Previously we assume there're some non-executing sections at the bottom of the text section so that we won't hit the array's bound. But on BOLTed binary, it turned out .bolt section is at the bottom of text section which can be profiled, then it crash llvm-profgen. This change try to fix it.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D113238
[NFC] This patch fixes URLs containing "master". Old URLs were either broken or
redirecting to the new URL.
Reviewed By: #libc, ldionne, mehdi_amini
Differential Revision: https://reviews.llvm.org/D113186
We almost always want to use the default AA pipeline. It's very easy for
users of PassBuilder to forget to customize the AAManager to use the
default AA pipeline (for example, the NewPM C API forgets to do this).
If somebody wants a custom AA pipeline, similar to what is being done
now with the default AA pipeline registration, they can
FAM.registerPass([&] { return std::move(MyAA); });
before calling
PB.registerFunctionAnalyses(FAM);
For example, LTOBackend.cpp and NewPMDriver.cpp do this.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D113210
If a tool wants to introduce new indirections via stubs at link-time in
ORC, it can cause fidelity issues around the address of the function if
some references to the function do not have relocations. This is known
to happen inside the body of the function itself on x86_64 for example,
where a PC-relative address is formed, but without a relocation.
```
_foo:
leaq -7(%rip), %rax ## form pointer to '_foo' without relocation
_bar:
leaq (%rip), %rax ## uses X86_64_RELOC_SIGNED to '_foo'
```
The consequence of introducing a stub for such a function at link time
is that if it forms a pointer to itself without relocation, it will not
have the same value as a pointer from outside the function. If the
function pointer is used as a key, this can cause problems.
This utility provides best-effort support for adding such missing
relocations using MCDisassembler and MCInstrAnalysis to identify the
problematic instructions. Currently it is only implemented for x86_64.
Note: the related issue with call/jump instructions is not handled
here, only forming function pointers.
rdar://83514317
Differential revision: https://reviews.llvm.org/D113038
This diff makes several amendments to the local file caching mechanism
which was migrated from ThinLTO to Support in
rGe678c51177102845c93529d457b020f969125373 in response to follow-up
discussion on that commit.
Patch By: noajshu
Differential Revision: https://reviews.llvm.org/D113080
The only binary-format-related field in the BBAddrMap structure is the function address (`Addr`), which will use uint64_t in 64B format and uint32_t in 32B format. This patch changes it to use uint64_t in both formats.
This allows non-templated use of the struct, at the expense of a marginal additional size overhead for the 32-bit format. The size of the BB address map section does not change.
Differential Revision: https://reviews.llvm.org/D112679
As seen in https://bugs.llvm.org/show_bug.cgi?id=52213 llvm-objdump
asserts if either the --debug-vars or the --dwarf options are provided
with invalid values. As suggested, this fix adds use of a default value
to these options and errors when given bad input.
Differential Revision: https://reviews.llvm.org/D112183
By default `llvm::seq` would happily iterate over enums, which may be unsafe if the enum values are not continuous. This patch disable enum iteration with `llvm::seq` and `llvm::seq_inclusive` and adds two new functions: `enum_seq` and `enum_seq_inclusive`.
To make sure enum iteration is safe, we require users to declare their enum types as iterable by specializing `enum_iteration_traits<SomeEnum>`. Because it's not always possible to add these traits next to enum definition (e.g., for enums defined in external libraries), we provide an escape hatch to allow iteration on per-callsite basis by passing `force_iteration_on_noniterable_enum`.
The main benefit of this approach is that these global declarations via traits can appear just next to enum definitions, making easy to spot when enums are miss-labeled, e.g., after introducing new enum values, whereas `force_iteration_on_noniterable_enum` should stand out and be easy to grep for.
This emerged from a discussion with gchatelet@ about reusing llvm's `Sequence.h` in lieu of https://github.com/GPUOpen-Drivers/llpc/blob/dev/lgc/interface/lgc/EnumIterator.h.
Reviewed By: dblaikie, gchatelet, aaron.ballman
Differential Revision: https://reviews.llvm.org/D107378
Two things in this diff:
1) Warn on the invalid range, currently three types of checking, see the detailed message in the code.
2) In some situation, llvm-profgen gives lots of warnings on the truncated stacks which is noisy. This change provides a switch to `--show-detailed-warning` to skip the warnings. Alternatively, we use a summary for those warning and show the percentage of cases with those issues.
Example of warning summary.
```
warning: 0.05%(1120/2428958) cases with issue: Profile context truncated due to missing probe for call instruction.
warning: 0.00%(2/178637) cases with issue: Range does not belong to any functions, likely from external function.
```
Reviewed By: hoy
Differential Revision: https://reviews.llvm.org/D111902
These were added to prevent functions from being removed by WPO.
But that doesn't make sense, correct WPO will not remove functions we actually use.
I noticed these because compiling cc1_main.cpp was pulling in random LLVM pass headers.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D112971
This is a new draft of D28234. I previously did the unorthodox thing of
pushing to it when I wasn't the original author, but since this version
- Uses `GNUInstallDirs`, rather than mimics it, as the original author
was hesitant to do but others requested.
- Is much broader, effecting many more projects than LLVM itself.
I figured it was time to make a new revision.
I am using this patch (and many back-ports) as the basis of
https://github.com/NixOS/nixpkgs/pull/111487 for my distro (NixOS). It
looked like people were generally on board in D28234, but I make note of
this here in case extra motivation is useful.
---
As pointed out in the original issue, a central tension is that LLVM
already has some partial support for these sorts of things. For example
`LLVM_LIBDIR_SUFFIX`, or `COMPILER_RT_INSTALL_PATH`. Because it's not
quite clear yet what to do about those, we are holding off on changing
libdirs and `compiler-rt`. for this initial PR.
---
On the advice of @lebedev.ri, I am splitting this up a bit per
subproject, starting with LLVM. To allow it to be more easily reviewed. This and the subsequent patch must be landed together, as this will not build alone. But the rest can be landed on their own.
Reviewed By: compnerd
Differential Revision: https://reviews.llvm.org/D100810
Notes generated in OpenBSD core files provide additional information
about the kernel state and CPU registers. These notes are described
in core.5, which can be viewed here: https://man.openbsd.org/core.5
Differential Revision: https://reviews.llvm.org/D111966
(Second try. Need to link against CodeGen and MC libs.)
The llvm-reduce tool has been extended to operate on MIR (import, clone and
export). Current limitation is that only a single machine function is
supported. A single reducer pass that operates on machine instructions (while
on SSA-form) has been added. Additional MIR specific reducer passes can be
added later as needed.
Differential Revision: https://reviews.llvm.org/D110527
The llvm-reduce tool has been extended to operate on MIR (import, clone and
export). Current limitation is that only a single machine function is
supported. A single reducer pass that operates on machine instructions (while
on SSA-form) has been added. Additional MIR specific reducer passes can be
added later as needed.
Differential Revision: https://reviews.llvm.org/D110527
Allow filling zero count for all the function ranges even there is no samples hitting that function. Add a switch for this.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D112858
Use the new sys::path::is_style_posix() and is_style_windows() in a few
places that need to detect the system's native path style.
In llvm/lib/Support/Path.cpp, this patch removes most uses of the
private `real_style()`, where is_style_posix() and is_style_windows()
are just a little tidier.
Elsewhere, this removes `_WIN32` macro checks. Added a FIXME to a
FileManagerTest that seemed fishy, but maintained the existing
behaviour.
Differential Revision: https://reviews.llvm.org/D112289
Like probe-based profile, the total samples is the sum of all its body samples. This patch fix it by a post-processing update for the line-number based profile. Tested it on our internal services, results showed no performance change.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D112672
This patch fixes:
llvm/tools/llvm-profgen/ProfiledBinary.cpp:357:12: error: variable
'EndOffset' set but not used [-Werror,-Wunused-but-set-variable]
The last use of the variable was removed on Oct 26 in commit
40ca411251.
The extractBasicBlocksFromModule, extractInstrFromModule, and other
similar functions previously performed very poorly when the number of
such elements in the program to reduce was very high. Previously, we
were creating the set which caches elements to keep by looping through
all elements in the module and adding them to the set. However, since
std::set is an ordered set, this introduces a massive amount of
rebalancing if the order of elements in the program and the order of
their pointers in memory are not the same.
The solution is straightforward: first put all the elements to be kept
in a vector, then use the constructor for std::set which takes a pair of
iterators over a collection. This constructor is optimized to avoid
doing unnecessary work when initializing large sets.
Also in this change, we pass BBsToKeep set to functions
replaceBranchTerminator and removeUninterestingBBsFromSwitch as a const
reference rather than passing it by value. This ought to prevent the
need to copy the collection each time these functions are called, which
is expensive if the collection is large.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D112757
Previous implementation of populating profile symbol list is wrong, it only included the profiled symbols. Actually it should use all symbols, here this switches to use the symbols from debug info. Also turned the flag off by default.
Reviewed By: wenlei, hoy
Differential Revision: https://reviews.llvm.org/D111824
It happened a bug that some callsite name in the profile is not a real function, it turned out that there're some non-function symbol from the ELF text section, e.g. the global accessible branch label and also recalled that we can have one function being split into multiple ranges. We shouldn't count samples for those are not the entry of the real function.
So this change tried to fix this issue by switching to use the name or ranges from DWARF-based debug info, the range of which assure it's the real function start. For the split functions, we assume that the real entry function's DWARF name should always match the symbol table name.
The switching is also consistent with the body samples' symbol which is from DWARF.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D112282
This was checked while counting but not actually when doing the reduction, resulting in crashes.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D112766
It seems that llvm-objcopy stores data temporarily misaligned with the
requirements of the underlaying struct from libBinaryFormat, and UBSan
generates a runtime error.
Instead of trying to reinterpret the memory as the struct itself, simply
access the `char *` pointer that we are interested in, and that do not
have alignment restrictions.
This problem was pointed out in a comment of D111164.
Differential Revision: https://reviews.llvm.org/D112744
Adding support to the CS preinliner to trim cold base profiles. This makes trimming consistent with the inline decision made by the preinliner. Also disable the existing profile merger when preinliner is on unless explicitly specified.
Reviewed By: wenlei, wlei
Differential Revision: https://reviews.llvm.org/D112489
**Context:**
This is a second attempt at introducing signature regeneration to llvm-objcopy. In this diff: https://reviews.llvm.org/D109840, a script was introduced to test
the validity of a code signature. In this diff: https://reviews.llvm.org/D109803 (now reverted), an effort was made to extract the signature generation behavior out of LLD into a common location for use in llvm-objcopy. In this diff: https://reviews.llvm.org/D109972 it was decided that there was no appropriate common location and that a small amount of duplication to bring signature generation to llvm-objcopy would be better. This diff introduces this duplication.
**Summary**
Prior to this change, if a LC_CODE_SIGNATURE load command
was included in the binary passed to llvm-objcopy, the command and
associated section were simply copied and included verbatim in the
new binary. If rest of the binary was modified at all, this results
in an invalid Mach-O file. This change regenerates the signature
rather than copying it.
The code_signature_lc.test test was modified to include the yaml
representation of a small signed MachO executable in order to
effectively test the signature generation.
Reviewed By: alexander-shaposhnikov, #lld-macho
Differential Revision: https://reviews.llvm.org/D111164
This change allows the unsymbolized profile as input. The unsymbolized profile is created by `llvm-profgen` with `--skip-symbolization` and it's after the sample aggregation but before symbolization , so it has much small file size. It can be used for sample merging and trimming, also is useful for debugging or adding test cases. A switch `--unsymbolized-profile=file-patch` is added for this.
Format of unsymbolized profile:
```
[context stack1] # If it's a CS profile
number of entries in RangeCounter
from_1-to_1:count_1
from_2-to_2:count_2
......
from_n-to_n:count_n
number of entries in BranchCounter
src_1->dst_1:count_1
src_2->dst_2:count_2
......
src_n->dst_n:count_n
[context stack2]
......
```
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D111750
IRBuilder has been updated to support preserving metdata in a more
general manner. This patch adds `LLVMAddMetadataToInst` and
deprecates `LLVMSetInstDebugLocation` in favor of the more
general function.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D93454
We incorrectly use duplication factor for total samples even though we already accumulate samples instead of taking MAX. It causes profile to have bloated total samples for functions with loop unrolled or vectorized. The change fix the issue for total sample, head sample and call target samples.
Differential Revision: https://reviews.llvm.org/D112042
Having non-undef constants in a final llvm-reduce output is nicer than
having undefs.
This splits the existing reduce-operands pass into three, one which does
the same as the current pass of reducing to undef, and two more to
reduce to the constant 1 and the constant 0. Do not reduce to undef if
the operand is a ConstantData, and do not reduce 0s to 1s.
Reducing GEP operands very frequently causes invalid IR (since types may
not match up if we index differently into a struct), so don't touch GEPs.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D111765
When printing names in lldb on windows these names contain the full type information while on linux only the name is contained.
This change introduces a flag in the Microsoft demangler to control if the type information should be included.
With the flag enabled demangled name contains only the qualified name, e.g:
without flag -> with flag
int (*array2d)[10] -> array2d
int (*abc::array2d)[10] -> abc::array2d
const int *x -> x
For globals there is a second inconsistency which is not yet addressed by this change. On linux globals (in global namespace) are prefixed with :: while on windows they are not.
Reviewed By: teemperor, rnk
Differential Revision: https://reviews.llvm.org/D111715
While build llvm-project as a sub-project on windows, met a build error:
libllvm-c.exports /llvm/bin\llvm-nm.exe: error: ...builds/rel64ninja/./lib/LLVMDemangle.lib: no such file or directory
The libllvm-c.exports, libllvm-c.args, and lib/*.lib should under LLVM_BINARY_DIR, using CMAKE_BINARY_DIR will cause 'no such file' error while llvm-project built as a sub-project.
By default, such a non-template variable of non-volatile const-qualified type
having namespace-scope has internal linkage ([basic.link]), so no need for `static`.
We would like to move ThinLTO’s battle-tested file caching mechanism to
the LLVM Support library so that we can use it elsewhere in LLVM.
Patch By: noajshu
Differential Revision: https://reviews.llvm.org/D111371
Right now when we see -O# we add the corresponding 'default<O#>' into
the list of passes to run when translating legacy -pass-name. This has
the side effect of not using the default AA pipeline.
Instead, treat -O# as -passes='default<O#>', but don't allow any other
-passes or -pass-name. I think we can keep `opt -O#` as shorthand for
`opt -passes='default<O#>` but disallow anything more than just -O#.
Tests need to be updated to not use `opt -O# -pass-name`.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D112036
We would like to move ThinLTO’s battle-tested file caching mechanism to
the LLVM Support library so that we can use it elsewhere in LLVM.
Patch By: noajshu
Differential Revision: https://reviews.llvm.org/D111371
Currently -W and --wide are treated as two options as they are only
included for gnu readelf compatibility and ignored. This change makes -W
an alias of --wide to be consistent with other option aliases.
Differential Revision: https://reviews.llvm.org/D111731
It can be a bit confusing to stop with no explanation so we should indicate
when further output was prevented by the cycle limit.
Differential Revision: https://reviews.llvm.org/D111753
Add `-use-dwarf-correlation` switch to allow llvm-profgen to generate AutoFDO profile for binaries built with CSSPGO (pseudo-probe).
Differential Revision: https://reviews.llvm.org/D111776
The first LBR entry can be an external branch, we should ignore the whole trace.
```
7f7448e889e4 0x7f7448e889e4/0x7f7448e88826/P/-/-/1 0x7f7448e8899f/0x7f7448e889d8/P/-/-/4 ...
```
Reviewed By: wenlei, hoy
Differential Revision: https://reviews.llvm.org/D111749
With `ignore-stack-samples`, We can ignore the call stack before the samples aggregation which could reduce some redundant computations.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D111577
SimpleRemoteEPC notionally allowed subclasses to override the
createMemoryManager and createMemoryAccess methods to use custom objects, but
could not actually be subclassed in practice (The construction process in
SimpleRemoteEPC::Create could not be re-used).
Instead of subclassing, this commit adds a SimpleRemoteEPC::Setup class that
can be used by clients to set up the memory manager and memory access members.
A default-constructed Setup object results in no change from previous behavior
(EPCGeneric* memory manager and memory access objects used by default).
Instead of setting operands to undef as the "operands" pass does,
convert the operands to a function argument. This avoids having to
introduce undef values into the IR which have some unpredictability
during optimizations.
For instance,
define void @func() {
entry:
%val = add i32 32, 21
store i32 %val, i32* null
ret void
}
is reduced to
define void @func(i32 %val) {
entry:
%val1 = add i32 32, 21
store i32 %val, i32* null
ret void
}
(note that the instruction %val is renamed to %val1 when printing
the IR to avoid ambiguity; ideally %val1 would be removed by dce or the
instruction reduction pass)
Any call to @func is replaced with a call to the function with the
new signature and filled with undef. This is not ideal for IPA passes,
but those out-of-scope for now.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D111503
Adds explicit narrowing casts to JITLinkMemoryManager.cpp.
Honors -slab-address option in llvm-jitlink.cpp, which was accidentally
dropped in the refactor.
This effectively reverts commit 6641d29b70.
This commit substantially refactors the JITLinkMemoryManager API to: (1) add
asynchronous versions of key operations, (2) give memory manager implementations
full control over link graph address layout, (3) enable more efficient tracking
of allocated memory, and (4) support "allocation actions" and finalize-lifetime
memory.
Together these changes provide a more usable API, and enable more powerful and
efficient memory manager implementations.
To support these changes the JITLinkMemoryManager::Allocation inner class has
been split into two new classes: InFlightAllocation, and FinalizedAllocation.
The allocate method returns an InFlightAllocation that tracks memory (both
working and executor memory) prior to finalization. The finalize method returns
a FinalizedAllocation object, and the InFlightAllocation is discarded. Breaking
Allocation into InFlightAllocation and FinalizedAllocation allows
InFlightAllocation subclassses to be written more naturally, and FinalizedAlloc
to be implemented and used efficiently (see (3) below).
In addition to the memory manager changes this commit also introduces a new
MemProt type to represent memory protections (MemProt replaces use of
sys::Memory::ProtectionFlags in JITLink), and a new MemDeallocPolicy type that
can be used to indicate when a section should be deallocated (see (4) below).
Plugin/pass writers who were using sys::Memory::ProtectionFlags will have to
switch to MemProt -- this should be straightworward. Clients with out-of-tree
memory managers will need to update their implementations. Clients using
in-tree memory managers should mostly be able to ignore it.
Major features:
(1) More asynchrony:
The allocate and deallocate methods are now asynchronous by default, with
synchronous convenience wrappers supplied. The asynchronous versions allow
clients (including JITLink) to request and deallocate memory without blocking.
(2) Improved control over graph address layout:
Instead of a SegmentRequestMap, JITLinkMemoryManager::allocate now takes a
reference to the LinkGraph to be allocated. The memory manager is responsible
for calculating the memory requirements for the graph, and laying out the graph
(setting working and executor memory addresses) within the allocated memory.
This gives memory managers full control over JIT'd memory layout. For clients
that don't need or want this degree of control the new "BasicLayout" utility can
be used to get a segment-based view of the graph, similar to the one provided by
SegmentRequestMap. Once segment addresses are assigned the BasicLayout::apply
method can be used to automatically lay out the graph.
(3) Efficient tracking of allocated memory.
The FinalizedAlloc type is a wrapper for an ExecutorAddr and requires only
64-bits to store in the controller. The meaning of the address held by the
FinalizedAlloc is left up to the memory manager implementation, but the
FinalizedAlloc type enforces a requirement that deallocate be called on any
non-default values prior to destruction. The deallocate method takes a
vector<FinalizedAlloc>, allowing for bulk deallocation of many allocations in a
single call.
Memory manager implementations will typically store the address of some
allocation metadata in the executor in the FinalizedAlloc, as holding this
metadata in the executor is often cheaper and may allow for clean deallocation
even in failure cases where the connection with the controller is lost.
(4) Support for "allocation actions" and finalize-lifetime memory.
Allocation actions are pairs (finalize_act, deallocate_act) of JITTargetAddress
triples (fn, arg_buffer_addr, arg_buffer_size), that can be attached to a
finalize request. At finalization time, after memory protections have been
applied, each of the "finalize_act" elements will be called in order (skipping
any elements whose fn value is zero) as
((char*(*)(const char *, size_t))fn)((const char *)arg_buffer_addr,
(size_t)arg_buffer_size);
At deallocation time the deallocate elements will be run in reverse order (again
skipping any elements where fn is zero).
The returned char * should be null to indicate success, or a non-null
heap-allocated string error message to indicate failure.
These actions allow finalization and deallocation to be extended to include
operations like registering and deregistering eh-frames, TLS sections,
initializer and deinitializers, and language metadata sections. Previously these
operations required separate callWrapper invocations. Compared to callWrapper
invocations, actions require no extra IPC/RPC, reducing costs and eliminating
a potential source of errors.
Finalize lifetime memory can be used to support finalize actions: Sections with
finalize lifetime should be destroyed by memory managers immediately after
finalization actions have been run. Finalize memory can be used to support
finalize actions (e.g. with extra-metadata, or synthesized finalize actions)
without incurring permanent memory overhead.
Summary: This patch improves the error message context of the
XCOFF interfaces by providing more details.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D110320
ExecutorProcessControl objects will now have a TaskDispatcher member which
should be used to dispatch work (in particular, handling incoming packets in
the implementation of remote EPC implementations like SimpleRemoteEPC).
The GenericNamedTask template can be used to wrap function objects that are
callable as 'void()' (along with an optional name to describe the task).
The makeGenericNamedTask functions can be used to create GenericNamedTask
instances without having to name the function object type.
In a future patch ExecutionSession will be updated to use the
ExecutorProcessControl's dispatcher, instead of its DispatchTaskFunction.
Use Module& wherever possible.
Since every reduction immediately turns Chunks into an Oracle, directly pass Oracle instead.
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D111122
Allow overlap/similarity comparison to use custom hot threshold cutoff, instead of using hard coded 990000 as hot cutoff.
Differential Revision: https://reviews.llvm.org/D111385
When parsing mmap to retrieve PID, deduplicate them before passing PID list to perf script. Perf script would error out when there's duplicated PID in the input, however raw perf data may main duplicated PID for large binary where more than one mmap is needed to load executable segment.
Differential Revision: https://reviews.llvm.org/D111384
This adds the `--dump-blockinfo` flag to `llvm-bcanalyzer`, allowing a sufficiently motivated user to dump (parts of) the `BLOCKINFO_BLOCK` block. The default behavior is unchanged, and `--dump-blockinfo` only takes effect in the same context as other flags that control dump behavior (i.e., requires that `--dump` is also passed).
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D107536
We have a string litteral (via CPP) used to construct `StringRef`, which
is used to construct a `SmallString`. Just construct the latter
directly.
Differential Revision: https://reviews.llvm.org/D111322
This moves the registry higher in the LLVM library dependency stack.
Every client of the target registry needs to link against MC anyway to
actually use the target, so we might as well move this out of Support.
This allows us to ensure that Support doesn't have includes from MC/*.
Differential Revision: https://reviews.llvm.org/D111454
This reverts commit dfd74db981.
SimpleRemoteEPC should share dispatch with the ExecutionSession, rather than
having two different dispatch systems on the controller side.
SimpleRemoteEPCServer::Dispatch doesn't need to be shared.
Renames SimpleRemoteEPCServer::Dispatcher to SimpleRemoteEPCDispatcher and
moves it into OrcShared. SimpleRemoteEPCServer::ThreadDispatcher is similarly
moved and renamed to DynamicThreadPoolSimpleRemoteEPCDispatcher.
This will allow these classes to be reused by SimpleRemoteEPC on the controller
side of the connection.
A [[ https://reviews.llvm.org/rGf6fa95b77f33c3690e4201e505cb8dce1433abd9 | recent commit ]] removed `<string>` from `ErrorHandling.h`. The removal caused `<string>` to be no longer included for `llvm/tools/llvm-cxxdump/Error.cpp` which uses the string type.
This patch adds `<string>` to `llvm/tools/llvm-cxxdump/Error.cpp`.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D111354
For some transformations like hot-cold split or coro split, it can outline its part of function ranges. Since sample loader is the early stage of backend and no split happens at that time, compiler can't recognize those function, so in llvm-profgen we should attribute the sample to the original function. This is already done for the body range samples since we use the symbols from dwarf which is created before the split.
But for branch samples, the call from master function to its outlined function is actually not a call to the original function, we shouldn't add head/callsie samples for it. So instead of dwarf symbol, we use the symbols from symbol table and ignore those functions with special suffixes(like `.cold` ,`.resume`) for accumulating the callsite/head samples.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D110864
This change is to keep the help text and command guide of llvm-readelf
in tandem.
- In the help text mention that --section-data, --section-relocations,
--section-symbols and --stack-sizes have no effect on GNU style
output; give the accepted values for --elf-output-style and update
the description of --gnu-hash-table to use the command guide
description.
- In the command guide add the missing options -a,
--dependant-libraries,--no-demangle, --wide and -W. Also update the
description of --symbols so it matches the help text.
Differential Revision: https://reviews.llvm.org/D111240
This change is to add some missing details, clarifies some options and
brings the help text and command guide of objdump closer together.
- Added to the help that --all-headers also outputs symbols and
relocations to match the command guide.
- Added to the help that --debug-vars accepts an optional
ascii/unicode format to match the command guide.
- Changed the help descriptions for --disassemble,
--disassemble-all, --dwarf=<value>, --fault-map-section,
--line-numbers, --no-leading-addr and --source descriptions to
match the command guide.
- Added to the help that --start-address and --stop-address also
effect relocation entries and the symbol table output to match
the command guide.
- Added a note to the command guide that --unwind-info and -u
are not available for the elf format.
Differential Revision: https://reviews.llvm.org/D110633
In the command guide --prefix and --prefix-strip is used in the form
--prefix=<prefix> however currently it is used in the form --prefix
<prefix>. This change fixes these options to match the command guide.
Differential Revision: https://reviews.llvm.org/D110551
To better reflect the meaning of the now-disambiguated {GlobalValue,
GlobalAlias}::getBaseObject after breaking off GlobalIFunc::getResolverFunction
(D109792), the function is renamed to getAliaseeObject.
As described on D111049, we're trying to remove the <string> dependency from error handling and replace uses of report_fatal_error(const std::string&) with the Twine() variant which can be forward declared.
This removes `WasmTagType`. `WasmTagType` contained an attribute and a
signature index:
```
struct WasmTagType {
uint8_t Attribute;
uint32_t SigIndex;
};
```
Currently the attribute field is not used and reserved for future use,
and always 0. And that this class contains `SigIndex` as its property is
a little weird in the place, because the tag type's signature index is
not an inherent property of a tag but rather a reference to another
section that changes after linking. This makes tag handling in the
linker also weird that tag-related methods are taking both `WasmTagType`
and `WasmSignature` even though `WasmTagType` contains a signature
index. This is because the signature index changes in linking so it
doesn't have any info at this point. This instead moves `SigIndex` to
`struct WasmTag` itself, as we did for `struct WasmFunction` in D111104.
In this CL, in lib/MC and lib/Object, this now treats tag types in the
same way as function types. Also in YAML, this removes `struct Tag`,
because now it only contains the tag index. Also tags set `SigIndex` in
`WasmImport` union, as functions do.
I think this makes things simpler and makes tag handling more in line
with function handling. These two shares similar properties in that both
of them have signatures, but they are kind of nominal so having the same
signature doesn't mean they are the same element.
Also a drive-by fix: the reserved 'attirubute' part's encoding changed
from uleb32 to uint8 a while ago. This was fixed in lib/MC and
lib/Object but not in YAML. This doesn't change object files because the
field's value is always 0 and its encoding is the same for the both
encoding.
This is effectively NFC; I didn't mark it as such just because it
changed YAML test results.
Reviewed By: sbc100, tlively
Differential Revision: https://reviews.llvm.org/D111086
As described on D111049, we're trying to remove the <string> dependency from error handling and replace uses of report_fatal_error(const std::string&) with the Twine() variant which can be forward declared.
We can use the raw_string_ostream::str() method to perform the implicit flush() and return a reference to the std::string container that we can then wrap inside Twine().
https://commondatastorage.googleapis.com/chromium-browser-clang/llvm-include-analysis.html
Excessive use of the <string> header has a massive impact on compile time; its most commonly included via the ErrorHandling.h header, which has to be included in many key headers, impacting many source files that have no need for std::string.
As an initial step toward removing the <string> include from ErrorHandling.h, this patch proposes to update the fatal_error_handler_t handler to just take a raw const char* instead.
The next step will be to remove the report_fatal_error std::string variant, which will involve a lot of cleanup and better use of Twine/StringRef.
Differential Revision: https://reviews.llvm.org/D111049
This change adds duplication factor multiplier while accumulating body samples for line-number based profile. The body sample count will be `duplication-factor * count`. Base discriminator and duplication factor is decoded from the raw discriminator, this requires some refactor works.
Differential Revision: https://reviews.llvm.org/D109934
This simplifies the code in a number of ways and avoids
having to track functions and their types separately.
Differential Revision: https://reviews.llvm.org/D111104
D104366 introduced a new llvm-cxxfilt test with non-ASCII characters,
which caused a failure on llvm-clang-x86_64-expensive-checks-win
builder, with a stack trace suggesting issue in a call to isalnum.
The argument to isalnum should be either EOF or a value that is
representable in the type unsigned char. The llvm-cxxfilt does not
perform a cast from char to unsigned char before the call, so the
value might be out of valid range.
Replace the call to isalnum with isAlnum from StringExtras, which takes
a char as the argument. This also makes the check independent of the
current locale.
Differential Revision: https://reviews.llvm.org/D110986
With the removal of OrcRPCExecutorProcessControl and OrcRPCTPCServer in
6aeed7b19c the ORC RPC library no longer has any in-tree users.
Clients needing serialization for ORC should move to Simple Packed
Serialization (usually by adopting SimpleRemoteEPC for remote JITing).
We expose the fact that we rely on unsigned wrapping to iterate through
all indexes. This can be confusing. Rather, keeping it as an
implementation detail through an iterator is less confusing and is less
code.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D110885
Summary:
for xcoff :
implement the getSymbolFlag and getSymbolType() for option --syms.
llvm-objdump --sym , if the symbol is label, print the containing section for the symbol too.
when using llvm-objdump --sym --symbol--description, print the symbol index and qualname for symbol.
for example:
--symbol-description
00000000000000c0 l .text (csect: (idx: 2) .foov[PR]) (idx: 3) .foov
and without --symbol-description
00000000000000c0 l .text (csect: .foov) .foov
Reviewers: James Henderson,Esme Yi
Differential Revision: https://reviews.llvm.org/D109452
LLVM (llvmorg-14-init) under Debian sid using latest gcc (Debian
10.3.0-9) 10.3.0 fails due to ambiguous overload on operators == and !=:
/root/src/llvm/src/llvm/tools/obj2yaml/elf2yaml.cpp:212:22:
error: ambiguous overload for 'operator!='
(operand types are 'llvm::ELFYAML::ELF_SHF' and 'int')
/root/src/llvm/src/llvm/tools/obj2yaml/elf2yaml.cpp:204:32:
error: ambiguous overload for 'operator!='
(operand types are 'const llvm::yaml::Hex64' and 'int')
/root/src/llvm/src/llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp:629:35:
error: ambiguous overload for 'operator=='
(operand types are 'const uint64_t' {aka 'const long unsigned int'} and
'llvm::Register')
Reviewed by: StephenTozer, jmorse, Higuoxing
Differential Revision: https://reviews.llvm.org/D109534
When replacing function calls, skip call instructions where the old
function is not the called function, but e.g. the old function is passed
as an argument.
This fixes a crash due to trying to construct invalid IR for the test
case.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D109759
We used the segment alignment in elf header to assume the loader alignment. However this is incorrect because loader alignment is always the same as page size. If segment needs to be aligned at load time, linker will set aligned address as virtual address in elf header.
Differential Revision: https://reviews.llvm.org/D110795
This change enables llvm-profgen to take raw perf data as alternative input format. Sometimes we need to retrieve evenets for processes with matching binary. Using perf data as input allows us to retrieve process Ids from mmap events for matching binary, then filter by process id during perf script generation.
Differential Revision: https://reviews.llvm.org/D110793
This change contains diagnostics improvments, refactoring and preparation for consuming perf data directly.
Diagnostics:
- We now have more detailed diagnostics when no mmap is found.
- We also print warning for abnormal transition to external code.
Refactoring:
- Simplify input perf trace processing to only allow a single input file. This is because 1) using multiple input perf trace (perf script) is error prone because we may miss key mmap events. 2) the functionality is not really being used anyways.
- Make more functions private for Readers, move non-trivial definitions out of header. Cleanup some inconsistency.
- Prepare for consuming perf data as input directly.
Differential Revision: https://reviews.llvm.org/D110729
The ReduceMetadata pass before this patch removed metadata on a per-MDNode (or NamedMDNode) basis. Either all references to an MDNode are kept, or all of them are removed. However, MDNodes are uniqued, meaning that references to MDNodes with the same data become references to the same MDNodes. As a consequence, e.g. tbaa references to the same type will all have the same MDNode reference and hence make it impossible to reduce only keeping metadata on those memory access for which they are interesting.
Moreover, MDNodes can also be referenced by some intrinsics or other MDNodes. These references were not considered for removal leading to the possibility that MDNodes are not actually removed even if selected to be removed by the oracle.
This patch changes ReduceMetadata to reduces based on removable metadata references instead. MDNodes without references implicitly dropped anyway. References by intrinsic calls should be removed by ReduceOperands or ReduceInstructions. References in other MDNodes cannot be removed as it would violate the immutability of MDNodes.
Additionally, ReduceMetadata pass before this patch used `setMetadata(I, NULL)` to remove references, where `I` is the index in the array returned by `getAllMetadata`. However, `setMetadata` expects a MDKind (such as `MD_tbaa`) as first argument. `getAllMetadata` does not return those in consecutive order (otherwise it would not need to be a `std::pair` with `first` representing the MDKind).
Reviewed By: aeubanks, swamulism
Differential Revision: https://reviews.llvm.org/D110534
As for now, llvm-objcopy renames only sections that are specified
explicitly in --rename-section, while GNU objcopy keeps names of
relocation sections in sync with their targets. For example:
> readelf -S test.o
...
[ 1] .foo PROGBITS
[ 2] .rela.foo RELA
> objcopy --rename-section .foo=.bar test.o gnu.o
> readelf -S gnu.o
...
[ 1] .bar PROGBITS
[ 2] .rela.bar RELA
> llvm-objcopy --rename-section .foo=.bar test.o llvm.o
> readelf -S llvm.o
...
[ 1] .bar PROGBITS
[ 2] .rela.foo RELA
This patch makes llvm-objcopy to match the behavior of GNU objcopy better.
Differential Revision: https://reviews.llvm.org/D110352
The slab allocator is frequently used in -noexec tests where we want a
consistent memory layout. In this context we also want to set the effective
page size, rather than using the page size of the host process, since not all
systems use the same page size. The -slab-page-size option allows us to set
the page size for such tests.
The -slab-page-size option will also be honored in exec mode when using the
slab allocator, but will trigger an error if the requested size is not a
multiple of the actual process page size.
This option was motivated by test failures on a ppc64 bot that was returning
zero from sys::Process::getPageSize(), so it also contains a check for errors
and zero results from that function if the -slab-page-size option is absent.
Existing slab allocator tests will be updated to use this option in a follow-up
commit so that we can point the failing bot at this commit and observe errors
associated with sys::Process::getPageSize().
* Add a newline before `DYNAMIC RELOCATION RECORDS` (see D101796)
* Add the missing `OFFSET TYPE VALUE` line
* Align columns
Note: llvm-readobj/ELFDumper.cpp `loadDynamicTable` has sophisticated PT_DYNAMIC
code which is unavailable in llvm-objdump.
Reviewed By: jhenderson, Higuoxing
Differential Revision: https://reviews.llvm.org/D110595
Similar to https://reviews.llvm.org/D110465, we can compute function size on-demand for the functions that's hit by samples.
Here we leverage the raw range samples' address to compute a set of sample hit function. Then `BinarySizeContextTracker` just works on those function range for the size.
Reviewed By: hoy
Differential Revision: https://reviews.llvm.org/D110466
Previously we do symbolization for all the functions and actually we only need the symbols that's hit by the samples.
This can significantly speed up the time for large size binary.
Optimization for per-inliner will come along with next patch.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D110465
The MSP430 ABI supports build attributes for specifying
the ISA, code model, data model and enum size in ELF object files.
Differential Revision: https://reviews.llvm.org/D107969
This change is to add some missing details to the help text and command
guide:
- Added a note to the command guide that --debug-macro also dumps
.debug_macinfo.
- Added a note to the command guide that --debug-frame and --eh_frame
are aliases, and in cases where both sections are present one command
outputs both.
- Changed the wording in the help output for --ignore-case and --regex to
closer match the command guide.
We want this behavior for future testing infrastructure anyway, and it may help
with the failure in https://lab.llvm.org/buildbot/#/builders/98/builds/6401:
/b/fuchsia-x86_64-linux/llvm.obj/tools/clang/stage2-bins/bin/lli: warning:
remote mcjit does not support lazy compilation
Finalization error: could not register eh-frame: __register_frame function not
found
/b/fuchsia-x86_64-linux/llvm.obj/tools/clang/stage2-bins/bin/lli: disconnecting
This reintroduces "[ORC] Introduce EPCGenericRTDyldMemoryManager."
(bef55a2b47) and "[lli] Add ChildTarget dependence
on OrcTargetProcess library." (7a219d801b) which were
reverted in 99951a5684 due to bot failures.
The root cause of the bot failures should be fixed by "[ORC] Fix uninitialized
variable." (0371049277) and "[ORC] Wait for
handleDisconnect to complete in SimpleRemoteEPC::disconnect."
(320832cc9b).
This reverts commit bef55a2b47 while I investigate
failures on some bots. Also reverts "[lli] Add ChildTarget dependence on
OrcTargetProcess library." (7a219d801b) which was
a fallow-up to bef55a2b47.
EPCGenericRTDyldMemoryMnaager is an EPC-based implementation of the
RuntimeDyld::MemoryManager interface. It enables remote-JITing via EPC (backed
by a SimpleExecutorMemoryManager instance on the executor side) for RuntimeDyld
clients.
The lli and lli-child-target tools are updated to use SimpleRemoteEPC and
SimpleRemoteEPCServer (rather than OrcRemoteTargetClient/Server), and
EPCGenericRTDyldMemoryManager for MCJIT tests.
By enabling remote-JITing for MCJIT and RuntimeDyld-based ORC clients,
EPCGenericRTDyldMemoryManager allows us to deprecate older remote-JITing
support, including OrcTargetClient/Server, OrcRPCExecutorProcessControl, and the
Orc RPC system itself. These will be removed in future patches.
In order to be consistent with compiler that interprets zero count as unexecuted(cold), this change reports zero-value count for unexecuted part of function code. For the implementation, it leverages the range counter, initializes all the executed function range with the zero-value. After all ranges are merged and converted into disjoint ranges, the remaining zero count will indicates the unexecuted(cold) part of the function.
This change also extends the current `findDisjointRanges` method which now can support adding zero-value range.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D109713
This patch introduces non-CS AutoFDO profile generation into LLVM. The profile is supposed to be well consumed by compiler using `-fprofile-sample-use=[profile]`.
After range and branch counters are extracted from the LBR sample, here we go through each addresses for symbolization, create FunctionSamples and populate its sub fields like TotalSamples, BodySamples and HeadSamples etc. For inlined code, as we need to map back to original code, so we always add body samples to the leaf frame's function sample.
Reviewed By: wenlei, hoy
Differential Revision: https://reviews.llvm.org/D109551
Similar to https://reviews.llvm.org/D109637, there is a whole invalid line of message in perfscript.
```
warning: Invalid address in LBR record at line 14118674: Processed 14138923 events and lost 1 chunks!
warning: Invalid address in LBR record at line 14118676: Check IO/CPU overload!
```
This only happened for LBR only perfscript, hybridperfscript have a check of " 0x" to make sure it's the LBR perf line.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D110424
In ThinLTO for locals we normally compute the GUID from the name after
prepending the source path to get a unique global id. SamplePGO indirect
call profiles contain the target GUID without this uniquification,
however (unless compiling with -funique-internal-linkage-names).
In order to correctly handle the call edges added to the combined index
for these indirect calls, during importing and bitcode writing we
consult a map of original to full GUID to identify the actual callee.
However, for a large application this was consuming a lot of compile
time as we need to do this repeatedly (especially during importing where
we may traverse call edges multiple times).
To fix this implement a suggestion in one of the FIXME comments, and
actually modify the call edges during a single traversal after the index
is built to perform the fixups once. I combined this fixup with the dead
code analysis performed on the index in order to avoid adding an
additional walk of the index. The dead code analysis is the first
analysis performed on the index.
This reduced the time required for a large thin link with SamplePGO by
about 20%.
No new test added, but I confirmed that there are existing tests that
will fail when no fixup is performed.
Differential Revision: https://reviews.llvm.org/D110374
This change is to keep the help text and command guide of objcopy in
tandem.
- In the help output the options --rename-section and
--set-section-flags were missing the flag exclude, which is found in
the command guide.
- In the command guide the alias -G for --keep-global-symbol was
missing, which is found in the help output.
Differential Revision: https://reviews.llvm.org/D110340
It seems we missed one spot to persist `SampleContextFrameVector` into the global table (CSProfileGenerator::populateFunctionBoundarySamples:340) which causes a crash.
This change tried to fix it in a centralized way i. e. where we generate the `FunctionSamples`.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D110275
It happened that the LBR entry target can be the first address of text section which causes an out-of-range crash. So here add a boundary check.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D110271
Without preinliner, we need to tune down the cold count cutoff to merge/trim more context to limit profile size for large components. However it doesn't make sense for cold threshold to be higher than hot threshold, so we now change to use hot threshold as merging/trimming cut off instead.
Differential Revision: https://reviews.llvm.org/D110212
For large app, dumping disasm of the whole program can be slow and result in gianant output. Adding a switch to dump specific symbols only.
Reviewed By: wlei
Differential Revision: https://reviews.llvm.org/D110079
For strided accesses the loop vectorizer seems to prefer creating a
vector induction variable with a start value of the form
<i32 0, i32 1, i32 2, ...>. This value will be incremented each
loop iteration by a splat constant equal to the length of the vector.
Within the loop, arithmetic using splat values will be done on this
vector induction variable to produce indices for a vector GEP.
This pass attempts to dig through the arithmetic back to the phi
to create a new scalar induction variable and a stride. We push
all of the arithmetic out of the loop by folding it into the start,
step, and stride values. Then we create a scalar GEP to use as the
base pointer for a strided load or store using the computed stride.
Loop strength reduce will run after this pass and can do some
cleanups to the scalar GEP and induction variable.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D107790
Finalization and deallocation actions are a key part of the upcoming
JITLinkMemoryManager redesign: They generalize the existing finalization and
deallocate concepts (basically "copy-and-mprotect", and "munmap") to include
support for arbitrary registration and deregistration of parts of JIT linked
code. This allows us to register and deregister eh-frames, TLV sections,
language metadata, etc. using regular memory management calls with no additional
IPC/RPC overhead, which should both improve JIT performance and simplify
interactions between ORC and the ORC runtime.
The SimpleExecutorMemoryManager class provides executor-side support for memory
management operations, including finalization and deallocation actions.
This support is being added in advance of the rest of the memory manager
redesign as it will simplify the introduction of an EPC based
RuntimeDyld::MemoryManager (since eh-frame registration/deregistration will be
expressible as actions). The new RuntimeDyld::MemoryManager will in turn allow
us to remove older remote allocators that are blocking the rest of the memory
manager changes.
Most PDB fields on disk are 32-bit but describe the file in terms of MSF
blocks, which are 4 kiB by default.
So PDB files can be a bit larger than 4 GiB, and much larger if you create them
with a block size > 4 kiB.
This is a first (necessary, but by far not not sufficient) step towards
supporting such PDB files. Now we don't truncate in-memory file offsets (which
are in terms of bytes, not in terms of blocks).
No effective behavior change. lld-link will still error out if it were to
produce PDBs > 4 GiB.
Differential Revision: https://reviews.llvm.org/D109923
Turn on `use-context-cost-for-preinliner` to use context-sensitive byte size cost for preinliner decisions by default.
This is a more accurate proxy of inline cost than profile size. We tested on our large workload that it delivers measureable CPU improvement.
Differential Revision: https://reviews.llvm.org/D109893
New field `elements` is added to '!DIImportedEntity', representing
list of aliased entities.
This is needed to dump optimized debugging information where all names
in a module are imported, but a few names are imported with overriding
aliases.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D109343
Invalid frame addresses exist in call stack samples due to bad unwinding. This could happen to frame-pointer-based unwinding and the callee functions that do not have the frame pointer chain set up. It isn't common when the program is built with the frame pointer omission disabled, but can still happen with third-party static libs built with frame pointer omitted.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D109638
Perf script can sometimes give disordered LBR samples like below.
```
b022500
32de0044
3386e1d1
7f118e05720c
7f118df2d81f
0x2a0b9622/0x2a0b9610/P/-/-/1 0x2a0b79ff/0x2a0b9618/P/-/-/2 0x2a0b7a4a/0x2a0b79e8/P/-/-/1 0x2a0b7a33/0x2a0b7a46/P/-/-/1 0x2a0b7a42/0x2a0b7a23/P/-/-/1 0x2a0b7a21/0x2a0b7a37/P/-/-/2 0x2a0b79e6/0x2a0b7a07/P/-/-/1 0x2a0b79d4/0x2a0b79dc/P/-/-/2 0x2a0b7a03/0x2a0b79aa/P/-/-/1 0x2a0b79a8/0x2a0b7a00/P/-/-/234 0x2a0b9613/0x2a0b7930/P/-/-/1 0x2a0b9622/0x2a0b9610/P/-/-/1 0x2a0b79ff/0x2a0b9618/P/-/-/2 0x2a0b7a4a/0x2aWarning:
Processed 10263226 events and lost 1 chunks!
```
Note that the last LBR record `0x2a0b7a4a/0x2aWarning:` . Currently llvm-profgen does not detect that and as a result an uninitialized branch target value will be used. The uninitialized value can cause creepy instruction ranges created which which in turn will result in a completely wrong profile. An example is like
```
.... @ _ZN5folly13loadUnalignedIsEET_PKv]:18446744073709551615:18446744073709551615
1: 18446744073709551615
!CFGChecksum: 4294967295
!Attributes: 0
```
Reviewed By: wenlei, wlei
Differential Revision: https://reviews.llvm.org/D109637
We previously had a limitation that TLS variables could not
be exported (and therefore could also not be imported). This
change removed that limitation.
Differential Revision: https://reviews.llvm.org/D108877
This syncs parts from the x86 implementation to the ARMWinEH
implementation.
Currently, neither of the compilers targeting COFF/arm64 (MSVC, LLVM)
produce such relocations, but LLVM might after a later patch.
Differential Revision: https://reviews.llvm.org/D109650
This is the same as we do on arm64 already for the MSVC style label
symbols, but also handle the way GCC produces it - with all relocations
pointing at the .text section symbol, with various offsets.
Differential Revision: https://reviews.llvm.org/D109649
This reapplies bb27e45643 (SimpleRemoteEPC
support) and 2269a941a4 (#include <mutex>
fix) with further fixes to support building with LLVM_ENABLE_THREADS=Off.
https://reviews.llvm.org/D47381 / eb46c95c3e
changed the triples set up by GetHostTriple.cmake for i686 MSVC
from i686-pc-win32 to i686-pc-windows-msvc without changing
the corresponding condition in llvm-shlib.
Since then, the 32 bit x86 build of LLVM-C.dll has contained no
exported symbols at all.
Differential Revision: https://reviews.llvm.org/D109493
This reverts commit 5629afea91 ("[ORC] Add missing
include."), and bb27e45643 ("[ORC] Add
SimpleRemoteEPC: ExecutorProcessControl over SPS + abstract transport.").
The SimpleRemoteEPC patch currently assumes availability of threads, and needs
to be rewritten with LLVM_ENABLE_THREADS guards.
SimpleRemoteEPC is an ExecutorProcessControl implementation (with corresponding
new server class) that uses ORC SimplePackedSerialization (SPS) to serialize and
deserialize EPC-messages to/from byte-buffers. The byte-buffers are sent and
received via a new SimpleRemoteEPCTransport interface that can be implemented to
run SimpleRemoteEPC over whatever underlying transport system (IPC, RPC, network
sockets, etc.) best suits your use case.
The SimpleRemoteEPCServer class provides executor-side support. It uses a
customizable SimpleRemoteEPCServer::Dispatcher object to dispatch wrapper
function calls to prevent the RPC thread from being blocked (a problem in some
earlier remote-JIT server implementations). Almost all functionality (beyond the
bare basics needed to bootstrap) is implemented as wrapper functions to keep the
implementation simple and uniform.
Compared to previous remote JIT utilities (OrcRemoteTarget*,
OrcRPCExecutorProcessControl), more consideration has been given to
disconnection and error handling behavior: Graceful disconnection is now always
initiated by the ORC side of the connection, and failure at either end (or in
the transport) will result in Errors being delivered to both ends to enable
controlled tear-down of the JIT and Executor (in the Executor's case this means
"as controlled as the JIT'd code allows").
The introduction of SimpleRemoteEPC will allow us to remove other remote-JIT
support from ORC (including the legacy OrcRemoteTarget* code used by lli, and
the OrcRPCExecutorProcessControl and OrcRPCEPCServer classes), and then remove
ORC RPC itself.
The llvm-jitlink and llvm-jitlink-executor tools have been updated to use
SimpleRemoteEPC over file descriptors. Future commits will move lli and other
tools and example code to this system, and remove ORC RPC.
If the number of directories was 6 (equal to the DEBUG_DIRECTORY
index), patchDebugDirectory() was run even though the debug directory
is actually the 7th entry. Use <= in the comparison to fix that.
This fixes https://llvm.org/PR51243
Differential Revision: https://reviews.llvm.org/D106940
Reviewed by: jhenderson
This renames the primary methods for creating a zero value to `getZero`
instead of `getNullValue` and renames predicates like `isAllOnesValue`
to simply `isAllOnes`. This achieves two things:
1) This starts standardizing predicates across the LLVM codebase,
following (in this case) ConstantInt. The word "Value" doesn't
convey anything of merit, and is missing in some of the other things.
2) Calling an integer "null" doesn't make any sense. The original sin
here is mine and I've regretted it for years. This moves us to calling
it "zero" instead, which is correct!
APInt is widely used and I don't think anyone is keen to take massive source
breakage on anything so core, at least not all in one go. As such, this
doesn't actually delete any entrypoints, it "soft deprecates" them with a
comment.
Included in this patch are changes to a bunch of the codebase, but there are
more. We should normalize SelectionDAG and other APIs as well, which would
make the API change more mechanical.
Differential Revision: https://reviews.llvm.org/D109483
Allow variable number of directories, as allowed by the
specification. NumberOfRvaAndSize will default to 16 if not specified,
as in the past.
Reviewed by: jhenderson
Differential Revision: https://reviews.llvm.org/D108825
This patch continues refactoring done by D99055. It puts format specific
options into the correponding CopyConfig structures.
Differential Revision: https://reviews.llvm.org/D102277
Functions can have a personality function, as well as prefix and
prologue data as additional operands. Unused operands are assigned
a dummy value of i1* null. This patch addresses multiple issues in
use-list order preservation for these:
* Fix verify-uselistorder to also enumerate the dummy values.
This means that now use-list order values of these values are
shuffled even if there is no other mention of i1* null in the
module. This results in failures of Assembler/call-arg-is-callee.ll,
Assembler/opaque-ptr.ll and Bitcode/use-list-order2.ll.
* The use-list order prediction in ValueEnumerator does not take
into account the fact that a global may use a value more than
once and leaves uses in the same global effectively unordered.
We should be comparing the operand number here, as we do for
the more general case.
* While we enumerate all operands of a function together (which
seems sensible to me), the bitcode reader would first resolve
prefix data for all function, then prologue data for all
functions, then personality functions for all functions. Change
this to resolve all operands for a given function together
instead.
Differential Revision: https://reviews.llvm.org/D109282
Print relocations interleaved with disassembled instructions for
executables with relocatable sections, e.g. those built with "-Wl,-q".
Differential Revision: https://reviews.llvm.org/D109016
Currently native clusterization simply groups all benchmarks
by the opcode of key instruction, but that is suboptimal in certain cases,
e.g. where we can already tell that the particular instructions
already resolve into different sched classes.
In preparation for passing the MCSubtargetInfo (STI) through to writeNops
so that it can use the STI in operation at the time, we need to record the
STI in operation when a MCAlignFragment may write nops as padding. The
STI is currently unused, a further patch will pass it through to
writeNops.
There are many places that can create an MCAlignFragment, in most cases
we can find out the STI in operation at the time. In a few places this
isn't possible as we are in initialisation or finalisation, or are
emitting constant pools. When possible I've tried to find the most
appropriate existing fragment to obtain the STI from, when none is
available use the per module STI.
For constant pools we don't actually need to use EmitCodeAlign as the
constant pools are data anyway so falling through into it via an
executable NOP is no better than falling through into data padding.
This is a prerequisite for D45962 which uses the STI to emit the
appropriate NOP for the STI. Which can differ per fragment.
Note that involves an interface change to InitSections. It is now
called initSections and requires a SubtargetInfo as a parameter.
Differential Revision: https://reviews.llvm.org/D45961
In the case of no tied variables, we pick random defs, and then random uses that don't alias with defs we just picked.
Sounds good, except that an X86 instruction may have implicit reg uses,
e.g. for `MULX` it's `EDX`/`RDX`: `Intel SDM, 4-162 Vol. 2B MULX — Unsigned Multiply Without Affecting Flags`
> Performs an unsigned multiplication of the implicit source operand (EDX/RDX) and the specified source operand
> (the third operand) and stores the low half of the result in the second destination (second operand), the high half
> of the result in the first destination operand (first operand), without reading or writing the arithmetic flags.
And indeed, every once in a while `llvm-exegesis` happened to pick EDX as a def while measuring throughput,
and producing garbage output:
```
$ ./bin/llvm-exegesis -num-repetitions=1000000 -mode=inverse_throughput -repetition-mode=min --loop-body-size=4096 -dump-object-to-disk=false -opcode-name=MULX32rr --max-configs-per-opcode=65536
---
mode: inverse_throughput
key:
instructions:
- 'MULX32rr EDX R11D R12D'
config: ''
register_initial_values:
- 'R12D=0x0'
- 'EDX=0x0'
cpu_name: znver3
llvm_triple: x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
- { key: inverse_throughput, value: 4.00014, per_snippet_value: 4.00014 }
error: ''
info: instruction has no tied variables picking Uses different from defs
assembled_snippet: 415441BC00000000BA00000000C4C223F6D4C4C223F6D4C4C223F6D4C4C223F6D4415CC3415441BC00000000BA0000000049B80200000000000000C4C223F6D4C4C223F6D44983C0FF75F0415CC3
...
```
```
$ ./bin/llvm-exegesis -num-repetitions=1000000 -mode=inverse_throughput -repetition-mode=min --loop-body-size=4096 -dump-object-to-disk=false -opcode-name=MULX32rr --max-configs-per-opcode=65536
---
mode: inverse_throughput
key:
instructions:
- 'MULX32rr R13D EDX ECX'
config: ''
register_initial_values:
- 'ECX=0x0'
- 'EDX=0x0'
cpu_name: znver3
llvm_triple: x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
- { key: inverse_throughput, value: 3.00013, per_snippet_value: 3.00013 }
error: ''
info: instruction has no tied variables picking Uses different from defs
assembled_snippet: 4155B900000000BA00000000C4626BF6E9C4626BF6E9C4626BF6E9C4626BF6E9415DC34155B900000000BA0000000049B80200000000000000C4626BF6E9C4626BF6E94983C0FF75F0415DC3
...
```
Oops! Not only does that not look fun, i did hit that pitfail during AMD Zen 3 enablement.
While i have since then addressed this in rGd4d459e7475b4bb0d15280f12ed669342fa5edcd,
i suspect there may be other buggy results lying around, so we should at least stop producing them.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D109275
UsedTLSStorage is only used in allocateTLSSection,
guarded in x87 ELF only.
So clang will emit error with -Werror on.
.../llvm/tools/llvm-rtdyld/llvm-rtdyld.cpp:288:12:
error: private field 'UsedTLSStorage' is not used
[-Werror,-Wunused-private-field]
unsigned UsedTLSStorage = 0;
^
We merge cold context by default to save profile size. However trimming cold context after merging doesn't save size much, so default to off to reflect how it's commonly used.
Differential Revision: https://reviews.llvm.org/D109166
This change improves the warning for truncated context by: 1) deduplicate them as one call without probe can appear in many different context leading to duplicated warnings , 2) rephrase the message to make it easier to understand. The term "untracked frame" can be confusing.
Differential Revision: https://reviews.llvm.org/D109115
Added opt option -print-pipeline-passes to print a -passes compatible
string describing the built pass pipeline.
As an example:
$ opt -enable-new-pm=1 -adce -licm -simplifycfg -o /dev/null /dev/null -print-pipeline-passes
verify,function(adce),function(loop-mssa(licm)),function(simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts>),verify,BitcodeWriterPass
At the moment this is best-effort only and there are some known
limitations:
- Not all passes accepting parameters will print their parameters
(currently only implemented for simplifycfg).
- Some ClassName to pass-name mappings are not unique.
- Some ClassName to pass-name mappings are missing (e.g.
BitcodeWriterPass).
Differential Revision: https://reviews.llvm.org/D108298
Added opt option -print-pipeline-passes to print a -passes compatible
string describing the built pass pipeline.
As an example:
$ opt -enable-new-pm=1 -adce -licm -simplifycfg -o /dev/null /dev/null -print-pipeline-passes
verify,function(adce),function(loop-mssa(licm)),function(simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts>),verify,BitcodeWriterPass
At the moment this is best-effort only and there are some known
limitations:
- Not all passes accepting parameters will print their parameters
(currently only implemented for simplifycfg).
- Some ClassName to pass-name mappings are not unique.
- Some ClassName to pass-name mappings are missing (e.g.
BitcodeWriterPass).
Adding the compiler support of MD5 CS profile based on pervious context split work D107299. A MD5 CS profile is about 40% smaller than the string-based extbinary profile. As a result, the compilation is 15% faster.
There are a few conversion from real names to md5 names that have been made on the sample loader and context tracker side to get it work.
Reviewed By: wenlei, wmi
Differential Revision: https://reviews.llvm.org/D108342
The current help msg isn't super clear on whether t prints the content of the files or just the list of files.
(I'd certainly thought it'd print the list of files, and accidentally had a bunch of "gargabe" printed to my terminal).
Similarly, t sounded like it'd do what p actually did.
Differential Revision: https://reviews.llvm.org/D109018
This change aims at supporting LBR only sample perf script which is used for regular(Non-CS) profile generation. A LBR perf script includes a batch of LBR sample which starts with a frame pointer and a group of 32 LBR entries is followed. The FROM/TO LBR pair and the range between two consecutive entries (the former entry's TO and the latter entry's FROM) will be used to infer function profile info.
An example of LBR perf script(created by `perf script -F ip,brstack -i perf.data`)
```
40062f 0x40062f/0x4005b0/P/-/-/9 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 ...
4005d7 0x4005d7/0x4005e5/P/-/-/8 0x40062f/0x4005b0/P/-/-/6 0x400645/0x4005ff/P/-/-/1 ...
...
```
For implementation:
- Extended a new child class `LBRPerfReader` for the sample parsing, reused all the functionalities in `extractLBRStack` except for an extension to parsing leading instruction pointer.
- `HybridSample` is reused(just leave the call stack empty) and the parsed samples is still aggregated in `AggregatedSamples`. After that, range samples, branch sample, address samples are computed and recorded.
- Reused `ContextSampleCounterMap` to store the raw profile, since it's no need to aggregation by context, here it just registered one sample counter with a fake context key.
- Unified to use `show-raw-profile` instead of `show-unwinder-output` to dump the intermediate raw profile, see the comments of the format of the raw profile. For CS profile, it remains to output the unwinder output.
Profile generation part will come soon.
Differential Revision: https://reviews.llvm.org/D108153
Currently context strings contain a lot of duplicated function names and that significantly increase the profile size. This change split the context into a series of {name, offset, discriminator} tuples so function names used in the context can be replaced by the index into the name table and that significantly reduce the size consumed by context.
A follow-up improvement made in the compiler and profiling tools is to avoid reconstructing full context strings which is time- and memory- consuming. Instead a context vector of `StringRef` is adopted to represent the full context in all scenarios. As a result, the previous prevalent profile map which was implemented as a `StringRef` is now engineered as an unordered map keyed by `SampleContext`. `SampleContext` is reshaped to using an `ArrayRef` to represent a full context for CS profile. For non-CS profile, it falls back to use `StringRef` to represent a contextless function name. Both the `ArrayRef` and `StringRef` objects are underpinned by real array and string objects that are stored in producer buffers. For compiler, they are maintained by the sample reader. For llvm-profgen, they are maintained in `ProfiledBinary` and `ProfileGenerator`. Full context strings can be generated only in those cases of debugging and printing.
When it comes to profile format, nothing has changed to the text format, though internally CS context is implemented as a vector. Extbinary format is only changed for CS profile, with an additional `SecCSNameTable` section which stores all full contexts logically in the form of `vector<int>`, which each element as an offset points to `SecNameTable`. All occurrences of contexts elsewhere are redirected to using the offset of `SecCSNameTable`.
Testing
This is no-diff change in terms of code quality and profile content (for text profile).
For our internal large service (aka ads), the profile generation is cut to half, with a 20x smaller string-based extbinary format generated.
The compile time of ads is dropped by 25%.
Differential Revision: https://reviews.llvm.org/D107299
Summary: This patch is trying to add support for llvm-readobj
--needed-libs option under XCOFF.
For XCOFF, the needed libraries can be found from the Import
File ID Name Table of the Loader Section.
Currently, I am using binary inputs in the test since yaml2obj
does not yet support for writing the Loader Section and the
import file table.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D106643
The change adds a switch to allow sample loader to use global pre-inliner's decision instead. The pre-inliner in llvm-profgen makes inline decision globally based on whole program profile and function byte size as cost proxy.
Since pre-inliner also adjusts/merges context profile based on its inline decision, honoring its inline decision in sample loader would lead to better post-inline profile quality especially for thinlto where cross module profile merging isn't possible without pre-inliner.
Minor fix in profile reader is also included. When pre-inliner is use, we now also turn off the default merging and trimming logic unless it's explicitly asked.
Differential Revision: https://reviews.llvm.org/D108677
The --set-section-flags option was being ignored when adding a new
section. Take it into account if present.
Fixes https://llvm.org/PR51244
Reviewed By: jhenderson, MaskRay
Differential Revision: https://reviews.llvm.org/D106942
Moved View.h and View.cpp from /tools/llvm-mca/Views/ to /lib/MCA/ and
/include/llvm/MCA/. This is so that targets can define their own Views within
the /lib/Target/ directory (so that the View can use backend functionality).
To enable these Views within mca, targets will need to add them to the vector of
Views returned by their target's CustomBehaviour::getViews() methods.
Differential Revision: https://reviews.llvm.org/D108520
This is a follow up diff for BinarySizeContextTracker to track zero size for fully optimized inlinee. When an inlinee is fully optimized away, we won't be able to get its size through symbolizing instructions, hence we will treat the corresponding context size as unknown. However by traversing the inlined probe forest, we know what're original inlinees regardless of optimization. If a context show up in inlined probes, but not during symbolization, we know that it's fully optimized away hence its size is zero instead of unknown. It should provide more accurate size cost estimation for pre-inliner to make better inline decisions in llvm-profgen.
Differential Revision: https://reviews.llvm.org/D108350
No demangling may be a better default in the future.
Add `--demangle` for migration convenience.
Reviewed By: Enna1
Differential Revision: https://reviews.llvm.org/D108100
The implementation uses the int_asan_check_memaccess intrinsic to instrument the code. The intrinsic is replaced by a call to a function which performs the access check. The generated function names encode the input register name as a number using Reg - X86::NoRegister formula.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D107850
This removes the data layout, target triple, source filename, and module
identifier when possible.
Reviewed By: swamulism
Differential Revision: https://reviews.llvm.org/D108568
Commit 9f2967bcfe introduced support for
branch coverage including export to the LCOV format.
This commit corrects the LCOV field name for branches from BFH to BRH.
The mistake seems to have slipped in as typo because the correct field
name BRH is used in the comment section at the beginning of the file.
Differential Revision: https://reviews.llvm.org/D108358
A couple of passes that are parameterized in new-PM used different
pass names (in cmd line interface) while using the same pass class
name. This patch updates the PassRegistry to model pass parameters
more properly using PASS_WITH_PARAMS.
Reason for the change is to ensure that we have a 1-1 mapping
between class name and pass name (when disregarding the params).
With a 1-1 mapping it is more obvious which pass name to use in
options such as -debug-only, -print-after etc.
The opt -passes syntax is changed for the following passes:
early-cse-memssa => early-cse<memssa>
post-inline-ee-instrument => ee-instrument<post-inline>
loop-extract-single => loop-extract<single>
lower-matrix-intrinsics-minimal => lower-matrix-intrinsics<minimal>
This patch is not updating pass names in docs/Passes.rst. Not quite
sure what the status is for that document (e.g. when it comes to
listing pass paramters). It is only loop-extract-single that is
mentioned in Passes.rst today, out of the passes mentioned above.
Differential Revision: https://reviews.llvm.org/D108362
Support XCOFFDumper relocation reading support
This patch is part of D103696 partition
Reviewed By: daltenty, Helflym
Differential Revision: https://reviews.llvm.org/D104646
Refactored implementation of AddressSanitizerPass and
HWAddressSanitizerPass to use pass options similar to passes like
MemorySanitizerPass. This makes sure that there is a single mapping
from class name to pass name (needed by D108298), and options like
-debug-only and -print-after makes a bit more sense when (despite
that it is the unparameterized pass name that should be used in those
options).
A result of the above is that some pass names are removed in favor
of the parameterized versions:
- "khwasan" is now "hwasan<kernel;recover>"
- "kasan" is now "asan<kernel>"
- "kmsan" is now "msan<kernel>"
Differential Revision: https://reviews.llvm.org/D105007
Currently, `printHelp` behaves differently for options that:
* do not define `HelpText` (such options _are not printed_), and
* define its `HelpText` as `HelpText<"">` (such options _are printed_).
In practice, both approaches lead to no help text and `printHelp` should
treat them consistently. This patch addresses that by making
`printHelpt` check the length of the help text to be printed.
All affected tests have been updated accordingly. The option definitions
for llvm-cvtres have been updated with a short description or "Not
implemented" for options that are ignored by the tool.
Differential Revision: https://reviews.llvm.org/D107557
This change enables llvm-profgen to use accurate context-sensitive post-optimization function byte size as a cost proxy to drive global preinline decisions.
To do this, BinarySizeContextTracker is introduced to track function byte size under different inline context during disassembling. In preinliner, we can not query context byte size under switch `context-cost-for-preinliner`. The tracker uses a reverse trie to keep size of functions under different context (callee as parent, caller as child), and it can give best/longest possible matching context size for given input context.
The new size cost is off by default. There're a few TODOs that needs to addressed: 1) avoid dangling string from `Offset2LocStackMap`, which will be addressed in split context work; 2) using inlinee's entry probe to make sure we have correct zero size for inlinee that's completely optimized away after inlining. Some tuning is also needed.
Differential Revision: https://reviews.llvm.org/D108180
This patch implements Flow Sensitive Sample FDO (FSAFDO) profile
loader. We have two profile loaders for FS profile,
one before RegAlloc and one before BlockPlacement.
To enable it, when -fprofile-sample-use=<profile> is specified,
add "-enable-fs-discriminator=true \
-disable-ra-fsprofile-loader=false \
-disable-layout-fsprofile-loader=false"
to turn on the FS profile loaders.
Differential Revision: https://reviews.llvm.org/D107878
This change adds support to ORCv2 and the Orc runtime library for static
initializers, C++ static destructors, and exception handler registration for
ELF-based platforms, at present Linux and FreeBSD on x86_64. It is based on the
MachO platform and runtime support introduced in bb5f97e3ad.
Patch by Peter Housel. Thanks very much Peter!
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D108081
When option `--symbolize` is true, llvm-xray convert will demangle function
name on default. This patch adds a llvm-xray convert option `no-demangle` to
determine whether to demangle function name when symbolizing function ids from
the input log.
Reviewed By: MaskRay, smeenai
Differential Revision: https://reviews.llvm.org/D108019
Change to use unique pointer of profiled binary to unblock asan.
At same time, I realized we can decouple to move the profiled binary loading out of PerfReader, so I made some other related refactors.
Reviewed By: hoy
Differential Revision: https://reviews.llvm.org/D108254
The current implementation of printAttributes makes it fiddly to extend
attribute support for new targets.
By refactoring the code so all target specific variables are
initialized in a switch/case statement, it becomes simpler to extend
attribute support for new targets.
Reviewed By: jhenderson, MaskRay
Differential Revision: https://reviews.llvm.org/D107968
As we decided to support only one binary each time, this patch cleans up the related code dealing with multiple binaries. We can use `llvm-profdata` to merge profile from multiple binaries.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D108002
Similar to D94907 (llvm-nm -D).
The output will match GNU objdump 2.37.
Older versions don't use ` (version)` for undefined symbols.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D108097
The new ELF notes are added in clang-offload-wrapper, and llvm-readobj has to visualize them properly.
Differential Revision: https://reviews.llvm.org/D99552
A DiffConsumer object may be reused, but we'd like to reset it before
the next use.
No functionality change intended.
Differential Revision: https://reviews.llvm.org/D107985
Previoulsy debug-info-for-profiling and pseudo-probe-for-profiling are mutual exclusive because they compete the dwarf discrimnator for callsites on the IR. This changes allows to use the two switches together. The side effect is that callsite discriminators will be taken by pseudo probe, while discriminators for other instructions are still available for AutoFDO use. This is less than ideal, however, it still allows us a chance to smoothly transition from AutoFDO to CSSPGO, by collecting both profiles from a CSSPGO binary.
Reviewed By: wenlei, wmi
Differential Revision: https://reviews.llvm.org/D107876
As for now, llvm-objcopy sorts section headers according to the offsets
of the sections in the input file. That can corrupt section references
in the dynamic symbol table because it is a loadable section and as such
is not updated by the tool. Even though the section references are not
required for loading the binary correctly, they are still handy for a
user who analyzes the file.
While the patch removes global reordering of section headers, it layouts
the sections in the same way as before, i.e. according to their original
offsets. All that helps the output file to resemble the input better.
Note that the patch removes sorting SHT_GROUP sections to the start of
the list, which was introduced in D62620 in order to ensure that they
come before the group members, along with the corresponding test. The
original issue was caused by the sorting of section headers, so dropping
the sorting also resolves the issue.
Differential Revision: https://reviews.llvm.org/D107653
Currently we use a centralized string map(StringMap<FunctionSamples> ProfileMap) to store the profile while populating the sample, which might cause the memory usage bottleneck. I saw in an extreme case, there are thousands of samples whose context stack depth is >= 100. The memory consumption can be greater than 100GB.
As here the context is used for inlining, we can assume we won't have so many of inlinees keeping inlined at the same root function, so this change tried to cap the context stack and merge the samples for peak memory reduction and this is done after recursion compression.
The default value is -1 meaning no depth limit, in the future we can tune to a smaller one.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D107800
Some files still contained the old University of Illinois Open Source
Licence header. This patch replaces that with the Apache 2 with LLVM
Exception licence.
Differential Revision: https://reviews.llvm.org/D107528
The patch removes mutable accessor methods for sections and segments.
As for now, const variants of them are not used because all callers have
mutable access to an instance of Object. On the other hand, they do not
actually modify the sets, so it looks better to keep only const ones.
Differential Revision: https://reviews.llvm.org/D107652
This is related to PR51392.
Before this patch, the timeline view was rounding doubles to the first decimal,
using a logic similar to this:
```
double AverageTime = (double)Input / CumulativeExecutions;
double Result = floor((AverageTime * 10) + 0.5) / 10
```
Here, Input and CumulativeExecutions are both unsigned integers.
The last operation is what effectively performs the rounding of AverageTime.
PR51392 has been raised because - under specific -m32 configurations of GCC -
one of the timeline tests reports slighlty different values (due to a different
rounding choice).
This patch tries to minimise the propagation of floating-point error by
hoisting the multiply by 10, so that it is performed on the unsigned.
```
double AverageTime = (double)(Input * 10) / CumulativeExecutions;
floor(AverageTime + 0.5) / 10
```
So we are trading a floating point multiply for a integer multiply (which can be
expanded using a simple MUL or using an `ADD + LEA` sequence). This decrease in
floating point operations executed should also help with decreasing the error in
the computation..
Strictly speaking, that computation will always be potentially subject to error
(depending on what values are passed in input). However, this patch should
improve the situation and make bug like PR51392 less frequent.
Fix an edge case missed by https://reviews.llvm.org/D78921. For e.g.,
the Repro debug entry (generated with the /Brepro linker flag) does not
have a debug-directory payload. Do not attempt to patch Debug entries
without a payload.
Differential Revision: https://reviews.llvm.org/D107324
One performance issue happened in profile generation and it turned out the line 525 loop is the bottleneck.
Moving the code outside of loop scope can fix this issue. The run time is improved from 30+mins to ~30s.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D107529
Some tools may want to use the LLVM "diff" code. Move the code into a
library for easy use.
No functionality change intende.
Differential Revision: https://reviews.llvm.org/D107392
Some tools may want to use the LLVM "diff" code. Move the code into a
library for easy use.
No functionality change intende.
Differential Revision: https://reviews.llvm.org/D107392
This option is always interpreted strictly as a hexadecimal string,
even if it has no prefix that indicates the number format, hence
the existing call to StringRef::getAsInteger(16, ...).
StringRef::getAsInteger(0, ...) consumes a leading "0x" prefix is
present, but when the radix is specified, the radix shouldn't
be included.
Both MS rc.exe and GNU windres accept the language with that
prefix.
Also allow specifying the codepage to llvm-windres with a different
radix, as GNU windres allows that (but MS rc.exe doesn't).
This fixes https://llvm.org/PR51295.
Differential Revision: https://reviews.llvm.org/D107263
Migrate pseudo probe decoding logic in llvm-profgen to MC, so other LLVM-base program could reuse existing codes. Redesign object layout of encoded and decoded pseudo probes.
Reviewed By: hoy
Differential Revision: https://reviews.llvm.org/D106861
This change tried to integrate a new count based aggregated type of perf script. The only difference of the format is that an aggregated count is added at the head of the original sample which means the same samples are repeated to the given count times. This is used to reduce the perf script size.
e.g.
```
2
4005dc
400634
400684
7f68c5788793
0x4005c8/0x4005dc/P/-/-/0 ....
```
Implemented by a dedicated PerfReader `AggregatedHybridPerfReader`.
Differential Revision: https://reviews.llvm.org/D107192
This change supports to run without parsing MMap binary loading events instead it always assumes binary is loaded at the preferred address. This is used when we have assured no binary load address changes or we have pre-processed the addresses resolution. Warn if there's interior mmap event but without leading mmap events.
Reviewed By: hoy
Differential Revision: https://reviews.llvm.org/D107097
As detailed on https://pvs-studio.com/en/blog/posts/cpp/0771/ and raised on D62583, the SecNo++ increment is not guaranteed to occur before the second use of SecNo in the same addSection() call.
This patch pulls out the increment (just for clarity) and replaces the second use of SecNo with a constant zero value (we're using stable_sort so the value isn't critical).
Differential Revision: https://reviews.llvm.org/D107273
item of StringTable.
Summary: For the string table in XCOFF, the first 4 bytes
contains the length of the string table, so we should
print the string entries from fifth bytes. This patch
also adds tests for llvm-readobj dumping the string
table.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D105522
In order to support different types of perf scripts, this change tried to refactor `PerfReader` by adding the base class `PerfReaderBase` and current HybridPerfReader is derived from it for CS profile generation. Common functions like, passMM2PEvents, extract_lbrs, extract_callstack, etc. can be reused.
Next step is to add LBR only reader(for non-CS profile) and aggregated perf scripts reader(do a pre-aggregation of scripts).
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D107014
When we build with split dwarf in single mode the .o files that contain both "normal" debug sections and dwo sections, along with relocaiton sections for "normal" debug sections.
When we create DWARF context in DWARFObjInMemory we process relocations and store them in the map for .debug_info, etc section.
For DWO Context we also do it for non dwo dwarf sections. Which I believe is not necessary. This leads to a lot of memory being wasted. We observed 70GB extra memory being used.
I went with context sensitive approach, flag is passed in. I am not sure if it's always safe not to process relocations for regular debug sections if Obj contains .dwo sections.
If it is alternatvie might be just to scan, in constructor, sections and if there are .dwo sections not to process regular debug ones.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D106624
Pulled out the OptimizationLevel class from PassBuilder in order to be able to access it from within the PassManager and avoid include conflicts.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D107025
The LC_SUB_FRAMEWORK, LC_SUB_UMBRELLA, LC_SUB_CLIENT, and LC_SUB_LIBRARY
are used to indicate related libraries, binaries or framework names.
Their only payload is the string with the name of the object. Adding
those commands to the list of ignored/skipped load commands will avoid
an error that stop the process of copying/stripping and will copy their
contents verbatim.
Additionally, in order to have a test for this case, `yaml2obj` now
allows those four commands to contain a `Content`.
Differential Revision: https://reviews.llvm.org/D106412
[[noreturn]] can be used since Oct 2016 when the minimum compiler requirement was bumped to GCC 4.8/MSVC 2015.
Note: the definition of LLVM_ATTRIBUTE_NORETURN is kept for now.
On AIX, the linker needs to check whether a given lto_module_t contains
any constructor/destructor functions, in order to implement the behavior
of the -bcdtors:all flag. See
https://www.ibm.com/docs/en/aix/7.2?topic=l-ld-command for the flag's
documentation.
In llvm IR, constructor (destructor) functions are added to a special
global array @llvm.global_ctors (@llvm.global_dtors).
However, because these two symbols are artificial, they are not visited
during the symbol traversal (using the
lto_module_get_[num_symbols|symbol_name|symbol_attribute] API).
This patch adds a new function to the libLTO interface that checks the
presence of one or both of these two symbols.
Reviewed By: steven_wu
Differential Revision: https://reviews.llvm.org/D106887
Wrapper function call and dispatch handler helpers are moved to
ExecutionSession, and existing EPC-based tools are re-written to take an
ExecutionSession argument instead.
Requiring an ExecutorProcessControl instance simplifies existing EPC based
utilities (which only need to take an ES now), and should encourage more
utilities to use the EPC interface. It also simplifies process termination,
since the session can automatically call ExecutorProcessControl::disconnect
(previously this had to be done manually, and carefully ordered with the
rest of JIT tear-down to work correctly).
These tests access private symbols in the backends, so they cannot link
against libLLVM.so and must be statically linked. Linking these tests
can be slow and with debug builds the resulting binaries use a lot of
disk space.
By merging them into a single test binary means we now only need to
statically link 1 test instead of 6, which helps reduce the build
times and saves disk space.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D106464
See [GRP_COMDAT group with STB_LOCAL signature](https://groups.google.com/g/generic-abi/c/2X6mR-s2zoc)
objcopy PR: https://sourceware.org/bugzilla/show_bug.cgi?id=27931
GRP_COMDAT deduplication is purely based on the signature symbol name in
ld.lld/GNU ld/gold. The local/global status is not part of the equation.
If the signature symbol is localized by --localize-hidden or
--keep-global-symbol, the intention is likely to make the group fully
localized. Drop GRP_COMDAT to suppress deduplication.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D106782
The current implementation of displaying .stack_size information
presumes that each entry represents a single function but this is not
always the case. For example with the use of ICF multiple functions can
be represented with the same code, meaning that the address found in a
.stack_size entry corresponds to multiple function symbols.
This change allows multiple function names to be displayed when
appropriate.
Differential Revision: https://reviews.llvm.org/D105884
This matches what MS rc.exe allows in practice. I'm not aware of
any legal syntax case that are broken by allowing dashes as part
of what the tokenizer considers an Identifier - but I'm not
very well versed in the RC syntax either, can @amccarth think of
any case that would be broken by this?
This fixes downstream bug
https://github.com/msys2/MINGW-packages/issues/9180.
Additionally, rc.exe allows such resource name strings to be surrounded
by quotes, ending up with e.g.
Resource name (string): "QUOTEDNAME"
(i.e., the quotes end up as part of the string), which llvm-rc doesn't
support yet either. (I'm not aware of such cases in the wild though,
but resource string names with dashes do exist.)
This also allows including files with unquoted paths, with filenames
containing dashes (which fixes
https://github.com/msys2/MINGW-packages/issues/9130, which has been
worked around differently so far).
Differential Revision: https://reviews.llvm.org/D106598
Most modern tools only accept two-dash long options. Remove one-dash
long options which are not recognized by GNU style `getopt_long`.
This ensures long options cannot collide with grouped short options.
Note: llvm-symbolizer has `-demangle={true,false}` for pprof compatibility
(for a while). They are kept.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D106377
When run the command in the llvm-mc-assemble-fuzzer document,
```
llvm-mc-fuzzer --triple=aarch64-linux-gnu --fuzzer-args -max_len=4
```
it triggers the following assertion:
```
llvm-mc-assemble-fuzzer:
llvm-project/llvm/lib/MC/MCTargetOptionsCommandFlags.cpp:38:
bool llvm::mc::getRelaxAll(): Assertion `RelaxAllView &&
"RegisterMCTargetOptionsFlags not created."' failed.
```
It is caused by no global RegisterMCTargetOptionsFlags object to initialize
the MC target options.
Differential Revision: https://reviews.llvm.org/D106417
Add support for all built-in text macros supported by ML64:
@Date, @Time, @FileName, @FileCur, and @CurSeg.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D104965
This patch allows iterating typed enum via the ADT/Sequence utility.
It also changes the original design to better separate concerns:
- `StrongInt` only deals with safe `intmax_t` operations,
- `SafeIntIterator` presents the iterator and reverse iterator
interface but only deals with safe `StrongInt` internally.
- `iota_range` only deals with `SafeIntIterator` internally.
This design ensures that operations are always valid. In particular,
"Out of bounds" assertions fire when:
- the `value_type` is not representable as an `intmax_t`
- iterator operations make internal computation underflow/overflow
- the internal representation cannot be converted back to `value_type`
Differential Revision: https://reviews.llvm.org/D106279
This is a step1, mechanical refactor, of moving the bulk of llvm-dwp functionality in to a library. This should allow other tools, like BOLT, to re-use some of the llvm-dwp functionality.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D106198
In PGO, a C++ external linkage function `foo` has a private counter
`__profc_foo` and a private `__profd_foo` in a `comdat nodeduplicate`.
A `__attribute__((weak))` function `foo` has a weak hidden counter `__profc_foo`
and a private `__profd_foo` in a `comdat nodeduplicate`.
In `ld.lld a.o b.o`, say a.o defines an external linkage `foo` and b.o
defines a weak `foo`. Currently we treat `comdat nodeduplicate` as `comdat any`,
ld.lld will incorrectly consider `b.o:__profc_foo` non-prevailing. In the worst
case when `b.o:__profd_foo` is retained and `b.o:__profc_foo` isn't, there will
be dangling reference causing an `undefined hidden symbol` error.
Add SelectionKind to `Comdat` in IRSymtab and let linkers ignore nodeduplicate comdat.
Differential Revision: https://reviews.llvm.org/D106228
This diff changes llvm-ifs to use unified IFS file format
and perform other renaming changes in preparation for the
merging between elfabi/ifs.
Differential Revision: https://reviews.llvm.org/D99810
This change implements unified text stub format and command line
interface proposed in the elfabi/ifs merge plan.
Differential Revision: https://reviews.llvm.org/D99399
Adds support for MachO static initializers/deinitializers and eh-frame
registration via the ORC runtime.
This commit introduces cooperative support code into the ORC runtime and ORC
LLVM libraries (especially the MachOPlatform class) to support macho runtime
features for JIT'd code. This commit introduces support for static
initializers, static destructors (via cxa_atexit interposition), and eh-frame
registration. Near-future commits will add support for MachO native
thread-local variables, and language runtime registration (e.g. for Objective-C
and Swift).
The llvm-jitlink tool is updated to use the ORC runtime where available, and
regression tests for the new MachOPlatform support are added to compiler-rt.
Notable changes on the ORC runtime side:
1. The new macho_platform.h / macho_platform.cpp files contain the bulk of the
runtime-side support. This includes eh-frame registration; jit versions of
dlopen, dlsym, and dlclose; a cxa_atexit interpose to record static destructors,
and an '__orc_rt_macho_run_program' function that defines running a JIT'd MachO
program in terms of the jit- dlopen/dlsym/dlclose functions.
2. Replaces JITTargetAddress (and casting operations) with ExecutorAddress
(copied from LLVM) to improve type-safety of address management.
3. Adds serialization support for ExecutorAddress and unordered_map types to
the runtime-side Simple Packed Serialization code.
4. Adds orc-runtime regression tests to ensure that static initializers and
cxa-atexit interposes work as expected.
Notable changes on the LLVM side:
1. The MachOPlatform class is updated to:
1.1. Load the ORC runtime into the ExecutionSession.
1.2. Set up standard aliases for macho-specific runtime functions. E.g.
___cxa_atexit -> ___orc_rt_macho_cxa_atexit.
1.3. Install the MachOPlatformPlugin to scrape LinkGraphs for information
needed to support MachO features (e.g. eh-frames, mod-inits), and
communicate this information to the runtime.
1.4. Provide entry-points that the runtime can call to request initializers,
perform symbol lookup, and request deinitialiers (the latter is
implemented as an empty placeholder as macho object deinits are rarely
used).
1.5. Create a MachO header object for each JITDylib (defining the __mh_header
and __dso_handle symbols).
2. The llvm-jitlink tool (and llvm-jitlink-executor) are updated to use the
runtime when available.
3. A `lookupInitSymbolsAsync` method is added to the Platform base class. This
can be used to issue an async lookup for initializer symbols. The existing
`lookupInitSymbols` method is retained (the GenericIRPlatform code is still
using it), but is deprecated and will be removed soon.
4. JIT-dispatch support code is added to ExecutorProcessControl.
The JIT-dispatch system allows handlers in the JIT process to be associated with
'tag' symbols in the executor, and allows the executor to make remote procedure
calls back to the JIT process (via __orc_rt_jit_dispatch) using those tags.
The primary use case is ORC runtime code that needs to call bakc to handlers in
orc::Platform subclasses. E.g. __orc_rt_macho_jit_dlopen calling back to
MachOPlatform::rt_getInitializers using __orc_rt_macho_get_initializers_tag.
(The system is generic however, and could be used by non-runtime code).
The new ExecutorProcessControl::JITDispatchInfo struct provides the address
(in the executor) of the jit-dispatch function and a jit-dispatch context
object, and implementations of the dispatch function are added to
SelfExecutorProcessControl and OrcRPCExecutorProcessControl.
5. OrcRPCTPCServer is updated to support JIT-dispatch calls over ORC-RPC.
6. Serialization support for StringMap is added to the LLVM-side Simple Packed
Serialization code.
7. A JITLink::allocateBuffer operation is introduced to allocate writable memory
attached to the graph. This is used by the MachO header synthesis code, and will
be generically useful for other clients who want to create new graph content
from scratch.
If a file has no symbols, perhaps because it is a linked executable,
synthesize some symbols by walking the code section. Otherwise the
disassembler will try to treat the whole code section as a function,
which won't parse. Fixes https://bugs.llvm.org/show_bug.cgi?id=50957.
Differential Revision: https://reviews.llvm.org/D105539
llvm-readelf is a user-facing tool which emulates GNU readelf. Remove one-dash
long options which are not recognized by GNU style `getopt_long`. This ensures
long options cannot collide with grouped short options.
Note: the documentation (D63719)/help messages have recommended the double-dash
forms since LLVM 9.0.0.
llvm-readobj is intended as an internal tool which has some flexibility.
llvm-readelf/llvm-readobj use the same option parsing code and llvm-readobj's
one-dash long options aren't used after test migration.
Differential Revision: https://reviews.llvm.org/D106037
We can build it with -Werror=global-constructors now. This helps
in situation where libSupport is embedded as a shared library,
potential with dlopen/dlclose scenario, and when command-line
parsing or other facilities may not be involved. Avoiding the
implicit construction of these cl::opt can avoid double-registration
issues and other kind of behavior.
Reviewed By: lattner, jpienaar
Differential Revision: https://reviews.llvm.org/D105959