This removes `WasmTagType`. `WasmTagType` contained an attribute and a
signature index:
```
struct WasmTagType {
uint8_t Attribute;
uint32_t SigIndex;
};
```
Currently the attribute field is not used and reserved for future use,
and always 0. And that this class contains `SigIndex` as its property is
a little weird in the place, because the tag type's signature index is
not an inherent property of a tag but rather a reference to another
section that changes after linking. This makes tag handling in the
linker also weird that tag-related methods are taking both `WasmTagType`
and `WasmSignature` even though `WasmTagType` contains a signature
index. This is because the signature index changes in linking so it
doesn't have any info at this point. This instead moves `SigIndex` to
`struct WasmTag` itself, as we did for `struct WasmFunction` in D111104.
In this CL, in lib/MC and lib/Object, this now treats tag types in the
same way as function types. Also in YAML, this removes `struct Tag`,
because now it only contains the tag index. Also tags set `SigIndex` in
`WasmImport` union, as functions do.
I think this makes things simpler and makes tag handling more in line
with function handling. These two shares similar properties in that both
of them have signatures, but they are kind of nominal so having the same
signature doesn't mean they are the same element.
Also a drive-by fix: the reserved 'attirubute' part's encoding changed
from uleb32 to uint8 a while ago. This was fixed in lib/MC and
lib/Object but not in YAML. This doesn't change object files because the
field's value is always 0 and its encoding is the same for the both
encoding.
This is effectively NFC; I didn't mark it as such just because it
changed YAML test results.
Reviewed By: sbc100, tlively
Differential Revision: https://reviews.llvm.org/D111086
As described on D111049, we're trying to remove the <string> dependency from error handling and replace uses of report_fatal_error(const std::string&) with the Twine() variant which can be forward declared.
We can use the raw_string_ostream::str() method to perform the implicit flush() and return a reference to the std::string container that we can then wrap inside Twine().
https://commondatastorage.googleapis.com/chromium-browser-clang/llvm-include-analysis.html
Excessive use of the <string> header has a massive impact on compile time; its most commonly included via the ErrorHandling.h header, which has to be included in many key headers, impacting many source files that have no need for std::string.
As an initial step toward removing the <string> include from ErrorHandling.h, this patch proposes to update the fatal_error_handler_t handler to just take a raw const char* instead.
The next step will be to remove the report_fatal_error std::string variant, which will involve a lot of cleanup and better use of Twine/StringRef.
Differential Revision: https://reviews.llvm.org/D111049
This change adds duplication factor multiplier while accumulating body samples for line-number based profile. The body sample count will be `duplication-factor * count`. Base discriminator and duplication factor is decoded from the raw discriminator, this requires some refactor works.
Differential Revision: https://reviews.llvm.org/D109934
This simplifies the code in a number of ways and avoids
having to track functions and their types separately.
Differential Revision: https://reviews.llvm.org/D111104
D104366 introduced a new llvm-cxxfilt test with non-ASCII characters,
which caused a failure on llvm-clang-x86_64-expensive-checks-win
builder, with a stack trace suggesting issue in a call to isalnum.
The argument to isalnum should be either EOF or a value that is
representable in the type unsigned char. The llvm-cxxfilt does not
perform a cast from char to unsigned char before the call, so the
value might be out of valid range.
Replace the call to isalnum with isAlnum from StringExtras, which takes
a char as the argument. This also makes the check independent of the
current locale.
Differential Revision: https://reviews.llvm.org/D110986
With the removal of OrcRPCExecutorProcessControl and OrcRPCTPCServer in
6aeed7b19c the ORC RPC library no longer has any in-tree users.
Clients needing serialization for ORC should move to Simple Packed
Serialization (usually by adopting SimpleRemoteEPC for remote JITing).
We expose the fact that we rely on unsigned wrapping to iterate through
all indexes. This can be confusing. Rather, keeping it as an
implementation detail through an iterator is less confusing and is less
code.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D110885
Summary:
for xcoff :
implement the getSymbolFlag and getSymbolType() for option --syms.
llvm-objdump --sym , if the symbol is label, print the containing section for the symbol too.
when using llvm-objdump --sym --symbol--description, print the symbol index and qualname for symbol.
for example:
--symbol-description
00000000000000c0 l .text (csect: (idx: 2) .foov[PR]) (idx: 3) .foov
and without --symbol-description
00000000000000c0 l .text (csect: .foov) .foov
Reviewers: James Henderson,Esme Yi
Differential Revision: https://reviews.llvm.org/D109452
LLVM (llvmorg-14-init) under Debian sid using latest gcc (Debian
10.3.0-9) 10.3.0 fails due to ambiguous overload on operators == and !=:
/root/src/llvm/src/llvm/tools/obj2yaml/elf2yaml.cpp:212:22:
error: ambiguous overload for 'operator!='
(operand types are 'llvm::ELFYAML::ELF_SHF' and 'int')
/root/src/llvm/src/llvm/tools/obj2yaml/elf2yaml.cpp:204:32:
error: ambiguous overload for 'operator!='
(operand types are 'const llvm::yaml::Hex64' and 'int')
/root/src/llvm/src/llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp:629:35:
error: ambiguous overload for 'operator=='
(operand types are 'const uint64_t' {aka 'const long unsigned int'} and
'llvm::Register')
Reviewed by: StephenTozer, jmorse, Higuoxing
Differential Revision: https://reviews.llvm.org/D109534
When replacing function calls, skip call instructions where the old
function is not the called function, but e.g. the old function is passed
as an argument.
This fixes a crash due to trying to construct invalid IR for the test
case.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D109759
We used the segment alignment in elf header to assume the loader alignment. However this is incorrect because loader alignment is always the same as page size. If segment needs to be aligned at load time, linker will set aligned address as virtual address in elf header.
Differential Revision: https://reviews.llvm.org/D110795
This change enables llvm-profgen to take raw perf data as alternative input format. Sometimes we need to retrieve evenets for processes with matching binary. Using perf data as input allows us to retrieve process Ids from mmap events for matching binary, then filter by process id during perf script generation.
Differential Revision: https://reviews.llvm.org/D110793
This change contains diagnostics improvments, refactoring and preparation for consuming perf data directly.
Diagnostics:
- We now have more detailed diagnostics when no mmap is found.
- We also print warning for abnormal transition to external code.
Refactoring:
- Simplify input perf trace processing to only allow a single input file. This is because 1) using multiple input perf trace (perf script) is error prone because we may miss key mmap events. 2) the functionality is not really being used anyways.
- Make more functions private for Readers, move non-trivial definitions out of header. Cleanup some inconsistency.
- Prepare for consuming perf data as input directly.
Differential Revision: https://reviews.llvm.org/D110729
The ReduceMetadata pass before this patch removed metadata on a per-MDNode (or NamedMDNode) basis. Either all references to an MDNode are kept, or all of them are removed. However, MDNodes are uniqued, meaning that references to MDNodes with the same data become references to the same MDNodes. As a consequence, e.g. tbaa references to the same type will all have the same MDNode reference and hence make it impossible to reduce only keeping metadata on those memory access for which they are interesting.
Moreover, MDNodes can also be referenced by some intrinsics or other MDNodes. These references were not considered for removal leading to the possibility that MDNodes are not actually removed even if selected to be removed by the oracle.
This patch changes ReduceMetadata to reduces based on removable metadata references instead. MDNodes without references implicitly dropped anyway. References by intrinsic calls should be removed by ReduceOperands or ReduceInstructions. References in other MDNodes cannot be removed as it would violate the immutability of MDNodes.
Additionally, ReduceMetadata pass before this patch used `setMetadata(I, NULL)` to remove references, where `I` is the index in the array returned by `getAllMetadata`. However, `setMetadata` expects a MDKind (such as `MD_tbaa`) as first argument. `getAllMetadata` does not return those in consecutive order (otherwise it would not need to be a `std::pair` with `first` representing the MDKind).
Reviewed By: aeubanks, swamulism
Differential Revision: https://reviews.llvm.org/D110534
As for now, llvm-objcopy renames only sections that are specified
explicitly in --rename-section, while GNU objcopy keeps names of
relocation sections in sync with their targets. For example:
> readelf -S test.o
...
[ 1] .foo PROGBITS
[ 2] .rela.foo RELA
> objcopy --rename-section .foo=.bar test.o gnu.o
> readelf -S gnu.o
...
[ 1] .bar PROGBITS
[ 2] .rela.bar RELA
> llvm-objcopy --rename-section .foo=.bar test.o llvm.o
> readelf -S llvm.o
...
[ 1] .bar PROGBITS
[ 2] .rela.foo RELA
This patch makes llvm-objcopy to match the behavior of GNU objcopy better.
Differential Revision: https://reviews.llvm.org/D110352
The slab allocator is frequently used in -noexec tests where we want a
consistent memory layout. In this context we also want to set the effective
page size, rather than using the page size of the host process, since not all
systems use the same page size. The -slab-page-size option allows us to set
the page size for such tests.
The -slab-page-size option will also be honored in exec mode when using the
slab allocator, but will trigger an error if the requested size is not a
multiple of the actual process page size.
This option was motivated by test failures on a ppc64 bot that was returning
zero from sys::Process::getPageSize(), so it also contains a check for errors
and zero results from that function if the -slab-page-size option is absent.
Existing slab allocator tests will be updated to use this option in a follow-up
commit so that we can point the failing bot at this commit and observe errors
associated with sys::Process::getPageSize().
* Add a newline before `DYNAMIC RELOCATION RECORDS` (see D101796)
* Add the missing `OFFSET TYPE VALUE` line
* Align columns
Note: llvm-readobj/ELFDumper.cpp `loadDynamicTable` has sophisticated PT_DYNAMIC
code which is unavailable in llvm-objdump.
Reviewed By: jhenderson, Higuoxing
Differential Revision: https://reviews.llvm.org/D110595
Similar to https://reviews.llvm.org/D110465, we can compute function size on-demand for the functions that's hit by samples.
Here we leverage the raw range samples' address to compute a set of sample hit function. Then `BinarySizeContextTracker` just works on those function range for the size.
Reviewed By: hoy
Differential Revision: https://reviews.llvm.org/D110466
Previously we do symbolization for all the functions and actually we only need the symbols that's hit by the samples.
This can significantly speed up the time for large size binary.
Optimization for per-inliner will come along with next patch.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D110465
The MSP430 ABI supports build attributes for specifying
the ISA, code model, data model and enum size in ELF object files.
Differential Revision: https://reviews.llvm.org/D107969
This change is to add some missing details to the help text and command
guide:
- Added a note to the command guide that --debug-macro also dumps
.debug_macinfo.
- Added a note to the command guide that --debug-frame and --eh_frame
are aliases, and in cases where both sections are present one command
outputs both.
- Changed the wording in the help output for --ignore-case and --regex to
closer match the command guide.
We want this behavior for future testing infrastructure anyway, and it may help
with the failure in https://lab.llvm.org/buildbot/#/builders/98/builds/6401:
/b/fuchsia-x86_64-linux/llvm.obj/tools/clang/stage2-bins/bin/lli: warning:
remote mcjit does not support lazy compilation
Finalization error: could not register eh-frame: __register_frame function not
found
/b/fuchsia-x86_64-linux/llvm.obj/tools/clang/stage2-bins/bin/lli: disconnecting
This reintroduces "[ORC] Introduce EPCGenericRTDyldMemoryManager."
(bef55a2b47) and "[lli] Add ChildTarget dependence
on OrcTargetProcess library." (7a219d801b) which were
reverted in 99951a5684 due to bot failures.
The root cause of the bot failures should be fixed by "[ORC] Fix uninitialized
variable." (0371049277) and "[ORC] Wait for
handleDisconnect to complete in SimpleRemoteEPC::disconnect."
(320832cc9b).
This reverts commit bef55a2b47 while I investigate
failures on some bots. Also reverts "[lli] Add ChildTarget dependence on
OrcTargetProcess library." (7a219d801b) which was
a fallow-up to bef55a2b47.
EPCGenericRTDyldMemoryMnaager is an EPC-based implementation of the
RuntimeDyld::MemoryManager interface. It enables remote-JITing via EPC (backed
by a SimpleExecutorMemoryManager instance on the executor side) for RuntimeDyld
clients.
The lli and lli-child-target tools are updated to use SimpleRemoteEPC and
SimpleRemoteEPCServer (rather than OrcRemoteTargetClient/Server), and
EPCGenericRTDyldMemoryManager for MCJIT tests.
By enabling remote-JITing for MCJIT and RuntimeDyld-based ORC clients,
EPCGenericRTDyldMemoryManager allows us to deprecate older remote-JITing
support, including OrcTargetClient/Server, OrcRPCExecutorProcessControl, and the
Orc RPC system itself. These will be removed in future patches.
In order to be consistent with compiler that interprets zero count as unexecuted(cold), this change reports zero-value count for unexecuted part of function code. For the implementation, it leverages the range counter, initializes all the executed function range with the zero-value. After all ranges are merged and converted into disjoint ranges, the remaining zero count will indicates the unexecuted(cold) part of the function.
This change also extends the current `findDisjointRanges` method which now can support adding zero-value range.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D109713
This patch introduces non-CS AutoFDO profile generation into LLVM. The profile is supposed to be well consumed by compiler using `-fprofile-sample-use=[profile]`.
After range and branch counters are extracted from the LBR sample, here we go through each addresses for symbolization, create FunctionSamples and populate its sub fields like TotalSamples, BodySamples and HeadSamples etc. For inlined code, as we need to map back to original code, so we always add body samples to the leaf frame's function sample.
Reviewed By: wenlei, hoy
Differential Revision: https://reviews.llvm.org/D109551
Similar to https://reviews.llvm.org/D109637, there is a whole invalid line of message in perfscript.
```
warning: Invalid address in LBR record at line 14118674: Processed 14138923 events and lost 1 chunks!
warning: Invalid address in LBR record at line 14118676: Check IO/CPU overload!
```
This only happened for LBR only perfscript, hybridperfscript have a check of " 0x" to make sure it's the LBR perf line.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D110424
In ThinLTO for locals we normally compute the GUID from the name after
prepending the source path to get a unique global id. SamplePGO indirect
call profiles contain the target GUID without this uniquification,
however (unless compiling with -funique-internal-linkage-names).
In order to correctly handle the call edges added to the combined index
for these indirect calls, during importing and bitcode writing we
consult a map of original to full GUID to identify the actual callee.
However, for a large application this was consuming a lot of compile
time as we need to do this repeatedly (especially during importing where
we may traverse call edges multiple times).
To fix this implement a suggestion in one of the FIXME comments, and
actually modify the call edges during a single traversal after the index
is built to perform the fixups once. I combined this fixup with the dead
code analysis performed on the index in order to avoid adding an
additional walk of the index. The dead code analysis is the first
analysis performed on the index.
This reduced the time required for a large thin link with SamplePGO by
about 20%.
No new test added, but I confirmed that there are existing tests that
will fail when no fixup is performed.
Differential Revision: https://reviews.llvm.org/D110374
This change is to keep the help text and command guide of objcopy in
tandem.
- In the help output the options --rename-section and
--set-section-flags were missing the flag exclude, which is found in
the command guide.
- In the command guide the alias -G for --keep-global-symbol was
missing, which is found in the help output.
Differential Revision: https://reviews.llvm.org/D110340
It seems we missed one spot to persist `SampleContextFrameVector` into the global table (CSProfileGenerator::populateFunctionBoundarySamples:340) which causes a crash.
This change tried to fix it in a centralized way i. e. where we generate the `FunctionSamples`.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D110275
It happened that the LBR entry target can be the first address of text section which causes an out-of-range crash. So here add a boundary check.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D110271
Without preinliner, we need to tune down the cold count cutoff to merge/trim more context to limit profile size for large components. However it doesn't make sense for cold threshold to be higher than hot threshold, so we now change to use hot threshold as merging/trimming cut off instead.
Differential Revision: https://reviews.llvm.org/D110212
For large app, dumping disasm of the whole program can be slow and result in gianant output. Adding a switch to dump specific symbols only.
Reviewed By: wlei
Differential Revision: https://reviews.llvm.org/D110079
For strided accesses the loop vectorizer seems to prefer creating a
vector induction variable with a start value of the form
<i32 0, i32 1, i32 2, ...>. This value will be incremented each
loop iteration by a splat constant equal to the length of the vector.
Within the loop, arithmetic using splat values will be done on this
vector induction variable to produce indices for a vector GEP.
This pass attempts to dig through the arithmetic back to the phi
to create a new scalar induction variable and a stride. We push
all of the arithmetic out of the loop by folding it into the start,
step, and stride values. Then we create a scalar GEP to use as the
base pointer for a strided load or store using the computed stride.
Loop strength reduce will run after this pass and can do some
cleanups to the scalar GEP and induction variable.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D107790
Finalization and deallocation actions are a key part of the upcoming
JITLinkMemoryManager redesign: They generalize the existing finalization and
deallocate concepts (basically "copy-and-mprotect", and "munmap") to include
support for arbitrary registration and deregistration of parts of JIT linked
code. This allows us to register and deregister eh-frames, TLV sections,
language metadata, etc. using regular memory management calls with no additional
IPC/RPC overhead, which should both improve JIT performance and simplify
interactions between ORC and the ORC runtime.
The SimpleExecutorMemoryManager class provides executor-side support for memory
management operations, including finalization and deallocation actions.
This support is being added in advance of the rest of the memory manager
redesign as it will simplify the introduction of an EPC based
RuntimeDyld::MemoryManager (since eh-frame registration/deregistration will be
expressible as actions). The new RuntimeDyld::MemoryManager will in turn allow
us to remove older remote allocators that are blocking the rest of the memory
manager changes.
Most PDB fields on disk are 32-bit but describe the file in terms of MSF
blocks, which are 4 kiB by default.
So PDB files can be a bit larger than 4 GiB, and much larger if you create them
with a block size > 4 kiB.
This is a first (necessary, but by far not not sufficient) step towards
supporting such PDB files. Now we don't truncate in-memory file offsets (which
are in terms of bytes, not in terms of blocks).
No effective behavior change. lld-link will still error out if it were to
produce PDBs > 4 GiB.
Differential Revision: https://reviews.llvm.org/D109923
Turn on `use-context-cost-for-preinliner` to use context-sensitive byte size cost for preinliner decisions by default.
This is a more accurate proxy of inline cost than profile size. We tested on our large workload that it delivers measureable CPU improvement.
Differential Revision: https://reviews.llvm.org/D109893
New field `elements` is added to '!DIImportedEntity', representing
list of aliased entities.
This is needed to dump optimized debugging information where all names
in a module are imported, but a few names are imported with overriding
aliases.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D109343
Invalid frame addresses exist in call stack samples due to bad unwinding. This could happen to frame-pointer-based unwinding and the callee functions that do not have the frame pointer chain set up. It isn't common when the program is built with the frame pointer omission disabled, but can still happen with third-party static libs built with frame pointer omitted.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D109638
Perf script can sometimes give disordered LBR samples like below.
```
b022500
32de0044
3386e1d1
7f118e05720c
7f118df2d81f
0x2a0b9622/0x2a0b9610/P/-/-/1 0x2a0b79ff/0x2a0b9618/P/-/-/2 0x2a0b7a4a/0x2a0b79e8/P/-/-/1 0x2a0b7a33/0x2a0b7a46/P/-/-/1 0x2a0b7a42/0x2a0b7a23/P/-/-/1 0x2a0b7a21/0x2a0b7a37/P/-/-/2 0x2a0b79e6/0x2a0b7a07/P/-/-/1 0x2a0b79d4/0x2a0b79dc/P/-/-/2 0x2a0b7a03/0x2a0b79aa/P/-/-/1 0x2a0b79a8/0x2a0b7a00/P/-/-/234 0x2a0b9613/0x2a0b7930/P/-/-/1 0x2a0b9622/0x2a0b9610/P/-/-/1 0x2a0b79ff/0x2a0b9618/P/-/-/2 0x2a0b7a4a/0x2aWarning:
Processed 10263226 events and lost 1 chunks!
```
Note that the last LBR record `0x2a0b7a4a/0x2aWarning:` . Currently llvm-profgen does not detect that and as a result an uninitialized branch target value will be used. The uninitialized value can cause creepy instruction ranges created which which in turn will result in a completely wrong profile. An example is like
```
.... @ _ZN5folly13loadUnalignedIsEET_PKv]:18446744073709551615:18446744073709551615
1: 18446744073709551615
!CFGChecksum: 4294967295
!Attributes: 0
```
Reviewed By: wenlei, wlei
Differential Revision: https://reviews.llvm.org/D109637
We previously had a limitation that TLS variables could not
be exported (and therefore could also not be imported). This
change removed that limitation.
Differential Revision: https://reviews.llvm.org/D108877
This syncs parts from the x86 implementation to the ARMWinEH
implementation.
Currently, neither of the compilers targeting COFF/arm64 (MSVC, LLVM)
produce such relocations, but LLVM might after a later patch.
Differential Revision: https://reviews.llvm.org/D109650
This is the same as we do on arm64 already for the MSVC style label
symbols, but also handle the way GCC produces it - with all relocations
pointing at the .text section symbol, with various offsets.
Differential Revision: https://reviews.llvm.org/D109649
This reapplies bb27e45643 (SimpleRemoteEPC
support) and 2269a941a4 (#include <mutex>
fix) with further fixes to support building with LLVM_ENABLE_THREADS=Off.
https://reviews.llvm.org/D47381 / eb46c95c3e
changed the triples set up by GetHostTriple.cmake for i686 MSVC
from i686-pc-win32 to i686-pc-windows-msvc without changing
the corresponding condition in llvm-shlib.
Since then, the 32 bit x86 build of LLVM-C.dll has contained no
exported symbols at all.
Differential Revision: https://reviews.llvm.org/D109493
This reverts commit 5629afea91 ("[ORC] Add missing
include."), and bb27e45643 ("[ORC] Add
SimpleRemoteEPC: ExecutorProcessControl over SPS + abstract transport.").
The SimpleRemoteEPC patch currently assumes availability of threads, and needs
to be rewritten with LLVM_ENABLE_THREADS guards.
SimpleRemoteEPC is an ExecutorProcessControl implementation (with corresponding
new server class) that uses ORC SimplePackedSerialization (SPS) to serialize and
deserialize EPC-messages to/from byte-buffers. The byte-buffers are sent and
received via a new SimpleRemoteEPCTransport interface that can be implemented to
run SimpleRemoteEPC over whatever underlying transport system (IPC, RPC, network
sockets, etc.) best suits your use case.
The SimpleRemoteEPCServer class provides executor-side support. It uses a
customizable SimpleRemoteEPCServer::Dispatcher object to dispatch wrapper
function calls to prevent the RPC thread from being blocked (a problem in some
earlier remote-JIT server implementations). Almost all functionality (beyond the
bare basics needed to bootstrap) is implemented as wrapper functions to keep the
implementation simple and uniform.
Compared to previous remote JIT utilities (OrcRemoteTarget*,
OrcRPCExecutorProcessControl), more consideration has been given to
disconnection and error handling behavior: Graceful disconnection is now always
initiated by the ORC side of the connection, and failure at either end (or in
the transport) will result in Errors being delivered to both ends to enable
controlled tear-down of the JIT and Executor (in the Executor's case this means
"as controlled as the JIT'd code allows").
The introduction of SimpleRemoteEPC will allow us to remove other remote-JIT
support from ORC (including the legacy OrcRemoteTarget* code used by lli, and
the OrcRPCExecutorProcessControl and OrcRPCEPCServer classes), and then remove
ORC RPC itself.
The llvm-jitlink and llvm-jitlink-executor tools have been updated to use
SimpleRemoteEPC over file descriptors. Future commits will move lli and other
tools and example code to this system, and remove ORC RPC.
If the number of directories was 6 (equal to the DEBUG_DIRECTORY
index), patchDebugDirectory() was run even though the debug directory
is actually the 7th entry. Use <= in the comparison to fix that.
This fixes https://llvm.org/PR51243
Differential Revision: https://reviews.llvm.org/D106940
Reviewed by: jhenderson
This renames the primary methods for creating a zero value to `getZero`
instead of `getNullValue` and renames predicates like `isAllOnesValue`
to simply `isAllOnes`. This achieves two things:
1) This starts standardizing predicates across the LLVM codebase,
following (in this case) ConstantInt. The word "Value" doesn't
convey anything of merit, and is missing in some of the other things.
2) Calling an integer "null" doesn't make any sense. The original sin
here is mine and I've regretted it for years. This moves us to calling
it "zero" instead, which is correct!
APInt is widely used and I don't think anyone is keen to take massive source
breakage on anything so core, at least not all in one go. As such, this
doesn't actually delete any entrypoints, it "soft deprecates" them with a
comment.
Included in this patch are changes to a bunch of the codebase, but there are
more. We should normalize SelectionDAG and other APIs as well, which would
make the API change more mechanical.
Differential Revision: https://reviews.llvm.org/D109483
Allow variable number of directories, as allowed by the
specification. NumberOfRvaAndSize will default to 16 if not specified,
as in the past.
Reviewed by: jhenderson
Differential Revision: https://reviews.llvm.org/D108825
This patch continues refactoring done by D99055. It puts format specific
options into the correponding CopyConfig structures.
Differential Revision: https://reviews.llvm.org/D102277
Functions can have a personality function, as well as prefix and
prologue data as additional operands. Unused operands are assigned
a dummy value of i1* null. This patch addresses multiple issues in
use-list order preservation for these:
* Fix verify-uselistorder to also enumerate the dummy values.
This means that now use-list order values of these values are
shuffled even if there is no other mention of i1* null in the
module. This results in failures of Assembler/call-arg-is-callee.ll,
Assembler/opaque-ptr.ll and Bitcode/use-list-order2.ll.
* The use-list order prediction in ValueEnumerator does not take
into account the fact that a global may use a value more than
once and leaves uses in the same global effectively unordered.
We should be comparing the operand number here, as we do for
the more general case.
* While we enumerate all operands of a function together (which
seems sensible to me), the bitcode reader would first resolve
prefix data for all function, then prologue data for all
functions, then personality functions for all functions. Change
this to resolve all operands for a given function together
instead.
Differential Revision: https://reviews.llvm.org/D109282
Print relocations interleaved with disassembled instructions for
executables with relocatable sections, e.g. those built with "-Wl,-q".
Differential Revision: https://reviews.llvm.org/D109016
Currently native clusterization simply groups all benchmarks
by the opcode of key instruction, but that is suboptimal in certain cases,
e.g. where we can already tell that the particular instructions
already resolve into different sched classes.
In preparation for passing the MCSubtargetInfo (STI) through to writeNops
so that it can use the STI in operation at the time, we need to record the
STI in operation when a MCAlignFragment may write nops as padding. The
STI is currently unused, a further patch will pass it through to
writeNops.
There are many places that can create an MCAlignFragment, in most cases
we can find out the STI in operation at the time. In a few places this
isn't possible as we are in initialisation or finalisation, or are
emitting constant pools. When possible I've tried to find the most
appropriate existing fragment to obtain the STI from, when none is
available use the per module STI.
For constant pools we don't actually need to use EmitCodeAlign as the
constant pools are data anyway so falling through into it via an
executable NOP is no better than falling through into data padding.
This is a prerequisite for D45962 which uses the STI to emit the
appropriate NOP for the STI. Which can differ per fragment.
Note that involves an interface change to InitSections. It is now
called initSections and requires a SubtargetInfo as a parameter.
Differential Revision: https://reviews.llvm.org/D45961
In the case of no tied variables, we pick random defs, and then random uses that don't alias with defs we just picked.
Sounds good, except that an X86 instruction may have implicit reg uses,
e.g. for `MULX` it's `EDX`/`RDX`: `Intel SDM, 4-162 Vol. 2B MULX — Unsigned Multiply Without Affecting Flags`
> Performs an unsigned multiplication of the implicit source operand (EDX/RDX) and the specified source operand
> (the third operand) and stores the low half of the result in the second destination (second operand), the high half
> of the result in the first destination operand (first operand), without reading or writing the arithmetic flags.
And indeed, every once in a while `llvm-exegesis` happened to pick EDX as a def while measuring throughput,
and producing garbage output:
```
$ ./bin/llvm-exegesis -num-repetitions=1000000 -mode=inverse_throughput -repetition-mode=min --loop-body-size=4096 -dump-object-to-disk=false -opcode-name=MULX32rr --max-configs-per-opcode=65536
---
mode: inverse_throughput
key:
instructions:
- 'MULX32rr EDX R11D R12D'
config: ''
register_initial_values:
- 'R12D=0x0'
- 'EDX=0x0'
cpu_name: znver3
llvm_triple: x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
- { key: inverse_throughput, value: 4.00014, per_snippet_value: 4.00014 }
error: ''
info: instruction has no tied variables picking Uses different from defs
assembled_snippet: 415441BC00000000BA00000000C4C223F6D4C4C223F6D4C4C223F6D4C4C223F6D4415CC3415441BC00000000BA0000000049B80200000000000000C4C223F6D4C4C223F6D44983C0FF75F0415CC3
...
```
```
$ ./bin/llvm-exegesis -num-repetitions=1000000 -mode=inverse_throughput -repetition-mode=min --loop-body-size=4096 -dump-object-to-disk=false -opcode-name=MULX32rr --max-configs-per-opcode=65536
---
mode: inverse_throughput
key:
instructions:
- 'MULX32rr R13D EDX ECX'
config: ''
register_initial_values:
- 'ECX=0x0'
- 'EDX=0x0'
cpu_name: znver3
llvm_triple: x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
- { key: inverse_throughput, value: 3.00013, per_snippet_value: 3.00013 }
error: ''
info: instruction has no tied variables picking Uses different from defs
assembled_snippet: 4155B900000000BA00000000C4626BF6E9C4626BF6E9C4626BF6E9C4626BF6E9415DC34155B900000000BA0000000049B80200000000000000C4626BF6E9C4626BF6E94983C0FF75F0415DC3
...
```
Oops! Not only does that not look fun, i did hit that pitfail during AMD Zen 3 enablement.
While i have since then addressed this in rGd4d459e7475b4bb0d15280f12ed669342fa5edcd,
i suspect there may be other buggy results lying around, so we should at least stop producing them.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D109275
UsedTLSStorage is only used in allocateTLSSection,
guarded in x87 ELF only.
So clang will emit error with -Werror on.
.../llvm/tools/llvm-rtdyld/llvm-rtdyld.cpp:288:12:
error: private field 'UsedTLSStorage' is not used
[-Werror,-Wunused-private-field]
unsigned UsedTLSStorage = 0;
^
We merge cold context by default to save profile size. However trimming cold context after merging doesn't save size much, so default to off to reflect how it's commonly used.
Differential Revision: https://reviews.llvm.org/D109166
This change improves the warning for truncated context by: 1) deduplicate them as one call without probe can appear in many different context leading to duplicated warnings , 2) rephrase the message to make it easier to understand. The term "untracked frame" can be confusing.
Differential Revision: https://reviews.llvm.org/D109115
Added opt option -print-pipeline-passes to print a -passes compatible
string describing the built pass pipeline.
As an example:
$ opt -enable-new-pm=1 -adce -licm -simplifycfg -o /dev/null /dev/null -print-pipeline-passes
verify,function(adce),function(loop-mssa(licm)),function(simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts>),verify,BitcodeWriterPass
At the moment this is best-effort only and there are some known
limitations:
- Not all passes accepting parameters will print their parameters
(currently only implemented for simplifycfg).
- Some ClassName to pass-name mappings are not unique.
- Some ClassName to pass-name mappings are missing (e.g.
BitcodeWriterPass).
Differential Revision: https://reviews.llvm.org/D108298
Added opt option -print-pipeline-passes to print a -passes compatible
string describing the built pass pipeline.
As an example:
$ opt -enable-new-pm=1 -adce -licm -simplifycfg -o /dev/null /dev/null -print-pipeline-passes
verify,function(adce),function(loop-mssa(licm)),function(simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts>),verify,BitcodeWriterPass
At the moment this is best-effort only and there are some known
limitations:
- Not all passes accepting parameters will print their parameters
(currently only implemented for simplifycfg).
- Some ClassName to pass-name mappings are not unique.
- Some ClassName to pass-name mappings are missing (e.g.
BitcodeWriterPass).
Adding the compiler support of MD5 CS profile based on pervious context split work D107299. A MD5 CS profile is about 40% smaller than the string-based extbinary profile. As a result, the compilation is 15% faster.
There are a few conversion from real names to md5 names that have been made on the sample loader and context tracker side to get it work.
Reviewed By: wenlei, wmi
Differential Revision: https://reviews.llvm.org/D108342
The current help msg isn't super clear on whether t prints the content of the files or just the list of files.
(I'd certainly thought it'd print the list of files, and accidentally had a bunch of "gargabe" printed to my terminal).
Similarly, t sounded like it'd do what p actually did.
Differential Revision: https://reviews.llvm.org/D109018
This change aims at supporting LBR only sample perf script which is used for regular(Non-CS) profile generation. A LBR perf script includes a batch of LBR sample which starts with a frame pointer and a group of 32 LBR entries is followed. The FROM/TO LBR pair and the range between two consecutive entries (the former entry's TO and the latter entry's FROM) will be used to infer function profile info.
An example of LBR perf script(created by `perf script -F ip,brstack -i perf.data`)
```
40062f 0x40062f/0x4005b0/P/-/-/9 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 ...
4005d7 0x4005d7/0x4005e5/P/-/-/8 0x40062f/0x4005b0/P/-/-/6 0x400645/0x4005ff/P/-/-/1 ...
...
```
For implementation:
- Extended a new child class `LBRPerfReader` for the sample parsing, reused all the functionalities in `extractLBRStack` except for an extension to parsing leading instruction pointer.
- `HybridSample` is reused(just leave the call stack empty) and the parsed samples is still aggregated in `AggregatedSamples`. After that, range samples, branch sample, address samples are computed and recorded.
- Reused `ContextSampleCounterMap` to store the raw profile, since it's no need to aggregation by context, here it just registered one sample counter with a fake context key.
- Unified to use `show-raw-profile` instead of `show-unwinder-output` to dump the intermediate raw profile, see the comments of the format of the raw profile. For CS profile, it remains to output the unwinder output.
Profile generation part will come soon.
Differential Revision: https://reviews.llvm.org/D108153
Currently context strings contain a lot of duplicated function names and that significantly increase the profile size. This change split the context into a series of {name, offset, discriminator} tuples so function names used in the context can be replaced by the index into the name table and that significantly reduce the size consumed by context.
A follow-up improvement made in the compiler and profiling tools is to avoid reconstructing full context strings which is time- and memory- consuming. Instead a context vector of `StringRef` is adopted to represent the full context in all scenarios. As a result, the previous prevalent profile map which was implemented as a `StringRef` is now engineered as an unordered map keyed by `SampleContext`. `SampleContext` is reshaped to using an `ArrayRef` to represent a full context for CS profile. For non-CS profile, it falls back to use `StringRef` to represent a contextless function name. Both the `ArrayRef` and `StringRef` objects are underpinned by real array and string objects that are stored in producer buffers. For compiler, they are maintained by the sample reader. For llvm-profgen, they are maintained in `ProfiledBinary` and `ProfileGenerator`. Full context strings can be generated only in those cases of debugging and printing.
When it comes to profile format, nothing has changed to the text format, though internally CS context is implemented as a vector. Extbinary format is only changed for CS profile, with an additional `SecCSNameTable` section which stores all full contexts logically in the form of `vector<int>`, which each element as an offset points to `SecNameTable`. All occurrences of contexts elsewhere are redirected to using the offset of `SecCSNameTable`.
Testing
This is no-diff change in terms of code quality and profile content (for text profile).
For our internal large service (aka ads), the profile generation is cut to half, with a 20x smaller string-based extbinary format generated.
The compile time of ads is dropped by 25%.
Differential Revision: https://reviews.llvm.org/D107299