Summary: When disassembling, symbolize a branch target operand
to print a label instead of a real address.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D114492
One of the uses of `LTOCodeGenerator` is to take it as a middle+back end. Sometimes
it is very helpful to access, especially get information from the optimized module.
If the information can be changed in optimization, it cannot be get before the
module is added to `LTOCodeGenerator`. This patch adds a function
`LTOCodeGenerator::getMergedModule` to access the `MergedModule`.
Reviewed By: steven_wu
Differential Revision: https://reviews.llvm.org/D114201
The IceLake scheduler model is still mainly a copy of the SkylakeServer model.
This patch adjusts the fp shuffle classes to account for most instructions now working on Port 1 as well as Port 5.
This is based off Agner + uops.info as well as the PR48110 report.
Differential Revision: https://reviews.llvm.org/D115752
For scripting purposes, use different error code for cases that aren't a
result of different tbd files, e.g. bad input like a path to a broken
symlink. This behaves more similiarly to how the unix `diff` command behaves.
This also includes minor tweaks on error messages.
Reviewed By: ributzka, JDevlieghere
Differential Revision: https://reviews.llvm.org/D115905
When reading location lists in dwo files the addresses cannot be
resolved, but that's not a problem.
Long term this probably should be fixed with a different API that
exposes location expressions without the need to resolve the address
ranges, since that's all the verifier (in its current state) requires.
(though the verifier should probably also eventually verify the address
ranges in location lists are a subset of the enclosing scope's address
range)
When verifying dwo files address ranges won't be able to be resolved due
to missing debug_addr (or missing debug_ranges in the case of DWARFv4
Split DWARF).
Since they refer to the debug_line in the skeleton unit, they can't be
resolved from the dwo CU.
But they can be resolved for split TUs, since those refer to
.debug_line.dwo, which is available in the dwo file.
Add a test to `llvm-dwarfdump` to simply test that the error messages
make sense when passing bad `.dSYM`s.
Reviewed By: clayborg
Differential Revision: https://reviews.llvm.org/D115889
This removes a python3 f-String from the llvm-debuginfod-find lit test script to maximize compatibility with different python versions, as it seemed to cause some issues.
Reviewed By: phosek
Differential Revision: https://reviews.llvm.org/D115814
preinliner has been tuned on large server workloads and it's not ready to be turned on by default. this change also updates the thresholds based on tuning.
Differential Revision: https://reviews.llvm.org/D115770
Before we have an issue with artificial LBR whose source is a return, recalling that "an internal code(A) can return to external address, then from the external address call a new internal code(B), making an artificial branch that looks like a return from A to B can confuse the unwinder". We just ignore the LBRs after this artificial LBR which can miss some samples. This change aims at fixing this by correctly unwinding them instead of ignoring them.
List some typical scenarios covered by this change.
1) multiple sequential call back happen in external address, e.g.
```
[ext, call, foo] [foo, return, ext] [ext, call, bar]
```
Unwinder should avoid having foo return from bar. Wrong call stack is like [foo, bar]
2) the call stack before and after external call should be correctly unwinded.
```
{call stack1} {call stack2}
[foo, call, ext] [ext, call, bar] [bar, return, ext] [ext, return, foo ]
```
call stack 1 should be the same to call stack2. Both shouldn't be truncated
3) call stack should be truncated after call into external code since we can't do inlining with external code.
```
[foo, call, ext] [ext, call, bar] [bar, call, baz] [baz, return, bar ] [bar, return, ext]
```
the call stack of code in baz should not include foo.
### Implementation:
We leverage artificial frame to fix#2 and #3: when we got a return artificial LBR, push an extra artificial frame to the stack. when we pop frame, check if the parent is an artificial frame to pop(fix#2). Therefore, call/ return artificial LBR is just the same as regular LBR which can keep the call stack.
While recording context on the trie, artificial frame is used as a tag indicating that we should truncate the call stack(fix#3).
To differentiate #1 and #2, we leverage `getCallAddrFromFrameAddr`. Normally the target of the return should be the next inst of a call inst and `getCallAddrFromFrameAddr` will return the address of call inst. Otherwise, getCallAddrFromFrameAddr will return to 0 which is the case of #1.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D115550
We can have the sampling just hit into the external addresses, in that case, both the top stack frame and the latest LBR target are external addresses. For example:
```
ffffffff
0x4006c8/0xffffffff/P/-/-/0 0x40069b/0x400670/M/-/-/0
ffffffff
40067e
0xffffffff/0xffffffff/P/-/-/0 0x4006c8/0xffffffff/P/-/-/0 0x40069b/0x400670/M/-/-/0
```
Before we will ignore the entire samples. However, we found there exists some internal LBRs in the remaining part of sample, the range between them is still a valid range, we will lose some valid LBRs. Those LBRs will be unwinded based on a empty(context-less) call stack.
This change tries to fix it, instead of ignoring the entire sample, we only ignore the leading external addresses.
Note that the first outgoing LBR is useful since there is a valid range between it's source and next LBR's target.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D115538
This produced a few verifier warnings that came up while I was
investigating something else here. Change the assembly to match the
comment so it's warning free. Doesn't seem necessary to change the
CHECKs for the test since it's just a bug in the test, not in the code
under test.
Avoid duplicating the string decoding - improve the error messages down
in form parsing (& produce an Expected<const char*> instead of
Optional<const char*> to communicate the extra error details)
CSSPGO currently employs a flat profile format for context-sensitive profiles. Such a flat profile allows for precisely manipulating contexts that is either inlined or not inlined. This is a benefit over the nested profile format used by non-CS AutoFDO. A downside of this is the longer build time due to parsing the indexing the full CS contexts.
For a CS flat profile, though only the context profiles relevant to a module are loaded when that module is compiled, the cost to figure out what profiles are relevant is noticeably high when there're many contexts, since the sample reader will need to scan all context strings anyway. On the contrary, a nested function profile has its related inline subcontexts isolated from other unrelated contexts. Therefore when compiling a set of functions, unrelated contexts will never need to be scanned.
In this change we are exploring using nested profile format for CSSPGO. This is expected to work based on an assumption that with a preinliner-computed profile all contexts are precomputed and expected to be inlined by the compiler. Contexts not expected to be inlined will be cut off and returned to corresponding base profiles (for top-level outlined functions). This naturally forms a nested profile where all nested contexts are expected to be inlined. The compiler will less likely optimize on derived contexts that are not precomputed.
A CS-nested profile will look exactly the same with regular nested profile except that each nested profile can come with an attributes. With pseudo probes, a nested profile shown as below can also have a CFG checksum.
```
main:1968679:12
2: 24
3: 28 _Z5funcAi:18
3.1: 28 _Z5funcBi:30
3: _Z5funcAi:1467398
0: 10
1: 10 _Z8funcLeafi:11
3: 24
1: _Z8funcLeafi:1467299
0: 6
1: 6
3: 287884
4: 287864 _Z3fibi:315608
15: 23
!CFGChecksum: 138828622701
!Attributes: 2
!CFGChecksum: 281479271677951
!Attributes: 2
```
Specific work included in this change:
- A recursive profile converter to convert CS flat profile to nested profile.
- Extend function checksum and attribute metadata to be stored in nested way for text profile and extbinary profile.
- Unifiy sample loader inliner path for CS and preinlined nested profile.
- Changes in the sample loader to support probe-based nested profile.
I've seen promising results regarding build time. A nested profile can result in a 20% shorter build time than a CS flat profile while keep an on-par performance. This is with -duplicate-contexts-into-base=1.
Test Plan:
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D115205
The IceLake scheduler model is still mainly a copy of the SkylakeServer model.
This patch adjusts the integer shuffle classes to account for most instructions now working on Port 1 as well as Port 5.
This is based off Agner + uops.info as well as the PR48110 report.
Differential Revision: https://reviews.llvm.org/D115547
In debug info, we expect variables to have location info if they are used and we don't want location info for functions that are not used. However, if an unused function is inlined, we could have the scenario where a function is not in the final binary but its static variable is in the final binary. Ensure that variables in the final binary have location debug info even if their scope was inlined.
Also add `--implicit-check-not` to a test for clarity.
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D115565
MIPS64 little endian target has a "special" encoding of `r_info`
relocation record field. Instead of one 64-bit little endian number, it
is a little endian 32-bit number followed by a 32-bit big endian number.
For correct reading and writing such fields we must provide information
about target machine into the corresponding routine. This patch does
this for the `llvm-objcopy` tool and fix handling of MIPS64 little
endian files.
The bug was reported in the issue #52647.
Differential Revision: https://reviews.llvm.org/D115635
Seems helpful when you're dealing with invalid/problematic DWARF. Some
diagnostic messages are probably redundant with the verbose dumping and
could be simplified with this.
Adds a fallback to use the debuginfod client library (386655) in `findDebugBinary`.
Fixed a cast of Erorr::success() to Expected<> in debuginfod library.
Added Debuginfod to Symbolize deps in gn.
Updates compiler-rt/lib/sanitizer_common/symbolizer/scripts/build_symbolizer.sh to include Debuginfod library to fix sanitizer-x86_64-linux breakage.
Reviewed By: jhenderson, vitalybuka
Differential Revision: https://reviews.llvm.org/D113717
One of the uses of `LTOCodeGenerator` is to take it as a middle+back end. Sometimes
it is very helpful to access, especially get information from the optimized module.
If the information can be changed in optimization, it cannot be get before the
module is added to `LTOCodeGenerator`. This patch adds a function
`LTOCodeGenerator::getMergedModule` to access the `MergedModule`.
Reviewed By: steven_wu
Differential Revision: https://reviews.llvm.org/D114201
Fix overrides to use both ports. Update the uops counts + port usage based off the most recent llvm-exegesis captures (PR36895) and what Intel AoM / Agner reports as well.
Adds JSONScopedPrinter to llvm-readelf. It includes an empty
JSONELFDumper class which will be used to override any LLVMELFDumper
methods which utilize startLine() which JSONScopedPrinter cannot
provide.
This introduces a change where calls to llvm-readelf with non-ELF object
files that specify --elf-output-style=GNU will now print file summary
information where it previously didn't.
Fixes previous Windows test failure which occured due to JSON escaping
of '\' by not relying on LIT substitution.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D114225
Adds JSONScopedPrinter to llvm-readelf. It includes an empty
JSONELFDumper class which will be used to override any LLVMELFDumper
methods which utilize startLine() which JSONScopedPrinter cannot
provide.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D114225
Adds a fallback to use the debuginfod client library (386655) in `findDebugBinary`.
Fixed a cast of Erorr::success() to Expected<> in debuginfod library.
Added Debuginfod to Symbolize deps in gn.
Updates compiler-rt/lib/sanitizer_common/symbolizer/scripts/build_symbolizer.sh to include Debuginfod library to fix sanitizer-x86_64-linux breakage.
Reviewed By: jhenderson, vitalybuka
Differential Revision: https://reviews.llvm.org/D113717