Summary:
Since bottleneck hints are enabled via user request, it can be
confusing if no bottleneck information is presented. Such is the
case when no bottlenecks are identified. This patch emits a message
in that case.
Reviewers: andreadb
Reviewed By: andreadb
Subscribers: tschuett, gbedwell, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59098
llvm-svn: 355628
I need this to remove a binary from LLD test suite.
The patch also simplifies the code a bit.
Differential revision: https://reviews.llvm.org/D59082
llvm-svn: 355591
This allows us to store more info about where we're emitting the remarks
without cluttering LLVMContext. This is needed for future support for
the remark section.
Differential Revision: https://reviews.llvm.org/D58996
Original llvm-svn: 355507
llvm-svn: 355514
This allows us to store more info about where we're emitting the remarks
without cluttering LLVMContext. This is needed for future support for
the remark section.
Differential Revision: https://reviews.llvm.org/D58996
llvm-svn: 355507
We should create CompressedSection only if the section has SHF_COMPRESSED flag
or it's name starts from '.zdebug'.
Currently, we create it if section's data starts from ZLIB signature.
Differential revision: https://reviews.llvm.org/D59018
llvm-svn: 355501
Partly addresses PR15026.
There are a few tests that passed in invalid architectures, which are fixed in: rL355349 and D58931
Reviewers: echristo, efriedma, rengolin, atrick
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D58933
llvm-svn: 355455
Getting rid of the name "optimization remarks" for anything that
involves handling remarks on the client side.
It's safer to do this now, before we get stuck with that name in all the
APIs and public interfaces we decide to export to users in the future.
This renames llvm/tools/opt-remarks to llvm/tools/remarks-shlib, and now
generates `libRemarks.dylib` instead of `libOptRemarks.dylib`.
Differential Revision: https://reviews.llvm.org/D58535
llvm-svn: 355439
When --compress-debug-sections is given, llvm-objcopy do not compress
sections that have "ZLIB" header in data. Normally this signature is used
in zlib-gnu compression format. But if zlib-gnu used then the name of the compressed
section should start from .z* (e.g .zdebug_info). If it does not, then it is not
a zlib-gnu format and section should be treated as a normal uncompressed section.
Differential revision: https://reviews.llvm.org/D58908
llvm-svn: 355399
If zlib is not available, and --compress-debug-sections is passed,
we want to report an error. Currently, it is only reported for
--compress_debug_sections= form of the option.
Fixes the https://bugs.llvm.org/show_bug.cgi?id=40886.
I do not think there is a way to write a test for this.
Differential revision: https://reviews.llvm.org/D58909
llvm-svn: 355391
This patch adds a new flag named -bottleneck-analysis to print out information
about throughput bottlenecks.
MCA knows how to identify and classify dynamic dispatch stalls. However, it
doesn't know how to analyze and highlight kernel bottlenecks. The goal of this
patch is to teach MCA how to correlate increases in backend pressure to backend
stalls (and therefore, the loss of throughput).
From a Scheduler point of view, backend pressure is a function of the scheduler
buffer usage (i.e. how the number of uOps in the scheduler buffers changes over
time). Backend pressure increases (or decreases) when there is a mismatch
between the number of opcodes dispatched, and the number of opcodes issued in
the same cycle. Since buffer resources are limited, continuous increases in
backend pressure would eventually leads to dispatch stalls. So, there is a
strong correlation between dispatch stalls, and how backpressure changed over
time.
This patch teaches how to identify situations where backend pressure increases
due to:
- unavailable pipeline resources.
- data dependencies.
Data dependencies may delay execution of instructions and therefore increase the
time that uOps have to spend in the scheduler buffers. That often translates to
an increase in backend pressure which may eventually lead to a bottleneck.
Contention on pipeline resources may also delay execution of instructions, and
lead to a temporary increase in backend pressure.
Internally, the Scheduler classifies instructions based on whether register /
memory operands are available or not.
An instruction is marked as "ready to execute" only if data dependencies are
fully resolved.
Every cycle, the Scheduler attempts to execute all instructions that are ready
to execute. If an instruction cannot execute because of unavailable pipeline
resources, then the Scheduler internally updates a BusyResourceUnits mask with
the ID of each unavailable resource.
ExecuteStage is responsible for tracking changes in backend pressure. If backend
pressure increases during a cycle because of contention on pipeline resources,
then ExecuteStage sends a "backend pressure" event to the listeners.
That event would contain information about instructions delayed by resource
pressure, as well as the BusyResourceUnits mask.
Note that ExecuteStage also knows how to identify situations where backpressure
increased because of delays introduced by data dependencies.
The SummaryView observes "backend pressure" events and prints out a "bottleneck
report".
Example of bottleneck report:
```
Cycles with backend pressure increase [ 99.89% ]
Throughput Bottlenecks:
Resource Pressure [ 0.00% ]
Data Dependencies: [ 99.89% ]
- Register Dependencies [ 0.00% ]
- Memory Dependencies [ 99.89% ]
```
A bottleneck report is printed out only if increases in backend pressure
eventually caused backend stalls.
About the time complexity:
Time complexity is linear in the number of instructions in the
Scheduler::PendingSet.
The average slowdown tends to be in the range of ~5-6%.
For memory intensive kernels, the slowdown can be significant if flag
-noalias=false is specified. In the worst case scenario I have observed a
slowdown of ~30% when flag -noalias=false was specified.
We can definitely recover part of that slowdown if we optimize class LSUnit (by
doing extra bookkeeping to speedup queries). For now, this new analysis is
disabled by default, and it can be enabled via flag -bottleneck-analysis. Users
of MCA as a library can enable the generation of pressure events through the
constructor of ExecuteStage.
This patch partially addresses https://bugs.llvm.org/show_bug.cgi?id=37494
Differential Revision: https://reviews.llvm.org/D58728
llvm-svn: 355308
Add statistics for abstract origins, function, variable and parameter
locations; break the 'variable' counts down into variables and
parameters. Also update call site counting to check for
DW_AT_call_{file,line} in addition to DW_TAG_call_site.
Differential revision: https://reviews.llvm.org/D58849
llvm-svn: 355243
This was sometimes causing clang or llvm-mc to crash, and in other
cases could emit a bogus DWARF line-table header. I did an interim
patch in r352541; this patch should be a cleaner and more complete
fix, and retains the test.
Addresses PR40538.
Differential Revision: https://reviews.llvm.org/D58750
llvm-svn: 355226
Summary:
This patch will obtain the section name for symbols that refer to a section. Prior to this patch the Name field for STT_SECTIONs was blank, now it is populated.
Before:
```
Symbol table '.symtab' contains 6 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 SECTION LOCAL DEFAULT 1
2: 0000000000000000 0 SECTION LOCAL DEFAULT 3
3: 0000000000000000 0 SECTION LOCAL DEFAULT 4
4: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _GLOBAL_OFFSET_TABLE_
5: 0000000000000000 0 TLS GLOBAL DEFAULT UND sym
```
With this patch:
```
Symbol table '.symtab' contains 6 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 SECTION LOCAL DEFAULT 1 .text
2: 0000000000000000 0 SECTION LOCAL DEFAULT 3 .data
3: 0000000000000000 0 SECTION LOCAL DEFAULT 4 .bss
4: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _GLOBAL_OFFSET_TABLE_
5: 0000000000000000 0 TLS GLOBAL DEFAULT UND sym
```
This fixes PR40788
Reviewers: jhenderson, rupprecht, espindola
Reviewed By: rupprecht
Subscribers: emaste, javed.absar, arichardson, MaskRay, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58796
llvm-svn: 355207
This is for tweaking SHT_SYMTAB sections.
Their sh_info contains the (number of symbols + 1) usually.
But for creating invalid inputs for test cases it would be convenient
to allow explicitly override this field from YAML.
Differential revision: https://reviews.llvm.org/D58779
llvm-svn: 355193
This patch allows all forms of values for options to be used at the end
of a group. With the fix, it is possible to follow the way GNU binutils
tools handle grouping options better. For example, the -j option can be
used with objdump in any of the following ways:
$ objdump -d -j .text a.o
$ objdump -d -j.text a.o
$ objdump -dj .text a.o
$ objdump -dj.text a.o
Differential Revision: https://reviews.llvm.org/D58711
llvm-svn: 355185
There's no reason to limit the DWARF CFI dumper to EM_386 and EM_X86_64;
ELF files could contain DWARF CFI on almost any platform (even 32-bit ARM;
NetBSD uses DWARF CFI on that platform). So start using the DWARF CFI dumper
unconditionally so that we can dump .eh_frame sections on the remaining ELF
platforms as well as in NetBSD binaries.
Differential Revision: https://reviews.llvm.org/D58761
llvm-svn: 355151
Add support for cloning DWARF expressions that contain base type DIE
references in dsymutil.
<rdar://problem/48167812>
Differential Revision: https://reviews.llvm.org/D58534
llvm-svn: 355148
Part 2 of CSPGO changes (mostly related to ProfileSummary).
Note that I use a default parameter in setProfileSummary() and getSummary().
This is to break the dependency in clang. I will make the parameter explicit
after changing clang in a separated patch.
Differential Revision: https://reviews.llvm.org/D54175
llvm-svn: 355131
Dsymutil gets library member information is through the ambiguous
/path/to/archive.a(member.o). The current logic we use would get
confused by additional parentheses. Using rfind mitigates this issue.
llvm-svn: 355114
Summary:
This was removed in r349068, but it is needed when llvm is compiled
using the non-default c++ standard library on a platform.
Reviewers: sylvestre.ledru, infinity0, mgorny, cuviper
Reviewed By: sylvestre.ledru
Subscribers: jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57859
llvm-svn: 355107
This is https://bugs.llvm.org/show_bug.cgi?id=40861,
Previously llvm-readobj would print the DT_NULL sometimes
for the dynamic section that has no terminator entry.
The logic of printDynamicTable was a bit overcomplicated.
I rewrote it slightly to fix the issue and commented.
Differential revision: https://reviews.llvm.org/D58716
llvm-svn: 355073
This restores the patch that splits demangled stdin input on
non-alphanumerics. I had reverted this patch earlier because it broke
Windows build-bots. I have updated the test so that it passes on
Windows.
I was running the test from powershell and never saw the issue until I
switched to the mingw shell.
This reverts commit 628ab5c682.
llvm-svn: 355031
This reverts commit 5cd5f8f256.
The test passes on linux, but fails on the windows build-bots.
This test failure seems to be a quoting issue between my test and
FileCheck on Windows. I'm reverting this patch until I can replicate
and fix in my Windows environment.
llvm-svn: 355021
Summary:
This patch displays a hexadecimal section value (Elf_Shdr::sh_type) or section-relative offset when printing unknown sections.
Here is a subset of the output (ignoring the fields following "Type" when dumping an ELF's GNU `--section-headers` table).
Section Headers:
```
[Nr] Name Type
[16] android_rel LOOS+0x1
[17] android_rela LOOS+0x2
[27] unknown 0x1000: <unknown>
[28] loos LOOS+0
[30] hios VERSYM
[31] loproc LOPROC+0
[33] hiproc LOPROC+0xFFFFFFF
[34] louser LOUSER+0
[36] hiuser LOUSER+0x7FFFFFFF
```
As a comparison, the previous output looked something like the above, but with a blank "Type" field:
```
[Nr] Name Type
[27] unknown
[28] loos
[30] hios VERSYM
[31] loproc
[33] hiproc
[34] louser
[36] hiuser
```
This fixes PR40773
Reviewers: jhenderson, rupprecht, Bigcheese
Reviewed By: jhenderson, rupprecht, Bigcheese
Subscribers: MaskRay, Bigcheese, srhines, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58701
llvm-svn: 355014
Summary:
Before:
```
Dynamic Section:
NEEDED libpthread.so.0
...
NEEDED ld-linux-x86-64.so.2
RPATH 0x00000000001c2e61
```
After:
```
Dynamic Section:
NEEDED libpthread.so.0
...
NEEDED ld-linux-x86-64.so.2
RPATH $ORIGIN/../lib
```
Only a small problem here, I have no idea on choosing test case. I see there's a test
file(test/tools/llvm-objdump/private-headers-dynamic-section.test). But it has no DT_RPATH and DT_RUNPATH tags. Shall I replace the ELF file in the
Inputs dir by a new one?
Reviewers: jhenderson, grimar
Reviewed By: jhenderson
Subscribers: srhines, rupprecht, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58707
llvm-svn: 355001
Summary:
This patch attempts to replicate GNU c++-filt behavior when splitting stdin input for demangling.
Previously, cxx-filt would split input only on spaces. Each delimited item is then demangled.
From what I have tested, GNU c++filt also splits input on any character that does not make
up the mangled name (notably commas, but also a large set of non-alphanumeric characters).
This patch splits stdin input on any character that does not belong to the Itanium mangling
format (since Itanium is currently the only supported format in llvm-cxxfilt).
This is an update to PR39990
Reviewers: jhenderson, tejohnson, compnerd
Reviewed By: compnerd
Subscribers: erik.pilkington, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58416
llvm-svn: 354998
That patch is the fix for https://bugs.llvm.org/show_bug.cgi?id=40703
"wrong line number info for obj file compiled with -ffunction-sections"
bug. The problem happened with only .o files. If object file contains
several .text sections then line number information showed incorrectly.
The reason for this is that DwarfLineTable could not detect section which
corresponds to specified address(because address is the local to the
section). And as the result it could not select proper sequence in the
line table. The fix is to pass SectionIndex with the address. So that it
would be possible to differentiate addresses from various sections. With
this fix llvm-objdump shows correct line numbers for disassembled code.
Differential review: https://reviews.llvm.org/D58194
llvm-svn: 354972
llvm-readobj's error messages were broken for bad archive members. This
patch fixes them, and also adds testing for archive and thin archive
handling within llvm-readobj.
Reviewed by: rupprecht, grimar, higuoxing
Differential Revision: https://reviews.llvm.org/D58681
llvm-svn: 354960
Current PGO profile counts are not context sensitive. The branch probabilities
for the inlined functions are kept the same for all call-sites, and they might
be very different from the actual branch probabilities. These suboptimal
profiles can greatly affect some downstream optimizations, in particular for
the machine basic block placement optimization.
In this patch, we propose to have a post-inline PGO instrumentation/use pass,
which we called Context Sensitive PGO (CSPGO). For the users who want the best
possible performance, they can perform a second round of PGO instrument/use on
the top of the regular PGO. They will have two sets of profile counts. The
first pass profile will be manly for inline, indirect-call promotion, and
CGSCC simplification pass optimizations. The second pass profile is for
post-inline optimizations and code-gen optimizations.
A typical usage:
// Regular PGO instrumentation and generate pass1 profile.
> clang -O2 -fprofile-generate source.c -o gen
> ./gen
> llvm-profdata merge default.*profraw -o pass1.profdata
// CSPGO instrumentation.
> clang -O2 -fprofile-use=pass1.profdata -fcs-profile-generate -o gen2
> ./gen2
// Merge two sets of profiles
> llvm-profdata merge default.*profraw pass1.profdata -o profile.profdata
// Use the combined profile. Pass manager will invoke two PGO use passes.
> clang -O2 -fprofile-use=profile.profdata -o use
This change touches many components in the compiler. The reviewed patch
(D54175) will committed in phrases.
Differential Revision: https://reviews.llvm.org/D54175
llvm-svn: 354930
The --disassembler-options, or -M, are used to customize
the disassembler and affect its output.
The two implemented options allow selecting register names on ARM:
* With -Mreg-names-raw, the disassembler uses rNN for all registers.
* With -Mreg-names-std it prints sp, lr and pc for r13, r14 and r15,
which is the default behavior of llvm-objdump.
Differential Revision: https://reviews.llvm.org/D57680
llvm-svn: 354870