In general, llvm-symbolizer follows the output style of GNU's addr2line.
However, there are still some differences; in particular, for a requested
address, llvm-symbolizer prints line and column, while addr2line prints
only the line number.
This patch adds a new switch to select the preferred style.
Differential Revision: https://reviews.llvm.org/D60190
llvm-svn: 357675
Summary:
Now CVType and CVSymbol are effectively type-safe wrappers around
ArrayRef<uint8_t>. Make the kind() accessor load it from the
RecordPrefix, which is the same for types and symbols.
Reviewers: zturner, aganea
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60018
llvm-svn: 357658
The standard doesn't require a DW_TAG_variable, DW_TAG_formal_parameter
or DW_TAG_constant to A DW_AT_type attribute describing the type of the
variable. It only specifies that it *can* have one.
llvm-svn: 357628
This avoids allocating a few KB of heap memory on startup, and instead
allocates these maps lazily. I noticed this while profiling LLD.
llvm-svn: 357192
This reverts commit rL357020.
The commit broke the test llvm/test/tools/llvm-objdump/embedded-source.test
on some builds including clang-ppc64be-linux-multistage,
clang-s390x-linux, clang-with-lto-ubuntu, clang-x64-windows-msvc,
llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast (and others).
llvm-svn: 357026
Reapply rL356941 after regenerating the object file in the failing test
llvm/test/tools/llvm-objdump/embedded-source.test from source.
Original commit message:
[llvm] Prevent duplicate files in debug line header in dwarf 5.
Motivation: In previous dwarf versions, file name indexes started from 1, and
the primary source file was not explicit. Dwarf 5 standard (6.2.4) prescribes
the primary source file to be explicitly given an entry with an index number 0.
The current implementation honors the specification by just duplicating the
main source file, once with index number 0, and later maybe with another
index number. While this is compliant with the letter of the standard, the
duplication causes problems for consumers of this information such as lldb.
(Some files are duplicated, where only some of them have a line table although
all refer to the same file)
With this change, dwarf 5 debug line section files always start from 0, and
the zeroth entry is not duplicated whenever possible. This requires different
handling of dwarf 4 and dwarf 5 during generation (e.g. when a function returns
an index zero for a file name, it signals an error in dwarf 4, but not in dwarf 5)
However, I think the minor complication is worth it, because it enables all
consumers (lldb, gdb, dwarfdump, objdump, and so on) to treat all files in the
file name list homogenously.
Tags: #llvm, #debug-info
Differential Revision: https://reviews.llvm.org/D59515
llvm-svn: 357018
Summary:
Motivation: In previous dwarf versions, file name indexes started from 1, and
the primary source file was not explicit. Dwarf 5 standard (6.2.4) prescribes
the primary source file to be explicitly given an entry with an index number 0.
The current implementation honors the specification by just duplicating the
main source file, once with index number 0, and later maybe with another
index number. While this is compliant with the letter of the standard, the
duplication causes problems for consumers of this information such as lldb.
(Some files are duplicated, where only some of them have a line table although
all refer to the same file)
With this change, dwarf 5 debug line section files always start from 0, and
the zeroth entry is not duplicated whenever possible. This requires different
handling of dwarf 4 and dwarf 5 during generation (e.g. when a function returns
an index zero for a file name, it signals an error in dwarf 4, but not in dwarf 5)
However, I think the minor complication is worth it, because it enables all
consumers (lldb, gdb, dwarfdump, objdump, and so on) to treat all files in the
file name list homogenously.
Reviewers: dblaikie, probinson, aprantl, espindola
Reviewed By: probinson
Subscribers: emaste, jvesely, nhaehnle, aprantl, javed.absar, arichardson, hiraditya, MaskRay, rupprecht, jdoerfert, llvm-commits
Tags: #llvm, #debug-info
Differential Revision: https://reviews.llvm.org/D59515
llvm-svn: 356941
[Symbolizer] Add getModuleSectionIndexForAddress() helper routine
The https://reviews.llvm.org/D58194 patch changed symbolizer interface.
Particularily it requires not only Address but SectionIndex also.
Note object::SectionedAddress parameter:
Expected<DILineInfo> symbolizeCode(const std::string &ModuleName,
object::SectionedAddress ModuleOffset,
StringRef DWPName = "");
There are callers of symbolizer which do not know particular section index.
That patch creates getModuleSectionIndexForAddress() routine which
will detect section index for the specified address. Thus if caller
set ModuleOffset.SectionIndex into object::SectionedAddress::UndefSection
state then symbolizer would detect section index using
getModuleSectionIndexForAddress routine.
Differential Revision: https://reviews.llvm.org/D58848
llvm-svn: 356829
Summary:
getRelocatedValue may compute incorrect value for SHT_RELA-typed relocation entries.
// DWARFDataExtractor.cpp
uint64_t DWARFDataExtractor::getRelocatedValue(uint32_t Size, uint32_t *Off,
...
// This formula is correct for REL, but may be incorrect for RELA if the value
// stored in the location (getUnsigned(Off, Size)) is not zero.
return getUnsigned(Off, Size) + Rel->Value;
In this patch, we
* refactor these visit* functions to include a new parameter `uint64_t A`.
Since these visit* functions are no longer used as visitors, rename them to resolve*.
+ REL: A is used as the addend. A is the value stored in the location where the
relocation applies: getUnsigned(Off, Size)
+ RELA: The addend encoded in RelocationRef is used, e.g. getELFAddend(R)
* and add another set of supports* functions to check if a given relocation type is handled.
DWARFObjInMemory uses them to fail early.
Reviewers: echristo, dblaikie
Reviewed By: echristo
Subscribers: mgorny, aprantl, aheejin, fedor.sergeev, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57939
llvm-svn: 356729
Summary:
This considers module symbol streams and the global symbol stream to be
roots. Most types that this considers "unreferenced" are referenced by
LF_UDT_MOD_SRC_LINE id records, which VC seems to always include.
Essentially, they are types that the user can only find in the debugger
if they call them by name, they cannot be found by traversing a symbol.
In practice, around 80% of type information in a PDB is referenced by a
symbol. That seems like a reasonable number.
I don't really plan to do anything with this tool. It mostly just exists
for informational purposes, and to confirm that we probably don't need
to implement type reference tracking in LLD. We can continue to merge
all types as we do today without wasting space.
Reviewers: zturner, aganea
Subscribers: mgorny, hiraditya, arphaman, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59620
llvm-svn: 356692
Introduce a DW_OP_LLVM_convert Dwarf expression pseudo op that allows
for a convenient way to perform type conversions on the Dwarf expression
stack. As an additional bonus it paves the way for using other Dwarf
v5 ops that need to reference a base_type.
The new DW_OP_LLVM_convert is used from lib/Transforms/Utils/Local.cpp
to perform sext/zext on debug values but mainly the patch is about
preparing terrain for adding other Dwarf v5 ops that need to reference a
base_type.
For Dwarf v5 the op maps to DW_OP_convert and for earlier versions a
complex shift & mask pattern is generated to emulate sext/zext.
This is a recommit of r356442 with trivial fixes for the failing tests.
Differential Revision: https://reviews.llvm.org/D56587
llvm-svn: 356451
Introduce a DW_OP_LLVM_convert Dwarf expression pseudo op that allows
for a convenient way to perform type conversions on the Dwarf expression
stack. As an additional bonus it paves the way for using other Dwarf
v5 ops that need to reference a base_type.
The new DW_OP_LLVM_convert is used from lib/Transforms/Utils/Local.cpp
to perform sext/zext on debug values but mainly the patch is about
preparing terrain for adding other Dwarf v5 ops that need to reference a
base_type.
For Dwarf v5 the op maps to DW_OP_convert and for earlier versions a
complex shift & mask pattern is generated to emulate sext/zext.
Differential Revision: https://reviews.llvm.org/D56587
llvm-svn: 356442
Before, empty debug streams were written as 8 bytes (4 bytes signature + 4 bytes for the GlobalRefs count).
With this patch, unused empty streams aren't emitted anymore. Modules now encode 65535 as an 'unused stream' value, by convention.
Also fix the * Linker * contrib section which wasn't correctly emitted previously.
Differential Revision: https://reviews.llvm.org/D59502
llvm-svn: 356395
Summary:
This is similar to how addr2line handles consecutive entries with the
same address - pick the last one.
Reviewers: dblaikie, friss, JDevlieghere
Reviewed By: dblaikie
Subscribers: eugenis, vitalybuka, echristo, JDevlieghere, probinson, aprantl, hiraditya, rupprecht, jdoerfert, llvm-commits
Tags: #llvm, #debug-info
Differential Revision: https://reviews.llvm.org/D58952
llvm-svn: 356265
Summary:
This is similar to how addr2line handles consecutive entries with the
same address - pick the last one.
Reviewers: dblaikie, friss, JDevlieghere
Reviewed By: dblaikie
Subscribers: ormris, echristo, JDevlieghere, probinson, aprantl, hiraditya, rupprecht, jdoerfert, llvm-commits
Tags: #llvm, #debug-info
Differential Revision: https://reviews.llvm.org/D58952
llvm-svn: 355972
Summary:
Swift now generates PDBs for debugging on Windows. llvm and lldb
need a language enumerator value too properly handle the output
emitted by swiftc.
Subscribers: jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59231
llvm-svn: 355882
Change the format type of *Personality and *LSDAAddress to PRIx64 since
they are of type uint64_t.
The problem was detected on mips builds, where it was printing junk values
and causing test failure.
Patch by Milos Stojanovic.
Differential Revision: https://reviews.llvm.org/D58451
llvm-svn: 355607
When dumping ToT clan's debug info with dwarfdump, we were seeing an
error saying that that the location list overflows the debug_loc
section. After reducing the testcase we figured out that we were
interpreting the DW_FORM_data4 as a section offset.
In DWARF3 DW_FORM_data4 and DW_FORM_data8 served also as a section
offset. Until now we didn't check check for the DWARF version, because
some producers (read old versions of clang) were still emitting this.
The relevant code/comment was added in 2013, and I believe it's now
reasonable to start checking the version.
The FormValue class is a little bit of a mess because it cashes the
DWARF unit and context when it extracted the value itself. Several
methods of the class rely on it being present, or return an Optional for
the code path that needs it. At the same time the FormValue class also
used in places where there's no DWARF unit.
For this patch I went with the least invasive change: checking the
version from the CU when it's available. If it's not (because the form
value was created from a value directly) we default to the old behavior.
Differential revision: https://reviews.llvm.org/D58698
llvm-svn: 355456
Add support for cloning DWARF expressions that contain base type DIE
references in dsymutil.
<rdar://problem/48167812>
Differential Revision: https://reviews.llvm.org/D58534
llvm-svn: 355148
That patch is the fix for https://bugs.llvm.org/show_bug.cgi?id=40703
"wrong line number info for obj file compiled with -ffunction-sections"
bug. The problem happened with only .o files. If object file contains
several .text sections then line number information showed incorrectly.
The reason for this is that DwarfLineTable could not detect section which
corresponds to specified address(because address is the local to the
section). And as the result it could not select proper sequence in the
line table. The fix is to pass SectionIndex with the address. So that it
would be possible to differentiate addresses from various sections. With
this fix llvm-objdump shows correct line numbers for disassembled code.
Differential review: https://reviews.llvm.org/D58194
llvm-svn: 354972
DWARFFormValues can be created from a data extractor or by passing its
value directly. Until now this was done by member functions that
modified an existing object's internal state. This patch replaces a
subset of these methods with static method that return a new
DWARFFormValue.
llvm-svn: 354941
Adds llvm-dwarfdump support for pretty printing Dwarf5 expressions ops
that reference a base type (right now only DW_OP_convert is added).
Includes verification to verify that the ops operand is actually a
DW_TAG_base_type DIE.
Differential Revision: https://reviews.llvm.org/D58442
llvm-svn: 354552
Summary:
llvm-symbolizer would originally report symbols that belonged to an invalid object file section.
Specifically the case where: `*Symbol.getSection() == ObjFile.section_end()`
This patch prevents the Symbolizer from collecting symbols that belong to invalid sections.
The test (from PR40591) introduces a case where two symbols have address 0,
one symbol is defined, 'foo', and the other is not defined, 'bar'. This patch will cause
the Symbolizer to keep 'foo' and ignore 'bar'.
As a side note, the logic for adding symbols to the Symbolizer's store
(`SymbolizableObjectFile::addSymbol`) replaces symbols with the
same <address, size> pair. At some point that logic should be revisited as in the
aforementioned case, 'bar' was overwriting 'foo' in the Symbolizer's store,
and 'foo' was forgotten.
This fixes PR40591
Reviewers: jhenderson, rupprecht
Reviewed By: rupprecht
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58146
llvm-svn: 354083
Summary:
rL189250 added a realpath call, and rL352916 because realpath breaks assumptions with some build systems. However, the /usr/lib/debug case has been clarified, falling back to /usr/lib/debug is currently broken if the obj passed in is a relative path. Adding a call to use absolute paths when falling back to /usr/lib/debug fixes that while still not making any realpath assumptions.
This also adds a --fallback-debug-path command line flag for testing (since we probably can't write to /usr/lib/debug from buildbot environments), but was also verified manually:
```
$ rm -f path/to/dwarfdump-test.elf-x86-64
$ strace llvm-symbolizer --obj=relative/path/to/dwarfdump-test.elf-x86-64.debuglink 0x40113f |& grep dwarfdump
```
Lookups went to relative/path/to/dwarfdump-test.elf-x86-64, relative/path/to/.debug/dwarfdump-test.elf-x86-64, and then finally /usr/lib/debug/absolute/path/to/dwarfdump-test.elf-x86-64.
Reviewers: dblaikie, samsonov
Reviewed By: dblaikie
Subscribers: krytarowski, aprantl, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57916
llvm-svn: 353730
When type streams with forward references were merged using GHashes, cycles
were introduced in the debug info. This was caused by
GlobalTypeTableBuilder::insertRecordAs() not inserting the record on the second
pass, thus yielding an empty ArrayRef at that record slot. Later on, upon PDB
emission, TpiStreamBuilder::commit() would skip that empty record, thus
offseting all indices that came after in the stream.
This solution comes in two steps:
1. Fix the hash calculation, by doing a multiple-step resolution, iff there are
forward references in the input stream.
2. Fix merge by resolving with multiple passes, therefore moving records with
forward references at the end of the stream.
This patch also adds support for llvm-readoj --codeview-ghash.
Finally, fix dumpCodeViewMergedTypes() which previously could reference deleted
memory.
Fixes PR40221
Differential Revision: https://reviews.llvm.org/D57790
llvm-svn: 353412
The wrong variable was being used when printing the address increment in
verbose output of .debug_line. This patch fixes this.
Reviewed by: JDevlieghere
Differential Revision: https://reviews.llvm.org/D57693
llvm-svn: 353288
Summary:
Using realpath makes assumptions about build systems that do not always hold true. The debug binary referred to from the .gnu_debuglink should exist in the same directory (or in a .debug directory, etc.), but the files may only exist as symlinks to a differently named files elsewhere, and using realpath causes that lookup to fail.
This was added in r189250, and this is basically a revert + regression test case.
Reviewers: dblaikie, samsonov, jhenderson
Reviewed By: dblaikie
Subscribers: llvm-commits, hiraditya
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57609
llvm-svn: 352916
Summary:
This patch fixes access to fpo streams in native pdb from DbiStream and makes
code consistent with DbiStreamBuilder.
Patch By: leonid.mashinskiy
Reviewers: zturner, aleksandr.urakov
Reviewed By: zturner
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56725
llvm-svn: 352615
PDBs contain several serialized hash tables. In the microsoft-pdb
repo published to support LLVM implementing PDB support, the
provided initializes the bucket count for the TPI and IPI streams
to the maximum size. This occurs in tpi.cpp L33 and tpi.cpp L398.
In the LLVM code for generating PDBs, these streams are created with
minimum number of buckets. This difference makes LLVM generated
PDBs slower for when used for debugging.
Patch by C.J. Hebert
Differential Revision: https://reviews.llvm.org/D56942
llvm-svn: 352117
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
This is difficult/not possible to test in LLVM, but is visible as a
crash in LLD when parsing DWARF to generate gdb-index.
This function is called by llvm-dwarfdump when parsing high_pc for
non-verbose output (to print the actual high_pc rather than the low_pc
relative value), but in that case llvm-dwarfdump doesn't print section
names (if it did, it would hit this problem).
We could add some other features to llvm-dwarfdump to expose this, but
nothing really springs to my mind. I will add a test to lld, though.
llvm-svn: 350010
Currently the section name (& possibly number) is only printed on
addresses in ranges - but no reason it couldn't also be displayed on
other addresses (like low/high PC).
Refactor in that direction by pulling out the section lookup and name
ambiguity dumping logic into a reusable helper.
llvm-svn: 349995
Propagate the llvm::Error a little further up. This is NFC for
llvm-dwarfdump in this change, but allows ld.lld to emit more precise
error messages about which object and archive the erroneous DWARF is in.
llvm-svn: 349978
Originally committed in r349333, reverted in r349353.
GCC emitted these unconditionally on/before 4.4/March 2012
Clang emitted these unconditionally on/before 3.5/March 2014
This improves performance when parsing CUs (especially those using split
DWARF) that contain no code ranges (such as the mini CUs that may be
created by ThinLTO importing - though generally they should be/are
avoided, especially for Split DWARF because it produces a lot of very
small CUs, which don't scale well in a bunch of other ways too
(including size)).
The revert was due to a (Google internal) test that had some checked in old
object files missing DW_AT_ranges. That's since been fixed.
llvm-svn: 349968
- When signing return addresses with -msign-return-address=<scope>{+<key>},
either the A key instructions or the B key instructions can be used. To
correctly authenticate the return address, the unwinder/debugger must know
which key was used to sign the return address.
- When and exception is thrown or a break point reached, it may be necessary to
unwind the stack. To accomplish this, the unwinder/debugger must be able to
first authenticate an the return address if it has been signed.
- To enable this, the augmentation string of CIEs has been extended to allow
inclusion of a 'B' character. Functions that are signed using the B key
variant of the instructions should have and FDE whose associated CIE has a 'B'
in the augmentation string.
- One must also be able to preserve these semantics when first stepping from a
high level language into assembly and then, as a second step, into an object
file. To achieve this, I have introduced a new assembly directive
'.cfi_b_key_frame ', that tells the assembler the current frame uses return
address signing with the B key.
- This ensures that the FDE is associated with a CIE that has 'B' in the
augmentation string.
Differential Revision: https://reviews.llvm.org/D51798
llvm-svn: 349895
This is to address post-commit feedback from Paul Robinson on r348954.
The original commit misinterprets count and upper bound as the same thing (I thought I saw GCC producing an upper bound the same as Clang's count, but GCC correctly produces an upper bound that's one less than the count (in C, that is, where arrays are zero indexed)).
I want to preserve the C-like output for the common case, so in the absence of a lower bound the count (or one greater than the upper bound) is rendered between []. In the trickier cases, where a lower bound is specified, a half-open range is used (eg: lower bound 1, count 2 would be "[1, 3)" and an unknown parts use a '?' (eg: "[1, ?)" or "[?, 7)" or "[?, ? + 3)").
Reviewers: aprantl, probinson, JDevlieghere
Differential Revision: https://reviews.llvm.org/D55721
llvm-svn: 349670
- Reapply changes intially introduced in r343089
- The archtecture info is no longer loaded whenever a DWARFContext is created
- The runtimes libraries (santiziers) make use of the dwarf context classes but
do not intialise the target info
- The architecture of the object can be obtained without loading the target info
- Adding a method to the dwarf context to get this information and multiplex the
string printing later on
Differential Revision: https://reviews.llvm.org/D55774
llvm-svn: 349472
GCC emitted these unconditionally on/before 4.4/March 2012
Clang emitted these unconditionally on/before 3.5/March 2014
This improves performance when parsing CUs (especially those using split
DWARF) that contain no code ranges (such as the mini CUs that may be
created by ThinLTO importing - though generally they should be/are
avoided, especially for Split DWARF because it produces a lot of very
small CUs, which don't scale well in a bunch of other ways too
(including size)).
llvm-svn: 349333
Doesn't handle varargs and other fun things, but it's a start. (also
doesn't print these strictly as valid C++ when it's a pointer to
function, it'll print as "void(int)*" instead of "void (*)(int)")
llvm-svn: 348965
This lays the foundation for dumping types not referenced by DW_AT_type
attributes (in the near-term, that'll be DW_AT_containing_type for a
DW_TAG_ptr_to_member_type - in the future, potentially dumping the
pretty printed name next to the DW_TAG for the type, rather than only
when the type is referenced from elsewhere)
llvm-svn: 348961
Previously we would create an lldb::Function object for each function
parsed, but we would not add these to the clang AST. This is a first
step towards getting local variable support working, as we first need an
AST decl so that when we create local variable entries, they have the
proper DeclContext.
Differential Revision: https://reviews.llvm.org/D55384
llvm-svn: 348631
VarStreamArray was built on the assumption that it is backed by a
StreamRef, and offset 0 of that StreamRef is the first byte of the first
record in the array.
This is a logical and intuitive assumption, but unfortunately we have
use cases where it doesn't hold. Specifically, a PDB module's symbol
stream is prefixed by 4 bytes containing a magic value, and the first
byte of record data in the array is actually at offset 4 of this byte
sequence.
Previously, we would just truncate the first 4 bytes and then construct
the VarStreamArray with the resulting StreamRef, so that offset 0 of the
underlying stream did correspond to the first byte of the first record,
but this is problematic, because symbol records reference other symbol
records by the absolute offset including that initial magic 4 bytes. So
if another record wants to refer to the first record in the array, it
would say "the record at offset 4".
This led to extremely confusing hacks and semantics in loading code, and
after spending 30 minutes trying to get some math right and failing, I
decided to fix this in the underlying implementation of VarStreamArray.
Now, we can say that a stream is skewed by a particular amount. This
way, when we access a record by absolute offset, we can use the same
values that the records themselves contain, instead of having to do
fixups.
Differential Revision: https://reviews.llvm.org/D55344
llvm-svn: 348499
Previously these were dropped. We now understand them sufficiently
well to start emitting them. From the debugger's perspective, this
now enables us to have debug info about typedefs (both global and
function-locally scoped)
Differential Revision: https://reviews.llvm.org/D55228
llvm-svn: 348306
Part of the patch to not build the hash map eagerly was omitted
due to a merge conflict. Add it back, which should fix the failing
tests.
llvm-svn: 348166
When there is no .debug_addr section for some reason,
llvm-dwarfdump would print the bogus empty section name when dumping ranges
in .debug_info:
DW_AT_ranges [DW_FORM_rnglistx] (indexed (0x0) rangelist = 0x00000004
[0x0000000000000000, 0x0000000000000001) ""
[0x0000000000000000, 0x0000000000000002) "")
That happens because of the code which uses 0 (zero) as a section index as a default value.
The code should use -1ULL instead because technically 0 is a valid zero section index
in ELF and -1ULL is a special constant used that means "no section available".
This is mostly a fix for the overall correctness/safety of the code,
but a test case is provided too.
Differential revision: https://reviews.llvm.org/D55113
llvm-svn: 348115
Summary:
This speeds up linking clang.exe/pdb with /DEBUG:GHASH by 31%, from
12.9s to 9.8s.
Symbol records are typically small (16.7 bytes on average), but we
processed them one at a time. CVSymbol is a relatively "large" type. It
wraps an ArrayRef<uint8_t> with a kind an optional 32-bit hash, which we
don't need. Before this change, each DbiModuleDescriptorBuilder would
maintain an array of CVSymbols, and would write them individually with a
BinaryItemStream.
With this change, we now add symbols that happen to appear contiguously
in bulk. For each .debug$S section (roughly one per function), we
allocate two copies, one for relocation, and one for realignment
purposes. For runs of symbols that go in the module stream, which is
most symbols, we now add them as a single ArrayRef<uint8_t>, so the
vector DbiModuleDescriptorBuilder is roughly linear in the number of
.debug$S sections (O(# funcs)) instead of the number of symbol records
(very large).
Some stats on symbol sizes for the curious:
PDB size: 507M
sym bytes: 316,508,016
sym count: 18,954,971
sym byte avg: 16.7
As future work, we may be able to skip copying symbol records in the
linker for realignment purposes if we make LLVM write them aligned into
the object file. We need to double check that such symbol records are
still compatible with link.exe, but if so, it's definitely worth doing,
since my profile shows we spend 500ms in memcpy in the symbol merging
code. We could potentially cut that in half by saving a copy.
Alternatively, we could apply the relocations *after* we iterate the
symbols. This would require some careful re-engineering of the
relocation processing code, though.
Reviewers: zturner, aganea, ruiu
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D54554
llvm-svn: 347687
When you have a member function with a ref-qualifier, for example:
struct Foo {
void Func() &;
void Func2() &&;
};
clang-cl was not emitting this information. Doing so is a bit
awkward, because it's not a property of the LF_MFUNCTION type, which
is what you'd expect. Instead, it's a property of the this pointer
which is actually an LF_POINTER. This record has an attributes
bitmask on it, and our handling of this bitmask was all wrong. We
had some parts of the bitmask defined incorrectly, but importantly
for this bug, we didn't know about these extra 2 bits that represent
the ref qualifier at all.
Differential Revision: https://reviews.llvm.org/D54667
llvm-svn: 347354
PointerAttributes is a bitwise-or of several other fields, each of
which is already printed on its own line with a better explanation.
So this doesn't really help much.
llvm-svn: 347275
Especially for symbolizer it can be efficient to have to search through
the entire index when it isn't needed - llvm-symbolizer looks up only a
few CUs & already has an index available in getUnitForEntry, once it's
passed down to DWARFUnitHeader::extract then there's no need for it to
call getFromOffset.
llvm-svn: 347134
This is a follow-up to r346715. Use PRIx64 to formatted print of 64-bit
value in the `DWARFDebugLoclists::LocationList::dump` to escape problem
on big-endian hosts.
llvm-svn: 347049
In a previous patch, we pre-processed the TPI stream in order to build
the reverse mapping from nested type -> parent type so that we could
accurately reconstruct a DeclContext hierarchy.
However, there were some issues. An LF_NESTTYPE record is really just a
typedef, so although it happens to be used to indicate the name of the
nested type and referring to the global record which defines the type,
it is also used for every other kind of nested typedef. When we rebuild
the DeclContext hierarchy, we want it to be as accurate as possible,
which means that if we have something like:
struct A {
struct B {};
using C = B;
};
We don't want to create two CXXRecordDecls in the AST each with the
exact same definition. We just want to create one for B and then
define C as an alias to B. Previously, however, it would not be able
to distinguish between the two cases and it would treat A::B and
A::C as being two classes each with separate definitions. We address
the first half of improving the pre-processing logic so that only
actual definitions are treated this way.
Later, in a followup patch, we can handle the case of nested
typedefs since we're already going to be enumerating the field list
anyway and this patch introduces the general framework for
distinguishing between the two cases.
Differential Revision: https://reviews.llvm.org/D54357
llvm-svn: 346786
The `DWARFDebugAddrTable::dump` routine prints 32/64-bits addresses.
These values are stored in a vector of `uint64_t` independently of their
original sizes. But `format` function gets format string with PRIx32
suffix in case of 32-bit address size. At least on MIPS 32-bit targets
that leads to incorrect output.
This patch changes formats strings and always use PRIx64 to print
`uint64_t` values.
Differential Revision: http://reviews.llvm.org/D54424
llvm-svn: 346715
This was being used as a sort of indirect out parameter from shouldDump
- seems simpler to use it as the actual result of the call. (this does
mean using a pointer to an Optional & actually using all 3 states (null,
None, and present) which is, admittedly, a tad subtle - but given the
limited scope, seems OK to me - open to discussion though, if others
feel strongly about it)
llvm-svn: 346691
Summary: The debug_info_offset values in .debug_{,gnu_}pub{name,types} may be relocated. Change it to DWARFSection so that we can get relocated values.
Reviewers: ruiu, dblaikie, grimar, JDevlieghere
Reviewed By: JDevlieghere
Subscribers: aprantl, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D54375
llvm-svn: 346615
This change allows for link-time merging of debugging information from
Microsoft precompiled types OBJs compiled with cl.exe /Z7 /Yc and /Yu.
This fixes llvm.org/PR34278
Differential Revision: https://reviews.llvm.org/D45213
llvm-svn: 346154
Adding functionality to the DWARF verifier for DWARF v5 strx* forms which
index into the string offsets table.
Differential Revision: https://reviews.llvm.org/D54049
llvm-svn: 346061
This is a minor bug fix. Previously, if you tried to encode the RSP
register on the x86 platform, that might have succeeded and been encoded
incorrectly. However, no existing producer or consumer passes the x86_64
registers when targeting x86_32.
llvm-svn: 345879
The TypeIndex used by cl.exe is 0x103, which indicates a SimpleTypeMode
of NearPointer (note the absence of the bitness, normally pointers use a
mode of NearPointer32 or NearPointer64) and a SimpleTypeKind of void.
So this is basically a void*, but without a specified size, which makes
sense given how std::nullptr_t is defined.
clang-cl was actually not emitting *anything* for this. Instead, when we
encountered std::nullptr_t in a DIType, we would actually just emit a
TypeIndex of 0, which is obviously wrong.
std::nullptr_t in DWARF is represented as a DW_TAG_unspecified_type with
a name of "decltype(nullptr)", so we add that logic along with a test,
as well as an update to the dumping code so that we no longer print
void* when dumping 0x103 (which would previously treat Void/NearPointer
no differently than Void/NearPointer64).
Differential Revision: https://reviews.llvm.org/D53957
llvm-svn: 345811
The purpose of this patch is twofold:
- Fold pre-DWARF v5 functionality into v5 to eliminate the need for 2 different
versions of range list handling. We get rid of DWARFDebugRangelist{.cpp,.h}.
- Templatize the handling of range list tables so that location list handling
can take advantage of it as well. Location list and range list tables have the
same basic layout.
A non-NFC version of this patch was previously submitted with r342218, but it caused
errors with some TSan tests. This patch has no functional changes. The difference to
the non-NFC patch is that there are no changes to rangelist dumping in this patch.
Differential Revision: https://reviews.llvm.org/D53545
llvm-svn: 345546
Relocatable content may have overlapping ranges until the sections are
finalized. This reduces the amount of verification that is done on an object
file so that invalid errors are not raised.
llvm-svn: 345441
As was already mentioned in comments for D53364, DWARF 5
spec says about DW_LLE_startx_length:
"This is a form of bounded location description that has two unsigned ULEB operands.
The first value is an address index (into the .debug_addr section) that indicates the beginning of the address range
over which the location is valid. The second value is the length of the range. ")
Currently, the length is always parsed as U32.
Patch change the behavior to parse DW_LLE_startx_length as ULEB128 for DWARF 5
and keeps it as U32 for DWARF4+(pre-DWARF5) for compatibility.
Differential revision: https://reviews.llvm.org/D53564
llvm-svn: 345254
This is mostly some cleanup done in the process of implementing
some basic support for types. I tried to split up the patch a
bit to get some of the NFC portion of the patch out into a separate
commit, and this is the result of that. It moves some code around,
deletes some spurious namespace qualifications, removes some
unnecessary header includes, forward declarations, etc.
llvm-svn: 344913
This teaches llvm-dwarfdump to dump the content of .debug_loclists sections.
It converts the DWARFDebugLocDWO class to DWARFDebugLoclists,
teaches llvm-dwarfdump about .debug_loclists section and
adds the implementation for parsing the DW_LLE_offset_pair entries.
Differential revision: https://reviews.llvm.org/D53364
llvm-svn: 344895
Summary:
This patch just extends the `IPDBSession` interface to allow retrieving
of frame data through it, and adds an implementation over DIA. It is needed
for an implementation (for now with DIA) of the conversion from FPO programs
to DWARF expressions mentioned in D53086.
Reviewers: zturner, asmith, rnk
Reviewed By: asmith
Subscribers: mgorny, aprantl, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D53324
llvm-svn: 344886
Putting addresses in the address pool, even with non-fission, can reduce
relocations - reusing the addresses from debug_info and debug_rnglists
(the latter coming soon)
llvm-svn: 344834
Considers the index when extracting location lists from a .dwp file.
Majority of the patch by David Blaikie.
Reviewers: dblaikie
Differential revision: https://reviews.llvm.org/D53155
llvm-svn: 344807
When we're on the last bucket the computation is tricky.
We were failing when the last bucket contained multiple
matches. Added a new test for this.
llvm-svn: 344081
We changed an ArrayRef<uint8_t> to an ArrayRef<uint32_t>, but
it needs to be an ArrayRef<support::ulittle32_t>.
We also change ArrayRef<> to FixedStreamArray<>. Technically
an ArrayRef<> will work, but it can cause a copy in the underlying
implementation if the memory is not contiguous, and there's no
reason not to use a FixedStreamArray<>.
Thanks to nemanjai@ and thakis@ for helping me track this down
and confirm the fix.
llvm-svn: 344063
Fix the following warning when compiling with clang (caused by commit
rL343951):
GlobalsStream.cpp:61:33: warning: comparison of integers of different
signs: 'int' and 'uint32_t'
This also avoids double evaluation of `GlobalsTable.HashBuckets.size()`.
llvm-svn: 343957
The Globals table is a hash table keyed on symbol name, so
it's possible to lookup symbols by name in O(1) time. Add
a function to the globals stream to do this, and add an option
to llvm-pdbutil to exercise this, then use it to write some
tests to verify correctness.
llvm-svn: 343951
NFC-ish (the parsing of the units is not a functional change - no
errors/warnings are emitted during the shallow parsing - though without
parsing them here, the "max version" would be wrong (still zero) later
on, so in those cases the units do need to be parsed)
llvm-svn: 343884
DWARF v5 introduces DW_AT_call_all_calls, a subprogram attribute which
indicates that all calls (both regular and tail) within the subprogram
have call site entries. The information within these call site entries
can be used by a debugger to populate backtraces with synthetic tail
call frames.
Tail calling frames go missing in backtraces because the frame of the
caller is reused by the callee. Call site entries allow a debugger to
reconstruct a sequence of (tail) calls which led from one function to
another. This improves backtrace quality. There are limitations: tail
recursion isn't handled, variables within synthetic frames may not
survive to be inspected, etc. This approach is not novel, see:
https://gcc.gnu.org/wiki/summit2010?action=AttachFile&do=get&target=jelinek.pdf
This patch adds an IR-level flag (DIFlagAllCallsDescribed) which lowers
to DW_AT_call_all_calls. It adds the minimal amount of DWARF generation
support needed to emit standards-compliant call site entries. For easier
deployment, when the debugger tuning is LLDB, the DWARF requirement is
adjusted to v4.
Testing: Apart from check-{llvm, clang}, I built a stage2 RelWithDebInfo
clang binary. Its dSYM passed verification and grew by 1.4% compared to
the baseline. 151,879 call site entries were added.
rdar://42001377
Differential Revision: https://reviews.llvm.org/D49887
llvm-svn: 343883
Summary:
Before this change, LLVM would always describe locals on the stack as
being relative to some specific register, RSP, ESP, EBP, ESI, etc.
Variables in stack memory are pretty common, so there is a special
S_DEFRANGE_FRAMEPOINTER_REL symbol for them. This change uses it to
reduce the size of our debug info.
On top of the size savings, there are cases on 32-bit x86 where local
variables are addressed from ESP, but ESP changes across the function.
Unlike in DWARF, there is no FPO data to describe the stack adjustments
made to push arguments onto the stack and pop them off after the call,
which makes it hard for the debugger to find the local variables in
frames further up the stack.
To handle this, CodeView has a special VFRAME register, which
corresponds to the $T0 variable set by our FPO data in 32-bit. Offsets
to local variables are instead relative to this value.
This is part of PR38857.
Reviewers: hans, zturner, javed.absar
Subscribers: aprantl, hiraditya, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D52217
llvm-svn: 343543
These work a little differently because they are actually in
the globals stream and are treated as symbol records, even though
DIA presents them as types. So this also adds the necessary
infrastructure to cache records that live somewhere other than
the TPI stream as well.
llvm-svn: 343507
We didn't properly detect when a pointer was a member
pointer, and when that was the case we were not
properly returning class parent info. This caused
member pointers to render incorrectly in pretty mode.
However, we didn't even have pretty tests for pointers
in native mode, so those are also added now to ensure
this.
llvm-svn: 343393
- Add fix so that all code paths that create DWARFContext
with an ObjectFile initialise the target architecture in the context
- Add an assert that the Arch is known in the Dwarf CallFrameString method
llvm-svn: 343317
This caused the DebugInfo/Sparc/gnu-window-save.ll test to fail.
> Functions that have signed return addresses need additional dwarf support:
> - After signing the LR, and before authenticating it, the LR register is in a
> state the is unusable by a debugger or unwinder
> - To account for this a new directive, .cfi_negate_ra_state, is added
> - This directive says the signed state of the LR register has now changed,
> i.e. unsigned -> signed or signed -> unsigned
> - This directive has the same CFA code as the SPARC directive GNU_window_save
> (0x2d), adding a macro to account for multiply defined codes
> - This patch matches the gcc implementation of this support:
> https://patchwork.ozlabs.org/patch/800271/
>
> Differential Revision: https://reviews.llvm.org/D50136
llvm-svn: 343103
Functions that have signed return addresses need additional dwarf support:
- After signing the LR, and before authenticating it, the LR register is in a
state the is unusable by a debugger or unwinder
- To account for this a new directive, .cfi_negate_ra_state, is added
- This directive says the signed state of the LR register has now changed,
i.e. unsigned -> signed or signed -> unsigned
- This directive has the same CFA code as the SPARC directive GNU_window_save
(0x2d), adding a macro to account for multiply defined codes
- This patch matches the gcc implementation of this support:
https://patchwork.ozlabs.org/patch/800271/
Differential Revision: https://reviews.llvm.org/D50136
llvm-svn: 343089
This allows the native reader to find records of class/struct/
union type and dump them. This behavior is tested by using the
diadump subcommand against golden output produced by actual DIA
SDK on the same PDB file, and again using pretty -native to
confirm that we actually dump the classes. We don't find class
members or anything like that yet, for now it's just the class
itself.
llvm-svn: 342779