Previously we would dump the names of enum types, but not their
enumerator values. This adds support for enumerator values. In
doing so, we have to introduce a general purpose mechanism for
caching symbol indices of field list members. Unlike global
types, FieldList members do not have a TypeIndex. So instead,
we identify them by the pair {TypeIndexOfFieldList, IndexInFieldList}.
llvm-svn: 342415
Previously for cv-qualified types, we would just ignore them
and they would never get printed. Now we can enumerate them
and cache them like any other symbol type.
llvm-svn: 342414
Naively computing the hash after the PDB data has been generated is in practice
as fast as other approaches I tried. I also tried online-computing the hash as
parts of the PDB were written out (https://reviews.llvm.org/D51887; that's also
where all the measuring data is) and computing the hash in parallel
(https://reviews.llvm.org/D51957). This approach here is simplest, without
being slower.
Differential Revision: https://reviews.llvm.org/D51956
llvm-svn: 342333
Eventually we need to be able to support nested types, which don't
have an associated CVType record. To handle this, remove the
CVType from all of the record classes, and instead store the
deserialized record. Then move the deserialization up to the thing
that creates the type. This actually makes error handling better
anyway as we can return an invalid symbol instead of asserting false.
llvm-svn: 342284
r342003 added support for emitting FPO data from the
DEBUG_S_FRAMEDATA subsection of the .debug$S section to the PDB
file. However, that is not the end of the story. FPO can end
up in two different destinations in a PDB, each corresponding to
a different FPO data source.
The case handled by r342003 involves copying data from the
DEBUG_S_FRAMEDATA subsection of the .debug$S section to the
"New FPO" stream in the PDB, which is then referred to by the
DBI stream. The case handled by this patch involves copying
records from the .debug$F section of an object file to the "FPO"
stream (or perhaps more aptly, the "Old FPO" stream) in the PDB
file, which is also referred to by the DBI stream.
The formats are largely similar, and the difference is mostly
only visible in masm generated object files, such as some of the
low-level CRT object files like memcpy. MASM doesn't appear to
support writing the DEBUG_S_FRAMEDATA subsection, and instead
just writes these records to the .debug$F section.
Although clang-cl does not emit a .debug$F section ever, lld still
needs to support it so we have good debugging for CRT functions.
Differential Revision: https://reviews.llvm.org/D51958
llvm-svn: 342080
Eliminating some duplication of rangelist dumping code at the expense of
some version-dependent code in dump and extract routines.
Reviewer: dblaikie, JDevlieghere, vleschuk
Differential revision: https://reviews.llvm.org/D51081
llvm-svn: 342048
Summary:
There are two registers encoded in the S_FRAMEPROC flags: one for locals
and one for parameters. The encoding is described by the
ExpandEncodedBasePointerReg function in cvinfo.h. Two bits are used to
indicate one of four possible values:
0: no register - Used when there are no variables.
1: SP / standard - Variables are stored relative to the standard SP
for the ISA.
2: FP - Variables are addressed relative to the ISA frame
pointer, i.e. EBP on x86. If realignment is required, parameters
use this. If a dynamic alloca is used, locals will be EBP relative.
3: Alternative - Variables are stored relative to some alternative
third callee-saved register. This is required to address highly
aligned locals when there are dynamic stack adjustments. In this
case, both the incoming SP saved in the standard FP and the current
SP are at some dynamic offset from the locals. LLVM uses ESI in
this case, MSVC uses EBX.
Most of the changes in this patch are to pass around the CPU so that we
can decode these into real, named architectural registers.
Subscribers: hiraditya
Differential Revision: https://reviews.llvm.org/D51894
llvm-svn: 341999
Makes the produced pdbs more deterministic; before they'd contain 2 arbitary
bytes where this padding was.
Also reorder initialization to match the order of the fields in the struct (nfc)
llvm-svn: 341945
With the merge of TUs and CUs into a single container, some code that
relied on the CU range having an ordered range of contiguous addresses
(for locating a CU at a given offset) broke. But the units from
debug_info (currently only CUs, but CUs and TUs in DWARFv5) are in a
contiguous sub-range of that container - searching only through that
subrange is still valid & so do that.
llvm-svn: 341889
clang-format was getting confused due to the presence of a macro
invocation that was not terminated by a semicolon. Fixed this by
terminating the macro lines with semicolons and re-ran clang-format
on the file.
llvm-svn: 341864
- Log the reason for a PDB or precompiled-OBJ load failure
- Properly handle out-of-date PDB or precompiled-OBJ signature by displaying a corresponding error
- Slightly change behavior on PDB failure: any subsequent load attempt from another OBJ would result in the same error message being logged
- Slightly change behavior on PDB failure: retry with filename only if previous error was ENOENT ("no such file or directory")
- Tests: a. for native PDB errors; b. cover all the cases above
Differential Revision: https://reviews.llvm.org/D51559
llvm-svn: 341825
They were unintentionally calling DIA directly, which requires
Windows. We need to pass the -native flag, and this then required
fixing up one or two tests.
llvm-svn: 341731
In order to start testing this, I've added a new mode to
llvm-pdbutil which is only really useful for writing tests.
It just dumps the value of raw fields in record format.
This isn't really ideal and it won't allow us to test some
important cases, but it's better than nothing for now.
llvm-svn: 341729
Part of the responsibility of the native PDB reader is to cache
symbols the first time they are accessed, so they can then be
looked up by an ID. Furthermore, we need to resolve type indices
to records that we vend to the user, and other things. Previously
this code was all thrown together a bit haphazardly in the native
session class, but it makes sense to collect all of this into a
single class whose sole responsibility is to manage the collection
of known symbols.
llvm-svn: 341608
The way DIA SDK works is that when you request a symbol, it
gets assigned an internal identifier that is unique for the
life of the session. You can then use this identifier to
get back the same symbol, with all of the same internal state
that it had before, even if you "destroyed" the original
copy of the object you had.
This didn't work properly in our native implementation, and
if you destroyed an object for a particular symbol, then
requested the same symbol again, it would get assigned a new
ID and you'd get a fresh copy of the object. In order to fix
this some refactoring had to happen to properly reuse cached
objects. Some unittests are added to verify that symbol
reuse is taking place, making use of the new unittest input
feature.
llvm-svn: 341503
The -diff option makes it easy to diff dwarf by hiding addresses and
offsets. However not all of them were hidden, which should be fixed by
this patch.
Differential revision: https://reviews.llvm.org/D51593
llvm-svn: 341377
According to the standard, for the .debug_names (the "dwarf accelerator
tables"):
> If a subprogram or inlined subroutine is included, and has a
> DW_AT_linkage_name attribute, there will be an additional index entry
> for the linkage name.
For Swift we generate DW_structure_types with a linkage name and the
verifier was incorrectly rejecting this. This patch fixes that by only
considering the linkage name in those particular cases. The test is the
"reduced" debug info of the failing swift test on swift.org.
Differential revision: https://reviews.llvm.org/D51420
llvm-svn: 341311
Following D50807, and heading towards D50664, this intermediary change does the following:
1. Upgrade all custom Error types in llvm/trunk/lib/DebugInfo/ to use the new StringError behavior (D50807).
2. Implement std::is_error_code_enum and make_error_code() for DebugInfo error enumerations.
3. Rename GenericError -> PDBError (the file will be renamed in a subsequent commit)
4. Update custom error messages to follow the same formatting: (\w\s*)+\.
5. Keep generic "file not found" (ENOENT) errors as they are in PDB code. Previously, there used to be a custom enumeration for that purpose.
6. Remove a few extraneous LF in log() implementations. Printing LF is a responsability at a higher level, not at the error level.
Differential Revision: https://reviews.llvm.org/D51499
llvm-svn: 341228
Both DWARFDebugLine and DWARFDebugAddr used the same callback mechanism
for handling recoverable errors. They both implemented similar warn() function
to be used as such callbacks.
In this revision we get rid of code duplication and move this warn() function
to DWARFContext as DWARFContext::dumpWarning().
Reviewers: lhames, jhenderson, aprantl, probinson, dblaikie, JDevlieghere
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D51033
llvm-svn: 340528
DWARF-related classes in lib/DebugInfo/DWARF contained
duplicating code for creating StringError instances, like:
template <typename... Ts>
static Error createError(char const *Fmt, const Ts &... Vals) {
std::string Buffer;
raw_string_ostream Stream(Buffer);
Stream << format(Fmt, Vals...);
return make_error<StringError>(Stream.str(), inconvertibleErrorCode());
}
Similar function was placed in Support lib in https://reviews.llvm.org/D49824
This revision makes DWARF classes use this function
instead of their local implementation of it.
Reviewers: aprantl, dblaikie, probinson, wolfgangp, JDevlieghere, jhenderson
Reviewed By: JDevlieghere, jhenderson
Differential Revision: https://reviews.llvm.org/D49964
llvm-svn: 340163
Summary:
This prefix was added in r333421, and it changed our dumper output to
say things like "CVRegEAX" instead of just "EAX". That's a functional
change that I'd rather avoid.
I tested GCC, Clang, and MSVC, and all of them support #pragma
push_macro. They don't issue warnings whem the macro is not defined
either.
I don't have a Mac so I can't test the real termios.h header, but I
looked at the termios.h sources online and looked for other conflicts.
I saw only the CR* macros, so those are the ones we work around.
Reviewers: zturner, JDevlieghere
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D50851
llvm-svn: 339907
We don't expect module names to be present in the index. This patch adds
DW_TAG_module to the blacklist.
Differential revision: https://reviews.llvm.org/D50237
llvm-svn: 338878
This is patch 4 of 4 NFC refactorings to handle type units and compile
units more consistently and with less concern about the object-file
section that they came from.
Patch 4 combines separate DWARFUnitVectors for compile and type units
into a single DWARFUnitVector that contains both. For now the
implementation distinguishes compile units from type units by putting
all compile units at the front of the vector, reflecting the DWARF v4
distinction between .debug_info and .debug_types sections. A future
patch will change this to allow the free mixing of unit kinds, as is
specified by DWARF v5.
Differential Revision: https://reviews.llvm.org/D49744
llvm-svn: 338633
This is patch 3 of 4 NFC refactorings to handle type units and compile
units more consistently and with less concern about the object-file
section that they came from.
Patch 3 simply renames DWARFUnitSection to DWARFUnitVector, as the
object-file section of a unit is nearly irrelevant now.
Differential Revision: https://reviews.llvm.org/D49743
llvm-svn: 338632
This is patch 2 of 4 NFC refactorings to handle type units and compile
units more consistently and with less concern about the object-file
section that they came from.
Patch 2 takes the existing std::deque<DWARFUnitSection> for type units
and makes it a simple DWARFUnitSection, simplifying the handling of
type units and making it more consistent with compile units.
Differential Revision: https://reviews.llvm.org/D49742
llvm-svn: 338629
This is patch 1 of 4 NFC refactorings to handle type units and compile
units more consistently and with less concern about the object-file
section that they came from.
Patch 1 replaces the templated DWARFUnitSection with a non-templated
version. That is, instead of being a SmallVector of pointers to a
specific unit kind, it is not a SmallVector of pointers to the base
class for both type and compile units. Virtual methods are magic.
Differential Revision: https://reviews.llvm.org/D49741
llvm-svn: 338628
Summary: SmallVector's elements are moved when resizing and cause use-after-free.
Reviewers: probinson, dblaikie
Subscribers: JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D49702
llvm-svn: 337772
The intent is to use it for location list tables as well. Change is almost NFC with the exception
of the spelling of some strings used during dumping (all lowercase now).
Reviewer: JDevlieghere
Differential Revision: https://reviews.llvm.org/D49500
llvm-svn: 337763
For instance, When dumping .apple_types, the second atom represents the
DW_TAG. In addition to printing the raw value, we now also pretty print
the value if the ATOM tells us how.
llvm-svn: 337026
Make the DIE iterator bidirectional so we can move to the previous
sibling of a DIE.
Differential revision: https://reviews.llvm.org/D49173
llvm-svn: 336823
I don't think there's a need to use `const char *`. In most (probably all?)
cases, we need a length of a name later, so discarding a length will
lead to a wasted effort.
Differential Revision: https://reviews.llvm.org/D49046
llvm-svn: 336612
Summary:
If the encoding is not specified in CIE augmentation string, then it
should be DW_EH_PE_absptr instead of DW_EH_PE_omit.
Reviewers: ruiu, MaskRay, plotfi, rafauler
Reviewed By: MaskRay
Subscribers: rafauler, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D49000
llvm-svn: 336577
The reference implementation uses a case-insensitive string
comparison for strings of equal length. This will cause the
string "tEo" to compare less than "VUo". However we were using
a case sensitive comparison, which would generate the opposite
outcome. Switch to a case insensitive comparison. Also, when
one of the strings contains non-ascii characters, fallback to
a straight memcmp.
The only way to really test this is with a DIA test. Before this
patch, the test will fail (but succeed if link.exe is used instead
of lld-link). After the patch, it succeeds even with lld-link.
llvm-svn: 336464
It seems like the debugger first computes a symbol's bucket,
and then does a binary search of entries in the bucket using the
symbol's name in order to find it. If the bucket entries are not
in sorted order, this obviously won't work. After this patch a
couple of simple test cases show that we generate an exactly
identical GSI hash stream, which is very nice.
llvm-svn: 336405
We have a function which switches on the type of a symbol record
to return a hardcoded offset into the record that contains the
symbol name. Not all symbols have names to begin with, and for
those records we return -1 for the offset.
Names are used for various things. Importantly for this particular
bug, a hash of the record name is used as a key for certain hash
tables which are serialied into the PDB file. One of these hash
tables is for the global symbol stream, which is basically a
collection of S_PROCREF symbols which contain the name of the
symbol, a module, and an address offset.
However, for S_PROCREF symbols, the function to return the offset
of the name was returning -1: basically it wasn't implemented.
As a result of this, all global symbols were hashing to the same
value, essentially it was as if every single global symbol's name
was the empty string.
This manifests in the VS debugger when you try to call a function
(global or member, doesn't matter) through the immediate window
and the debugger simply reports an error because it can't find the
function. This makes perfect sense, because it is hashing the name
for real, looking in the global symbol hash table, and there is only
1 entry there which corresponds to a symbol whose name is the empty
string.
Fixing this fixes the MSVC debugger in this case.
llvm-svn: 336024
The code to emit the pieces of the MSF file were actually in
PDBFileBuilder. Move this to MSFBuilder so that we can
theoretically emit an MSF without having a PDB file.
llvm-svn: 335789
Summary:
The NetBSD Operating System installs debuginfo
files into /usr/libdata/debug, rather than other path
like in some other popular distribution.
This change makes llvm-symbolizer functional with
the basesystem executables.
Reviewers: joerg, vitalybuka
Reviewed By: vitalybuka
Subscribers: JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D48525
llvm-svn: 335511
Errors found processing the DW_AT_ranges attribute are propagated by lower level
routines and reported by their callers.
Reviewer: JDevlieghere
Differential Revision: https://reviews.llvm.org/D48344
llvm-svn: 335188
Summary:
This method was not correct for entries in DWO files as it assumed it
could just add up the CU and DIE offsets to get the absolute DIE offset.
This is not correct for the DWO files, as here the CU offset will
reference the skeleton unit, whereas the DIE offset will be the offset
in the full unit in the DWO file.
Unfortunately, this means that we are not able to determine the absolute
DIE offset using the information in the .debug_names section alone,
which means we have to offload some of this work to the users of this
class.
To demonstrate how this can be done, I've added/fixed the ability to
lookup entries using accelerator tables in DWO files in llvm-dwarfdump.
To make this happen, I've needed to make two extra changes in other
classes:
- made the DWARFContext method to lookup a CU based on the section
offset public. I've needed this functionality to lookup a CU, and this
seems like a useful thing in general.
- made DWARFUnit::getDWOId call extractDIEsIfNeeded. Before this, the
DWOId was filled in only if the root DIE happened to be parsed
before we called the accessor. Since the lazy parsing is supposed to
happen under the hood, calling extractDIEsIfNeeded seems appropriate.
Reviewers: JDevlieghere, aprantl, dblaikie
Subscribers: mgrang, llvm-commits
Differential Revision: https://reviews.llvm.org/D48009
llvm-svn: 334578
Summary:
Back when we were introducing the DWARF v5 name index, there was a
short discussion whether we shouldn't have a nicer api for iterating
over the index. At that time, I did not find it necessary since the
iteration over names was done only from within the index itself (and I
figured the internal implementation can deal with a slightly rough
interface).
However, now I ran into a use for this kind of API in LLDB (for finding
all names matching a regular expression), so it looked like a nice
opportunity to introduce one. To make the API more useful, I've made the
NameTableEntry class a bit smarter: it now stores the string section
reference (so it can return its name) and its position in the name index
(mainly useful for dumping/logging).
I also convert the internal users to use the new API, which also gives
test coverage for the added code.
Reviewers: JDevlieghere, aprantl, dblaikie
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47590
llvm-svn: 333738
Summary:
Both (Apple and DWARF5) implementations of the iterators had bugs which
resulted in crashes if one attempted to iterate through the accelerator
tables all the way.
For the Apple tables, the issue was that we did not clear the DataOffset
field when we reached the end, which made our iterator compare unequal
to the "end" iterator. For the Dwarf5 tables, the problem was that we
incremented the CurrentIndex pointer and then used the incremented
(possibly invalid) pointer to check whether we have reached the end of
the index list.
The reason these bugs went undetected is because their only user
(dwarfdump) only ever searched for the first match. Besides allowing us
to test this fix, changing llvm-dwarfdump --find to display all matches
seems like a good improvement (it makes the behavior consistent with the
--name option), so I change llvm-dwarfdump to do that.
The existing tests would be sufficient to test this fix with the new
llvm-dwarfdump behavior, but I add a special test that demonstrates that
the tool indeed displays multiple results. The find.test test needed to
be tweaked a bit as the tool now does not print the ".debug_info
contents" header (also consistent with how --name works).
Reviewers: JDevlieghere, aprantl, dblaikie
Subscribers: mgrang, llvm-commits
Differential Revision: https://reviews.llvm.org/D47543
llvm-svn: 333635
When requesting to dump both the parent chain and children, we used to
print the DIE more than once because we propagated the dump options to
the parent without clearing the respective flags. This commit fixes this
oversight and adds a test.
rdar://39415292
Differential revision: https://reviews.llvm.org/D47263
llvm-svn: 333350
When printing an error for an invalid address range in a DIE, we used to
print the child above the parent, which is counter intuitive. This patch
reverses the order and indents the child to mimic the way we print the
debug info section.
llvm-svn: 333006
In DWARF v5, the DWO ID is in the (split/skeleton) CU header, not an
attribute on the CU DIE.
This changes the size of those headers, so use the parsed size whenever
we have one, for simplicitly.
Differential Revision: https://reviews.llvm.org/D47158
llvm-svn: 333004
Rather than relying on the user to do the address calculating in
DW_AT_location we should just dump the absolute address.
rdar://problem/38513870
Differential revision: https://reviews.llvm.org/D47152
llvm-svn: 332873
Change the "recoverable" error callback to take an Error instaed of a
string.
Reviewed by: JDevlieghere
Differential Revision: https://reviews.llvm.org/D46831
llvm-svn: 332845
Previously we emitted 20-byte SHA1 hashes. This is overkill
for identifying debug info records, and has the negative side
effect of making object files bigger and links slower. By
using only the last 8 bytes of a SHA1, we get smaller object
files and ~10% faster links.
This modifies the format of the .debug$H section by adding a new
value for the hash algorithm field, so that the linker will still
work when its object files have an old format.
Differential Revision: https://reviews.llvm.org/D46855
llvm-svn: 332669
The prefix includes type kind, which is important to preserve. Two
different type leafs can easily have the same interior record contents
as another type.
We ran into this issue in PR37492 where a bitfield type record collided
with a const modifier record. Their contents were bitwise identical, but
their kinds were different.
llvm-svn: 332664
This is a resubmit of r331868 (D46583), which was reverted due to
failures on the PS4 bot.
These have been resolved with r332246/D46748.
llvm-svn: 332349
Extract information related to a "unit header" from DWARFUnit into a
new DWARFUnitHeader class, and add a DWARFUnit member for the header.
This is one step in the direction of allowing type units in the
.debug_info section for DWARF v5.
Differential Revision: https://reviews.llvm.org/D46707
llvm-svn: 332289
Summary:
If we are not emitting a linkage name in the .debug_info sections, we
should not add it into the index either. This makes sure our index is
consistent with the actual debug info.
I am also explicitly setting the --dwarf-linkage-names=All in the
name-collsions test as that one would now fail on targets where this
defaults to "Abstract" (in fact, it would have failed already if there
wasn't a bug in the DWARF verifier, which I fix as well).
Reviewers: probinson, aprantl, JDevlieghere
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46748
llvm-svn: 332246
length excluding the table header. Instead it must encode the contribution length minus the length
field itself.
Reviewer: JDevliegehere
Differential Revision: https://reviews.llvm.org/D45922
llvm-svn: 332030
The print format was causing at least 2 unit-test failures from r331971.
The signed/unsigned comparison warnings only appeared to affect two lines but
it was unclear whether it might just pop up on other lines, so I have been
explicit in all the literals in the tests.
There were other bot unit-test failures that I am still investigating.
llvm-svn: 331978
Reviewed by: dblaikie, JDevlieghere, espindola
Differential Revision: https://reviews.llvm.org/D44560
Summary:
The .debug_line parser previously reported errors by printing to stderr and
return false. This is not particularly helpful for clients of the library code,
as it prevents them from handling the errors in a manner based on the calling
context. This change switches to using llvm::Error and callbacks to indicate
what problems were detected during parsing, and has updated clients to handle
the errors in a location-specific manner. In general, this means that they
continue to do the same thing to external users. Below, I have outlined what
the known behaviour changes are, relating to this change.
There are two levels of "errors" in the new error mechanism, to broadly
distinguish between different fail states of the parser, since not every
failure will prevent parsing of the unit, or of subsequent unit. Malformed
table errors that prevent reading the remainder of the table (reported by
returning them) and other minor issues representing problems with parsing that
do not prevent attempting to continue reading the table (reported by calling a
specified callback funciton). The only example of this currently is when the
last sequence of a unit is unterminated. However, I think it would be good to
change the handling of unrecognised opcodes to report as minor issues as well,
rather than just printing to the stream if --verbose is used (this would be a
subsequent change however).
I have substantially extended the DwarfGenerator to be able to handle
custom-crafted .debug_line sections, allowing for comprehensive unit-testing
of the parser code. For now, I am just adding unit tests to cover the basic
error reporting, and positive cases, and do not currently intend to test every
part of the parser, although the framework should be sufficient to do so at a
later point.
Known behaviour changes:
- The dump function in DWARFContext now does not attempt to read subsequent
tables when searching for a specific offset, if the unit length field of a
table before the specified offset is a reserved value.
- getOrParseLineTable now returns a useful Error if an invalid offset is
encountered, rather than simply a nullptr.
- The parse functions no longer use `WithColor::warning` directly to report
errors, allowing LLD to call its own warning function.
- The existing parse error messages have been updated to not specifically
include "warning" in their message, allowing consumers to determine what
severity the problem is.
- If the line table version field appears to have a value less than 2, an
informative error is returned, instead of just false.
- If the line table unit length field uses a reserved value, an informative
error is returned, instead of just false.
- Dumping of .debug_line.dwo sections is now implemented the same as regular
.debug_line sections.
- Verbose dumping of .debug_line[.dwo] sections now prints the prologue, if
there is a prologue error, just like non-verbose dumping.
As a helper for the generator code, I have re-added emitInt64 to the
AsmPrinter code. This previously existed, but was removed way back in r100296,
presumably because it was dead at the time.
This change also requires a change to LLD, which will be committed separately.
llvm-svn: 331971
The new verifier check has found an error in the
debug-names-name-collisions.ll test on the PS4 bot:
error: Name Index @ 0x0: Entry @ 0xdc: mismatched Name of DIE @ 0x23: index - _ZN3foo3fooE; debug_info - foo.
Reverting while I investigate whether this is a bug in the verifier or
the generator.
This reverts commit r331868.
llvm-svn: 331869
Summary:
This patch implements a check which makes sure all entries required by
the DWARF v5 specification are present in the Name Index. The algorithm
tries to follow the wording of Section 6.1.1.1 of the spec as closely as
possible.
The main deviation from it is that instead of a whitelist-based approach
in the spec "The name index must contain an entry for each debugging
information entry that defines a named subprogram, label, variable,
type, or namespace" I chose a blacklist-based one, where I consider
everything to be "in" and then remove the entries that don't make sense.
I did this because it has more potential for catching interesting cases
and the above is a bit vague (it uses plain words like "variable" and
"subprogram", but the rest of the section speaks about specific TAGs).
This approach has raised some interesting questions, the main one being
whether enumerator values should be indexed. The consensus seems to be
that they should, although it does not follow from section 6.1.1.1.
For the time being I made the verifier ignore these, as LLVM does not do
this yet, and I wanted to get a clean run when verifying generated debug
info.
Another interesting case was the DW_TAG_imported_declaration. It was not
immediately clear to me whether this should go in or not, but currently
it is not indexed, and (unlike the enumerators) in does not seem to cause
problems for LLDB, so I've also ignored it.
Reviewers: JDevlieghere, aprantl, dblaikie
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46583
llvm-svn: 331868
LLVM always puts function definition DIEs at the top level, but under
some circumstances GCC does not (at least in this case with member
functions of a function-local type).
To ensure that doesn't appear as though the local type's member function
is unduly inlined within the outer function - ensure the inline
discovery DIE parent walk stops at the first DW_TAG_subprogram.
llvm-svn: 331291
This prevents infinite recursion in DWARFDie::findRecursively for
malformed DWARF where a DIE references itself.
This fixes PR36257.
Differential revision: https://reviews.llvm.org/D43092
llvm-svn: 331200
Part of the DBI stream is a list of variable length structures
describing each module that contributes to the final executable.
One member of this structure is a section contribution entry that
describes the first section contribution in the output file for
the given module.
We have been leaving this structure unpopulated until now, so with
this patch it is now filled out correctly.
Differential Revision: https://reviews.llvm.org/D45832
llvm-svn: 330457
Updated two more debug line related warnings to use WithColor. This was
necessary to ensure consistent output order of the warnings on Windows
for debug line tests.
Differential Revision: https://reviews.llvm.org/D45871
llvm-svn: 330440
The DBI stream contains a list of module descriptors. At the
beginning of each descriptor is a structure representing the first
section contribution in the output file for that module. LLD
currently doesn't fill out this structure at all, but link.exe
does. So as a precursor to emitting this data in LLD, we first
need a way to dump it so that it can be checked.
This patch adds support for the dumping, and verifies via a test
that LLD emits bogus information.
llvm-svn: 330208
Using Config->is64() will treat ARM64 as Amd64, which is incorrect.
Furthermore, there are more esoteric architectures that could
theoretically be encountered. Just set it directly to the machine
type, which we already know anyway.
llvm-svn: 330157
When emitting CodeView debug information, compiler-generated thunk routines
should be emitted using S_THUNK32 symbols instead of S_GPROC32_ID symbols so
Visual Studio can properly step into the user code. This initial support only
handles standard thunk ordinals.
Differential Revision: https://reviews.llvm.org/D43838
llvm-svn: 330132
Most of these are pretty trivial and obvious. Setting the toolchain
version to 14.11 is perhaps a little questionable, but we've been bitten
in the past where one of our version fields sidn't match MSVC's, and I
definitely don't want to go through that diagnosis again as it was
pretty time consuming and hard to track down.
I found all of these by using llvm-pdbutil export to dump the dbi and
pdb streams to a file, then using fc followed by llvm-pdbutil explain to
explain the mismatched bytes.
There are still some more, these are just the low hanging fruit.
Differential Revision: https://reviews.llvm.org/D45276
llvm-svn: 330130
r327219 added wrappers to std::sort which randomly shuffle the container before
sorting. This will help in uncovering non-determinism caused due to undefined
sorting order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of
std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to
llvm::sort. Refer the comments section in D44363 for a list of all the
required patches.
llvm-svn: 330061
While reading Codeview records which contain variable-length encoded integers,
such as LF_BCLASS, LF_ENUMERATE, LF_MEMBER, LF_VBCLASS or LF_IVBCLASS,
the record's size would be improperly calculated in cases where the value was
indeed of a variable length (>= LF_NUMERIC). This caused a bad alignement on
the next record, which would/might crash later on.
Differential Revision: https://reviews.llvm.org/D45104
llvm-svn: 329659
Summary:
This patch add checks to verify that the information in the name index
entries is consistent with the debug_info section. Specifically, we
check that entries point to valid DIEs, and their names, tags, and
compile units match the information in the debug_info sections.
These checks are only run if the previous checks did not find any errors
in the name index headers. Attempting to proceed with the checks anyway
would likely produce a lot of spurious errors and the verification code
would need to be very careful to avoid crashing.
I also add a couple of more checks to the abbreviation-validation code
to verify that some attributes are always present (an index without a
DW_IDX_die_offset attribute is fairly useless).
The entry verification works only on indexes without any type units - I
haven't attempted to extend it to type units, as we don't even have a
DWARF v5-compatible type unit generator at the moment.
Reviewers: JDevlieghere, aprantl, dblaikie
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45323
llvm-svn: 329392
Summary:
The positions of the DwarfVersion and AddressSize arguments were
reversed, which caused parsing for dwarf opcodes which contained
address-size-dependent operands (such as DW_OP_addr). Amusingly enough,
none of the address-size asserts fired, as dwarf version was always 4,
which is a valid address size.
I ran into this when constructing weird inputs for the DWARF verifier. I
I add a test case as hand-written dwarf -- I am not sure how to trigger
this differently, as having a DW_OP_addr inside a location list is a
fairly non-standard thing to do.
Fixing this error exposed a bug in the debug_loc.dwo parser, which was
always being constructed with an address size of 0. I fix that as well
by following the pattern in the non-dwo parser of picking up the address
size from the first compile unit (which is technically not correct, but
probably good enough in practice).
Reviewers: JDevlieghere, aprantl, dblaikie
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45324
llvm-svn: 329381
Using this, you can use llvm-pdbutil to export the contents of a
stream to a binary file, then run explain on the binary file so
that it treats the offset as an offset into the stream instead
of an offset into a file. This makes it easy to compare the
contents of the same stream from two different files.
llvm-svn: 329207
The missing definitions are from cvconst.h shipped with DIA SDK.
Correct the url to MSDN for MemoryTypeEnum and set the underlying
type of PDB_StackFrameType and PDB_MemoryType to uint16_t.
llvm-svn: 329104
This command can dump the binary contents of a stream to a file.
This is useful when you want to do side-by-side comparisons of
a specific stream from two PDBs to examine the differences between
them. You can export both of them to a file, then open them up
side by side in a hex editor (for example), so as to eliminate any
differences that might arise from the contents being on different
blocks in the PDB.
In subsequent patches I plan to improve the "explain" subcommand
so that you can explain the contents of a binary file that isn't
necessarily a full PDB, but one of these dumped streams, by telling
the subcommand how to interpret the contents.
llvm-svn: 329002
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.
Reviewers: echristo, zturner, samsonov
Reviewed By: echristo
Subscribers: JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D45134
llvm-svn: 328935
This will show more detail when using `llvm-pdbutil explain` on an
offset in the DBI or PDB streams. Specifically, it will dig into
individual header fields and substreams to give a more precise
description of what the byte represents.
llvm-svn: 328878
There are two FPMs in an MSF file, the idea being that for
incremental updates you can write to the alternate one and then
atomically swap them on commit. LLVM defaulted to using FPM1
on the first commit, but this differs from Microsoft's behavior
which is to default to using FPM2 on the first commit. To
eliminate some byte-level file differences, this patch changes
LLVM's default to also be FPM2.
Additionally, LLVM was trying to be "smart" about marking FPM
pages allocated. In addition to marking every page belonging
to the alternate FPM as unallocated, LLVM also marked pages at
the end of the main FPM which were not needed as unallocated.
In order to match the behavior of Microsoft-generated PDBs, we
now always mark every FPM block as allocated, regardless of
whether it is in the main FPM or the alt FPM, and regardless of
whether or not it describes blocks which are actually in the file.
This has the side benefit of simplifying our code.
llvm-svn: 328812
We should align the value of the field, not the overall section offset.
This distinction matters if one of the debug_names contributions is not
of size which is a multiple of four. The dwarf producers may choose to
emit rounded contributions, but they are not required to do so. In the
latter case, without this patch we would corrupt the parsing state, as
we would adjust the offset even if subsequent contributions contained
correctly rounded augmentation strings.
llvm-svn: 328796
Before this patch we were parsing the attributes as section offsets, as
that is what apple_names is doing. However, this is not correct as DWARF
v5 specifies that this attribute should use the Reference form class.
This also updates all the testcases (except the ones that deliberately
pass a different form) to use the correct form class.
llvm-svn: 328773
Before this change, using dumpProperties() with PDBSymbolData
would look like this:
get_locationType: 3
1
After this change:
get_locationType: 3
get_value: 1
llvm-svn: 328590
This was reverted several times due to what ultimately turned out
to be incompatibilities in our serialized hash table format.
Several changes went in prior to this to fix those issues since
they were more fundamental and independent of supporting injected
sources, so now that those are fixed this change should hopefully
pass.
llvm-svn: 328363
When investigating bugs in PDB generation, the first step is
often to do the same link with link.exe and then compare PDBs.
But comparing PDBs is hard because two completely different byte
sequences can both be correct, so it hampers the investigation when
you also have to spend time figuring out not just which bytes are
different, but also if the difference is meaningful.
This patch fixes a couple of cases related to string table emission,
hash table emission, and the order in which we emit strings that
makes more of our bytes the same as the bytes generated by MS PDBs.
Differential Revision: https://reviews.llvm.org/D44810
llvm-svn: 328348
NFC, this just renames some methods to better express what they
do, and also adds a few helper methods to add some symmetry to the
API in a few places (for example there was a getStringFromId but not
a getIdFromString method in the string table).
llvm-svn: 328221
Summary:
This commit adds checks of the abbreviation table in a DWARF v5 Name
Index. The most interesting/useful check is the one which checks that
each index attributes is encoded using the correct form class, but it
also checks for the more obvious errors like unknown
forms/tags/attributes and duplicated attributes.
Reviewers: JDevlieghere, aprantl, dblaikie
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44736
llvm-svn: 328202
To resolve symbol context at a particular address, we need to
determine the compiland for the address. We are able to determine
the parent compiland of PDBSymbolFunc, PDBSymbolTypeUDT,
PDBSymbolTypeEnum symbols indirectly through line information.
However no such information is availabile for PDBSymbolData,
i.e. variables.
The Section Contribution table from PDBs has information about
each compiland's contribution to sections by address. For example,
a piece of a contribution looks like,
VA RelativeVA Sect No. Offset Length Compiland
14000087B0 000087B0 0001 000077B0 000000BB exe_main.obj
So given an address, it's possible to determine its compiland with
this information.
llvm-svn: 328178
The hash table is a list of buckets, and the *value* stored in
the bucket cannot be 0 since that is reserved. However, the code
here was incorrectly skipping over the 0'th bucket entirely.
The 0'th bucket is perfectly fine, just none of these buckets
can contain the value 0.
As a result, whenever there was a string where hash(S) % Size
was equal to 0, we would write the value in the next bucket
instead. We never caught this in our tests due to *another*
bug, which is that we would iterate the entire list of buckets
looking for the value, only using the hash value as a starting
point. However, the real algorithm stops when it finds 0 in
a bucket since it takes that to mean "the item is not in the
hash table".
The unit test is updated to carefully construct a set of hash
values that will cause one item to hash to 0 mod bucket count,
and the reader is also updated to return an error indicating that
the item is not found when it encounters a 0 bucket.
llvm-svn: 328162
This is mostly just plumbing to get a DWARFDataExtractor where we
compute abbr_offset so we can use getRelocatedValue.
This is part of PR36793.
llvm-svn: 328154
Summary:
We have had at least three pieces of code (in DWARFAbbreviationDeclaration,
DWARFAcceleratorTable and DWARFDie) that have hand-rolled support for
dumping unknown dwarf enum values. While not terrible, they are a bit
distracting and enable small differences to creep in (Unknown_ffff vs.
Unknown_0xffff). I ended up needing to add a fourth place
(DWARFVerifier), so it seems it would be a good time to centralize.
This patch creates an alternative to the XXXString dumping functions in
the BinaryFormat library, which formats an unknown value as
DW_TYPE_unknown_1234, instead of just an empty string. It is based on
the formatv function, as that allows us to avoid materializing the
string for unknown values (and because this way I don't have to invent a
name for the new functions :P).
In this patch I add formatters for dwarf attributes, forms, tags, and
index attributes as these are the ones in use currently, but adding
other enums is straight-forward.
Reviewers: dblaikie, JDevlieghere, aprantl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44570
llvm-svn: 328090
This is still failing on a different bot this time due to some
issue related to hashing absolute paths. Reverting until I can
figure it out.
llvm-svn: 328014
The issue causing this to fail in certain configurations
should be fixed.
It was due to the fact that DIA apparently expects there to be
a null string at ID 1 in the string table. I'm not sure why this
is important but it seems to make a difference, so set it.
llvm-svn: 328002
Summary:
Redefine PDBSymbolCompiland::getSourceFileName() to return the filename (w/o directory) of the source file that is used to compile the compiland. This is because the result returned previously is ambiguous. It could be the filename, relative path or full path of the source file.
Move the implementation of SymbolFilePDB::GetSourceFileNameForPDBCompiland() into a new method PDBSymbolCompiland::getSourceFileFullPath().
Reviewers: zturner, rnk, llvm-commits
Reviewed By: zturner
Differential Revision: https://reviews.llvm.org/D44458
llvm-svn: 327910
Summary: This commit adds two methods to the PDBSymboFunc class used in parsing symbols. getLineNumbers() is used to determine a Function symbol's declaration and getCompilandId() is used to initialize the SymbolContext field sc.comp_unit.
Reviewers: zturner, rnk, llvm-commits
Reviewed By: zturner
Differential Revision: https://reviews.llvm.org/D44457
llvm-svn: 327909
Natvis is a debug language supported by Visual Studio for
specifying custom visualizers. The /NATVIS option is an
undocumented link.exe flag which will take a .natvis file
and "inject" it into the PDB. This way, you can ship the
debug visualizers for a program along with the PDB, which
is very useful for postmortem debugging.
This is implemented by adding a new "named stream" to the
PDB with a special name of /src/files/<natvis file name>
and simply copying the contents of the xml into this file.
Additionally, we need to emit a single stream named
/src/headerblock which contains a hash table of embedded
files to records describing them.
This patch adds this functionality, including the /NATVIS
option to lld-link.
Differential Revision: https://reviews.llvm.org/D44328
llvm-svn: 327895
Summary:
This patch adds more checks to the .debug_names validator. Specifically,
they check for:
- buckets claiming to be non-empty but pointing to mismatched hashes
(most consumers would interpret this as an empty bucket, but it
questionable whether the generator meant that)
- hashes that are not reachable from any bucket
- names with incorrect hashes
Together, these checks ensure that any name in the index can be reached
through the hash table using the regular lookup algorithm. We also warn
if we encounter a name index without a hash table.
Reviewers: JDevlieghere, aprantl, dblaikie
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44433
llvm-svn: 327699
There was some code that tried to calculate the number of 4-byte
words required to hold N bits, but it was instead computing the
number of bytes required to hold N bits. This was leading to
extraneous data being output into the hash table, which would
cause certain operations in DIA (the Microsoft PDB reader) to
fail.
llvm-svn: 327675
It previously only worked when the key and value types were
both 4 byte integers. We now have a use case for a non trivial
value type, so we need to extend it to support arbitrary value
types, which means templatizing it.
llvm-svn: 327647
Summary:
Some PDB symbols do not have a valid VA or RVA but have Addr by Section and Offset. For example, a variable in thread-local storage has the following properties:
get_addressOffset: 0
get_addressSection: 5
get_lexicalParentId: 2
get_name: g_tls
get_symIndexId: 12
get_typeId: 4
get_dataKind: 6
get_symTag: 7
get_locationType: 2
This change provides a new method to locate line numbers by Section and Offset from those symbols.
Reviewers: zturner, rnk, llvm-commits
Subscribers: asmith, JDevlieghere
Differential Revision: https://reviews.llvm.org/D44407
llvm-svn: 327601