Commit Graph

106 Commits

Author SHA1 Message Date
Zachary Turner ebf03f6c46 Refactor the PDB HashTable class.
It previously only worked when the key and value types were
both 4 byte integers.  We now have a use case for a non trivial
value type, so we need to extend it to support arbitrary value
types, which means templatizing it.

llvm-svn: 327647
2018-03-15 17:38:26 +00:00
Zachary Turner 679aeadda1 [PDB] Support dumping injected sources via the DIA reader.
Injected sources are basically a way to add actual source file content
to your PDB. Presumably you could use this for shipping your source code
with your debug information, but in practice I can only find this being
used for embedding natvis files inside of PDBs.

In order to effectively test LLVM's natvis file injection, we need a way
to dump the injected sources of a PDB in a way that is authoritative
(i.e. based on Microsoft's understanding of the PDB format, and not
LLVM's). To this end, I've added support for dumping injected sources
via DIA. I made a PDB file that used the /natvis option to generate a
test case.

Differential Revision: https://reviews.llvm.org/D44405

llvm-svn: 327428
2018-03-13 17:46:06 +00:00
Zachary Turner 49f8674c28 Fix a bug regarding a mis-identified file type in pdbutil.
llvm-svn: 326929
2018-03-07 19:12:36 +00:00
Aaron Smith a27b5e93a3 [llvm-pdbdump] Add guard for null pointers and remove unused code
Summary: This avoids crashing when a user tries to dump a pdb with the `-native` option.

Reviewers: zturner, llvm-commits, rnk

Reviewed By: zturner

Subscribers: mgrang

Differential Revision: https://reviews.llvm.org/D44117

llvm-svn: 326863
2018-03-07 02:23:08 +00:00
Aaron Smith 5ab08cfd23 [llvm-pdbdump] Dump restrict type qualifier
Reviewers: zturner, llvm-commits, rnk

Reviewed By: zturner

Subscribers: majnemer

Differential Revision: https://reviews.llvm.org/D43639

llvm-svn: 326731
2018-03-05 18:29:43 +00:00
Adrian McCarthy 4b1a89fa92 Fix llvm-pdbutil to handle new built-in types
Summary:
The built-in PDB types enum has been extended to include char16_t and char32_t.
llvm-pdbutil was hitting an llvm_unreachable because it didn't know about these
new values.  The new values are not yet in the DIA documentation, but are
listed in the cvconst.h header that comes as part of the DIA SDK.

Reviewers: asmith, zturner, rnk

Subscribers: stella.stamenova, llvm-commits, sanjoy

Differential Revision: https://reviews.llvm.org/D43646

llvm-svn: 325838
2018-02-22 23:16:56 +00:00
Zachary Turner cafd476836 Fix emission of PDB string table.
This was originally reported as a bug with the symptom being "cvdump
crashes when printing an LLD-linked PDB that has an S_FILESTATIC record
in it". After some additional investigation, I determined that this was
a symptom of a larger problem, and in fact the real problem was in the
way we emitted the global PDB string table. As evidence of this, you can
take any lld-generated PDB, run cvdump -stringtable on it, and it would
return no results.

My hypothesis was that cvdump could not *find* the string table to begin
with. Normally it would do this by looking in the "named stream map",
finding the string /names, and using its value as the stream index. If
this lookup fails, then cvdump would fail to load the string table.

To test this hypothesis, I looked at the name stream map generated by a
link.exe PDB, and I emitted exactly those bytes into an LLD-generated
PDB. Suddenly, cvdump could read our string table!

This code has always been hacky and we knew there was something we
didn't understand. After all, there were some comments to the effect of
"we have to emit strings in a specific order, otherwise things don't
work". The key to fixing this was finally understanding this.

The way it works is that it makes use of a generic serializable hash map
that maps integers to other integers. In this case, the "key" is the
offset into a buffer, and the value is the stream number. If you index
into the buffer at the offset specified by a given key, you find the
name. The underlying cause of all these problems is that we were using
the identity function for the hash. i.e. if a string's offset in the
buffer was 12, the hash value was 12. Instead, we need to hash the
string *at that offset*. There is an additional catch, in that we have
to compute the hash as a uint32 and then truncate it to uint16.

Making this work is a little bit annoying, because we use the same hash
table in other places as well, and normally just using the identity
function for the hash function is actually what's desired. I'm not
totally happy with the template goo I came up with, but it works in any
case.

The reason we never found this bug through our own testing is because we
were building a /parallel/ hash table (in the form of an
llvm::StringMap<>) and doing all of our lookups and "real" hash table
work against that. I deleted all of that code and now everything goes
through the real hash table. Then, to test it, I added a unit test which
adds 7 strings and queries the associated values. I test every possible
insertion order permutation of these 7 strings, to verify that it really
does work as expected.

Differential Revision: https://reviews.llvm.org/D43326

llvm-svn: 325386
2018-02-16 20:46:04 +00:00
Simon Pilgrim e01b58f0ed Fix MSVC "not all control paths return a value" warning.
llvm-svn: 322719
2018-01-17 18:16:28 +00:00
Aaron Smith 620a7f765d Fix build error - 'default label in switch which covers all enumeration values'
llvm-svn: 322610
2018-01-17 01:49:01 +00:00
Aaron Smith 53a1a1616c Fix pretty printing the unspecified param of a variadic function
Summary:
 - Fix a bug in PrettyBuiltinDumper that returns "void" as the name for
  an unspecified builtin type. Since the unspecified param of a variadic
  function is considered a builtin of unspecified type in PDBs, we set
  "..." for its name.

  - Provide a method to determine if a PDBSymbolFunc is variadic in
  PrettyFunctionDumper since PDBSymbolFunc::getArgument() doesn't return the
  last unspecified-type param.

  - Add a pretty-func-dumper.test to test pretty dumping of variadic
  functions.

Reviewers: zturner, llvm-commits

Reviewed By: zturner

Differential Revision: https://reviews.llvm.org/D41801

llvm-svn: 322608
2018-01-17 01:22:03 +00:00
Zachary Turner 6047858270 [PDB] Correctly link S_FILESTATIC records.
This is not a record type that clang currently generates,
but it is a record that is encountered in object files generated
by cl.  This record is unusual in that it refers directly to
the string table instead of indirectly to the string table via
the FileChecksums table.  Because of this, it was previously
overlooked and we weren't remapping the string indices at all.
This would lead to crashes in MSVC when trying to display a
variable whose debug info involved an S_FILESTATIC.

Original bug report by Alexander Ganea

Differential Revision: https://reviews.llvm.org/D41718

llvm-svn: 321883
2018-01-05 19:12:40 +00:00
Zachary Turner a1eb9432b1 Don't crash in llvm-pdbutil when dumping TypeIndexes with high bit set.
This is a special code that indicates that it's a function id.
While I'm still not certain how to interpret these, we definitely
should *not* be using these values as indices into an array directly.
For now, when we encounter one of these, just print the numeric value.

llvm-svn: 320775
2017-12-15 00:27:49 +00:00
Michael Zolotukhin 67b04bd8ac Recover some overzealously removed includes.
llvm-svn: 320648
2017-12-13 22:21:02 +00:00
Michael Zolotukhin 62602a476a Remove redundant includes from tools.
llvm-svn: 320631
2017-12-13 21:31:10 +00:00
Zachary Turner 2ed069e63d Fix error in llvm-pdbutil.
A recent change made this print the wrong value, breaking some
tests.  This is now fixed.

llvm-svn: 319862
2017-12-06 00:26:43 +00:00
Zachary Turner 376d437776 Teach llvm-pdbutil to dump types from object files.
llvm-svn: 319859
2017-12-05 23:58:18 +00:00
Zachary Turner ca6dbf1440 Split TypeTableBuilder into two classes.
llvm-svn: 319456
2017-11-30 18:39:50 +00:00
Zachary Turner 3e3936da93 Make TypeTableBuilder inherit from TypeCollection.
A couple of places in LLD were passing references to
TypeTableCollections around, which makes it hard to change the
implementation at runtime.  However, these cases only needed to
iterate over the types in the collection, and TypeCollection
already provides a handy abstract interface for this purpose.

By implementing this interface, we can get rid of the need to
pass TypeTableBuilder references around, which should allow us
to swap the implementation at runtime in subsequent patches.

llvm-svn: 319345
2017-11-29 19:35:21 +00:00
Zachary Turner 85082013e6 Fix line endings in llvm-pdbutil.cpp
llvm-svn: 319340
2017-11-29 19:29:25 +00:00
Zachary Turner 6900de1dfb [CodeView] Refactor / Rewrite TypeSerializer and TypeTableBuilder.
The motivation behind this patch is that future directions require us to
be able to compute the hash value of records independently of actually
using them for de-duplication.

The current structure of TypeSerializer / TypeTableBuilder being a
single entry point that takes an unserialized type record, and then
hashes and de-duplicates it is not flexible enough to allow this.

At the same time, the existing TypeSerializer is already extremely
complex for this very reason -- it tries to be too many things. In
addition to serializing, hashing, and de-duplicating, ti also supports
splitting up field list records and adding continuations. All of this
functionality crammed into this one class makes it very complicated to
work with and hard to maintain.

To solve all of these problems, I've re-written everything from scratch
and split the functionality into separate pieces that can easily be
reused. The end result is that one class TypeSerializer is turned into 3
new classes SimpleTypeSerializer, ContinuationRecordBuilder, and
TypeTableBuilder, each of which in isolation is simple and
straightforward.

A quick summary of these new classes and their responsibilities are:

- SimpleTypeSerializer : Turns a non-FieldList leaf type into a series of
  bytes. Does not do any hashing. Every time you call it, it will
  re-serialize and return bytes again. The same instance can be re-used
  over and over to avoid re-allocations, and in exchange for this
  optimization the bytes returned by the serializer only live until the
  caller attempts to serialize a new record.

- ContinuationRecordBuilder : Turns a FieldList-like record into a series
  of fragments. Does not do any hashing. Like SimpleTypeSerializer,
  returns references to privately owned bytes, so the storage is
  invalidated as soon as the caller tries to re-use the instance. Works
  equally well for LF_FIELDLIST as it does for LF_METHODLIST, solving a
  long-standing theoretical limitation of the previous implementation.

- TypeTableBuilder : Accepts sequences of bytes that the user has already
  serialized, and inserts them by de-duplicating with a hash table. For
  the sake of convenience and efficiency, this class internally stores a
  SimpleTypeSerializer so that it can accept unserialized records. The
  same is not true of ContinuationRecordBuilder. The user is required to
  create their own instance of ContinuationRecordBuilder.

Differential Revision: https://reviews.llvm.org/D40518

llvm-svn: 319198
2017-11-28 18:33:17 +00:00
Aaron Ballman ecf0e95267 Add llvm::for_each as a range-based extensions to <algorithm> and make use of it in some cases where it is a more clear alternative to std::for_each.
llvm-svn: 317356
2017-11-03 20:01:25 +00:00
Reid Kleckner 145090f124 [PDB] Handle an empty globals hash table with no buckets
llvm-svn: 316722
2017-10-27 00:45:51 +00:00
Reid Kleckner 8aa32ffbad [codeview] Fix handling of S_HEAPALLOCSITE
The type index is from the TPI stream, not the IPI stream. Fix the
dumper, fix type index discovery, and add a test in LLD.

Also improve the log message we emit when we fail to rewrite type
indices in LLD. That's how I found this bug.

llvm-svn: 316461
2017-10-24 17:02:40 +00:00
Hans Wennborg dc8d6f2527 Fix -Wcovered-switch-default warnings from r314821
llvm-svn: 314826
2017-10-03 18:44:12 +00:00
Hans Wennborg 660531085a CodeView: Provide a .def file with the register ids
The list of register ids was previously written out in a couple of dirrent
places. This puts it in a .def file and also adds a few more registers (e.g.
the x87 regs) which should lead to more readable dumps, but I didn't include
the whole list since that seems unnecessary.

X86_MC::initLLVMToSEHAndCVRegMapping is pretty ugly, but at least it's not
relying on magic constants anymore. The TODO of using tablegen still stands.

Differential revision: https://reviews.llvm.org/D38480

llvm-svn: 314821
2017-10-03 18:27:22 +00:00
Peter Collingbourne 9e26e97955 COFF: PDB: Allow multiple modules with the same name.
It is possible for two modules to have the same name if they are
archive members with the same name, or if we are doing LTO (in which
case all modules will have the name "lto.tmp").

Differential Revision: https://reviews.llvm.org/D37589

llvm-svn: 312744
2017-09-07 20:39:46 +00:00
Zachary Turner e31b9dcf91 [llvm-pdbutil] Remove unused variables.
llvm-svn: 312395
2017-09-02 00:09:43 +00:00
Zachary Turner 41f0706401 Fix broken test.
llvm-svn: 312359
2017-09-01 20:17:20 +00:00
Zachary Turner abb17cc084 [llvm-pdbutil] Support dumping CodeView from object files.
We have llvm-readobj for dumping CodeView from object files, and
llvm-pdbutil has always been more focused on PDB.  However,
llvm-pdbutil has a lot of useful options for summarizing debug
information in aggregate and presenting high level statistical
views.  Furthermore, it's arguably better as a testing tool since
we don't have to write tests to conform to a state-machine like
structure where you match multiple lines in succession, each
depending on a previous match.  llvm-pdbutil dumps much more
concisely, so it's possible to use single-line matches in many
cases where as with readobj tests you have to use multi-line
matches with an implicit state machine.

Because of this, I'm adding object file support to llvm-pdbutil.
In fact, this mirrors the cvdump tool from Microsoft, which also
supports both object files and pdb files.  In the future we could
perhaps rename this tool llvm-cvutil.

In the meantime, this allows us to deep dive into object files
the same way we already can with PDB files.

llvm-svn: 312358
2017-09-01 20:06:56 +00:00
Zachary Turner 4c80661368 Fix some size_t / uint32_t mismatched comparisons.
llvm-svn: 312278
2017-08-31 20:50:25 +00:00
Zachary Turner 99c6982bcd [llvm-pdbutil] Print detailed S_UDT stats.
This adds a new command line option, -udt-stats, which breaks
down the stats of S_UDT records.  These are one of the biggest
contributors to the size of /DEBUG:FASTLINK PDBs, so they need
some additional tools to be able to analyze their usage.  This
option will dig into each S_UDT record and determine what kind
of record it points to, and then break down the statistics by
the target type.  The goal here is to identify how our object
files differ from MSVC object files in S_UDT records, so that
we can output fewer of them and reach size parity.

llvm-svn: 312276
2017-08-31 20:43:22 +00:00
George Karpenkov 218ea7f69c Remove llvm-pdbutil/fuzzer.
The code does not compile, is not maintained, and does not have a buildbot.

Differential Revision: https://reviews.llvm.org/D37032

llvm-svn: 311512
2017-08-23 00:02:10 +00:00
Zachary Turner d1de2f4f5e [llvm-pdbutil] Add support for dumping detailed module stats.
This adds support for dumping a summary of module symbols
and CodeView debug chunks.  This option prints a table for
each module of all of the symbols that occurred in the module
and the number of times it occurred and total byte size.  Then
at the end it prints the totals for the entire file.

Additionally, this patch adds the -jmc (just my code) option,
which suppresses modules which are from external libraries or
linker imports, so that you can focus only on the object files
and libraries that originate from your own source code.

llvm-svn: 311338
2017-08-21 14:53:25 +00:00
Victor Leschuk 091da14423 Remove useless default case in switch
llvm-svn: 311149
2017-08-18 09:02:06 +00:00
Zachary Turner 4c432b202f Fix warning about covered switch default.
llvm-svn: 311129
2017-08-17 22:20:15 +00:00
Zachary Turner 96bcd6a37a [llvm-pdbutil] Fix some dumping issues.
When dumping, we were treating the S_INLINESITESYM as referring
to a type record, when it actually refers to an id record.  We
had this correct in TypeIndexDiscovery, so our merging algorithm
should be fine, but we had it wrong in the dumper, which means it
would appear to work most of the time, unless the index was out
of bounds in the type stream, when it would fail.  Fixed this, and
audited a few other cases to make them match the behavior in
TypeIndexDiscovery.

Also, I've now observed a new symbol record with kind 0x1168 which
I have no clue what it is, so to avoid crashing we have to just
print "Unknown Symbol Kind".

llvm-svn: 311117
2017-08-17 20:04:51 +00:00
Zachary Turner f401e1102d Fix a few minor issues when dumping symbols.
1) We weren't handling symbol types that weren't able to parse,
   even if we knew what the leaf type was.  This was triggering
   when trying to dump /DEBUG:FASTLINK PDBs, where we expect a
   certain symbol to show up, but we just don't know how to parse
   it.
2) We lost the code for dumping record bytes, so this was added
   back.

llvm-svn: 311116
2017-08-17 20:04:31 +00:00
Zachary Turner 28e31ee45e Output S_SECTION symbols to the Linker module.
PDBs need to contain 1 module for each object file/compiland,
and a special one synthesized by the linker.  This one contains
a symbol record for each output section in the executable with
its address information.  This patch adds such symbols to the
linker module.  Note that we also are supposed to add an
S_COFFGROUP symbol for what appears to be each input section that
contributes to each output section, but it's not entirely clear
how to generate these yet, so I'm leaving that for a separate
patch.

llvm-svn: 310754
2017-08-11 20:46:28 +00:00
Zachary Turner 5448dabbdd [PDB] Fix an issue writing the publics stream.
In the refactor to merge the publics and globals stream, a bug
was introduced that wrote the wrong value for one of the fields
of the PublicsStreamHeader.  This caused debugging in WinDbg
to break.

We had no way of dumping any of these fields, so in addition to
fixing the bug I've added dumping support for them along with a
test that verifies the correct value is written.

llvm-svn: 310439
2017-08-09 04:23:59 +00:00
Zachary Turner 59e3ae827d [PDB] Fix linking of function symbols and local variables.
The compiler outputs PROC32_ID symbols into the object files
for functions, and these symbols have an embedded type index
which, when copied to the PDB, refer to the IPI stream.  However,
the symbols themselves are also converted into regular symbols
(e.g. S_GPROC32_ID -> S_GPROC32), and type indices in the regular
symbol records refer to the TPI stream.  So this patch applies
two fixes to function records.
  1. It converts ID symbols to the proper non-ID record type.
  2. After remapping the type index from the object file's index
     space to the PDB file/IPI stream's index space, it then
     remaps that index to the TPI stream's index space by.

Besides functions, during the remapping process we were also
discarding symbol record types which we did not recognize.
In particular, we were discarding S_BPREL32 records, which is
what MSVC uses to describe local variables on the stack.  So
this patch fixes that as well by copying them to the PDB.

Differential Revision: https://reviews.llvm.org/D36426

llvm-svn: 310394
2017-08-08 18:34:44 +00:00
Zachary Turner 489a7a09ad [llvm-pdbutil] Don't crash when a section contrib's isect is invalid.
llvm-svn: 310298
2017-08-07 20:24:01 +00:00
Adrian McCarthy b41f03e768 Enable llvm-pdbutil to list enumerations using native PDB reader
This extends the native reader to enable llvm-pdbutil to list the enums in a
PDB and it includes a simple test. It does not yet list the values in the
enumerations, which requires an actual implementation of
NativeEnumSymbol::FindChildren.

To exercise this code, use a command like:

    llvm-pdbutil pretty -native -enums foo.pdb

Differential Revision: https://reviews.llvm.org/D35738

llvm-svn: 310144
2017-08-04 22:37:58 +00:00
Zachary Turner 92e584cb48 [pdbutil] When dumping section contribs, show the section name.
llvm-svn: 310128
2017-08-04 21:10:04 +00:00
Zachary Turner fb1cd5090c [llvm-pdbutil] Dump image section headers.
Image section headers are stored in the DBI stream, but we
had no way to dump them.  This patch adds dumping support,
along with some tests that LLD actually dumps them correctly.

Differential Revision: https://reviews.llvm.org/D36332

llvm-svn: 310107
2017-08-04 20:02:38 +00:00
Zachary Turner 5869936844 [llvm-pdbutil] Add an option to only dump specific module indices.
Often something interesting (like a symbol) is in a particular
module, and you don't want to dump symbols from all other 300
modules to see the one you want.  This adds a -modi option so that
we only dump the specified module.

llvm-svn: 310000
2017-08-03 23:11:52 +00:00
Zachary Turner 33eee1911a [llvm-pdbutil] Allow diff to force module equivalencies.
Sometimes the normal module equivalence detection algorithm doesn't
quite work.  For example, you might build the same program with
MSVC and clang-cl, outputting to different object files, exes, and
PDBs, then compare them.  If the object files have different names
though, then they won't be treated as equivalent.  This way we
can force specific module indices to be treated as equivalent.

llvm-svn: 309983
2017-08-03 20:30:09 +00:00
Zachary Turner c3d8eec9e9 [pdbutil] Add a command to dump the FPM.
Recently problems have been discovered in the way we write the FPM
(free page map).  In order to fix this, we first need to establish
a baseline about what a correct FPM looks like using an MSVC
generated PDB, so that we can then make our own generated PDBs
match.  And in order to do this, the dumper needs a mode where it
can dump an FPM so that we can write tests for it.

This patch adds a command to dump the FPM, as well as a test against
a known-good PDB.

llvm-svn: 309894
2017-08-02 22:25:52 +00:00
Reid Kleckner dd853e5406 [llvm-pdbutil] Clean up ExitOnError usage to add ": " to our errors
The banner parameter is supposed to end in a separator, like ": ".
Otherwise, we get ugly errors like:

Error while reading publics streamNative error: blah blah

llvm-svn: 309332
2017-07-27 23:13:18 +00:00
Reid Kleckner 14d90fd05c [PDB] Improve GSI hash table dumping for publics and globals
The PDB "symbol stream" actually contains symbol records for the publics
and the globals stream. The globals and publics streams are essentially
hash tables that point into a single stream of records. In order to
match cvdump's behavior, we need to only dump symbol records referenced
from the hash table. This patch implements that, and then implements
global stream dumping, since it's just a subset of public stream
dumping.

Now we shouldn't see S_PROCREF or S_GDATA32 records when dumping
publics, and instead we should see those record in the globals stream.

llvm-svn: 309066
2017-07-26 00:40:36 +00:00
Reid Kleckner 898ddf61c0 [codeview] Emit 'D' as the cv source language for D code
This matches DMD:
522263965c/src/ddmd/backend/cv8.c (L199)

Fixes PR33899.

llvm-svn: 308890
2017-07-24 16:16:42 +00:00