Commit Graph

62 Commits

Author SHA1 Message Date
Zachary Turner 457cc34e48 [llvm-pdbutil] Dump more info about globals.
We add an option to dump the entire global / public symbol record
stream.  Previously we would dump globals or publics, but not both.
And when we did dump them, we would always dump them in the order
they were referenced by the corresponding hash streams, not in
the order they were serialized in.  This patch adds a lower level
mode that just dumps the whole stream in serialization order.

Additionally, when dumping global-extras, we now dump the hash
bitmap as well as the record offset instead of dumping all zeros
for the offsets.

llvm-svn: 336407
2018-07-06 02:59:25 +00:00
Rui Ueyama 197194b6c9 Define InitLLVM to do common initialization all at once.
We have a few functions that virtually all command wants to run on
process startup/shutdown. This patch adds InitLLVM class to do that
all at once, so that we don't need to copy-n-paste boilerplate code
to each llvm command's main() function.

Differential Revision: https://reviews.llvm.org/D45602

llvm-svn: 330046
2018-04-13 18:26:06 +00:00
Zachary Turner 15b2bdfd8b [llvm-pdbutil] Add the ability to explain binary files.
Using this, you can use llvm-pdbutil to export the contents of a
stream to a binary file, then run explain on the binary file so
that it treats the offset as an offset into the stream instead
of an offset into a file.  This makes it easy to compare the
contents of the same stream from two different files.

llvm-svn: 329207
2018-04-04 17:29:09 +00:00
Zachary Turner d11328a1bb [llvm-pdbutil] Add an export subcommand.
This command can dump the binary contents of a stream to a file.
This is useful when you want to do side-by-side comparisons of
a specific stream from two PDBs to examine the differences between
them.  You can export both of them to a file, then open them up
side by side in a hex editor (for example), so as to eliminate any
differences that might arise from the contents being on different
blocks in the PDB.

In subsequent patches I plan to improve the "explain" subcommand
so that you can explain the contents of a binary file that isn't
necessarily a full PDB, but one of these dumped streams, by telling
the subcommand how to interpret the contents.

llvm-svn: 329002
2018-04-02 18:35:21 +00:00
Mandeep Singh Grang 8db564e033 [tools] Change std::sort to llvm::sort in response to r327219
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.

To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.

Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.

Reviewers: JDevlieghere, zturner, echristo, dberris, friss

Reviewed By: echristo

Subscribers: gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D45141

llvm-svn: 328943
2018-04-01 21:24:53 +00:00
Zachary Turner d5cf5cf637 [llvm-pdbutil] Dig deeper into the PDB and DBI streams when explaining.
This will show more detail when using `llvm-pdbutil explain` on an
offset in the DBI or PDB streams.  Specifically, it will dig into
individual header fields and substreams to give a more precise
description of what the byte represents.

llvm-svn: 328878
2018-03-30 17:16:50 +00:00
Zachary Turner ea40f40e1b [PDB] Add an explain subcommand.
When investigating various things, we often have a file offset
and what to know what's in the PDB at that address.  For example
we may be doing a binary comparison of two LLD-generated PDBs
to look for sources of non-determinism, or we may wish to compare
an LLD-generated PDB with a Microsoft generated PDB for sources
of byte-for-byte incompatibility.  In these cases, we can do a
binary diff of the two files, and once we find a mismatched byte
we can use explain to figure out what that byte is, immediately
honining in on the problem.

This patch implements this by trying to narrow the meaning of
a particular file offset down as much as possible.

Differential Revision: https://reviews.llvm.org/D44959

llvm-svn: 328799
2018-03-29 16:28:20 +00:00
Zachary Turner 7b84b678a9 Delete pdbutil diff mode.
This has been made obsolete by the fact that almost all of the
things it previously checked for are no longer relevant since
we can just compare bytes in a lot of places.

llvm-svn: 328562
2018-03-26 18:01:07 +00:00
Zachary Turner a6fb536e5b [PDB] Make our PDBs look more like MS PDBs.
When investigating bugs in PDB generation, the first step is
often to do the same link with link.exe and then compare PDBs.

But comparing PDBs is hard because two completely different byte
sequences can both be correct, so it hampers the investigation when
you also have to spend time figuring out not just which bytes are
different, but also if the difference is meaningful.

This patch fixes a couple of cases related to string table emission,
hash table emission, and the order in which we emit strings that
makes more of our bytes the same as the bytes generated by MS PDBs.

Differential Revision: https://reviews.llvm.org/D44810

llvm-svn: 328348
2018-03-23 18:43:39 +00:00
Zachary Turner fced530650 Revert "Resubmit "Support embedding natvis files in PDBs.""
This is still failing on a different bot this time due to some
issue related to hashing absolute paths.  Reverting until I can
figure it out.

llvm-svn: 328014
2018-03-20 18:37:03 +00:00
Zachary Turner 132d7a134f Resubmit "Support embedding natvis files in PDBs."
The issue causing this to fail in certain configurations
should be fixed.

It was due to the fact that DIA apparently expects there to be
a null string at ID 1 in the string table.  I'm not sure why this
is important but it seems to make a difference, so set it.

llvm-svn: 328002
2018-03-20 17:06:39 +00:00
Zachary Turner a21558897b Revert "Support embedding natvis files in PDBs."
This is causing a test failure on a certain bot, so I'm removing
this temporarily until we can figure out the source of the error.

llvm-svn: 327903
2018-03-19 20:41:59 +00:00
Zachary Turner de53aaf132 Support embedding natvis files in PDBs.
Natvis is a debug language supported by Visual Studio for
specifying custom visualizers.  The /NATVIS option is an
undocumented link.exe flag which will take a .natvis file
and "inject" it into the PDB.  This way, you can ship the
debug visualizers for a program along with the PDB, which
is very useful for postmortem debugging.

This is implemented by adding a new "named stream" to the
PDB with a special name of /src/files/<natvis file name>
and simply copying the contents of the xml into this file.

Additionally, we need to emit a single stream named
/src/headerblock which contains a hash table of embedded
files to records describing them.

This patch adds this functionality, including the /NATVIS
option to lld-link.

Differential Revision: https://reviews.llvm.org/D44328

llvm-svn: 327895
2018-03-19 19:53:51 +00:00
Zachary Turner 679aeadda1 [PDB] Support dumping injected sources via the DIA reader.
Injected sources are basically a way to add actual source file content
to your PDB. Presumably you could use this for shipping your source code
with your debug information, but in practice I can only find this being
used for embedding natvis files inside of PDBs.

In order to effectively test LLVM's natvis file injection, we need a way
to dump the injected sources of a PDB in a way that is authoritative
(i.e. based on Microsoft's understanding of the PDB format, and not
LLVM's). To this end, I've added support for dumping injected sources
via DIA. I made a PDB file that used the /natvis option to generate a
test case.

Differential Revision: https://reviews.llvm.org/D44405

llvm-svn: 327428
2018-03-13 17:46:06 +00:00
Aaron Smith a27b5e93a3 [llvm-pdbdump] Add guard for null pointers and remove unused code
Summary: This avoids crashing when a user tries to dump a pdb with the `-native` option.

Reviewers: zturner, llvm-commits, rnk

Reviewed By: zturner

Subscribers: mgrang

Differential Revision: https://reviews.llvm.org/D44117

llvm-svn: 326863
2018-03-07 02:23:08 +00:00
Michael Zolotukhin 62602a476a Remove redundant includes from tools.
llvm-svn: 320631
2017-12-13 21:31:10 +00:00
Zachary Turner ca6dbf1440 Split TypeTableBuilder into two classes.
llvm-svn: 319456
2017-11-30 18:39:50 +00:00
Zachary Turner 3e3936da93 Make TypeTableBuilder inherit from TypeCollection.
A couple of places in LLD were passing references to
TypeTableCollections around, which makes it hard to change the
implementation at runtime.  However, these cases only needed to
iterate over the types in the collection, and TypeCollection
already provides a handy abstract interface for this purpose.

By implementing this interface, we can get rid of the need to
pass TypeTableBuilder references around, which should allow us
to swap the implementation at runtime in subsequent patches.

llvm-svn: 319345
2017-11-29 19:35:21 +00:00
Zachary Turner 85082013e6 Fix line endings in llvm-pdbutil.cpp
llvm-svn: 319340
2017-11-29 19:29:25 +00:00
Zachary Turner 6900de1dfb [CodeView] Refactor / Rewrite TypeSerializer and TypeTableBuilder.
The motivation behind this patch is that future directions require us to
be able to compute the hash value of records independently of actually
using them for de-duplication.

The current structure of TypeSerializer / TypeTableBuilder being a
single entry point that takes an unserialized type record, and then
hashes and de-duplicates it is not flexible enough to allow this.

At the same time, the existing TypeSerializer is already extremely
complex for this very reason -- it tries to be too many things. In
addition to serializing, hashing, and de-duplicating, ti also supports
splitting up field list records and adding continuations. All of this
functionality crammed into this one class makes it very complicated to
work with and hard to maintain.

To solve all of these problems, I've re-written everything from scratch
and split the functionality into separate pieces that can easily be
reused. The end result is that one class TypeSerializer is turned into 3
new classes SimpleTypeSerializer, ContinuationRecordBuilder, and
TypeTableBuilder, each of which in isolation is simple and
straightforward.

A quick summary of these new classes and their responsibilities are:

- SimpleTypeSerializer : Turns a non-FieldList leaf type into a series of
  bytes. Does not do any hashing. Every time you call it, it will
  re-serialize and return bytes again. The same instance can be re-used
  over and over to avoid re-allocations, and in exchange for this
  optimization the bytes returned by the serializer only live until the
  caller attempts to serialize a new record.

- ContinuationRecordBuilder : Turns a FieldList-like record into a series
  of fragments. Does not do any hashing. Like SimpleTypeSerializer,
  returns references to privately owned bytes, so the storage is
  invalidated as soon as the caller tries to re-use the instance. Works
  equally well for LF_FIELDLIST as it does for LF_METHODLIST, solving a
  long-standing theoretical limitation of the previous implementation.

- TypeTableBuilder : Accepts sequences of bytes that the user has already
  serialized, and inserts them by de-duplicating with a hash table. For
  the sake of convenience and efficiency, this class internally stores a
  SimpleTypeSerializer so that it can accept unserialized records. The
  same is not true of ContinuationRecordBuilder. The user is required to
  create their own instance of ContinuationRecordBuilder.

Differential Revision: https://reviews.llvm.org/D40518

llvm-svn: 319198
2017-11-28 18:33:17 +00:00
Aaron Ballman ecf0e95267 Add llvm::for_each as a range-based extensions to <algorithm> and make use of it in some cases where it is a more clear alternative to std::for_each.
llvm-svn: 317356
2017-11-03 20:01:25 +00:00
Peter Collingbourne 9e26e97955 COFF: PDB: Allow multiple modules with the same name.
It is possible for two modules to have the same name if they are
archive members with the same name, or if we are doing LTO (in which
case all modules will have the name "lto.tmp").

Differential Revision: https://reviews.llvm.org/D37589

llvm-svn: 312744
2017-09-07 20:39:46 +00:00
Zachary Turner abb17cc084 [llvm-pdbutil] Support dumping CodeView from object files.
We have llvm-readobj for dumping CodeView from object files, and
llvm-pdbutil has always been more focused on PDB.  However,
llvm-pdbutil has a lot of useful options for summarizing debug
information in aggregate and presenting high level statistical
views.  Furthermore, it's arguably better as a testing tool since
we don't have to write tests to conform to a state-machine like
structure where you match multiple lines in succession, each
depending on a previous match.  llvm-pdbutil dumps much more
concisely, so it's possible to use single-line matches in many
cases where as with readobj tests you have to use multi-line
matches with an implicit state machine.

Because of this, I'm adding object file support to llvm-pdbutil.
In fact, this mirrors the cvdump tool from Microsoft, which also
supports both object files and pdb files.  In the future we could
perhaps rename this tool llvm-cvutil.

In the meantime, this allows us to deep dive into object files
the same way we already can with PDB files.

llvm-svn: 312358
2017-09-01 20:06:56 +00:00
Zachary Turner 99c6982bcd [llvm-pdbutil] Print detailed S_UDT stats.
This adds a new command line option, -udt-stats, which breaks
down the stats of S_UDT records.  These are one of the biggest
contributors to the size of /DEBUG:FASTLINK PDBs, so they need
some additional tools to be able to analyze their usage.  This
option will dig into each S_UDT record and determine what kind
of record it points to, and then break down the statistics by
the target type.  The goal here is to identify how our object
files differ from MSVC object files in S_UDT records, so that
we can output fewer of them and reach size parity.

llvm-svn: 312276
2017-08-31 20:43:22 +00:00
Zachary Turner d1de2f4f5e [llvm-pdbutil] Add support for dumping detailed module stats.
This adds support for dumping a summary of module symbols
and CodeView debug chunks.  This option prints a table for
each module of all of the symbols that occurred in the module
and the number of times it occurred and total byte size.  Then
at the end it prints the totals for the entire file.

Additionally, this patch adds the -jmc (just my code) option,
which suppresses modules which are from external libraries or
linker imports, so that you can focus only on the object files
and libraries that originate from your own source code.

llvm-svn: 311338
2017-08-21 14:53:25 +00:00
Zachary Turner fb1cd5090c [llvm-pdbutil] Dump image section headers.
Image section headers are stored in the DBI stream, but we
had no way to dump them.  This patch adds dumping support,
along with some tests that LLD actually dumps them correctly.

Differential Revision: https://reviews.llvm.org/D36332

llvm-svn: 310107
2017-08-04 20:02:38 +00:00
Zachary Turner 5869936844 [llvm-pdbutil] Add an option to only dump specific module indices.
Often something interesting (like a symbol) is in a particular
module, and you don't want to dump symbols from all other 300
modules to see the one you want.  This adds a -modi option so that
we only dump the specified module.

llvm-svn: 310000
2017-08-03 23:11:52 +00:00
Zachary Turner 33eee1911a [llvm-pdbutil] Allow diff to force module equivalencies.
Sometimes the normal module equivalence detection algorithm doesn't
quite work.  For example, you might build the same program with
MSVC and clang-cl, outputting to different object files, exes, and
PDBs, then compare them.  If the object files have different names
though, then they won't be treated as equivalent.  This way we
can force specific module indices to be treated as equivalent.

llvm-svn: 309983
2017-08-03 20:30:09 +00:00
Zachary Turner c3d8eec9e9 [pdbutil] Add a command to dump the FPM.
Recently problems have been discovered in the way we write the FPM
(free page map).  In order to fix this, we first need to establish
a baseline about what a correct FPM looks like using an MSVC
generated PDB, so that we can then make our own generated PDBs
match.  And in order to do this, the dumper needs a mode where it
can dump an FPM so that we can write tests for it.

This patch adds a command to dump the FPM, as well as a test against
a known-good PDB.

llvm-svn: 309894
2017-08-02 22:25:52 +00:00
Reid Kleckner 14d90fd05c [PDB] Improve GSI hash table dumping for publics and globals
The PDB "symbol stream" actually contains symbol records for the publics
and the globals stream. The globals and publics streams are essentially
hash tables that point into a single stream of records. In order to
match cvdump's behavior, we need to only dump symbol records referenced
from the hash table. This patch implements that, and then implements
global stream dumping, since it's just a subset of public stream
dumping.

Now we shouldn't see S_PROCREF or S_GDATA32 records when dumping
publics, and instead we should see those record in the globals stream.

llvm-svn: 309066
2017-07-26 00:40:36 +00:00
Reid Kleckner 686f121a5d [PDB] Dump extra info about the publics stream
This includes the hash table, the address map, and the thunk table and
section offset table. The last two are only used for incremental
linking, which LLD doesn't support, so they are less interesting. The
hash table is particularly important to get right, since this is the one
of the streams that debuggers use to translate addresses to symbols.

llvm-svn: 308764
2017-07-21 18:28:55 +00:00
Reid Kleckner a842cd75e2 [codeview] Remove TypeServerHandler and PDBTypeServerHandler
Summary:
Instead of wiring these through the CVTypeVisitor interface, clients
should inspect the CVTypeArray before visiting it and potentially load
up the type server's TPI stream if they need it.

No tests relied on this functionality because LLD was the only client.

Reviewers: ruiu

Subscribers: mgorny, hiraditya, zturner, llvm-commits

Differential Revision: https://reviews.llvm.org/D35394

llvm-svn: 308212
2017-07-17 20:28:06 +00:00
Zachary Turner a9d944fd6f Resubmit "Add pdb-diff test."
This was originally reverted because of two issues.
  1) Printing ANSI color escape codes even when outputting to
     a file
  2) Module name comparisons were failing when comparing a PDB
     generated on one machine to a PDB generated on another
     machine.

I attempted to fix #2 by adding command line options which let
you specify prefixes to strip from the beginning of embedded
paths, which effectively lets us specify a path to "base" each
PDB from and only compare the parts under the base.  But this is
tricky because PDB paths always use Windows path syntax, even
when they are created on non-Windows hosts.  A problem still
existed when constructing the prefix to strip, where we were
accidentally using a host-specific path separator instead of
a Windows path separator.

This resubmission fixes the issue on Linux (and I have verified
that the test now passes on Linux).

llvm-svn: 307571
2017-07-10 19:16:49 +00:00
Zachary Turner ba3836bc78 Revert "Build fixes for pdb-diff test."
This reverts commit 180af3fdbdb17ec35b45ec1f925fd743b28d37e1.

This is still breaking due to linux-specific path differences.

llvm-svn: 307559
2017-07-10 17:32:47 +00:00
Zachary Turner 6da7a3058e Fix pdb-diff test.
A test was checked in on Friday that worked by checking in an
object file and PDB generated locally by MSVC, and then having
the test run lld-link on the object file and diffing LLD's PDB
against the checked in PDB.

This failed because part of the diffing algorithm involves
determining if two modules are the same, and if so drilling into
the module and diffing individual fields of the module.  The
only thing we can use to make this determination though is the
"name" of the module, which is a path to where the module (obj
file) was read from on the machine where it was linked.  This
fails for obvious reasons when comparing a PDB generated on one
machine to a PDB on another machine.

The fix employed here is to add two command line options to the
diff subcommand, which allow the user to specify a "binary root
path".  The bin root path, if specified, is stripped from the
beginning of any embedded PDB paths.  The test is updated to
specify the user's local test output directory for the left
PDB, and is hardcoded to the location where the original PDB
was created for the right PDB.  This way all the equivalence
comparisons should succeed.

llvm-svn: 307555
2017-07-10 16:52:15 +00:00
Zachary Turner c1e93e5fa4 Fix some differences between lld and MSVC generated PDBs.
A couple of things were different about our generated PDBs.

1) We were outputting the wrong Version on the PDB Stream.
   The version we were setting was newer than what MSVC is setting.
   It's not clear what the implications are, but we change LLD
   to use PdbImplVC70, as MSVC does.
2) For the optional debug stream indices in the DBI Stream, we
   were outputting 0 to mean "the stream is not present".  MSVC
   outputs uint16_t(-1), which is the "correct" way to specify
   that a stream is not present.  So we fix that as well.
3) We were setting the PDB Stream signature to 0.  This is supposed
   to be the result of calling time(nullptr).  Although this leads
   to non-deterministic builds, a better way to solve that is by
   having a command line option explicitly for generating a
   reproducible build, and have the default behavior of lld-link
   match the default behavior of link.

To test this, I'm making use of the new and improved `pdb diff`
sub command.  To make it suitable for writing tests against, I had
to modify the diff subcommand slightly to print less verbose output.
Previously it would always print | <column> | <value1> | <value2> |
which is quite verbose, and the values are fragile.  All we really
want to know is "did we produce the same value as link?"  So I added
command line options to print a single character representing the
result status (different, identical, equivalent), and another to
hide the value display.  Note that just inspecting the diff output
used to write the test, you can see some things that are obviously
wrong.  That is just reflective of the fact that this is the state
of affairs today, not that we're asserting that this is "correct".
We can use this as a starting point to discover differences, fix
them, and update the test.

Differential Revision: https://reviews.llvm.org/D35086

llvm-svn: 307422
2017-07-07 18:45:56 +00:00
Zachary Turner eae44dfee9 [PDB] Add a test that verifies every known type record.
We had a lot of one-off tests for this type and that type,
or "every type that happens to be generated by this program
I built".  Eventually I got a bug report filed where we were
crashing on a type that was not covered by any of these tests.
So this test carefully constructs a minimal C++ program that
will cause every type we support to be emitted.  This ensures
full coverage for type records.

Differential Revision: https://reviews.llvm.org/D34915

llvm-svn: 307187
2017-07-05 18:43:25 +00:00
Zachary Turner 02a267758e [llvm-pdbutil] Add the ability to dump the dependency tree for a type
Previously we had the -type-index option which would dump the record of
a single, but we had no way to follow the dependency graph backwards and
also dump all dependent types.

Having this option makes test-writing better, because we can limit the
test to only those records that are of importance for the thing we're
trying to test, which allows us to use things like CHECK-NEXT to reduce
fragility.

Differential Revision: https://reviews.llvm.org/D34899

llvm-svn: 306852
2017-06-30 18:15:47 +00:00
Zachary Turner e79b07e41e [llvm-pdbutil] Add a mode to `bytes` for dumping split debug chunks.
llvm-svn: 306309
2017-06-26 17:22:36 +00:00
Zachary Turner fa33282774 [llvm-pdbutil] Dump raw bytes of module symbols and debug chunks.
llvm-svn: 306179
2017-06-23 23:08:57 +00:00
Zachary Turner c2f5b4bfd9 [llvm-pdbutil] Dump raw bytes of type and id records.
llvm-svn: 306167
2017-06-23 21:50:54 +00:00
Zachary Turner dd73968256 [llvm-pdbutil] Dump raw bytes of various DBI stream subsections.
llvm-svn: 306160
2017-06-23 21:11:54 +00:00
Zachary Turner 5f09852dfb [llvm-pdbutil] Show what blocks a stream occupies.
This is useful when you want to look at a specific chunk of a
stream or look for discontinuities, and you need to know the
list of blocks occupied by a stream.

llvm-svn: 306150
2017-06-23 20:28:14 +00:00
Zachary Turner 6c3e41bbd3 [llvm-pdbutil] Dump raw bytes of pdb name map.
This patch dumps the raw bytes of the pdb name map which contains
the mapping of stream name to stream index for the string table
and other reserved streams.

llvm-svn: 306148
2017-06-23 20:18:38 +00:00
Zachary Turner 6b124f29e7 [llvm-pdbutil] Add the ability to dump raw bytes from the file.
Normally we can only make sense of the content of a PDB in terms
of streams and blocks, but in some cases it may be useful to dump
bytes at a specific absolute file offset.  For example, if you
know that some interesting data is at a particular location and
you want to see some surrounding data.

llvm-svn: 306146
2017-06-23 19:54:44 +00:00
Zachary Turner 9940203a2c [llvm-pdbutil] Create a "bytes" subcommand.
This idea originally came about when I was doing some deep
investigation of why certain bytes in a PDB that we round-tripped
differed from their original bytes in the source PDB.  I found
myself having to hack up the code in many places to dump the
bytes of this substream, or that record.  It would be nice if
we could just do this for every possible stream, substream,
debug chunk type, etc.

It doesn't make sense to put this under dump because there's just
so many options that would detract from the more common use case
of just dumping deserialized records.  So making a new subcommand
seems like the most logical course of action.  In doing so, we
already have two command line options that are suitable for this
new subcommand, so start out by moving them there.

llvm-svn: 306056
2017-06-22 20:58:11 +00:00
Zachary Turner 7df69958f8 [llvm-pdbutil] Rename "raw" to "dump".
Now you run llvm-pdbutil dump <options>.  This is a followup
after having renamed the tool, whereas before raw was obviously
just the style of dumping, whereas now "dump" is the action to
perform with the "util".

llvm-svn: 306055
2017-06-22 20:57:39 +00:00
Zachary Turner ed130b6ac0 Remove diff pedantic mode.
llvm-svn: 305818
2017-06-20 18:50:30 +00:00
Zachary Turner 59224cba2e Remove some dead code / includes.
I'm trying to get rid of the TypeDatabase class, so the first
step is to minimize its footprint.

llvm-svn: 305611
2017-06-16 23:42:15 +00:00
Zachary Turner 47d9a560de [llvm-pdbutil] Add support for dumping cross module imports/exports.
llvm-svn: 305532
2017-06-16 00:04:24 +00:00