forked from OSchip/llvm-project
155 lines
6.8 KiB
ReStructuredText
155 lines
6.8 KiB
ReStructuredText
========================================
|
|
The PDB Info Stream (aka the PDB Stream)
|
|
========================================
|
|
|
|
.. contents::
|
|
:local:
|
|
|
|
.. _pdb_stream_header:
|
|
|
|
Stream Header
|
|
=============
|
|
At offset 0 of the PDB Stream is a header with the following layout:
|
|
|
|
|
|
.. code-block:: c++
|
|
|
|
struct PdbStreamHeader {
|
|
ulittle32_t Version;
|
|
ulittle32_t Signature;
|
|
ulittle32_t Age;
|
|
Guid UniqueId;
|
|
};
|
|
|
|
- **Version** - A Value from the following enum:
|
|
|
|
.. code-block:: c++
|
|
|
|
enum class PdbStreamVersion : uint32_t {
|
|
VC2 = 19941610,
|
|
VC4 = 19950623,
|
|
VC41 = 19950814,
|
|
VC50 = 19960307,
|
|
VC98 = 19970604,
|
|
VC70Dep = 19990604,
|
|
VC70 = 20000404,
|
|
VC80 = 20030901,
|
|
VC110 = 20091201,
|
|
VC140 = 20140508,
|
|
};
|
|
|
|
While the meaning of this field appears to be obvious, in practice we have
|
|
never observed a value other than ``VC70``, even with modern versions of
|
|
the toolchain, and it is unclear why the other values exist. It is assumed
|
|
that certain aspects of the PDB stream's layout, and perhaps even that of
|
|
the other streams, will change if the value is something other than ``VC70``.
|
|
|
|
- **Signature** - A 32-bit time-stamp generated with a call to ``time()`` at
|
|
the time the PDB file is written. Note that due to the inherent uniqueness
|
|
problems of using a timestamp with 1-second granularity, this field does not
|
|
really serve its intended purpose, and as such is typically ignored in favor
|
|
of the ``Guid`` field, described below.
|
|
|
|
- **Age** - The number of times the PDB file has been written. This can be used
|
|
along with ``Guid`` to match the PDB to its corresponding executable.
|
|
|
|
- **Guid** - A 128-bit identifier guaranteed to be unique across space and time.
|
|
In general, this can be thought of as the result of calling the Win32 API
|
|
`UuidCreate <https://msdn.microsoft.com/en-us/library/windows/desktop/aa379205(v=vs.85).aspx>`__,
|
|
although LLVM cannot rely on that, as it must work on non-Windows platforms.
|
|
|
|
.. _pdb_named_stream_map:
|
|
|
|
Named Stream Map
|
|
================
|
|
|
|
Following the header is a serialized hash table whose key type is a string, and
|
|
whose value type is an integer. The existence of a mapping ``X -> Y`` means
|
|
that the stream with the name ``X`` has stream index ``Y`` in the underlying MSF
|
|
file. Note that not all streams are named (for example, the
|
|
:doc:`TPI Stream <TpiStream>` has a fixed index and as such there is no need to
|
|
look up its index by name). In practice, there are usually only a small number
|
|
of named streams and these are enumerated in the table of streams in :doc:`index`.
|
|
A corollary of this is if a stream does have a name (and as such is in the named
|
|
stream map) then consulting the Named Stream Map is likely to be the only way to
|
|
discover the stream's MSF stream index. Several important streams (such as the
|
|
global string table, which is called ``/names``) can only be located this way, and
|
|
so it is important to both produce and consume this correctly as tools will not
|
|
function correctly without it.
|
|
|
|
.. important::
|
|
Some streams are located by fixed indices (e.g TPI Stream has index 2), but
|
|
other streams are located by fixed names (e.g. the string table is called
|
|
``/names``) and can only be located by consulting the Named Stream Map.
|
|
|
|
The on-disk layout of the Named Stream Map consists of 2 components. The first is
|
|
a buffer of string data prefixed by a 32-bit length. The second is a serialized
|
|
hash table whose key and value types are both ``uint32_t``. The key is the offset
|
|
of a null-terminated string in the string data buffer specifying the name of the
|
|
stream, and the value is the MSF stream index of the stream with said name.
|
|
Note that although the key is an integer, the hash function used to find the right
|
|
bucket hashes the string at the corresponding offset in the string data buffer.
|
|
|
|
The on-disk layout of the serialized hash table is described at :doc:`HashTable`.
|
|
|
|
Note that the entire Named Stream Map is not length-prefixed, so the only way to
|
|
get to the data following it is to de-serialize it in its entirety.
|
|
|
|
|
|
.. _pdb_stream_features:
|
|
|
|
PDB Feature Codes
|
|
=================
|
|
Following the Named Stream Map, and consuming all remaining bytes of the PDB
|
|
Stream is a list of values from the following enumeration:
|
|
|
|
.. code-block:: c++
|
|
|
|
enum class PdbRaw_FeatureSig : uint32_t {
|
|
VC110 = 20091201,
|
|
VC140 = 20140508,
|
|
NoTypeMerge = 0x4D544F4E,
|
|
MinimalDebugInfo = 0x494E494D,
|
|
};
|
|
|
|
The meaning of these values is summarized by the following table:
|
|
|
|
+------------------+-------------------------------------------------+
|
|
| Flag | Meaning |
|
|
+==================+=================================================+
|
|
| VC110 | - No other features flags are present |
|
|
| | - PDB contains an :doc:`IPI Stream <TpiStream>` |
|
|
+------------------+-------------------------------------------------+
|
|
| VC140 | - Other feature flags may be present |
|
|
| | - PDB contains an :doc:`IPI Stream <TpiStream>` |
|
|
+------------------+-------------------------------------------------+
|
|
| NoTypeMerge | - Presumably duplicate types can appear in the |
|
|
| | TPI Stream, although it's unclear why this |
|
|
| | might happen. |
|
|
+------------------+-------------------------------------------------+
|
|
| MinimalDebugInfo | - Program was linked with /DEBUG:FASTLINK |
|
|
| | - There is no TPI / IPI stream, all type info |
|
|
| | is contained in the original object files. |
|
|
+------------------+-------------------------------------------------+
|
|
|
|
Matching a PDB to its executable
|
|
================================
|
|
The linker is responsible for writing both the PDB and the final executable, and
|
|
as a result is the only entity capable of writing the information necessary to
|
|
match the PDB to the executable.
|
|
|
|
In order to accomplish this, the linker generates a guid for the PDB (or
|
|
re-uses the existing guid if it is linking incrementally) and increments the Age
|
|
field.
|
|
|
|
The executable is a PE/COFF file, and part of a PE/COFF file is the presence of
|
|
number of "directories". For our purposes here, we are interested in the "debug
|
|
directory". The exact format of a debug directory is described by the
|
|
`IMAGE_DEBUG_DIRECTORY structure <https://msdn.microsoft.com/en-us/library/windows/desktop/ms680307(v=vs.85).aspx>`__.
|
|
For this particular case, the linker emits a debug directory of type
|
|
``IMAGE_DEBUG_TYPE_CODEVIEW``. The format of this record is defined in
|
|
``llvm/DebugInfo/CodeView/CVDebugRecord.h``, but it suffices to say here only
|
|
that it includes the same ``Guid`` and ``Age`` fields. At runtime, a
|
|
debugger or tool can scan the COFF executable image for the presence of
|
|
a debug directory of the correct type and verify that the Guid and Age match.
|