forked from OSchip/llvm-project
Convert PDB docs to unix line endings. No other changes.
llvm-svn: 359712
This commit is contained in:
parent
a0df4d37b0
commit
0a4aeec16e
|
@ -1,3 +1,3 @@
|
|||
=====================================
|
||||
The PDB Global Symbol Stream
|
||||
=====================================
|
||||
=====================================
|
||||
The PDB Global Symbol Stream
|
||||
=====================================
|
||||
|
|
|
@ -1,103 +1,103 @@
|
|||
The PDB Serialized Hash Table Format
|
||||
====================================
|
||||
|
||||
.. contents::
|
||||
:local:
|
||||
|
||||
.. _hash_intro:
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
One of the design goals of the PDB format is to provide accelerated access to
|
||||
debug information, and for this reason there are several occasions where hash
|
||||
tables are serialized and embedded directly to the file, rather than requiring
|
||||
a consumer to read a list of values and reconstruct the hash table on the fly.
|
||||
|
||||
The serialization format supports hash tables of arbitrarily large size and
|
||||
capacity, as well as value types and hash functions. The only supported key
|
||||
value type is a uint32. The only requirement is that the producer and consumer
|
||||
agree on the hash function. As such, the hash function can is not discussed
|
||||
further in this document, it is assumed that for a particular instance of a PDB
|
||||
file hash table, the appropriate hash function is being used.
|
||||
|
||||
On-Disk Format
|
||||
==============
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
.--------------------.-- +0
|
||||
| Size |
|
||||
.--------------------.-- +4
|
||||
| Capacity |
|
||||
.--------------------.-- +8
|
||||
| Present Bit Vector |
|
||||
.--------------------.-- +N
|
||||
| Deleted Bit Vector |
|
||||
.--------------------.-- +M ─╮
|
||||
| Key | │
|
||||
.--------------------.-- +M+4 │
|
||||
| Value | │
|
||||
.--------------------.-- +M+4+sizeof(Value) │
|
||||
... ├─ |Capacity| Bucket entries
|
||||
.--------------------. │
|
||||
| Key | │
|
||||
.--------------------. │
|
||||
| Value | │
|
||||
.--------------------. ─╯
|
||||
|
||||
- **Size** - The number of values contained in the hash table.
|
||||
|
||||
- **Capacity** - The number of buckets in the hash table. Producers should
|
||||
maintain a load factor of no greater than ``2/3*Capacity+1``.
|
||||
|
||||
- **Present Bit Vector** - A serialized bit vector which contains information
|
||||
about which buckets have valid values. If the bucket has a value, the
|
||||
corresponding bit will be set, and if the bucket doesn't have a value (either
|
||||
because the bucket is empty or because the value is a tombstone value) the bit
|
||||
will be unset.
|
||||
|
||||
- **Deleted Bit Vector** - A serialized bit vector which contains information
|
||||
about which buckets have tombstone values. If the entry in this bucket is
|
||||
deleted, the bit will be set, otherwise it will be unset.
|
||||
|
||||
- **Keys and Values** - A list of ``Capacity`` hash buckets, where the first
|
||||
entry is the key (always a uint32), and the second entry is the value. The
|
||||
state of each bucket (valid, empty, deleted) can be determined by examining
|
||||
the present and deleted bit vectors.
|
||||
|
||||
|
||||
.. _hash_bit_vectors:
|
||||
|
||||
Present and Deleted Bit Vectors
|
||||
===============================
|
||||
|
||||
The bit vectors indicating the status of each bucket are serialized as follows:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
.--------------------.-- +0
|
||||
| Word Count |
|
||||
.--------------------.-- +4
|
||||
| Word_0 | ─╮
|
||||
.--------------------.-- +8 │
|
||||
| Word_1 | │
|
||||
.--------------------.-- +12 ├─ |Word Count| values
|
||||
... │
|
||||
.--------------------. │
|
||||
| Word_N | │
|
||||
.--------------------. ─╯
|
||||
|
||||
The words, when viewed as a contiguous block of bytes, represent a bit vector with
|
||||
the following layout:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
.------------. .------------.------------.
|
||||
| Word_N | ... | Word_1 | Word_0 |
|
||||
.------------. .------------.------------.
|
||||
| | | | |
|
||||
+N*32 +(N-1)*32 +64 +32 +0
|
||||
|
||||
where the k'th bit of this bit vector represents the status of the k'th bucket
|
||||
in the hash table.
|
||||
The PDB Serialized Hash Table Format
|
||||
====================================
|
||||
|
||||
.. contents::
|
||||
:local:
|
||||
|
||||
.. _hash_intro:
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
One of the design goals of the PDB format is to provide accelerated access to
|
||||
debug information, and for this reason there are several occasions where hash
|
||||
tables are serialized and embedded directly to the file, rather than requiring
|
||||
a consumer to read a list of values and reconstruct the hash table on the fly.
|
||||
|
||||
The serialization format supports hash tables of arbitrarily large size and
|
||||
capacity, as well as value types and hash functions. The only supported key
|
||||
value type is a uint32. The only requirement is that the producer and consumer
|
||||
agree on the hash function. As such, the hash function can is not discussed
|
||||
further in this document, it is assumed that for a particular instance of a PDB
|
||||
file hash table, the appropriate hash function is being used.
|
||||
|
||||
On-Disk Format
|
||||
==============
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
.--------------------.-- +0
|
||||
| Size |
|
||||
.--------------------.-- +4
|
||||
| Capacity |
|
||||
.--------------------.-- +8
|
||||
| Present Bit Vector |
|
||||
.--------------------.-- +N
|
||||
| Deleted Bit Vector |
|
||||
.--------------------.-- +M ─╮
|
||||
| Key | │
|
||||
.--------------------.-- +M+4 │
|
||||
| Value | │
|
||||
.--------------------.-- +M+4+sizeof(Value) │
|
||||
... ├─ |Capacity| Bucket entries
|
||||
.--------------------. │
|
||||
| Key | │
|
||||
.--------------------. │
|
||||
| Value | │
|
||||
.--------------------. ─╯
|
||||
|
||||
- **Size** - The number of values contained in the hash table.
|
||||
|
||||
- **Capacity** - The number of buckets in the hash table. Producers should
|
||||
maintain a load factor of no greater than ``2/3*Capacity+1``.
|
||||
|
||||
- **Present Bit Vector** - A serialized bit vector which contains information
|
||||
about which buckets have valid values. If the bucket has a value, the
|
||||
corresponding bit will be set, and if the bucket doesn't have a value (either
|
||||
because the bucket is empty or because the value is a tombstone value) the bit
|
||||
will be unset.
|
||||
|
||||
- **Deleted Bit Vector** - A serialized bit vector which contains information
|
||||
about which buckets have tombstone values. If the entry in this bucket is
|
||||
deleted, the bit will be set, otherwise it will be unset.
|
||||
|
||||
- **Keys and Values** - A list of ``Capacity`` hash buckets, where the first
|
||||
entry is the key (always a uint32), and the second entry is the value. The
|
||||
state of each bucket (valid, empty, deleted) can be determined by examining
|
||||
the present and deleted bit vectors.
|
||||
|
||||
|
||||
.. _hash_bit_vectors:
|
||||
|
||||
Present and Deleted Bit Vectors
|
||||
===============================
|
||||
|
||||
The bit vectors indicating the status of each bucket are serialized as follows:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
.--------------------.-- +0
|
||||
| Word Count |
|
||||
.--------------------.-- +4
|
||||
| Word_0 | ─╮
|
||||
.--------------------.-- +8 │
|
||||
| Word_1 | │
|
||||
.--------------------.-- +12 ├─ |Word Count| values
|
||||
... │
|
||||
.--------------------. │
|
||||
| Word_N | │
|
||||
.--------------------. ─╯
|
||||
|
||||
The words, when viewed as a contiguous block of bytes, represent a bit vector with
|
||||
the following layout:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
.------------. .------------.------------.
|
||||
| Word_N | ... | Word_1 | Word_0 |
|
||||
.------------. .------------.------------.
|
||||
| | | | |
|
||||
+N*32 +(N-1)*32 +64 +32 +0
|
||||
|
||||
where the k'th bit of this bit vector represents the status of the k'th bucket
|
||||
in the hash table.
|
||||
|
|
|
@ -1,80 +1,80 @@
|
|||
=====================================
|
||||
The Module Information Stream
|
||||
=====================================
|
||||
|
||||
.. contents::
|
||||
:local:
|
||||
|
||||
.. _modi_stream_intro:
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
The Module Info Stream (henceforth referred to as the Modi stream) contains
|
||||
information about a single module (object file, import library, etc that
|
||||
contributes to the binary this PDB contains debug information about. There
|
||||
is one modi stream for each module, and the mapping between modi stream index
|
||||
and module is contained in the :doc:`DBI Stream <DbiStream>`. The modi stream
|
||||
for a single module contains line information for the compiland, as well as
|
||||
all CodeView information for the symbols defined in the compiland. Finally,
|
||||
there is a "global refs" substream which is not well understood.
|
||||
|
||||
.. _modi_stream_layout:
|
||||
|
||||
Stream Layout
|
||||
=============
|
||||
|
||||
A modi stream is laid out as follows:
|
||||
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
struct ModiStream {
|
||||
uint32_t Signature;
|
||||
uint8_t Symbols[SymbolSize-4];
|
||||
uint8_t C11LineInfo[C11Size];
|
||||
uint8_t C13LineInfo[C13Size];
|
||||
|
||||
uint32_t GlobalRefsSize;
|
||||
uint8_t GlobalRefs[GlobalRefsSize];
|
||||
};
|
||||
|
||||
- **Signature** - Unknown. In practice only the value of ``4`` has been
|
||||
observed. It is hypothesized that this value corresponds to the set of
|
||||
``CV_SIGNATURE_xx`` defines in ``cvinfo.h``, with the value of ``4``
|
||||
meaning that this module has C13 line information (as opposed to C11 line
|
||||
information). A corollary of this is that we expect to only ever see
|
||||
C13 line info, and that we do not understand the format of C11 line info.
|
||||
|
||||
- **Symbols** - The :ref:`CodeView Symbol Substream <modi_symbol_substream>`.
|
||||
``SymbolSize`` is equal to the value of ``SymByteSize`` for the
|
||||
corresponding module's entry in the :ref:`Module Info Substream <dbi_mod_info_substream>`
|
||||
of the :doc:`DBI Stream <DbiStream>`.
|
||||
|
||||
- **C11LineInfo** - A block containing CodeView line information in C11
|
||||
format. ``C11Size`` is equal to the value of ``C11ByteSize`` from the
|
||||
:ref:`Module Info Substream <dbi_mod_info_substream>` of the
|
||||
:doc:`DBI Stream <DbiStream>`. If this value is ``0``, then C11 line
|
||||
information is not present. As mentioned previously, the format of
|
||||
C11 line info is not understood and we assume all line in modern PDBs
|
||||
to be in C13 format.
|
||||
|
||||
- **C13LineInfo** - A block containing CodeView line information in C13
|
||||
format. ``C13Size`` is equal to the value of ``C13ByteSize`` from the
|
||||
:ref:`Module Info Substream <dbi_mod_info_substream>` of the
|
||||
:doc:`DBI Stream <DbiStream>`. If this value is ``0``, then C13 line
|
||||
information is not present.
|
||||
|
||||
- **GlobalRefs** - The meaning of this substream is not understood.
|
||||
|
||||
.. _modi_symbol_substream:
|
||||
|
||||
The CodeView Symbol Substream
|
||||
=============================
|
||||
|
||||
The CodeView Symbol Substream. This is an array of variable length
|
||||
records describing the functions, variables, inlining information,
|
||||
and other symbols defined in the compiland. The entire array consumes
|
||||
``SymbolSize-4`` bytes. The format of a CodeView Symbol Record (and
|
||||
thusly, an array of CodeView Symbol Records) is described in
|
||||
:doc:`CodeViewSymbols`.
|
||||
=====================================
|
||||
The Module Information Stream
|
||||
=====================================
|
||||
|
||||
.. contents::
|
||||
:local:
|
||||
|
||||
.. _modi_stream_intro:
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
The Module Info Stream (henceforth referred to as the Modi stream) contains
|
||||
information about a single module (object file, import library, etc that
|
||||
contributes to the binary this PDB contains debug information about. There
|
||||
is one modi stream for each module, and the mapping between modi stream index
|
||||
and module is contained in the :doc:`DBI Stream <DbiStream>`. The modi stream
|
||||
for a single module contains line information for the compiland, as well as
|
||||
all CodeView information for the symbols defined in the compiland. Finally,
|
||||
there is a "global refs" substream which is not well understood.
|
||||
|
||||
.. _modi_stream_layout:
|
||||
|
||||
Stream Layout
|
||||
=============
|
||||
|
||||
A modi stream is laid out as follows:
|
||||
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
struct ModiStream {
|
||||
uint32_t Signature;
|
||||
uint8_t Symbols[SymbolSize-4];
|
||||
uint8_t C11LineInfo[C11Size];
|
||||
uint8_t C13LineInfo[C13Size];
|
||||
|
||||
uint32_t GlobalRefsSize;
|
||||
uint8_t GlobalRefs[GlobalRefsSize];
|
||||
};
|
||||
|
||||
- **Signature** - Unknown. In practice only the value of ``4`` has been
|
||||
observed. It is hypothesized that this value corresponds to the set of
|
||||
``CV_SIGNATURE_xx`` defines in ``cvinfo.h``, with the value of ``4``
|
||||
meaning that this module has C13 line information (as opposed to C11 line
|
||||
information). A corollary of this is that we expect to only ever see
|
||||
C13 line info, and that we do not understand the format of C11 line info.
|
||||
|
||||
- **Symbols** - The :ref:`CodeView Symbol Substream <modi_symbol_substream>`.
|
||||
``SymbolSize`` is equal to the value of ``SymByteSize`` for the
|
||||
corresponding module's entry in the :ref:`Module Info Substream <dbi_mod_info_substream>`
|
||||
of the :doc:`DBI Stream <DbiStream>`.
|
||||
|
||||
- **C11LineInfo** - A block containing CodeView line information in C11
|
||||
format. ``C11Size`` is equal to the value of ``C11ByteSize`` from the
|
||||
:ref:`Module Info Substream <dbi_mod_info_substream>` of the
|
||||
:doc:`DBI Stream <DbiStream>`. If this value is ``0``, then C11 line
|
||||
information is not present. As mentioned previously, the format of
|
||||
C11 line info is not understood and we assume all line in modern PDBs
|
||||
to be in C13 format.
|
||||
|
||||
- **C13LineInfo** - A block containing CodeView line information in C13
|
||||
format. ``C13Size`` is equal to the value of ``C13ByteSize`` from the
|
||||
:ref:`Module Info Substream <dbi_mod_info_substream>` of the
|
||||
:doc:`DBI Stream <DbiStream>`. If this value is ``0``, then C13 line
|
||||
information is not present.
|
||||
|
||||
- **GlobalRefs** - The meaning of this substream is not understood.
|
||||
|
||||
.. _modi_symbol_substream:
|
||||
|
||||
The CodeView Symbol Substream
|
||||
=============================
|
||||
|
||||
The CodeView Symbol Substream. This is an array of variable length
|
||||
records describing the functions, variables, inlining information,
|
||||
and other symbols defined in the compiland. The entire array consumes
|
||||
``SymbolSize-4`` bytes. The format of a CodeView Symbol Record (and
|
||||
thusly, an array of CodeView Symbol Records) is described in
|
||||
:doc:`CodeViewSymbols`.
|
||||
|
|
|
@ -1,179 +1,179 @@
|
|||
=====================================
|
||||
The MSF File Format
|
||||
=====================================
|
||||
|
||||
.. contents::
|
||||
:local:
|
||||
|
||||
.. _msf_layout:
|
||||
|
||||
File Layout
|
||||
===========
|
||||
|
||||
The MSF file format consists of the following components:
|
||||
|
||||
1. :ref:`msf_superblock`
|
||||
2. :ref:`msf_freeblockmap` (also know as Free Page Map, or FPM)
|
||||
3. Data
|
||||
|
||||
Each component is stored as an indexed block, the length of which is specified
|
||||
in ``SuperBlock::BlockSize``. The file consists of 1 or more iterations of the
|
||||
following pattern (sometimes referred to as an "interval"):
|
||||
|
||||
1. 1 block of data
|
||||
2. Free Block Map 1 (corresponds to ``SuperBlock::FreeBlockMapBlock`` 1)
|
||||
3. Free Block Map 2 (corresponds to ``SuperBlock::FreeBlockMapBlock`` 2)
|
||||
4. ``SuperBlock::BlockSize - 3`` blocks of data
|
||||
|
||||
In the first interval, the first data block is used to store
|
||||
:ref:`msf_superblock`.
|
||||
|
||||
The following diagram demonstrates the general layout of the file (\| denotes
|
||||
the end of an interval, and is for visualization purposes only):
|
||||
|
||||
+-------------+-----------------------+------------------+------------------+----------+----+------+------+------+-------------+----+-----+
|
||||
| Block Index | 0 | 1 | 2 | 3 - 4095 | \| | 4096 | 4097 | 4098 | 4099 - 8191 | \| | ... |
|
||||
+=============+=======================+==================+==================+==========+====+======+======+======+=============+====+=====+
|
||||
| Meaning | :ref:`msf_superblock` | Free Block Map 1 | Free Block Map 2 | Data | \| | Data | FPM1 | FPM2 | Data | \| | ... |
|
||||
+-------------+-----------------------+------------------+------------------+----------+----+------+------+------+-------------+----+-----+
|
||||
|
||||
The file may end after any block, including immediately after a FPM1.
|
||||
|
||||
.. note::
|
||||
LLVM only supports 4096 byte blocks (sometimes referred to as the "BigMsf"
|
||||
variant), so the rest of this document will assume a block size of 4096.
|
||||
|
||||
.. _msf_superblock:
|
||||
|
||||
The Superblock
|
||||
==============
|
||||
At file offset 0 in an MSF file is the MSF *SuperBlock*, which is laid out as
|
||||
follows:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
struct SuperBlock {
|
||||
char FileMagic[sizeof(Magic)];
|
||||
ulittle32_t BlockSize;
|
||||
ulittle32_t FreeBlockMapBlock;
|
||||
ulittle32_t NumBlocks;
|
||||
ulittle32_t NumDirectoryBytes;
|
||||
ulittle32_t Unknown;
|
||||
ulittle32_t BlockMapAddr;
|
||||
};
|
||||
|
||||
- **FileMagic** - Must be equal to ``"Microsoft C / C++ MSF 7.00\\r\\n"``
|
||||
followed by the bytes ``1A 44 53 00 00 00``.
|
||||
- **BlockSize** - The block size of the internal file system. Valid values are
|
||||
512, 1024, 2048, and 4096 bytes. Certain aspects of the MSF file layout vary
|
||||
depending on the block sizes. For the purposes of LLVM, we handle only block
|
||||
sizes of 4KiB, and all further discussion assumes a block size of 4KiB.
|
||||
- **FreeBlockMapBlock** - The index of a block within the file, at which begins
|
||||
a bitfield representing the set of all blocks within the file which are "free"
|
||||
(i.e. the data within that block is not used). See :ref:`msf_freeblockmap` for
|
||||
more information.
|
||||
**Important**: ``FreeBlockMapBlock`` can only be ``1`` or ``2``!
|
||||
- **NumBlocks** - The total number of blocks in the file. ``NumBlocks * BlockSize``
|
||||
should equal the size of the file on disk.
|
||||
- **NumDirectoryBytes** - The size of the stream directory, in bytes. The stream
|
||||
directory contains information about each stream's size and the set of blocks
|
||||
that it occupies. It will be described in more detail later.
|
||||
- **BlockMapAddr** - The index of a block within the MSF file. At this block is
|
||||
an array of ``ulittle32_t``'s listing the blocks that the stream directory
|
||||
resides on. For large MSF files, the stream directory (which describes the
|
||||
block layout of each stream) may not fit entirely on a single block. As a
|
||||
result, this extra layer of indirection is introduced, whereby this block
|
||||
contains the list of blocks that the stream directory occupies, and the stream
|
||||
directory itself can be stitched together accordingly. The number of
|
||||
``ulittle32_t``'s in this array is given by ``ceil(NumDirectoryBytes / BlockSize)``.
|
||||
|
||||
.. _msf_freeblockmap:
|
||||
|
||||
The Free Block Map
|
||||
==================
|
||||
|
||||
The Free Block Map (sometimes referred to as the Free Page Map, or FPM) is a
|
||||
series of blocks which contains a bit flag for every block in the file. The
|
||||
flag will be set to 0 if the block is in use, and 1 if the block is unused.
|
||||
|
||||
Each file contains two FPMs, one of which is active at any given time. This
|
||||
feature is designed to support incremental and atomic updates of the underlying
|
||||
MSF file. While writing to an MSF file, if the active FPM is FPM1, you can
|
||||
write your new modified bitfield to FPM2, and vice versa. Only when you commit
|
||||
the file to disk do you need to swap the value in the SuperBlock to point to
|
||||
the new ``FreeBlockMapBlock``.
|
||||
|
||||
The Free Block Maps are stored as a series of single blocks thoughout the file
|
||||
at intervals of BlockSize. Because each FPM block is of size ``BlockSize``
|
||||
bytes, it contains 8 times as many bits as an interval has blocks. This means
|
||||
that the first block of each FPM refers to the first 8 intervals of the file
|
||||
(the first 32768 blocks), the second block of each FPM refers to the next 8
|
||||
blocks, and so on. This results in far more FPM blocks being present than are
|
||||
required, but in order to maintain backwards compatibility the format must stay
|
||||
this way.
|
||||
|
||||
The Stream Directory
|
||||
====================
|
||||
The Stream Directory is the root of all access to the other streams in an MSF
|
||||
file. Beginning at byte 0 of the stream directory is the following structure:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
struct StreamDirectory {
|
||||
ulittle32_t NumStreams;
|
||||
ulittle32_t StreamSizes[NumStreams];
|
||||
ulittle32_t StreamBlocks[NumStreams][];
|
||||
};
|
||||
|
||||
And this structure occupies exactly ``SuperBlock->NumDirectoryBytes`` bytes.
|
||||
Note that each of the last two arrays is of variable length, and in particular
|
||||
that the second array is jagged.
|
||||
|
||||
**Example:** Suppose a hypothetical PDB file with a 4KiB block size, and 4
|
||||
streams of lengths {1000 bytes, 8000 bytes, 16000 bytes, 9000 bytes}.
|
||||
|
||||
Stream 0: ceil(1000 / 4096) = 1 block
|
||||
|
||||
Stream 1: ceil(8000 / 4096) = 2 blocks
|
||||
|
||||
Stream 2: ceil(16000 / 4096) = 4 blocks
|
||||
|
||||
Stream 3: ceil(9000 / 4096) = 3 blocks
|
||||
|
||||
In total, 10 blocks are used. Let's see what the stream directory might look
|
||||
like:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
struct StreamDirectory {
|
||||
ulittle32_t NumStreams = 4;
|
||||
ulittle32_t StreamSizes[] = {1000, 8000, 16000, 9000};
|
||||
ulittle32_t StreamBlocks[][] = {
|
||||
{4},
|
||||
{5, 6},
|
||||
{11, 9, 7, 8},
|
||||
{10, 15, 12}
|
||||
};
|
||||
};
|
||||
|
||||
In total, this occupies ``15 * 4 = 60`` bytes, so ``SuperBlock->NumDirectoryBytes``
|
||||
would equal ``60``, and ``SuperBlock->BlockMapAddr`` would be an array of one
|
||||
``ulittle32_t``, since ``60 <= SuperBlock->BlockSize``.
|
||||
|
||||
Note also that the streams are discontiguous, and that part of stream 3 is in the
|
||||
middle of part of stream 2. You cannot assume anything about the layout of the
|
||||
blocks!
|
||||
|
||||
Alignment and Block Boundaries
|
||||
==============================
|
||||
As may be clear by now, it is possible for a single field (whether it be a high
|
||||
level record, a long string field, or even a single ``uint16``) to begin and
|
||||
end in separate blocks. For example, if the block size is 4096 bytes, and a
|
||||
``uint16`` field begins at the last byte of the current block, then it would
|
||||
need to end on the first byte of the next block. Since blocks are not
|
||||
necessarily contiguously laid out in the file, this means that both the consumer
|
||||
and the producer of an MSF file must be prepared to split data apart
|
||||
accordingly. In the aforementioned example, the high byte of the ``uint16``
|
||||
would be written to the last byte of block N, and the low byte would be written
|
||||
to the first byte of block N+1, which could be tens of thousands of bytes later
|
||||
(or even earlier!) in the file, depending on what the stream directory says.
|
||||
=====================================
|
||||
The MSF File Format
|
||||
=====================================
|
||||
|
||||
.. contents::
|
||||
:local:
|
||||
|
||||
.. _msf_layout:
|
||||
|
||||
File Layout
|
||||
===========
|
||||
|
||||
The MSF file format consists of the following components:
|
||||
|
||||
1. :ref:`msf_superblock`
|
||||
2. :ref:`msf_freeblockmap` (also know as Free Page Map, or FPM)
|
||||
3. Data
|
||||
|
||||
Each component is stored as an indexed block, the length of which is specified
|
||||
in ``SuperBlock::BlockSize``. The file consists of 1 or more iterations of the
|
||||
following pattern (sometimes referred to as an "interval"):
|
||||
|
||||
1. 1 block of data
|
||||
2. Free Block Map 1 (corresponds to ``SuperBlock::FreeBlockMapBlock`` 1)
|
||||
3. Free Block Map 2 (corresponds to ``SuperBlock::FreeBlockMapBlock`` 2)
|
||||
4. ``SuperBlock::BlockSize - 3`` blocks of data
|
||||
|
||||
In the first interval, the first data block is used to store
|
||||
:ref:`msf_superblock`.
|
||||
|
||||
The following diagram demonstrates the general layout of the file (\| denotes
|
||||
the end of an interval, and is for visualization purposes only):
|
||||
|
||||
+-------------+-----------------------+------------------+------------------+----------+----+------+------+------+-------------+----+-----+
|
||||
| Block Index | 0 | 1 | 2 | 3 - 4095 | \| | 4096 | 4097 | 4098 | 4099 - 8191 | \| | ... |
|
||||
+=============+=======================+==================+==================+==========+====+======+======+======+=============+====+=====+
|
||||
| Meaning | :ref:`msf_superblock` | Free Block Map 1 | Free Block Map 2 | Data | \| | Data | FPM1 | FPM2 | Data | \| | ... |
|
||||
+-------------+-----------------------+------------------+------------------+----------+----+------+------+------+-------------+----+-----+
|
||||
|
||||
The file may end after any block, including immediately after a FPM1.
|
||||
|
||||
.. note::
|
||||
LLVM only supports 4096 byte blocks (sometimes referred to as the "BigMsf"
|
||||
variant), so the rest of this document will assume a block size of 4096.
|
||||
|
||||
.. _msf_superblock:
|
||||
|
||||
The Superblock
|
||||
==============
|
||||
At file offset 0 in an MSF file is the MSF *SuperBlock*, which is laid out as
|
||||
follows:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
struct SuperBlock {
|
||||
char FileMagic[sizeof(Magic)];
|
||||
ulittle32_t BlockSize;
|
||||
ulittle32_t FreeBlockMapBlock;
|
||||
ulittle32_t NumBlocks;
|
||||
ulittle32_t NumDirectoryBytes;
|
||||
ulittle32_t Unknown;
|
||||
ulittle32_t BlockMapAddr;
|
||||
};
|
||||
|
||||
- **FileMagic** - Must be equal to ``"Microsoft C / C++ MSF 7.00\\r\\n"``
|
||||
followed by the bytes ``1A 44 53 00 00 00``.
|
||||
- **BlockSize** - The block size of the internal file system. Valid values are
|
||||
512, 1024, 2048, and 4096 bytes. Certain aspects of the MSF file layout vary
|
||||
depending on the block sizes. For the purposes of LLVM, we handle only block
|
||||
sizes of 4KiB, and all further discussion assumes a block size of 4KiB.
|
||||
- **FreeBlockMapBlock** - The index of a block within the file, at which begins
|
||||
a bitfield representing the set of all blocks within the file which are "free"
|
||||
(i.e. the data within that block is not used). See :ref:`msf_freeblockmap` for
|
||||
more information.
|
||||
**Important**: ``FreeBlockMapBlock`` can only be ``1`` or ``2``!
|
||||
- **NumBlocks** - The total number of blocks in the file. ``NumBlocks * BlockSize``
|
||||
should equal the size of the file on disk.
|
||||
- **NumDirectoryBytes** - The size of the stream directory, in bytes. The stream
|
||||
directory contains information about each stream's size and the set of blocks
|
||||
that it occupies. It will be described in more detail later.
|
||||
- **BlockMapAddr** - The index of a block within the MSF file. At this block is
|
||||
an array of ``ulittle32_t``'s listing the blocks that the stream directory
|
||||
resides on. For large MSF files, the stream directory (which describes the
|
||||
block layout of each stream) may not fit entirely on a single block. As a
|
||||
result, this extra layer of indirection is introduced, whereby this block
|
||||
contains the list of blocks that the stream directory occupies, and the stream
|
||||
directory itself can be stitched together accordingly. The number of
|
||||
``ulittle32_t``'s in this array is given by ``ceil(NumDirectoryBytes / BlockSize)``.
|
||||
|
||||
.. _msf_freeblockmap:
|
||||
|
||||
The Free Block Map
|
||||
==================
|
||||
|
||||
The Free Block Map (sometimes referred to as the Free Page Map, or FPM) is a
|
||||
series of blocks which contains a bit flag for every block in the file. The
|
||||
flag will be set to 0 if the block is in use, and 1 if the block is unused.
|
||||
|
||||
Each file contains two FPMs, one of which is active at any given time. This
|
||||
feature is designed to support incremental and atomic updates of the underlying
|
||||
MSF file. While writing to an MSF file, if the active FPM is FPM1, you can
|
||||
write your new modified bitfield to FPM2, and vice versa. Only when you commit
|
||||
the file to disk do you need to swap the value in the SuperBlock to point to
|
||||
the new ``FreeBlockMapBlock``.
|
||||
|
||||
The Free Block Maps are stored as a series of single blocks thoughout the file
|
||||
at intervals of BlockSize. Because each FPM block is of size ``BlockSize``
|
||||
bytes, it contains 8 times as many bits as an interval has blocks. This means
|
||||
that the first block of each FPM refers to the first 8 intervals of the file
|
||||
(the first 32768 blocks), the second block of each FPM refers to the next 8
|
||||
blocks, and so on. This results in far more FPM blocks being present than are
|
||||
required, but in order to maintain backwards compatibility the format must stay
|
||||
this way.
|
||||
|
||||
The Stream Directory
|
||||
====================
|
||||
The Stream Directory is the root of all access to the other streams in an MSF
|
||||
file. Beginning at byte 0 of the stream directory is the following structure:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
struct StreamDirectory {
|
||||
ulittle32_t NumStreams;
|
||||
ulittle32_t StreamSizes[NumStreams];
|
||||
ulittle32_t StreamBlocks[NumStreams][];
|
||||
};
|
||||
|
||||
And this structure occupies exactly ``SuperBlock->NumDirectoryBytes`` bytes.
|
||||
Note that each of the last two arrays is of variable length, and in particular
|
||||
that the second array is jagged.
|
||||
|
||||
**Example:** Suppose a hypothetical PDB file with a 4KiB block size, and 4
|
||||
streams of lengths {1000 bytes, 8000 bytes, 16000 bytes, 9000 bytes}.
|
||||
|
||||
Stream 0: ceil(1000 / 4096) = 1 block
|
||||
|
||||
Stream 1: ceil(8000 / 4096) = 2 blocks
|
||||
|
||||
Stream 2: ceil(16000 / 4096) = 4 blocks
|
||||
|
||||
Stream 3: ceil(9000 / 4096) = 3 blocks
|
||||
|
||||
In total, 10 blocks are used. Let's see what the stream directory might look
|
||||
like:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
struct StreamDirectory {
|
||||
ulittle32_t NumStreams = 4;
|
||||
ulittle32_t StreamSizes[] = {1000, 8000, 16000, 9000};
|
||||
ulittle32_t StreamBlocks[][] = {
|
||||
{4},
|
||||
{5, 6},
|
||||
{11, 9, 7, 8},
|
||||
{10, 15, 12}
|
||||
};
|
||||
};
|
||||
|
||||
In total, this occupies ``15 * 4 = 60`` bytes, so ``SuperBlock->NumDirectoryBytes``
|
||||
would equal ``60``, and ``SuperBlock->BlockMapAddr`` would be an array of one
|
||||
``ulittle32_t``, since ``60 <= SuperBlock->BlockSize``.
|
||||
|
||||
Note also that the streams are discontiguous, and that part of stream 3 is in the
|
||||
middle of part of stream 2. You cannot assume anything about the layout of the
|
||||
blocks!
|
||||
|
||||
Alignment and Block Boundaries
|
||||
==============================
|
||||
As may be clear by now, it is possible for a single field (whether it be a high
|
||||
level record, a long string field, or even a single ``uint16``) to begin and
|
||||
end in separate blocks. For example, if the block size is 4096 bytes, and a
|
||||
``uint16`` field begins at the last byte of the current block, then it would
|
||||
need to end on the first byte of the next block. Since blocks are not
|
||||
necessarily contiguously laid out in the file, this means that both the consumer
|
||||
and the producer of an MSF file must be prepared to split data apart
|
||||
accordingly. In the aforementioned example, the high byte of the ``uint16``
|
||||
would be written to the last byte of block N, and the low byte would be written
|
||||
to the first byte of block N+1, which could be tens of thousands of bytes later
|
||||
(or even earlier!) in the file, depending on what the stream directory says.
|
||||
|
|
|
@ -1,3 +1,3 @@
|
|||
=====================================
|
||||
The PDB Public Symbol Stream
|
||||
=====================================
|
||||
=====================================
|
||||
The PDB Public Symbol Stream
|
||||
=====================================
|
||||
|
|
|
@ -1,312 +1,312 @@
|
|||
=====================================
|
||||
The PDB TPI and IPI Streams
|
||||
=====================================
|
||||
|
||||
.. contents::
|
||||
:local:
|
||||
|
||||
.. _tpi_intro:
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
The PDB TPI Stream (Index 2) and IPI Stream (Index 4) contain information about
|
||||
all types used in the program. It is organized as a :ref:`header <tpi_header>`
|
||||
followed by a list of :doc:`CodeView Type Records <CodeViewTypes>`. Types are
|
||||
referenced from various streams and records throughout the PDB by their
|
||||
:ref:`type index <type_indices>`. In general, the sequence of type records
|
||||
following the :ref:`header <tpi_header>` forms a topologically sorted DAG
|
||||
(directed acyclic graph), which means that a type record B can only refer to
|
||||
the type A if ``A.TypeIndex < B.TypeIndex``. While there are rare cases where
|
||||
this property will not hold (particularly when dealing with object files
|
||||
compiled with MASM), an implementation should try very hard to make this
|
||||
property hold, as it means the entire type graph can be constructed in a single
|
||||
pass.
|
||||
|
||||
.. important::
|
||||
Type records form a topologically sorted DAG (directed acyclic graph).
|
||||
|
||||
.. _tpi_ipi:
|
||||
|
||||
TPI vs IPI Stream
|
||||
=================
|
||||
|
||||
Recent versions of the PDB format (aka all versions covered by this document)
|
||||
have 2 streams with identical layout, henceforth referred to as the TPI stream
|
||||
and IPI stream. Subsequent contents of this document describing the on-disk
|
||||
format apply equally whether it is for the TPI Stream or the IPI Stream. The
|
||||
only difference between the two is in *which* CodeView records are allowed to
|
||||
appear in each one, summarized by the following table:
|
||||
|
||||
+----------------------+---------------------+
|
||||
| TPI Stream | IPI Stream |
|
||||
+======================+=====================+
|
||||
| LF_POINTER | LF_FUNC_ID |
|
||||
+----------------------+---------------------+
|
||||
| LF_MODIFIER | LF_MFUNC_ID |
|
||||
+----------------------+---------------------+
|
||||
| LF_PROCEDURE | LF_BUILDINFO |
|
||||
+----------------------+---------------------+
|
||||
| LF_MFUNCTION | LF_SUBSTR_LIST |
|
||||
+----------------------+---------------------+
|
||||
| LF_LABEL | LF_STRING_ID |
|
||||
+----------------------+---------------------+
|
||||
| LF_ARGLIST | LF_UDT_SRC_LINE |
|
||||
+----------------------+---------------------+
|
||||
| LF_FIELDLIST | LF_UDT_MOD_SRC_LINE |
|
||||
+----------------------+---------------------+
|
||||
| LF_ARRAY | |
|
||||
+----------------------+---------------------+
|
||||
| LF_CLASS | |
|
||||
+----------------------+---------------------+
|
||||
| LF_STRUCTURE | |
|
||||
+----------------------+---------------------+
|
||||
| LF_INTERFACE | |
|
||||
+----------------------+---------------------+
|
||||
| LF_UNION | |
|
||||
+----------------------+---------------------+
|
||||
| LF_ENUM | |
|
||||
+----------------------+---------------------+
|
||||
| LF_TYPESERVER2 | |
|
||||
+----------------------+---------------------+
|
||||
| LF_VFTABLE | |
|
||||
+----------------------+---------------------+
|
||||
| LF_VTSHAPE | |
|
||||
+----------------------+---------------------+
|
||||
| LF_BITFIELD | |
|
||||
+----------------------+---------------------+
|
||||
| LF_METHODLIST | |
|
||||
+----------------------+---------------------+
|
||||
| LF_PRECOMP | |
|
||||
+----------------------+---------------------+
|
||||
| LF_ENDPRECOMP | |
|
||||
+----------------------+---------------------+
|
||||
|
||||
The usage of these records is described in more detail in
|
||||
:doc:`CodeView Type Records <CodeViewTypes>`.
|
||||
|
||||
.. _type_indices:
|
||||
|
||||
Type Indices
|
||||
============
|
||||
|
||||
A type index is a 32-bit integer that uniquely identifies a type inside of an
|
||||
object file's ``.debug$T`` section or a PDB file's TPI or IPI stream. The
|
||||
value of the type index for the first type record from the TPI stream is given
|
||||
by the ``TypeIndexBegin`` member of the :ref:`TPI Stream Header <tpi_header>`
|
||||
although in practice this value is always equal to 0x1000 (4096).
|
||||
|
||||
Any type index with a high bit set is considered to come from the IPI stream,
|
||||
although this appears to be more of a hack, and LLVM does not generate type
|
||||
indices of this nature. They can, however, be observed in Microsoft PDBs
|
||||
occasionally, so one should be prepared to handle them. Note that having the
|
||||
high bit set is not a necessary condition to determine whether a type index
|
||||
comes from the IPI stream, it is only sufficient.
|
||||
|
||||
Once the high bit is cleared, any type index >= ``TypeIndexBegin`` is presumed
|
||||
to come from the appropriate stream, and any type index less than this is a
|
||||
bitmask which can be decomposed as follows:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
.---------------------------.------.----------.
|
||||
| Unused | Mode | Kind |
|
||||
'---------------------------'------'----------'
|
||||
|+32 |+12 |+8 |+0
|
||||
|
||||
|
||||
- **Kind** - A value from the following enum:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
enum class SimpleTypeKind : uint32_t {
|
||||
None = 0x0000, // uncharacterized type (no type)
|
||||
Void = 0x0003, // void
|
||||
NotTranslated = 0x0007, // type not translated by cvpack
|
||||
HResult = 0x0008, // OLE/COM HRESULT
|
||||
|
||||
SignedCharacter = 0x0010, // 8 bit signed
|
||||
UnsignedCharacter = 0x0020, // 8 bit unsigned
|
||||
NarrowCharacter = 0x0070, // really a char
|
||||
WideCharacter = 0x0071, // wide char
|
||||
Character16 = 0x007a, // char16_t
|
||||
Character32 = 0x007b, // char32_t
|
||||
|
||||
SByte = 0x0068, // 8 bit signed int
|
||||
Byte = 0x0069, // 8 bit unsigned int
|
||||
Int16Short = 0x0011, // 16 bit signed
|
||||
UInt16Short = 0x0021, // 16 bit unsigned
|
||||
Int16 = 0x0072, // 16 bit signed int
|
||||
UInt16 = 0x0073, // 16 bit unsigned int
|
||||
Int32Long = 0x0012, // 32 bit signed
|
||||
UInt32Long = 0x0022, // 32 bit unsigned
|
||||
Int32 = 0x0074, // 32 bit signed int
|
||||
UInt32 = 0x0075, // 32 bit unsigned int
|
||||
Int64Quad = 0x0013, // 64 bit signed
|
||||
UInt64Quad = 0x0023, // 64 bit unsigned
|
||||
Int64 = 0x0076, // 64 bit signed int
|
||||
UInt64 = 0x0077, // 64 bit unsigned int
|
||||
Int128Oct = 0x0014, // 128 bit signed int
|
||||
UInt128Oct = 0x0024, // 128 bit unsigned int
|
||||
Int128 = 0x0078, // 128 bit signed int
|
||||
UInt128 = 0x0079, // 128 bit unsigned int
|
||||
|
||||
Float16 = 0x0046, // 16 bit real
|
||||
Float32 = 0x0040, // 32 bit real
|
||||
Float32PartialPrecision = 0x0045, // 32 bit PP real
|
||||
Float48 = 0x0044, // 48 bit real
|
||||
Float64 = 0x0041, // 64 bit real
|
||||
Float80 = 0x0042, // 80 bit real
|
||||
Float128 = 0x0043, // 128 bit real
|
||||
|
||||
Complex16 = 0x0056, // 16 bit complex
|
||||
Complex32 = 0x0050, // 32 bit complex
|
||||
Complex32PartialPrecision = 0x0055, // 32 bit PP complex
|
||||
Complex48 = 0x0054, // 48 bit complex
|
||||
Complex64 = 0x0051, // 64 bit complex
|
||||
Complex80 = 0x0052, // 80 bit complex
|
||||
Complex128 = 0x0053, // 128 bit complex
|
||||
|
||||
Boolean8 = 0x0030, // 8 bit boolean
|
||||
Boolean16 = 0x0031, // 16 bit boolean
|
||||
Boolean32 = 0x0032, // 32 bit boolean
|
||||
Boolean64 = 0x0033, // 64 bit boolean
|
||||
Boolean128 = 0x0034, // 128 bit boolean
|
||||
};
|
||||
|
||||
- **Mode** - A value from the following enum:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
enum class SimpleTypeMode : uint32_t {
|
||||
Direct = 0, // Not a pointer
|
||||
NearPointer = 1, // Near pointer
|
||||
FarPointer = 2, // Far pointer
|
||||
HugePointer = 3, // Huge pointer
|
||||
NearPointer32 = 4, // 32 bit near pointer
|
||||
FarPointer32 = 5, // 32 bit far pointer
|
||||
NearPointer64 = 6, // 64 bit near pointer
|
||||
NearPointer128 = 7 // 128 bit near pointer
|
||||
};
|
||||
|
||||
Note that for pointers, the bitness is represented in the mode. So a ``void*``
|
||||
would have a type index with ``Mode=NearPointer32, Kind=Void`` if built for 32-bits
|
||||
but a type index with ``Mode=NearPointer64, Kind=Void`` if built for 64-bits.
|
||||
|
||||
By convention, the type index for ``std::nullptr_t`` is constructed the same way
|
||||
as the type index for ``void*``, but using the bitless enumeration value
|
||||
``NearPointer``.
|
||||
|
||||
|
||||
|
||||
.. _tpi_header:
|
||||
|
||||
Stream Header
|
||||
=============
|
||||
At offset 0 of the TPI Stream is a header with the following layout:
|
||||
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
struct TpiStreamHeader {
|
||||
uint32_t Version;
|
||||
uint32_t HeaderSize;
|
||||
uint32_t TypeIndexBegin;
|
||||
uint32_t TypeIndexEnd;
|
||||
uint32_t TypeRecordBytes;
|
||||
|
||||
uint16_t HashStreamIndex;
|
||||
uint16_t HashAuxStreamIndex;
|
||||
uint32_t HashKeySize;
|
||||
uint32_t NumHashBuckets;
|
||||
|
||||
int32_t HashValueBufferOffset;
|
||||
uint32_t HashValueBufferLength;
|
||||
|
||||
int32_t IndexOffsetBufferOffset;
|
||||
uint32_t IndexOffsetBufferLength;
|
||||
|
||||
int32_t HashAdjBufferOffset;
|
||||
uint32_t HashAdjBufferLength;
|
||||
};
|
||||
|
||||
- **Version** - A value from the following enum.
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
enum class TpiStreamVersion : uint32_t {
|
||||
V40 = 19950410,
|
||||
V41 = 19951122,
|
||||
V50 = 19961031,
|
||||
V70 = 19990903,
|
||||
V80 = 20040203,
|
||||
};
|
||||
|
||||
Similar to the :doc:`PDB Stream <PdbStream>`, this value always appears to be
|
||||
``V80``, and no other values have been observed. It is assumed that should
|
||||
another value be observed, the layout described by this document may not be
|
||||
accurate.
|
||||
|
||||
- **HeaderSize** - ``sizeof(TpiStreamHeader)``
|
||||
|
||||
- **TypeIndexBegin** - The numeric value of the type index representing the
|
||||
first type record in the TPI stream. This is usually the value 0x1000 as type
|
||||
indices lower than this are reserved (see :ref:`Type Indices <type_indices>` for
|
||||
a discussion of reserved type indices).
|
||||
|
||||
- **TypeIndexEnd** - One greater than the numeric value of the type index
|
||||
representing the last type record in the TPI stream. The total number of type
|
||||
records in the TPI stream can be computed as ``TypeIndexEnd - TypeIndexBegin``.
|
||||
|
||||
- **TypeRecordBytes** - The number of bytes of type record data following the header.
|
||||
|
||||
- **HashStreamIndex** - The index of a stream which contains a list of hashes for
|
||||
every type record. This value may be -1, indicating that hash information is not
|
||||
present. In practice a valid stream index is always observed, so any producer
|
||||
implementation should be prepared to emit this stream to ensure compatibility with
|
||||
tools which may expect it to be present.
|
||||
|
||||
- **HashAuxStreamIndex** - Presumably the index of a stream which contains a separate
|
||||
hash table, although this has not been observed in practice and it's unclear what it
|
||||
might be used for.
|
||||
|
||||
- **HashKeySize** - The size of a hash value (usually 4 bytes).
|
||||
|
||||
- **NumHashBuckets** - The number of buckets used to generate the hash values in the
|
||||
aforementioned hash streams.
|
||||
|
||||
- **HashValueBufferOffset / HashValueBufferLength** - The offset and size within
|
||||
the TPI Hash Stream of the list of hash values. It should be assumed that there
|
||||
are either 0 hash values, or a number equal to the number of type records in the
|
||||
TPI stream (``TypeIndexEnd - TypeEndBegin``). Thus, if ``HashBufferLength`` is
|
||||
not equal to ``(TypeIndexEnd - TypeEndBegin) * HashKeySize`` we can consider the
|
||||
PDB malformed.
|
||||
|
||||
- **IndexOffsetBufferOffset / IndexOffsetBufferLength** - The offset and size
|
||||
within the TPI Hash Stream of the Type Index Offsets Buffer. This is a list of
|
||||
pairs of uint32_t's where the first value is a :ref:`Type Index <type_indices>`
|
||||
and the second value is the offset in the type record data of the type with this
|
||||
index. This can be used to do a binary search followed bin a linear search to
|
||||
get amortized O(log n) lookup by type index.
|
||||
|
||||
- **HashAdjBufferOffset / HashAdjBufferLength** - The offset and size within
|
||||
the TPI hash stream of a serialized hash table whose keys are the hash values
|
||||
in the hash value buffer and whose values are type indices. This appears to
|
||||
be useful in incremental linking scenarios, so that if a type is modified an
|
||||
entry can be created mapping the old hash value to the new type index so that
|
||||
a PDB file consumer can always have the most up to date version of the type
|
||||
without forcing the incremental linker to garbage collect and update
|
||||
references that point to the old version to now point to the new version.
|
||||
The layout of this hash table is described in :doc:`HashTable`.
|
||||
|
||||
.. _tpi_records:
|
||||
|
||||
CodeView Type Record List
|
||||
=========================
|
||||
Following the header, there are ``TypeRecordBytes`` bytes of data that represent a
|
||||
variable length array of :doc:`CodeView type records <CodeViewTypes>`. The number
|
||||
of such records (e.g. the length of the array) can be determined by computing the
|
||||
value ``Header.TypeIndexEnd - Header.TypeIndexBegin``.
|
||||
|
||||
log(n) random access is provided by way of the Type Index Offsets array (if present)
|
||||
described previously.
|
||||
=====================================
|
||||
The PDB TPI and IPI Streams
|
||||
=====================================
|
||||
|
||||
.. contents::
|
||||
:local:
|
||||
|
||||
.. _tpi_intro:
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
The PDB TPI Stream (Index 2) and IPI Stream (Index 4) contain information about
|
||||
all types used in the program. It is organized as a :ref:`header <tpi_header>`
|
||||
followed by a list of :doc:`CodeView Type Records <CodeViewTypes>`. Types are
|
||||
referenced from various streams and records throughout the PDB by their
|
||||
:ref:`type index <type_indices>`. In general, the sequence of type records
|
||||
following the :ref:`header <tpi_header>` forms a topologically sorted DAG
|
||||
(directed acyclic graph), which means that a type record B can only refer to
|
||||
the type A if ``A.TypeIndex < B.TypeIndex``. While there are rare cases where
|
||||
this property will not hold (particularly when dealing with object files
|
||||
compiled with MASM), an implementation should try very hard to make this
|
||||
property hold, as it means the entire type graph can be constructed in a single
|
||||
pass.
|
||||
|
||||
.. important::
|
||||
Type records form a topologically sorted DAG (directed acyclic graph).
|
||||
|
||||
.. _tpi_ipi:
|
||||
|
||||
TPI vs IPI Stream
|
||||
=================
|
||||
|
||||
Recent versions of the PDB format (aka all versions covered by this document)
|
||||
have 2 streams with identical layout, henceforth referred to as the TPI stream
|
||||
and IPI stream. Subsequent contents of this document describing the on-disk
|
||||
format apply equally whether it is for the TPI Stream or the IPI Stream. The
|
||||
only difference between the two is in *which* CodeView records are allowed to
|
||||
appear in each one, summarized by the following table:
|
||||
|
||||
+----------------------+---------------------+
|
||||
| TPI Stream | IPI Stream |
|
||||
+======================+=====================+
|
||||
| LF_POINTER | LF_FUNC_ID |
|
||||
+----------------------+---------------------+
|
||||
| LF_MODIFIER | LF_MFUNC_ID |
|
||||
+----------------------+---------------------+
|
||||
| LF_PROCEDURE | LF_BUILDINFO |
|
||||
+----------------------+---------------------+
|
||||
| LF_MFUNCTION | LF_SUBSTR_LIST |
|
||||
+----------------------+---------------------+
|
||||
| LF_LABEL | LF_STRING_ID |
|
||||
+----------------------+---------------------+
|
||||
| LF_ARGLIST | LF_UDT_SRC_LINE |
|
||||
+----------------------+---------------------+
|
||||
| LF_FIELDLIST | LF_UDT_MOD_SRC_LINE |
|
||||
+----------------------+---------------------+
|
||||
| LF_ARRAY | |
|
||||
+----------------------+---------------------+
|
||||
| LF_CLASS | |
|
||||
+----------------------+---------------------+
|
||||
| LF_STRUCTURE | |
|
||||
+----------------------+---------------------+
|
||||
| LF_INTERFACE | |
|
||||
+----------------------+---------------------+
|
||||
| LF_UNION | |
|
||||
+----------------------+---------------------+
|
||||
| LF_ENUM | |
|
||||
+----------------------+---------------------+
|
||||
| LF_TYPESERVER2 | |
|
||||
+----------------------+---------------------+
|
||||
| LF_VFTABLE | |
|
||||
+----------------------+---------------------+
|
||||
| LF_VTSHAPE | |
|
||||
+----------------------+---------------------+
|
||||
| LF_BITFIELD | |
|
||||
+----------------------+---------------------+
|
||||
| LF_METHODLIST | |
|
||||
+----------------------+---------------------+
|
||||
| LF_PRECOMP | |
|
||||
+----------------------+---------------------+
|
||||
| LF_ENDPRECOMP | |
|
||||
+----------------------+---------------------+
|
||||
|
||||
The usage of these records is described in more detail in
|
||||
:doc:`CodeView Type Records <CodeViewTypes>`.
|
||||
|
||||
.. _type_indices:
|
||||
|
||||
Type Indices
|
||||
============
|
||||
|
||||
A type index is a 32-bit integer that uniquely identifies a type inside of an
|
||||
object file's ``.debug$T`` section or a PDB file's TPI or IPI stream. The
|
||||
value of the type index for the first type record from the TPI stream is given
|
||||
by the ``TypeIndexBegin`` member of the :ref:`TPI Stream Header <tpi_header>`
|
||||
although in practice this value is always equal to 0x1000 (4096).
|
||||
|
||||
Any type index with a high bit set is considered to come from the IPI stream,
|
||||
although this appears to be more of a hack, and LLVM does not generate type
|
||||
indices of this nature. They can, however, be observed in Microsoft PDBs
|
||||
occasionally, so one should be prepared to handle them. Note that having the
|
||||
high bit set is not a necessary condition to determine whether a type index
|
||||
comes from the IPI stream, it is only sufficient.
|
||||
|
||||
Once the high bit is cleared, any type index >= ``TypeIndexBegin`` is presumed
|
||||
to come from the appropriate stream, and any type index less than this is a
|
||||
bitmask which can be decomposed as follows:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
.---------------------------.------.----------.
|
||||
| Unused | Mode | Kind |
|
||||
'---------------------------'------'----------'
|
||||
|+32 |+12 |+8 |+0
|
||||
|
||||
|
||||
- **Kind** - A value from the following enum:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
enum class SimpleTypeKind : uint32_t {
|
||||
None = 0x0000, // uncharacterized type (no type)
|
||||
Void = 0x0003, // void
|
||||
NotTranslated = 0x0007, // type not translated by cvpack
|
||||
HResult = 0x0008, // OLE/COM HRESULT
|
||||
|
||||
SignedCharacter = 0x0010, // 8 bit signed
|
||||
UnsignedCharacter = 0x0020, // 8 bit unsigned
|
||||
NarrowCharacter = 0x0070, // really a char
|
||||
WideCharacter = 0x0071, // wide char
|
||||
Character16 = 0x007a, // char16_t
|
||||
Character32 = 0x007b, // char32_t
|
||||
|
||||
SByte = 0x0068, // 8 bit signed int
|
||||
Byte = 0x0069, // 8 bit unsigned int
|
||||
Int16Short = 0x0011, // 16 bit signed
|
||||
UInt16Short = 0x0021, // 16 bit unsigned
|
||||
Int16 = 0x0072, // 16 bit signed int
|
||||
UInt16 = 0x0073, // 16 bit unsigned int
|
||||
Int32Long = 0x0012, // 32 bit signed
|
||||
UInt32Long = 0x0022, // 32 bit unsigned
|
||||
Int32 = 0x0074, // 32 bit signed int
|
||||
UInt32 = 0x0075, // 32 bit unsigned int
|
||||
Int64Quad = 0x0013, // 64 bit signed
|
||||
UInt64Quad = 0x0023, // 64 bit unsigned
|
||||
Int64 = 0x0076, // 64 bit signed int
|
||||
UInt64 = 0x0077, // 64 bit unsigned int
|
||||
Int128Oct = 0x0014, // 128 bit signed int
|
||||
UInt128Oct = 0x0024, // 128 bit unsigned int
|
||||
Int128 = 0x0078, // 128 bit signed int
|
||||
UInt128 = 0x0079, // 128 bit unsigned int
|
||||
|
||||
Float16 = 0x0046, // 16 bit real
|
||||
Float32 = 0x0040, // 32 bit real
|
||||
Float32PartialPrecision = 0x0045, // 32 bit PP real
|
||||
Float48 = 0x0044, // 48 bit real
|
||||
Float64 = 0x0041, // 64 bit real
|
||||
Float80 = 0x0042, // 80 bit real
|
||||
Float128 = 0x0043, // 128 bit real
|
||||
|
||||
Complex16 = 0x0056, // 16 bit complex
|
||||
Complex32 = 0x0050, // 32 bit complex
|
||||
Complex32PartialPrecision = 0x0055, // 32 bit PP complex
|
||||
Complex48 = 0x0054, // 48 bit complex
|
||||
Complex64 = 0x0051, // 64 bit complex
|
||||
Complex80 = 0x0052, // 80 bit complex
|
||||
Complex128 = 0x0053, // 128 bit complex
|
||||
|
||||
Boolean8 = 0x0030, // 8 bit boolean
|
||||
Boolean16 = 0x0031, // 16 bit boolean
|
||||
Boolean32 = 0x0032, // 32 bit boolean
|
||||
Boolean64 = 0x0033, // 64 bit boolean
|
||||
Boolean128 = 0x0034, // 128 bit boolean
|
||||
};
|
||||
|
||||
- **Mode** - A value from the following enum:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
enum class SimpleTypeMode : uint32_t {
|
||||
Direct = 0, // Not a pointer
|
||||
NearPointer = 1, // Near pointer
|
||||
FarPointer = 2, // Far pointer
|
||||
HugePointer = 3, // Huge pointer
|
||||
NearPointer32 = 4, // 32 bit near pointer
|
||||
FarPointer32 = 5, // 32 bit far pointer
|
||||
NearPointer64 = 6, // 64 bit near pointer
|
||||
NearPointer128 = 7 // 128 bit near pointer
|
||||
};
|
||||
|
||||
Note that for pointers, the bitness is represented in the mode. So a ``void*``
|
||||
would have a type index with ``Mode=NearPointer32, Kind=Void`` if built for 32-bits
|
||||
but a type index with ``Mode=NearPointer64, Kind=Void`` if built for 64-bits.
|
||||
|
||||
By convention, the type index for ``std::nullptr_t`` is constructed the same way
|
||||
as the type index for ``void*``, but using the bitless enumeration value
|
||||
``NearPointer``.
|
||||
|
||||
|
||||
|
||||
.. _tpi_header:
|
||||
|
||||
Stream Header
|
||||
=============
|
||||
At offset 0 of the TPI Stream is a header with the following layout:
|
||||
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
struct TpiStreamHeader {
|
||||
uint32_t Version;
|
||||
uint32_t HeaderSize;
|
||||
uint32_t TypeIndexBegin;
|
||||
uint32_t TypeIndexEnd;
|
||||
uint32_t TypeRecordBytes;
|
||||
|
||||
uint16_t HashStreamIndex;
|
||||
uint16_t HashAuxStreamIndex;
|
||||
uint32_t HashKeySize;
|
||||
uint32_t NumHashBuckets;
|
||||
|
||||
int32_t HashValueBufferOffset;
|
||||
uint32_t HashValueBufferLength;
|
||||
|
||||
int32_t IndexOffsetBufferOffset;
|
||||
uint32_t IndexOffsetBufferLength;
|
||||
|
||||
int32_t HashAdjBufferOffset;
|
||||
uint32_t HashAdjBufferLength;
|
||||
};
|
||||
|
||||
- **Version** - A value from the following enum.
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
enum class TpiStreamVersion : uint32_t {
|
||||
V40 = 19950410,
|
||||
V41 = 19951122,
|
||||
V50 = 19961031,
|
||||
V70 = 19990903,
|
||||
V80 = 20040203,
|
||||
};
|
||||
|
||||
Similar to the :doc:`PDB Stream <PdbStream>`, this value always appears to be
|
||||
``V80``, and no other values have been observed. It is assumed that should
|
||||
another value be observed, the layout described by this document may not be
|
||||
accurate.
|
||||
|
||||
- **HeaderSize** - ``sizeof(TpiStreamHeader)``
|
||||
|
||||
- **TypeIndexBegin** - The numeric value of the type index representing the
|
||||
first type record in the TPI stream. This is usually the value 0x1000 as type
|
||||
indices lower than this are reserved (see :ref:`Type Indices <type_indices>` for
|
||||
a discussion of reserved type indices).
|
||||
|
||||
- **TypeIndexEnd** - One greater than the numeric value of the type index
|
||||
representing the last type record in the TPI stream. The total number of type
|
||||
records in the TPI stream can be computed as ``TypeIndexEnd - TypeIndexBegin``.
|
||||
|
||||
- **TypeRecordBytes** - The number of bytes of type record data following the header.
|
||||
|
||||
- **HashStreamIndex** - The index of a stream which contains a list of hashes for
|
||||
every type record. This value may be -1, indicating that hash information is not
|
||||
present. In practice a valid stream index is always observed, so any producer
|
||||
implementation should be prepared to emit this stream to ensure compatibility with
|
||||
tools which may expect it to be present.
|
||||
|
||||
- **HashAuxStreamIndex** - Presumably the index of a stream which contains a separate
|
||||
hash table, although this has not been observed in practice and it's unclear what it
|
||||
might be used for.
|
||||
|
||||
- **HashKeySize** - The size of a hash value (usually 4 bytes).
|
||||
|
||||
- **NumHashBuckets** - The number of buckets used to generate the hash values in the
|
||||
aforementioned hash streams.
|
||||
|
||||
- **HashValueBufferOffset / HashValueBufferLength** - The offset and size within
|
||||
the TPI Hash Stream of the list of hash values. It should be assumed that there
|
||||
are either 0 hash values, or a number equal to the number of type records in the
|
||||
TPI stream (``TypeIndexEnd - TypeEndBegin``). Thus, if ``HashBufferLength`` is
|
||||
not equal to ``(TypeIndexEnd - TypeEndBegin) * HashKeySize`` we can consider the
|
||||
PDB malformed.
|
||||
|
||||
- **IndexOffsetBufferOffset / IndexOffsetBufferLength** - The offset and size
|
||||
within the TPI Hash Stream of the Type Index Offsets Buffer. This is a list of
|
||||
pairs of uint32_t's where the first value is a :ref:`Type Index <type_indices>`
|
||||
and the second value is the offset in the type record data of the type with this
|
||||
index. This can be used to do a binary search followed bin a linear search to
|
||||
get amortized O(log n) lookup by type index.
|
||||
|
||||
- **HashAdjBufferOffset / HashAdjBufferLength** - The offset and size within
|
||||
the TPI hash stream of a serialized hash table whose keys are the hash values
|
||||
in the hash value buffer and whose values are type indices. This appears to
|
||||
be useful in incremental linking scenarios, so that if a type is modified an
|
||||
entry can be created mapping the old hash value to the new type index so that
|
||||
a PDB file consumer can always have the most up to date version of the type
|
||||
without forcing the incremental linker to garbage collect and update
|
||||
references that point to the old version to now point to the new version.
|
||||
The layout of this hash table is described in :doc:`HashTable`.
|
||||
|
||||
.. _tpi_records:
|
||||
|
||||
CodeView Type Record List
|
||||
=========================
|
||||
Following the header, there are ``TypeRecordBytes`` bytes of data that represent a
|
||||
variable length array of :doc:`CodeView type records <CodeViewTypes>`. The number
|
||||
of such records (e.g. the length of the array) can be determined by computing the
|
||||
value ``Header.TypeIndexEnd - Header.TypeIndexBegin``.
|
||||
|
||||
log(n) random access is provided by way of the Type Index Offsets array (if present)
|
||||
described previously.
|
||||
|
|
|
@ -1,168 +1,168 @@
|
|||
=====================================
|
||||
The PDB File Format
|
||||
=====================================
|
||||
|
||||
.. contents::
|
||||
:local:
|
||||
|
||||
.. _pdb_intro:
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
PDB (Program Database) is a file format invented by Microsoft and which contains
|
||||
debug information that can be consumed by debuggers and other tools. Since
|
||||
officially supported APIs exist on Windows for querying debug information from
|
||||
PDBs even without the user understanding the internals of the file format, a
|
||||
large ecosystem of tools has been built for Windows to consume this format. In
|
||||
order for Clang to be able to generate programs that can interoperate with these
|
||||
tools, it is necessary for us to generate PDB files ourselves.
|
||||
|
||||
At the same time, LLVM has a long history of being able to cross-compile from
|
||||
any platform to any platform, and we wish for the same to be true here. So it
|
||||
is necessary for us to understand the PDB file format at the byte-level so that
|
||||
we can generate PDB files entirely on our own.
|
||||
|
||||
This manual describes what we know about the PDB file format today. The layout
|
||||
of the file, the various streams contained within, the format of individual
|
||||
records within, and more.
|
||||
|
||||
We would like to extend our heartfelt gratitude to Microsoft, without whom we
|
||||
would not be where we are today. Much of the knowledge contained within this
|
||||
manual was learned through reading code published by Microsoft on their `GitHub
|
||||
repo <https://github.com/Microsoft/microsoft-pdb>`__.
|
||||
|
||||
.. _pdb_layout:
|
||||
|
||||
File Layout
|
||||
===========
|
||||
|
||||
.. important::
|
||||
Unless otherwise specified, all numeric values are encoded in little endian.
|
||||
If you see a type such as ``uint16_t`` or ``uint64_t`` going forward, always
|
||||
assume it is little endian!
|
||||
|
||||
.. toctree::
|
||||
:hidden:
|
||||
|
||||
MsfFile
|
||||
PdbStream
|
||||
TpiStream
|
||||
DbiStream
|
||||
ModiStream
|
||||
PublicStream
|
||||
GlobalStream
|
||||
HashTable
|
||||
CodeViewSymbols
|
||||
CodeViewTypes
|
||||
|
||||
.. _msf:
|
||||
|
||||
The MSF Container
|
||||
-----------------
|
||||
A PDB file is really just a special case of an MSF (Multi-Stream Format) file.
|
||||
An MSF file is actually a miniature "file system within a file". It contains
|
||||
multiple streams (aka files) which can represent arbitrary data, and these
|
||||
streams are divided into blocks which may not necessarily be contiguously
|
||||
laid out within the file (aka fragmented). Additionally, the MSF contains a
|
||||
stream directory (aka MFT) which describes how the streams (files) are laid
|
||||
out within the MSF.
|
||||
|
||||
For more information about the MSF container format, stream directory, and
|
||||
block layout, see :doc:`MsfFile`.
|
||||
|
||||
.. _streams:
|
||||
|
||||
Streams
|
||||
-------
|
||||
The PDB format contains a number of streams which describe various information
|
||||
such as the types, symbols, source files, and compilands (e.g. object files)
|
||||
of a program, as well as some additional streams containing hash tables that are
|
||||
used by debuggers and other tools to provide fast lookup of records and types
|
||||
by name, and various other information about how the program was compiled such
|
||||
as the specific toolchain used, and more. A summary of streams contained in a
|
||||
PDB file is as follows:
|
||||
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| Name | Stream Index | Contents |
|
||||
+====================+==============================+===========================================+
|
||||
| Old Directory | - Fixed Stream Index 0 | - Previous MSF Stream Directory |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| PDB Stream | - Fixed Stream Index 1 | - Basic File Information |
|
||||
| | | - Fields to match EXE to this PDB |
|
||||
| | | - Map of named streams to stream indices |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| TPI Stream | - Fixed Stream Index 2 | - CodeView Type Records |
|
||||
| | | - Index of TPI Hash Stream |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| DBI Stream | - Fixed Stream Index 3 | - Module/Compiland Information |
|
||||
| | | - Indices of individual module streams |
|
||||
| | | - Indices of public / global streams |
|
||||
| | | - Section Contribution Information |
|
||||
| | | - Source File Information |
|
||||
| | | - References to streams containing |
|
||||
| | | FPO / PGO Data |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| IPI Stream | - Fixed Stream Index 4 | - CodeView Type Records |
|
||||
| | | - Index of IPI Hash Stream |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| /LinkInfo | - Contained in PDB Stream | - Unknown |
|
||||
| | Named Stream map | |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| /src/headerblock | - Contained in PDB Stream | - Summary of embedded source file content |
|
||||
| | Named Stream map | (e.g. natvis files) |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| /names | - Contained in PDB Stream | - PDB-wide global string table used for |
|
||||
| | Named Stream map | string de-duplication |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| Module Info Stream | - Contained in DBI Stream | - CodeView Symbol Records for this module |
|
||||
| | - One for each compiland | - Line Number Information |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| Public Stream | - Contained in DBI Stream | - Public (Exported) Symbol Records |
|
||||
| | | - Index of Public Hash Stream |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| Global Stream | - Contained in DBI Stream | - Single combined master symbol-table |
|
||||
| | | - Index of Global Hash Stream |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| TPI Hash Stream | - Contained in TPI Stream | - Hash table for looking up TPI records |
|
||||
| | | by name |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| IPI Hash Stream | - Contained in IPI Stream | - Hash table for looking up IPI records |
|
||||
| | | by name |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
|
||||
More information about the structure of each of these can be found on the
|
||||
following pages:
|
||||
|
||||
:doc:`PdbStream`
|
||||
Information about the PDB Info Stream and how it is used to match PDBs to EXEs.
|
||||
|
||||
:doc:`TpiStream`
|
||||
Information about the TPI stream and the CodeView records contained within.
|
||||
|
||||
:doc:`DbiStream`
|
||||
Information about the DBI stream and relevant substreams including the Module Substreams,
|
||||
source file information, and CodeView symbol records contained within.
|
||||
|
||||
:doc:`ModiStream`
|
||||
Information about the Module Information Stream, of which there is one for each compilation
|
||||
unit and the format of symbols contained within.
|
||||
|
||||
:doc:`PublicStream`
|
||||
Information about the Public Symbol Stream.
|
||||
|
||||
:doc:`GlobalStream`
|
||||
Information about the Global Symbol Stream.
|
||||
|
||||
:doc:`HashTable`
|
||||
Information about the serialized hash table format used internally to represent things such
|
||||
as the Named Stream Map and the Hash Adjusters in the :doc:`TPI/IPI Stream <TpiStream>`.
|
||||
|
||||
CodeView
|
||||
========
|
||||
CodeView is another format which comes into the picture. While MSF defines
|
||||
the structure of the overall file, and PDB defines the set of streams that
|
||||
appear within the MSF file and the format of those streams, CodeView defines
|
||||
the format of **symbol and type records** that appear within specific streams.
|
||||
Refer to the pages on :doc:`CodeViewSymbols` and :doc:`CodeViewTypes` for
|
||||
more information about the CodeView format.
|
||||
=====================================
|
||||
The PDB File Format
|
||||
=====================================
|
||||
|
||||
.. contents::
|
||||
:local:
|
||||
|
||||
.. _pdb_intro:
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
PDB (Program Database) is a file format invented by Microsoft and which contains
|
||||
debug information that can be consumed by debuggers and other tools. Since
|
||||
officially supported APIs exist on Windows for querying debug information from
|
||||
PDBs even without the user understanding the internals of the file format, a
|
||||
large ecosystem of tools has been built for Windows to consume this format. In
|
||||
order for Clang to be able to generate programs that can interoperate with these
|
||||
tools, it is necessary for us to generate PDB files ourselves.
|
||||
|
||||
At the same time, LLVM has a long history of being able to cross-compile from
|
||||
any platform to any platform, and we wish for the same to be true here. So it
|
||||
is necessary for us to understand the PDB file format at the byte-level so that
|
||||
we can generate PDB files entirely on our own.
|
||||
|
||||
This manual describes what we know about the PDB file format today. The layout
|
||||
of the file, the various streams contained within, the format of individual
|
||||
records within, and more.
|
||||
|
||||
We would like to extend our heartfelt gratitude to Microsoft, without whom we
|
||||
would not be where we are today. Much of the knowledge contained within this
|
||||
manual was learned through reading code published by Microsoft on their `GitHub
|
||||
repo <https://github.com/Microsoft/microsoft-pdb>`__.
|
||||
|
||||
.. _pdb_layout:
|
||||
|
||||
File Layout
|
||||
===========
|
||||
|
||||
.. important::
|
||||
Unless otherwise specified, all numeric values are encoded in little endian.
|
||||
If you see a type such as ``uint16_t`` or ``uint64_t`` going forward, always
|
||||
assume it is little endian!
|
||||
|
||||
.. toctree::
|
||||
:hidden:
|
||||
|
||||
MsfFile
|
||||
PdbStream
|
||||
TpiStream
|
||||
DbiStream
|
||||
ModiStream
|
||||
PublicStream
|
||||
GlobalStream
|
||||
HashTable
|
||||
CodeViewSymbols
|
||||
CodeViewTypes
|
||||
|
||||
.. _msf:
|
||||
|
||||
The MSF Container
|
||||
-----------------
|
||||
A PDB file is really just a special case of an MSF (Multi-Stream Format) file.
|
||||
An MSF file is actually a miniature "file system within a file". It contains
|
||||
multiple streams (aka files) which can represent arbitrary data, and these
|
||||
streams are divided into blocks which may not necessarily be contiguously
|
||||
laid out within the file (aka fragmented). Additionally, the MSF contains a
|
||||
stream directory (aka MFT) which describes how the streams (files) are laid
|
||||
out within the MSF.
|
||||
|
||||
For more information about the MSF container format, stream directory, and
|
||||
block layout, see :doc:`MsfFile`.
|
||||
|
||||
.. _streams:
|
||||
|
||||
Streams
|
||||
-------
|
||||
The PDB format contains a number of streams which describe various information
|
||||
such as the types, symbols, source files, and compilands (e.g. object files)
|
||||
of a program, as well as some additional streams containing hash tables that are
|
||||
used by debuggers and other tools to provide fast lookup of records and types
|
||||
by name, and various other information about how the program was compiled such
|
||||
as the specific toolchain used, and more. A summary of streams contained in a
|
||||
PDB file is as follows:
|
||||
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| Name | Stream Index | Contents |
|
||||
+====================+==============================+===========================================+
|
||||
| Old Directory | - Fixed Stream Index 0 | - Previous MSF Stream Directory |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| PDB Stream | - Fixed Stream Index 1 | - Basic File Information |
|
||||
| | | - Fields to match EXE to this PDB |
|
||||
| | | - Map of named streams to stream indices |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| TPI Stream | - Fixed Stream Index 2 | - CodeView Type Records |
|
||||
| | | - Index of TPI Hash Stream |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| DBI Stream | - Fixed Stream Index 3 | - Module/Compiland Information |
|
||||
| | | - Indices of individual module streams |
|
||||
| | | - Indices of public / global streams |
|
||||
| | | - Section Contribution Information |
|
||||
| | | - Source File Information |
|
||||
| | | - References to streams containing |
|
||||
| | | FPO / PGO Data |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| IPI Stream | - Fixed Stream Index 4 | - CodeView Type Records |
|
||||
| | | - Index of IPI Hash Stream |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| /LinkInfo | - Contained in PDB Stream | - Unknown |
|
||||
| | Named Stream map | |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| /src/headerblock | - Contained in PDB Stream | - Summary of embedded source file content |
|
||||
| | Named Stream map | (e.g. natvis files) |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| /names | - Contained in PDB Stream | - PDB-wide global string table used for |
|
||||
| | Named Stream map | string de-duplication |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| Module Info Stream | - Contained in DBI Stream | - CodeView Symbol Records for this module |
|
||||
| | - One for each compiland | - Line Number Information |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| Public Stream | - Contained in DBI Stream | - Public (Exported) Symbol Records |
|
||||
| | | - Index of Public Hash Stream |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| Global Stream | - Contained in DBI Stream | - Single combined master symbol-table |
|
||||
| | | - Index of Global Hash Stream |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| TPI Hash Stream | - Contained in TPI Stream | - Hash table for looking up TPI records |
|
||||
| | | by name |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
| IPI Hash Stream | - Contained in IPI Stream | - Hash table for looking up IPI records |
|
||||
| | | by name |
|
||||
+--------------------+------------------------------+-------------------------------------------+
|
||||
|
||||
More information about the structure of each of these can be found on the
|
||||
following pages:
|
||||
|
||||
:doc:`PdbStream`
|
||||
Information about the PDB Info Stream and how it is used to match PDBs to EXEs.
|
||||
|
||||
:doc:`TpiStream`
|
||||
Information about the TPI stream and the CodeView records contained within.
|
||||
|
||||
:doc:`DbiStream`
|
||||
Information about the DBI stream and relevant substreams including the Module Substreams,
|
||||
source file information, and CodeView symbol records contained within.
|
||||
|
||||
:doc:`ModiStream`
|
||||
Information about the Module Information Stream, of which there is one for each compilation
|
||||
unit and the format of symbols contained within.
|
||||
|
||||
:doc:`PublicStream`
|
||||
Information about the Public Symbol Stream.
|
||||
|
||||
:doc:`GlobalStream`
|
||||
Information about the Global Symbol Stream.
|
||||
|
||||
:doc:`HashTable`
|
||||
Information about the serialized hash table format used internally to represent things such
|
||||
as the Named Stream Map and the Hash Adjusters in the :doc:`TPI/IPI Stream <TpiStream>`.
|
||||
|
||||
CodeView
|
||||
========
|
||||
CodeView is another format which comes into the picture. While MSF defines
|
||||
the structure of the overall file, and PDB defines the set of streams that
|
||||
appear within the MSF file and the format of those streams, CodeView defines
|
||||
the format of **symbol and type records** that appear within specific streams.
|
||||
Refer to the pages on :doc:`CodeViewSymbols` and :doc:`CodeViewTypes` for
|
||||
more information about the CodeView format.
|
||||
|
|
Loading…
Reference in New Issue