forked from OSchip/llvm-project
122 lines
5.2 KiB
ReStructuredText
122 lines
5.2 KiB
ReStructuredText
=====================================
|
|
The MSF File Format
|
|
=====================================
|
|
|
|
.. contents::
|
|
:local:
|
|
|
|
.. _msf_superblock:
|
|
|
|
The Superblock
|
|
==============
|
|
At file offset 0 in an MSF file is the MSF *SuperBlock*, which is laid out as
|
|
follows:
|
|
|
|
.. code-block:: c++
|
|
|
|
struct SuperBlock {
|
|
char FileMagic[sizeof(Magic)];
|
|
ulittle32_t BlockSize;
|
|
ulittle32_t FreeBlockMapBlock;
|
|
ulittle32_t NumBlocks;
|
|
ulittle32_t NumDirectoryBytes;
|
|
ulittle32_t Unknown;
|
|
ulittle32_t BlockMapAddr;
|
|
};
|
|
|
|
- **FileMagic** - Must be equal to ``"Microsoft C / C++ MSF 7.00\\r\\n"``
|
|
followed by the bytes ``1A 44 53 00 00 00``.
|
|
- **BlockSize** - The block size of the internal file system. Valid values are
|
|
512, 1024, 2048, and 4096 bytes. Certain aspects of the MSF file layout vary
|
|
depending on the block sizes. For the purposes of LLVM, we handle only block
|
|
sizes of 4KiB, and all further discussion assumes a block size of 4KiB.
|
|
- **FreeBlockMapBlock** - The index of a block within the file, at which begins
|
|
a bitfield representing the set of all blocks within the file which are "free"
|
|
(i.e. the data within that block is not used). This bitfield is spread across
|
|
the MSF file at ``BlockSize`` intervals.
|
|
**Important**: ``FreeBlockMapBlock`` can only be ``1`` or ``2``! This field
|
|
is designed to support incremental and atomic updates of the underlying MSF
|
|
file. While writing to an MSF file, if the value of this field is `1`, you
|
|
can write your new modified bitfield to page 2, and vice versa. Only when
|
|
you commit the file to disk do you need to swap the value in the SuperBlock
|
|
to point to the new ``FreeBlockMapBlock``.
|
|
- **NumBlocks** - The total number of blocks in the file. ``NumBlocks * BlockSize``
|
|
should equal the size of the file on disk.
|
|
- **NumDirectoryBytes** - The size of the stream directory, in bytes. The stream
|
|
directory contains information about each stream's size and the set of blocks
|
|
that it occupies. It will be described in more detail later.
|
|
- **BlockMapAddr** - The index of a block within the MSF file. At this block is
|
|
an array of ``ulittle32_t``'s listing the blocks that the stream directory
|
|
resides on. For large MSF files, the stream directory (which describes the
|
|
block layout of each stream) may not fit entirely on a single block. As a
|
|
result, this extra layer of indirection is introduced, whereby this block
|
|
contains the list of blocks that the stream directory occupies, and the stream
|
|
directory itself can be stitched together accordingly. The number of
|
|
``ulittle32_t``'s in this array is given by ``ceil(NumDirectoryBytes / BlockSize)``.
|
|
|
|
The Stream Directory
|
|
====================
|
|
The Stream Directory is the root of all access to the other streams in an MSF
|
|
file. Beginning at byte 0 of the stream directory is the following structure:
|
|
|
|
.. code-block:: c++
|
|
|
|
struct StreamDirectory {
|
|
ulittle32_t NumStreams;
|
|
ulittle32_t StreamSizes[NumStreams];
|
|
ulittle32_t StreamBlocks[NumStreams][];
|
|
};
|
|
|
|
And this structure occupies exactly ``SuperBlock->NumDirectoryBytes`` bytes.
|
|
Note that each of the last two arrays is of variable length, and in particular
|
|
that the second array is jagged.
|
|
|
|
**Example:** Suppose a hypothetical PDB file with a 4KiB block size, and 4
|
|
streams of lengths {1000 bytes, 8000 bytes, 16000 bytes, 9000 bytes}.
|
|
|
|
Stream 0: ceil(1000 / 4096) = 1 block
|
|
|
|
Stream 1: ceil(8000 / 4096) = 2 blocks
|
|
|
|
Stream 2: ceil(16000 / 4096) = 4 blocks
|
|
|
|
Stream 3: ceil(9000 / 4096) = 3 blocks
|
|
|
|
In total, 10 blocks are used. Let's see what the stream directory might look
|
|
like:
|
|
|
|
.. code-block:: c++
|
|
|
|
struct StreamDirectory {
|
|
ulittle32_t NumStreams = 4;
|
|
ulittle32_t StreamSizes[] = {1000, 8000, 16000, 9000};
|
|
ulittle32_t StreamBlocks[][] = {
|
|
{4},
|
|
{5, 6},
|
|
{11, 9, 7, 8},
|
|
{10, 15, 12}
|
|
};
|
|
};
|
|
|
|
In total, this occupies ``15 * 4 = 60`` bytes, so ``SuperBlock->NumDirectoryBytes``
|
|
would equal ``60``, and ``SuperBlock->BlockMapAddr`` would be an array of one
|
|
``ulittle32_t``, since ``60 <= SuperBlock->BlockSize``.
|
|
|
|
Note also that the streams are discontiguous, and that part of stream 3 is in the
|
|
middle of part of stream 2. You cannot assume anything about the layout of the
|
|
blocks!
|
|
|
|
Alignment and Block Boundaries
|
|
==============================
|
|
As may be clear by now, it is possible for a single field (whether it be a high
|
|
level record, a long string field, or even a single ``uint16``) to begin and
|
|
end in separate blocks. For example, if the block size is 4096 bytes, and a
|
|
``uint16`` field begins at the last byte of the current block, then it would
|
|
need to end on the first byte of the next block. Since blocks are not
|
|
necessarily contiguously laid out in the file, this means that both the consumer
|
|
and the producer of an MSF file must be prepared to split data apart
|
|
accordingly. In the aforementioned example, the high byte of the ``uint16``
|
|
would be written to the last byte of block N, and the low byte would be written
|
|
to the first byte of block N+1, which could be tens of thousands of bytes later
|
|
(or even earlier!) in the file, depending on what the stream directory says.
|