reference, declarationdefinition
definition → references, declarations, derived classes, virtual overrides
reference to multiple definitions → definitions
unreferenced
    1
    2
    3
    4
    5
    6
    7
    8
    9
   10
   11
   12
   13
   14
   15
   16
   17
   18
   19
   20
   21
   22
   23
   24
   25
   26
   27
   28
   29
   30
   31
   32
   33
   34
   35
   36
   37
   38
   39
   40
   41
   42
   43
   44
   45
   46
   47
   48
   49
   50
   51
   52
   53
   54
   55
   56
   57
   58
   59
   60
   61
   62
   63
   64
   65
   66
   67
   68
   69
   70
   71
   72
   73
   74
   75
   76
   77
   78
   79
   80
   81
   82
   83
   84
   85
   86
   87
   88
   89
   90
   91
   92
   93
   94
   95
   96
   97
   98
   99
  100
  101
  102
  103
  104
  105
  106
  107
  108
  109
  110
  111
  112
  113
  114
  115
  116
  117
  118
  119
  120
  121
  122
  123
  124
  125
  126
  127
  128
  129
  130
  131
  132
  133
  134
  135
  136
  137
  138
  139
  140
  141
  142
  143
  144
  145
  146
  147
  148
  149
  150
  151
  152
  153
  154
  155
  156
  157
  158
  159
  160
  161
  162
  163
  164
  165
  166
  167
  168
  169
=====================================
The PDB File Format
=====================================

.. contents::
   :local:

.. _pdb_intro:

Introduction
============

PDB (Program Database) is a file format invented by Microsoft and which contains
debug information that can be consumed by debuggers and other tools.  Since
officially supported APIs exist on Windows for querying debug information from
PDBs even without the user understanding the internals of the file format, a
large ecosystem of tools has been built for Windows to consume this format.  In
order for Clang to be able to generate programs that can interoperate with these
tools, it is necessary for us to generate PDB files ourselves.

At the same time, LLVM has a long history of being able to cross-compile from
any platform to any platform, and we wish for the same to be true here.  So it
is necessary for us to understand the PDB file format at the byte-level so that
we can generate PDB files entirely on our own.

This manual describes what we know about the PDB file format today.  The layout
of the file, the various streams contained within, the format of individual
records within, and more.

We would like to extend our heartfelt gratitude to Microsoft, without whom we
would not be where we are today.  Much of the knowledge contained within this
manual was learned through reading code published by Microsoft on their `GitHub
repo <https://github.com/Microsoft/microsoft-pdb>`__.

.. _pdb_layout:

File Layout
===========

.. important::
   Unless otherwise specified, all numeric values are encoded in little endian.
   If you see a type such as ``uint16_t`` or ``uint64_t`` going forward, always
   assume it is little endian!

.. toctree::
   :hidden:

   MsfFile
   PdbStream
   TpiStream
   DbiStream
   ModiStream
   PublicStream
   GlobalStream
   HashTable
   CodeViewSymbols
   CodeViewTypes

.. _msf:

The MSF Container
-----------------
A PDB file is an MSF (Multi-Stream Format) file.  An MSF file is a "file system
within a file".  It contains multiple streams (aka files) which can represent
arbitrary data, and these streams are divided into blocks which may not
necessarily be contiguously laid out within the MSF container file.
Additionally, the MSF contains a stream directory (aka MFT) which describes how
the streams (files) are laid out within the MSF.

For more information about the MSF container format, stream directory, and
block layout, see :doc:`MsfFile`.

.. _streams:

Streams
-------
The PDB format contains a number of streams which describe various information
such as the types, symbols, source files, and compilands (e.g. object files)
of a program, as well as some additional streams containing hash tables that are
used by debuggers and other tools to provide fast lookup of records and types
by name, and various other information about how the program was compiled such
as the specific toolchain used, and more.  A summary of streams contained in a
PDB file is as follows:

+--------------------+------------------------------+-------------------------------------------+
| Name               | Stream Index                 | Contents                                  |
+====================+==============================+===========================================+
| Old Directory      | - Fixed Stream Index 0       | - Previous MSF Stream Directory           |
+--------------------+------------------------------+-------------------------------------------+
| PDB Stream         | - Fixed Stream Index 1       | - Basic File Information                  |
|                    |                              | - Fields to match EXE to this PDB         |
|                    |                              | - Map of named streams to stream indices  |
+--------------------+------------------------------+-------------------------------------------+
| TPI Stream         | - Fixed Stream Index 2       | - CodeView Type Records                   |
|                    |                              | - Index of TPI Hash Stream                |
+--------------------+------------------------------+-------------------------------------------+
| DBI Stream         | - Fixed Stream Index 3       | - Module/Compiland Information            |
|                    |                              | - Indices of individual module streams    |
|                    |                              | - Indices of public / global streams      |
|                    |                              | - Section Contribution Information        |
|                    |                              | - Source File Information                 |
|                    |                              | - References to streams containing        |
|                    |                              |   FPO / PGO Data                          |
+--------------------+------------------------------+-------------------------------------------+
| IPI Stream         | - Fixed Stream Index 4       | - CodeView Type Records                   |
|                    |                              | - Index of IPI Hash Stream                |
+--------------------+------------------------------+-------------------------------------------+
| /LinkInfo          | - Contained in PDB Stream    | - Unknown                                 |
|                    |   Named Stream map           |                                           |
+--------------------+------------------------------+-------------------------------------------+
| /src/headerblock   | - Contained in PDB Stream    | - Summary of embedded source file content |
|                    |   Named Stream map           |   (e.g. natvis files)                     |
+--------------------+------------------------------+-------------------------------------------+
| /names             | - Contained in PDB Stream    | - PDB-wide global string table used for   |
|                    |   Named Stream map           |   string de-duplication                   |
+--------------------+------------------------------+-------------------------------------------+
| Module Info Stream | - Contained in DBI Stream    | - CodeView Symbol Records for this module |
|                    | - One for each compiland     | - Line Number Information                 |
+--------------------+------------------------------+-------------------------------------------+
| Public Stream      | - Contained in DBI Stream    | - Public (Exported) Symbol Records        |
|                    |                              | - Index of Public Hash Stream             |
+--------------------+------------------------------+-------------------------------------------+
| Global Stream      | - Contained in DBI Stream    | - Single combined master symbol-table     |
|                    |                              | - Index of Global Hash Stream             |
+--------------------+------------------------------+-------------------------------------------+
| TPI Hash Stream    | - Contained in TPI Stream    | - Hash table for looking up TPI records   |
|                    |                              |   by name                                 |
+--------------------+------------------------------+-------------------------------------------+
| IPI Hash Stream    | - Contained in IPI Stream    | - Hash table for looking up IPI records   |
|                    |                              |   by name                                 |
+--------------------+------------------------------+-------------------------------------------+

More information about the structure of each of these can be found on the
following pages:

:doc:`PdbStream`
   Information about the PDB Info Stream and how it is used to match PDBs to EXEs.

:doc:`TpiStream`
   Information about the TPI stream and the CodeView records contained within.

:doc:`DbiStream`
   Information about the DBI stream and relevant substreams including the
   Module Substreams, source file information, and CodeView symbol records
   contained within.

:doc:`ModiStream`
   Information about the Module Information Stream, of which there is one for
   each compilation unit and the format of symbols contained within.

:doc:`PublicStream`
   Information about the Public Symbol Stream.

:doc:`GlobalStream`
   Information about the Global Symbol Stream.

:doc:`HashTable`
   Information about the serialized hash table format used internally to
   represent things such as the Named Stream Map and the Hash Adjusters in the
   :doc:`TPI/IPI Stream <TpiStream>`.

CodeView
========
CodeView is another format which comes into the picture.  While MSF defines
the structure of the overall file, and PDB defines the set of streams that
appear within the MSF file and the format of those streams, CodeView defines
the format of **symbol and type records** that appear within specific streams.
Refer to the pages on :doc:`CodeViewSymbols` and :doc:`CodeViewTypes` for
more information about the CodeView format.