reference, declarationdefinition
definition → references, declarations, derived classes, virtual overrides
reference to multiple definitions → definitions
unreferenced
    1
    2
    3
    4
    5
    6
    7
    8
    9
   10
   11
   12
   13
   14
   15
   16
   17
   18
   19
   20
   21
   22
   23
   24
   25
   26
   27
   28
   29
   30
   31
   32
   33
   34
   35
   36
   37
   38
   39
   40
   41
   42
   43
   44
   45
   46
   47
   48
   49
   50
   51
   52
   53
   54
   55
   56
   57
   58
   59
   60
   61
   62
   63
   64
   65
   66
   67
   68
   69
   70
   71
   72
   73
   74
   75
   76
   77
   78
   79
   80
   81
   82
   83
   84
   85
   86
   87
   88
   89
   90
   91
   92
   93
   94
   95
   96
   97
   98
   99
  100
  101
  102
  103
  104
  105
  106
  107
  108
  109
  110
  111
  112
  113
  114
  115
  116
  117
  118
  119
  120
  121
  122
  123
  124
  125
  126
  127
  128
  129
  130
  131
  132
  133
  134
  135
  136
  137
  138
  139
  140
  141
  142
  143
  144
  145
  146
  147
  148
  149
  150
  151
  152
  153
  154
  155
  156
  157
  158
  159
  160
  161
  162
  163
  164
  165
  166
  167
  168
  169
  170
  171
  172
  173
  174
  175
  176
  177
  178
  179
  180
  181
  182
  183
  184
  185
  186
  187
  188
  189
  190
  191
  192
  193
  194
  195
  196
  197
  198
  199
  200
  201
  202
  203
  204
  205
  206
  207
  208
  209
  210
  211
  212
  213
  214
  215
  216
  217
  218
  219
  220
  221
  222
  223
  224
  225
  226
  227
  228
  229
  230
  231
  232
  233
  234
  235
  236
  237
  238
  239
  240
  241
  242
  243
  244
  245
  246
  247
  248
  249
  250
  251
  252
  253
  254
  255
  256
  257
  258
  259
  260
  261
  262
  263
  264
  265
  266
  267
  268
  269
  270
  271
  272
  273
  274
  275
  276
  277
  278
  279
  280
  281
  282
  283
  284
  285
  286
  287
  288
  289
  290
  291
  292
  293
  294
  295
  296
  297
  298
  299
  300
  301
  302
  303
  304
  305
  306
  307
  308
  309
  310
  311
  312
  313
  314
  315
  316
  317
  318
  319
  320
  321
  322
  323
  324
  325
  326
  327
  328
  329
  330
  331
  332
  333
  334
  335
  336
  337
  338
  339
  340
  341
  342
  343
  344
  345
  346
  347
  348
  349
  350
  351
  352
  353
  354
  355
  356
  357
  358
  359
  360
  361
  362
  363
  364
  365
  366
  367
  368
  369
  370
  371
  372
  373
  374
  375
  376
  377
  378
  379
  380
  381
  382
  383
  384
  385
  386
  387
  388
  389
  390
  391
  392
  393
  394
  395
  396
  397
  398
  399
  400
=========================
Driver Design & Internals
=========================

.. contents::
   :local:

Introduction
============

This document describes the Clang driver. The purpose of this document
is to describe both the motivation and design goals for the driver, as
well as details of the internal implementation.

Features and Goals
==================

The Clang driver is intended to be a production quality compiler driver
providing access to the Clang compiler and tools, with a command line
interface which is compatible with the gcc driver.

Although the driver is part of and driven by the Clang project, it is
logically a separate tool which shares many of the same goals as Clang:

.. contents:: Features
   :local:

GCC Compatibility
-----------------

The number one goal of the driver is to ease the adoption of Clang by
allowing users to drop Clang into a build system which was designed to
call GCC. Although this makes the driver much more complicated than
might otherwise be necessary, we decided that being very compatible with
the gcc command line interface was worth it in order to allow users to
quickly test clang on their projects.

Flexible
--------

The driver was designed to be flexible and easily accommodate new uses
as we grow the clang and LLVM infrastructure. As one example, the driver
can easily support the introduction of tools which have an integrated
assembler; something we hope to add to LLVM in the future.

Similarly, most of the driver functionality is kept in a library which
can be used to build other tools which want to implement or accept a gcc
like interface.

Low Overhead
------------

The driver should have as little overhead as possible. In practice, we
found that the gcc driver by itself incurred a small but meaningful
overhead when compiling many small files. The driver doesn't do much
work compared to a compilation, but we have tried to keep it as
efficient as possible by following a few simple principles:

-  Avoid memory allocation and string copying when possible.
-  Don't parse arguments more than once.
-  Provide a few simple interfaces for efficiently searching arguments.

Simple
------

Finally, the driver was designed to be "as simple as possible", given
the other goals. Notably, trying to be completely compatible with the
gcc driver adds a significant amount of complexity. However, the design
of the driver attempts to mitigate this complexity by dividing the
process into a number of independent stages instead of a single
monolithic task.

Internal Design and Implementation
==================================

.. contents::
   :local:
   :depth: 1

Internals Introduction
----------------------

In order to satisfy the stated goals, the driver was designed to
completely subsume the functionality of the gcc executable; that is, the
driver should not need to delegate to gcc to perform subtasks. On
Darwin, this implies that the Clang driver also subsumes the gcc
driver-driver, which is used to implement support for building universal
images (binaries and object files). This also implies that the driver
should be able to call the language specific compilers (e.g. cc1)
directly, which means that it must have enough information to forward
command line arguments to child processes correctly.

Design Overview
---------------

The diagram below shows the significant components of the driver
architecture and how they relate to one another. The orange components
represent concrete data structures built by the driver, the green
components indicate conceptually distinct stages which manipulate these
data structures, and the blue components are important helper classes.

.. image:: DriverArchitecture.png
   :align: center
   :alt: Driver Architecture Diagram

Driver Stages
-------------

The driver functionality is conceptually divided into five stages:

#. **Parse: Option Parsing**

   The command line argument strings are decomposed into arguments
   (``Arg`` instances). The driver expects to understand all available
   options, although there is some facility for just passing certain
   classes of options through (like ``-Wl,``).

   Each argument corresponds to exactly one abstract ``Option``
   definition, which describes how the option is parsed along with some
   additional metadata. The Arg instances themselves are lightweight and
   merely contain enough information for clients to determine which
   option they correspond to and their values (if they have additional
   parameters).

   For example, a command line like "-Ifoo -I foo" would parse to two
   Arg instances (a JoinedArg and a SeparateArg instance), but each
   would refer to the same Option.

   Options are lazily created in order to avoid populating all Option
   classes when the driver is loaded. Most of the driver code only needs
   to deal with options by their unique ID (e.g., ``options::OPT_I``),

   Arg instances themselves do not generally store the values of
   parameters. In many cases, this would simply result in creating
   unnecessary string copies. Instead, Arg instances are always embedded
   inside an ArgList structure, which contains the original vector of
   argument strings. Each Arg itself only needs to contain an index into
   this vector instead of storing its values directly.

   The clang driver can dump the results of this stage using the
   ``-###`` flag (which must precede any actual command
   line arguments). For example:

   .. code-block:: console

      $ clang -### -Xarch_i386 -fomit-frame-pointer -Wa,-fast -Ifoo -I foo t.c
      Option 0 - Name: "-Xarch_", Values: {"i386", "-fomit-frame-pointer"}
      Option 1 - Name: "-Wa,", Values: {"-fast"}
      Option 2 - Name: "-I", Values: {"foo"}
      Option 3 - Name: "-I", Values: {"foo"}
      Option 4 - Name: "<input>", Values: {"t.c"}

   After this stage is complete the command line should be broken down
   into well defined option objects with their appropriate parameters.
   Subsequent stages should rarely, if ever, need to do any string
   processing.

#. **Pipeline: Compilation Action Construction**

   Once the arguments are parsed, the tree of subprocess jobs needed for
   the desired compilation sequence are constructed. This involves
   determining the input files and their types, what work is to be done
   on them (preprocess, compile, assemble, link, etc.), and constructing
   a list of Action instances for each task. The result is a list of one
   or more top-level actions, each of which generally corresponds to a
   single output (for example, an object or linked executable).

   The majority of Actions correspond to actual tasks, however there are
   two special Actions. The first is InputAction, which simply serves to
   adapt an input argument for use as an input to other Actions. The
   second is BindArchAction, which conceptually alters the architecture
   to be used for all of its input Actions.

   The clang driver can dump the results of this stage using the
   ``-ccc-print-phases`` flag. For example:

   .. code-block:: console

      $ clang -ccc-print-phases -x c t.c -x assembler t.s
      0: input, "t.c", c
      1: preprocessor, {0}, cpp-output
      2: compiler, {1}, assembler
      3: assembler, {2}, object
      4: input, "t.s", assembler
      5: assembler, {4}, object
      6: linker, {3, 5}, image

   Here the driver is constructing seven distinct actions, four to
   compile the "t.c" input into an object file, two to assemble the
   "t.s" input, and one to link them together.

   A rather different compilation pipeline is shown here; in this
   example there are two top level actions to compile the input files
   into two separate object files, where each object file is built using
   ``lipo`` to merge results built for two separate architectures.

   .. code-block:: console

      $ clang -ccc-print-phases -c -arch i386 -arch x86_64 t0.c t1.c
      0: input, "t0.c", c
      1: preprocessor, {0}, cpp-output
      2: compiler, {1}, assembler
      3: assembler, {2}, object
      4: bind-arch, "i386", {3}, object
      5: bind-arch, "x86_64", {3}, object
      6: lipo, {4, 5}, object
      7: input, "t1.c", c
      8: preprocessor, {7}, cpp-output
      9: compiler, {8}, assembler
      10: assembler, {9}, object
      11: bind-arch, "i386", {10}, object
      12: bind-arch, "x86_64", {10}, object
      13: lipo, {11, 12}, object

   After this stage is complete the compilation process is divided into
   a simple set of actions which need to be performed to produce
   intermediate or final outputs (in some cases, like ``-fsyntax-only``,
   there is no "real" final output). Phases are well known compilation
   steps, such as "preprocess", "compile", "assemble", "link", etc.

#. **Bind: Tool & Filename Selection**

   This stage (in conjunction with the Translate stage) turns the tree
   of Actions into a list of actual subprocess to run. Conceptually, the
   driver performs a top down matching to assign Action(s) to Tools. The
   ToolChain is responsible for selecting the tool to perform a
   particular action; once selected the driver interacts with the tool
   to see if it can match additional actions (for example, by having an
   integrated preprocessor).

   Once Tools have been selected for all actions, the driver determines
   how the tools should be connected (for example, using an inprocess
   module, pipes, temporary files, or user provided filenames). If an
   output file is required, the driver also computes the appropriate
   file name (the suffix and file location depend on the input types and
   options such as ``-save-temps``).

   The driver interacts with a ToolChain to perform the Tool bindings.
   Each ToolChain contains information about all the tools needed for
   compilation for a particular architecture, platform, and operating
   system. A single driver invocation may query multiple ToolChains
   during one compilation in order to interact with tools for separate
   architectures.

   The results of this stage are not computed directly, but the driver
   can print the results via the ``-ccc-print-bindings`` option. For
   example:

   .. code-block:: console

      $ clang -ccc-print-bindings -arch i386 -arch ppc t0.c
      # "i386-apple-darwin9" - "clang", inputs: ["t0.c"], output: "/tmp/cc-Sn4RKF.s"
      # "i386-apple-darwin9" - "darwin::Assemble", inputs: ["/tmp/cc-Sn4RKF.s"], output: "/tmp/cc-gvSnbS.o"
      # "i386-apple-darwin9" - "darwin::Link", inputs: ["/tmp/cc-gvSnbS.o"], output: "/tmp/cc-jgHQxi.out"
      # "ppc-apple-darwin9" - "gcc::Compile", inputs: ["t0.c"], output: "/tmp/cc-Q0bTox.s"
      # "ppc-apple-darwin9" - "gcc::Assemble", inputs: ["/tmp/cc-Q0bTox.s"], output: "/tmp/cc-WCdicw.o"
      # "ppc-apple-darwin9" - "gcc::Link", inputs: ["/tmp/cc-WCdicw.o"], output: "/tmp/cc-HHBEBh.out"
      # "i386-apple-darwin9" - "darwin::Lipo", inputs: ["/tmp/cc-jgHQxi.out", "/tmp/cc-HHBEBh.out"], output: "a.out"

   This shows the tool chain, tool, inputs and outputs which have been
   bound for this compilation sequence. Here clang is being used to
   compile t0.c on the i386 architecture and darwin specific versions of
   the tools are being used to assemble and link the result, but generic
   gcc versions of the tools are being used on PowerPC.

#. **Translate: Tool Specific Argument Translation**

   Once a Tool has been selected to perform a particular Action, the
   Tool must construct concrete Commands which will be executed during
   compilation. The main work is in translating from the gcc style
   command line options to whatever options the subprocess expects.

   Some tools, such as the assembler, only interact with a handful of
   arguments and just determine the path of the executable to call and
   pass on their input and output arguments. Others, like the compiler
   or the linker, may translate a large number of arguments in addition.

   The ArgList class provides a number of simple helper methods to
   assist with translating arguments; for example, to pass on only the
   last of arguments corresponding to some option, or all arguments for
   an option.

   The result of this stage is a list of Commands (executable paths and
   argument strings) to execute.

#. **Execute**

   Finally, the compilation pipeline is executed. This is mostly
   straightforward, although there is some interaction with options like
   ``-pipe``, ``-pass-exit-codes`` and ``-time``.

Additional Notes
----------------

The Compilation Object
^^^^^^^^^^^^^^^^^^^^^^

The driver constructs a Compilation object for each set of command line
arguments. The Driver itself is intended to be invariant during
construction of a Compilation; an IDE should be able to construct a
single long lived driver instance to use for an entire build, for
example.

The Compilation object holds information that is particular to each
compilation sequence. For example, the list of used temporary files
(which must be removed once compilation is finished) and result files
(which should be removed if compilation fails).

Unified Parsing & Pipelining
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Parsing and pipelining both occur without reference to a Compilation
instance. This is by design; the driver expects that both of these
phases are platform neutral, with a few very well defined exceptions
such as whether the platform uses a driver driver.

ToolChain Argument Translation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In order to match gcc very closely, the clang driver currently allows
tool chains to perform their own translation of the argument list (into
a new ArgList data structure). Although this allows the clang driver to
match gcc easily, it also makes the driver operation much harder to
understand (since the Tools stop seeing some arguments the user
provided, and see new ones instead).

For example, on Darwin ``-gfull`` gets translated into two separate
arguments, ``-g`` and ``-fno-eliminate-unused-debug-symbols``. Trying to
write Tool logic to do something with ``-gfull`` will not work, because
Tool argument translation is done after the arguments have been
translated.

A long term goal is to remove this tool chain specific translation, and
instead force each tool to change its own logic to do the right thing on
the untranslated original arguments.

Unused Argument Warnings
^^^^^^^^^^^^^^^^^^^^^^^^

The driver operates by parsing all arguments but giving Tools the
opportunity to choose which arguments to pass on. One downside of this
infrastructure is that if the user misspells some option, or is confused
about which options to use, some command line arguments the user really
cared about may go unused. This problem is particularly important when
using clang as a compiler, since the clang compiler does not support
anywhere near all the options that gcc does, and we want to make sure
users know which ones are being used.

To support this, the driver maintains a bit associated with each
argument of whether it has been used (at all) during the compilation.
This bit usually doesn't need to be set by hand, as the key ArgList
accessors will set it automatically.

When a compilation is successful (there are no errors), the driver
checks the bit and emits an "unused argument" warning for any arguments
which were never accessed. This is conservative (the argument may not
have been used to do what the user wanted) but still catches the most
obvious cases.

Relation to GCC Driver Concepts
-------------------------------

For those familiar with the gcc driver, this section provides a brief
overview of how things from the gcc driver map to the clang driver.

-  **Driver Driver**

   The driver driver is fully integrated into the clang driver. The
   driver simply constructs additional Actions to bind the architecture
   during the *Pipeline* phase. The tool chain specific argument
   translation is responsible for handling ``-Xarch_``.

   The one caveat is that this approach requires ``-Xarch_`` not be used
   to alter the compilation itself (for example, one cannot provide
   ``-S`` as an ``-Xarch_`` argument). The driver attempts to reject
   such invocations, and overall there isn't a good reason to abuse
   ``-Xarch_`` to that end in practice.

   The upside is that the clang driver is more efficient and does little
   extra work to support universal builds. It also provides better error
   reporting and UI consistency.

-  **Specs**

   The clang driver has no direct correspondent for "specs". The
   majority of the functionality that is embedded in specs is in the
   Tool specific argument translation routines. The parts of specs which
   control the compilation pipeline are generally part of the *Pipeline*
   stage.

-  **Toolchains**

   The gcc driver has no direct understanding of tool chains. Each gcc
   binary roughly corresponds to the information which is embedded
   inside a single ToolChain.

   The clang driver is intended to be portable and support complex
   compilation environments. All platform and tool chain specific code
   should be protected behind either abstract or well defined interfaces
   (such as whether the platform supports use as a driver driver).