reference, declarationdefinition
definition → references, declarations, derived classes, virtual overrides
reference to multiple definitions → definitions
unreferenced
    1
    2
    3
    4
    5
    6
    7
    8
    9
   10
   11
   12
   13
   14
   15
   16
   17
   18
   19
   20
   21
   22
   23
   24
   25
   26
   27
   28
   29
   30
   31
   32
   33
   34
   35
   36
   37
   38
   39
   40
   41
   42
   43
   44
   45
   46
   47
   48
   49
   50
   51
   52
   53
   54
   55
   56
   57
   58
   59
   60
   61
   62
   63
   64
   65
   66
   67
   68
   69
   70
   71
   72
   73
   74
   75
   76
   77
   78
   79
   80
   81
   82
   83
   84
   85
   86
   87
   88
   89
   90
   91
   92
   93
   94
   95
   96
   97
   98
   99
  100
  101
  102
  103
  104
  105
  106
  107
  108
  109
  110
  111
  112
  113
  114
  115
  116
  117
  118
  119
  120
  121
  122
  123
  124
  125
  126
  127
  128
  129
  130
  131
  132
  133
  134
  135
  136
  137
  138
  139
  140
  141
  142
  143
  144
  145
  146
  147
  148
  149
  150
  151
  152
  153
  154
  155
  156
  157
  158
  159
  160
  161
  162
  163
  164
  165
  166
  167
  168
  169
  170
  171
  172
  173
  174
  175
  176
  177
  178
  179
  180
  181
  182
  183
  184
  185
  186
  187
  188
  189
  190
  191
  192
  193
  194
  195
  196
  197
  198
  199
  200
  201
  202
  203
  204
  205
  206
  207
  208
  209
  210
  211
  212
  213
  214
  215
  216
  217
  218
  219
  220
  221
  222
  223
  224
  225
  226
  227
  228
  229
  230
  231
  232
  233
  234
  235
  236
  237
  238
  239
  240
  241
  242
  243
  244
  245
  246
  247
  248
  249
  250
  251
  252
  253
  254
  255
  256
  257
  258
  259
  260
  261
  262
  263
  264
  265
  266
  267
  268
  269
  270
  271
  272
  273
  274
  275
  276
  277
  278
  279
  280
  281
  282
  283
  284
  285
  286
  287
  288
  289
  290
  291
  292
  293
  294
  295
  296
  297
  298
  299
  300
  301
  302
  303
  304
  305
  306
  307
  308
  309
  310
  311
  312
  313
  314
  315
  316
  317
  318
  319
  320
  321
  322
  323
  324
  325
  326
  327
  328
  329
  330
  331
  332
  333
  334
  335
  336
  337
  338
  339
  340
  341
  342
  343
  344
  345
  346
  347
  348
  349
  350
  351
  352
  353
  354
  355
  356
  357
  358
  359
  360
  361
  362
  363
  364
  365
  366
  367
  368
  369
  370
  371
  372
  373
  374
  375
  376
  377
  378
  379
  380
  381
  382
  383
  384
  385
  386
  387
  388
  389
  390
  391
  392
  393
  394
  395
  396
Inlining
========

There are several options that control which calls the analyzer will consider for
inlining. The major one is ``-analyzer-config ipa``:

* ``analyzer-config ipa=none`` - All inlining is disabled. This is the only mode 
  available in LLVM 3.1 and earlier and in Xcode 4.3 and earlier.

* ``analyzer-config ipa=basic-inlining`` - Turns on inlining for C functions, C++ 
   static member functions, and blocks -- essentially, the calls that behave 
   like simple C function calls. This is essentially the mode used in 
   Xcode 4.4.

* ``analyzer-config ipa=inlining`` - Turns on inlining when we can confidently find
    the function/method body corresponding to the call. (C functions, static
    functions, devirtualized C++ methods, Objective-C class methods, Objective-C
    instance methods when ExprEngine is confident about the dynamic type of the
    instance).

* ``analyzer-config ipa=dynamic`` - Inline instance methods for which the type is
   determined at runtime and we are not 100% sure that our type info is
   correct. For virtual calls, inline the most plausible definition.

* ``analyzer-config ipa=dynamic-bifurcate`` - Same as -analyzer-config ipa=dynamic,
   but the path is split. We inline on one branch and do not inline on the 
   other. This mode does not drop the coverage in cases when the parent class 
   has code that is only exercised when some of its methods are overridden.

Currently, ``-analyzer-config ipa=dynamic-bifurcate`` is the default mode.

While ``-analyzer-config ipa`` determines in general how aggressively the analyzer 
will try to inline functions, several additional options control which types of 
functions can inlined, in an all-or-nothing way. These options use the 
analyzer's configuration table, so they are all specified as follows:

    ``-analyzer-config OPTION=VALUE``

c++-inlining
------------

This option controls which C++ member functions may be inlined.

    ``-analyzer-config c++-inlining=[none | methods | constructors | destructors]``

Each of these modes implies that all the previous member function kinds will be
inlined as well; it doesn't make sense to inline destructors without inlining
constructors, for example.

The default c++-inlining mode is 'destructors', meaning that all member
functions with visible definitions will be considered for inlining. In some
cases the analyzer may still choose not to inline the function.

Note that under 'constructors', constructors for types with non-trivial
destructors will not be inlined. Additionally, no C++ member functions will be 
inlined under -analyzer-config ipa=none or -analyzer-config ipa=basic-inlining,
regardless of the setting of the c++-inlining mode.

c++-template-inlining
^^^^^^^^^^^^^^^^^^^^^

This option controls whether C++ templated functions may be inlined.

    ``-analyzer-config c++-template-inlining=[true | false]``

Currently, template functions are considered for inlining by default.

The motivation behind this option is that very generic code can be a source
of false positives, either by considering paths that the caller considers
impossible (by some unstated precondition), or by inlining some but not all
of a deep implementation of a function.

c++-stdlib-inlining
^^^^^^^^^^^^^^^^^^^

This option controls whether functions from the C++ standard library, including
methods of the container classes in the Standard Template Library, should be
considered for inlining.

    ``-analyzer-config c++-stdlib-inlining=[true | false]``

Currently, C++ standard library functions are considered for inlining by 
default.

The standard library functions and the STL in particular are used ubiquitously
enough that our tolerance for false positives is even lower here. A false
positive due to poor modeling of the STL leads to a poor user experience, since
most users would not be comfortable adding assertions to system headers in order
to silence analyzer warnings.

c++-container-inlining
^^^^^^^^^^^^^^^^^^^^^^

This option controls whether constructors and destructors of "container" types
should be considered for inlining.

    ``-analyzer-config c++-container-inlining=[true | false]``

Currently, these constructors and destructors are NOT considered for inlining
by default.

The current implementation of this setting checks whether a type has a member
named 'iterator' or a member named 'begin'; these names are idiomatic in C++,
with the latter specified in the C++11 standard. The analyzer currently does a
fairly poor job of modeling certain data structure invariants of container-like
objects. For example, these three expressions should be equivalent:


.. code-block:: cpp
   
 std::distance(c.begin(), c.end()) == 0
 c.begin() == c.end()
 c.empty()

Many of these issues are avoided if containers always have unknown, symbolic
state, which is what happens when their constructors are treated as opaque.
In the future, we may decide specific containers are "safe" to model through
inlining, or choose to model them directly using checkers instead.


Basics of Implementation
------------------------

The low-level mechanism of inlining a function is handled in
ExprEngine::inlineCall and ExprEngine::processCallExit.

If the conditions are right for inlining, a CallEnter node is created and added
to the analysis work list. The CallEnter node marks the change to a new
LocationContext representing the called function, and its state includes the
contents of the new stack frame. When the CallEnter node is actually processed,
its single successor will be a edge to the first CFG block in the function.

Exiting an inlined function is a bit more work, fortunately broken up into
reasonable steps:

1. The CoreEngine realizes we're at the end of an inlined call and generates a
   CallExitBegin node.

2. ExprEngine takes over (in processCallExit) and finds the return value of the
   function, if it has one. This is bound to the expression that triggered the
   call. (In the case of calls without origin expressions, such as destructors,
   this step is skipped.)

3. Dead symbols and bindings are cleaned out from the state, including any local
   bindings.

4. A CallExitEnd node is generated, which marks the transition back to the
   caller's LocationContext.

5. Custom post-call checks are processed and the final nodes are pushed back
   onto the work list, so that evaluation of the caller can continue.

Retry Without Inlining
^^^^^^^^^^^^^^^^^^^^^^

In some cases, we would like to retry analysis without inlining a particular
call.

Currently, we use this technique to recover coverage in case we stop
analyzing a path due to exceeding the maximum block count inside an inlined
function.

When this situation is detected, we walk up the path to find the first node
before inlining was started and enqueue it on the WorkList with a special
ReplayWithoutInlining bit added to it (ExprEngine::replayWithoutInlining).  The
path is then re-analyzed from that point without inlining that particular call.

Deciding When to Inline
^^^^^^^^^^^^^^^^^^^^^^^

In general, the analyzer attempts to inline as much as possible, since it
provides a better summary of what actually happens in the program.  There are
some cases, however, where the analyzer chooses not to inline:

- If there is no definition available for the called function or method.  In
  this case, there is no opportunity to inline.

- If the CFG cannot be constructed for a called function, or the liveness
  cannot be computed.  These are prerequisites for analyzing a function body,
  with or without inlining.

- If the LocationContext chain for a given ExplodedNode reaches a maximum cutoff
  depth.  This prevents unbounded analysis due to infinite recursion, but also
  serves as a useful cutoff for performance reasons.

- If the function is variadic.  This is not a hard limitation, but an engineering
  limitation.

  Tracked by: <rdar://problem/12147064> Support inlining of variadic functions

- In C++, constructors are not inlined unless the destructor call will be
  processed by the ExprEngine. Thus, if the CFG was built without nodes for
  implicit destructors, or if the destructors for the given object are not
  represented in the CFG, the constructor will not be inlined. (As an exception,
  constructors for objects with trivial constructors can still be inlined.)
  See "C++ Caveats" below.

- In C++, ExprEngine does not inline custom implementations of operator 'new'
  or operator 'delete', nor does it inline the constructors and destructors
  associated with these. See "C++ Caveats" below.

- Calls resulting in "dynamic dispatch" are specially handled.  See more below.

- The FunctionSummaries map stores additional information about declarations,
  some of which is collected at runtime based on previous analyses.
  We do not inline functions which were not profitable to inline in a different
  context (for example, if the maximum block count was exceeded; see
  "Retry Without Inlining").


Dynamic Calls and Devirtualization
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

"Dynamic" calls are those that are resolved at runtime, such as C++ virtual
method calls and Objective-C message sends. Due to the path-sensitive nature of
the analysis, the analyzer may be able to reason about the dynamic type of the
object whose method is being called and thus "devirtualize" the call. 

This path-sensitive devirtualization occurs when the analyzer can determine what
method would actually be called at runtime.  This is possible when the type
information is constrained enough for a simulated C++/Objective-C object that
the analyzer can make such a decision.

DynamicTypeInfo
^^^^^^^^^^^^^^^

As the analyzer analyzes a path, it may accrue information to refine the
knowledge about the type of an object.  This can then be used to make better
decisions about the target method of a call.

Such type information is tracked as DynamicTypeInfo.  This is path-sensitive
data that is stored in ProgramState, which defines a mapping from MemRegions to
an (optional) DynamicTypeInfo.

If no DynamicTypeInfo has been explicitly set for a MemRegion, it will be lazily
inferred from the region's type or associated symbol. Information from symbolic
regions is weaker than from true typed regions.

  EXAMPLE: A C++ object declared "A obj" is known to have the class 'A', but a
           reference "A &ref" may dynamically be a subclass of 'A'.

The DynamicTypePropagation checker gathers and propagates DynamicTypeInfo,
updating it as information is observed along a path that can refine that type
information for a region.

  WARNING: Not all of the existing analyzer code has been retrofitted to use
           DynamicTypeInfo, nor is it universally appropriate. In particular,
           DynamicTypeInfo always applies to a region with all casts stripped
           off, but sometimes the information provided by casts can be useful.


RuntimeDefinition
^^^^^^^^^^^^^^^^^

The basis of devirtualization is CallEvent's getRuntimeDefinition() method,
which returns a RuntimeDefinition object.  When asked to provide a definition,
the CallEvents for dynamic calls will use the DynamicTypeInfo in their
ProgramState to attempt to devirtualize the call.  In the case of no dynamic
dispatch, or perfectly constrained devirtualization, the resulting
RuntimeDefinition contains a Decl corresponding to the definition of the called
function, and RuntimeDefinition::mayHaveOtherDefinitions will return FALSE.

In the case of dynamic dispatch where our information is not perfect, CallEvent
can make a guess, but RuntimeDefinition::mayHaveOtherDefinitions will return
TRUE. The RuntimeDefinition object will then also include a MemRegion
corresponding to the object being called (i.e., the "receiver" in Objective-C
parlance), which ExprEngine uses to decide whether or not the call should be
inlined.

Inlining Dynamic Calls
^^^^^^^^^^^^^^^^^^^^^^ 

The -analyzer-config ipa option has five different modes: none, basic-inlining,
inlining, dynamic, and dynamic-bifurcate. Under -analyzer-config ipa=dynamic,
all dynamic calls are inlined, whether we are certain or not that this will
actually be the definition used at runtime. Under -analyzer-config ipa=inlining,
only "near-perfect" devirtualized calls are inlined*, and other dynamic calls
are evaluated conservatively (as if no definition were available). 

* Currently, no Objective-C messages are not inlined under
  -analyzer-config ipa=inlining, even if we are reasonably confident of the type
  of the receiver. We plan to enable this once we have tested our heuristics
  more thoroughly.

The last option, -analyzer-config ipa=dynamic-bifurcate, behaves similarly to
"dynamic", but performs a conservative invalidation in the general virtual case
in *addition* to inlining. The details of this are discussed below.

As stated above, -analyzer-config ipa=basic-inlining does not inline any C++ 
member functions or Objective-C method calls, even if they are non-virtual or 
can be safely devirtualized.


Bifurcation
^^^^^^^^^^^

ExprEngine::BifurcateCall implements the ``-analyzer-config ipa=dynamic-bifurcate``
mode.

When a call is made on an object with imprecise dynamic type information 
(RuntimeDefinition::mayHaveOtherDefinitions() evaluates to TRUE), ExprEngine
bifurcates the path and marks the object's region (retrieved from the
RuntimeDefinition object) with a path-sensitive "mode" in the ProgramState.

Currently, there are 2 modes: 

* ``DynamicDispatchModeInlined`` - Models the case where the dynamic type information
   of the receiver (MemoryRegion) is assumed to be perfectly constrained so 
   that a given definition of a method is expected to be the code actually 
   called. When this mode is set, ExprEngine uses the Decl from 
   RuntimeDefinition to inline any dynamically dispatched call sent to this 
   receiver because the function definition is considered to be fully resolved.

* ``DynamicDispatchModeConservative`` - Models the case where the dynamic type
   information is assumed to be incorrect, for example, implies that the method 
   definition is overridden in a subclass. In such cases, ExprEngine does not 
   inline the methods sent to the receiver (MemoryRegion), even if a candidate 
   definition is available. This mode is conservative about simulating the 
   effects of a call.

Going forward along the symbolic execution path, ExprEngine consults the mode 
of the receiver's MemRegion to make decisions on whether the calls should be 
inlined or not, which ensures that there is at most one split per region.

At a high level, "bifurcation mode" allows for increased semantic coverage in
cases where the parent method contains code which is only executed when the
class is subclassed. The disadvantages of this mode are a (considerable?)
performance hit and the possibility of false positives on the path where the
conservative mode is used.

Objective-C Message Heuristics
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

ExprEngine relies on a set of heuristics to partition the set of Objective-C 
method calls into those that require bifurcation and those that do not. Below 
are the cases when the DynamicTypeInfo of the object is considered precise
(cannot be a subclass):

 - If the object was created with +alloc or +new and initialized with an -init
   method.

 - If the calls are property accesses using dot syntax. This is based on the
   assumption that children rarely override properties, or do so in an
   essentially compatible way.

 - If the class interface is declared inside the main source file. In this case
   it is unlikely that it will be subclassed.

 - If the method is not declared outside of main source file, either by the
   receiver's class or by any superclasses.

C++ Caveats
^^^^^^^^^^^

C++11 [class.cdtor]p4 describes how the vtable of an object is modified as it is
being constructed or destructed; that is, the type of the object depends on
which base constructors have been completed. This is tracked using
DynamicTypeInfo in the DynamicTypePropagation checker.

There are several limitations in the current implementation:

* Temporaries are poorly modeled right now because we're not confident in the
  placement of their destructors in the CFG. We currently won't inline their
  constructors unless the destructor is trivial, and don't process their
  destructors at all, not even to invalidate the region.

* 'new' is poorly modeled due to some nasty CFG/design issues.  This is tracked
  in PR12014.  'delete' is not modeled at all.

* Arrays of objects are modeled very poorly right now.  ExprEngine currently
  only simulates the first constructor and first destructor. Because of this,
  ExprEngine does not inline any constructors or destructors for arrays.


CallEvent
^^^^^^^^^

A CallEvent represents a specific call to a function, method, or other body of
code. It is path-sensitive, containing both the current state (ProgramStateRef)
and stack space (LocationContext), and provides uniform access to the argument
values and return type of a call, no matter how the call is written in the
source or what sort of code body is being invoked.

  NOTE: For those familiar with Cocoa, CallEvent is roughly equivalent to
        NSInvocation.

CallEvent should be used whenever there is logic dealing with function calls
that does not care how the call occurred.

Examples include checking that arguments satisfy preconditions (such as
__attribute__((nonnull))), and attempting to inline a call.

CallEvents are reference-counted objects managed by a CallEventManager. While
there is no inherent issue with persisting them (say, in a ProgramState's GDM),
they are intended for short-lived use, and can be recreated from CFGElements or
non-top-level StackFrameContexts fairly easily.