reference, declarationdefinition
definition → references, declarations, derived classes, virtual overrides
reference to multiple definitions → definitions
unreferenced
    1
    2
    3
    4
    5
    6
    7
    8
    9
   10
   11
   12
   13
   14
   15
   16
   17
   18
   19
   20
   21
   22
   23
   24
   25
   26
   27
   28
   29
   30
   31
   32
   33
   34
   35
   36
   37
   38
   39
   40
   41
   42
   43
   44
   45
   46
   47
   48
   49
   50
   51
   52
   53
   54
   55
   56
   57
   58
   59
   60
   61
   62
   63
   64
   65
   66
   67
   68
   69
   70
   71
   72
   73
   74
   75
   76
   77
   78
   79
   80
   81
   82
   83
   84
   85
   86
   87
   88
   89
   90
   91
   92
   93
   94
   95
   96
   97
   98
   99
  100
  101
  102
  103
  104
  105
  106
  107
  108
  109
  110
  111
  112
  113
  114
  115
  116
  117
  118
  119
  120
  121
  122
  123
  124
  125
  126
  127
  128
  129
  130
  131
  132
  133
  134
  135
  136
  137
  138
  139
  140
  141
  142
  143
  144
  145
  146
  147
  148
  149
  150
  151
  152
  153
  154
  155
  156
  157
  158
  159
  160
  161
  162
  163
  164
  165
  166
  167
  168
  169
  170
  171
  172
  173
  174
  175
  176
  177
  178
  179
  180
  181
  182
  183
  184
  185
  186
  187
  188
  189
  190
  191
  192
  193
  194
  195
  196
  197
  198
  199
  200
  201
  202
  203
  204
  205
  206
  207
  208
  209
  210
  211
  212
  213
  214
  215
  216
  217
  218
  219
  220
  221
  222
  223
  224
  225
  226
  227
  228
  229
  230
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
          "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
  <title>Open Projects</title>
  <link type="text/css" rel="stylesheet" href="menu.css">
  <link type="text/css" rel="stylesheet" href="content.css">
  <script type="text/javascript" src="scripts/menu.js"></script>
</head>
<body>

<div id="page">
<!--#include virtual="menu.html.incl"-->
<div id="content">

<h1>Open Projects</h1>

<p>This page lists several projects that would boost analyzer's usability and
power. Most of the projects listed here are infrastructure-related so this list
is an addition to the <a href="potential_checkers.html">potential checkers
list</a>. If you are interested in tackling one of these, please send an email
to the <a href=https://lists.llvm.org/mailman/listinfo/cfe-dev>cfe-dev
mailing list</a> to notify other members of the community.</p>

<ul>
  <li>Release checkers from "alpha"
    <p>New checkers which were contributed to the analyzer,
    but have not passed a rigorous evaluation process,
    are committed as "alpha checkers" (from "alpha version"),
    and are not enabled by default.</p>

    <p>Ideally, only the checkers which are actively being worked on should be in
    "alpha",
    but over the years the development of many of those has stalled.
    Such checkers should either be improved
    up to a point where they can be enabled by default,
    or removed from the analyzer entirely.

    <ul>
      <li><code>alpha.security.ArrayBound</code> and
      <code>alpha.security.ArrayBoundV2</code>
      <p>Array bounds checking is a desired feature,
      but having an acceptable rate of false positives might not be possible
      without a proper
      <a href="https://en.wikipedia.org/wiki/Widening_(computer_science)">loop widening</a> support.
      Additionally, it might be more promising to perform index checking based on
      <a href="https://en.wikipedia.org/wiki/Taint_checking">tainted</a> index values.
      <p><i>(Difficulty: Medium)</i></p></p>
      </li>

      <li><code>alpha.unix.StreamChecker</code>
        <p>A SimpleStreamChecker has been presented in the Building a Checker in 24
        Hours talk
        (<a href="https://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">slides</a>
        <a href="https://youtu.be/kdxlsP5QVPw">video</a>).</p>

        <p>This alpha checker is an attempt to write a production grade stream checker.
        However, it was found to have an unacceptably high false positive rate.
        One of the found problems was that eagerly splitting the state
        based on whether the system call may fail leads to too many reports.
        A <em>delayed</em> split where the implication is stored in the state
        (similarly to nullability implications in <code>TrustNonnullChecker</code>)
        may produce much better results.</p>
        <p><i>(Difficulty: Medium)</i></p>
      </li>
    </ul>
  </li>

  <li>Improve C++ support
  <ul>
    <li>Handle aggregate construction.
      <p><a href="https://en.cppreference.com/w/cpp/language/aggregate_initialization">Aggregates</a>
      are objects that can be brace-initialized without calling a
      constructor (that is, <code><a href="https://clang.llvm.org/doxygen/classclang_1_1CXXConstructExpr.html">
      CXXConstructExpr</a></code> does not occur in the AST),
      but potentially calling
      constructors for their fields and base classes
      These
      constructors of sub-objects need to know what object they are constructing.
      Moreover, if the aggregate contains
      references, lifetime extension needs to be properly modeled.

      One can start untangling this problem by trying to replace the
      current ad-hoc <code><a href="https://clang.llvm.org/doxygen/classclang_1_1ParentMap.html">
      ParentMap</a></code> lookup in <a href="https://clang.llvm.org/doxygen/ExprEngineCXX_8cpp_source.html#l00430">
      <code>CXXConstructExpr::CK_NonVirtualBase</code></a> branch of
      <code>ExprEngine::VisitCXXConstructExpr()</code>
      with proper support for the feature.
      <p><i>(Difficulty: Medium) </i></p></p>
    </li>

    <li>Handle constructors within <code>new[]</code>
      <p>When an array of objects is allocated using the <code>operator new[]</code>,
         constructors for all elements of the array are called.
         We should model (potentially some of) such evaluations,
         and the same applies for destructors called from
         <code>operator delete[]</code>.
      </p>
    </li>

    <li>Handle constructors that can be elided due to Named Return Value Optimization (NRVO)
      <p>Local variables which are returned by values on all return statements
         may be stored directly at the address for the return value,
         eliding the copy or move constructor call.
         Such variables can be identified using the AST call <code>VarDecl::isNRVOVariable</code>.
      </p>
    </li>

    <li>Handle constructors of lambda captures
      <p>Variables which are captured by value into a lambda require a call to
         a copy constructor.
         This call is not currently modeled.
      </p>
    </li>

    <li>Handle constructors for default arguments
      <p>Default arguments in C++ are recomputed at every call,
         and are therefore local, and not static, variables.
      </p>
    </li>

    <li>Enhance the modeling of the standard library.
      <p>The analyzer needs a better understanding of STL in order to be more
      useful on C++ codebases.
      While full library modeling is not an easy task,
      large gains can be achieved by supporting only a few cases:
      e.g. calling <code>.length()</code> on an empty
      <code>std::string</code> always yields zero.
    <p><i>(Difficulty: Medium)</i></p><p>
    </li>

    <li>Enhance CFG to model exception-handling.
      <p>Currently exceptions are treated as "black holes", and exception-handling
      control structures are poorly modeled in order to be conservative.
      This could be improved for both C++ and Objective-C exceptions.
      <p><i>(Difficulty: Hard)</i></p></p>
    </li>
  </ul>
  </li>

  <li>Core Analyzer Infrastructure
  <ul>
    <li>Handle unions.
      <p>Currently in the analyzer the value of a union is always regarded as
      an unknown.
      This problem was
      previously <a href="https://lists.llvm.org/pipermail/cfe-dev/2017-March/052864.html">discussed</a>
      on the mailing list, but no solution was implemented.
      <p><i> (Difficulty: Medium) </i></p></p>
    </li>

    <li>Floating-point support.
      <p>Currently, the analyzer treats all floating-point values as unknown.
      This project would involve adding a new <code>SVal</code> kind
      for constant floats, generalizing the constraint manager to handle floats,
      and auditing existing code to make sure it doesn't
      make incorrect assumptions (most notably, that <code>X == X</code>
      is always true, since it does not hold for <code>NaN</code>).
      <p><i> (Difficulty: Medium)</i></p></p>
    </li>

    <li>Improved loop execution modeling.
      <p>The analyzer simply unrolls each loop <tt>N</tt> times before
      dropping the path, for a fixed constant <tt>N</tt>.
      However, that results in lost coverage in cases where the loop always
      executes more than <tt>N</tt> times.
      A Google Summer Of Code
      <a href="https://summerofcode.withgoogle.com/archive/2017/projects/6071606019358720/">project</a>
      was completed to make the loop bound parameterizable,
      but the <a href="https://en.wikipedia.org/wiki/Widening_(computer_science)">widening</a>
      problem still remains open.

      <p><i> (Difficulty: Hard)</i></p></p>
    </li>

    <li>Basic function summarization support
      <p>The analyzer performs inter-procedural analysis using
      either inlining or "conservative evaluation" (invalidating all data
      passed to the function).
      Often, a very simple summary
      (e.g. "this function is <a href="https://en.wikipedia.org/wiki/Pure_function">pure</a>") would be
      enough to be a large improvement over conservative evaluation.
      Such summaries could be obtained either syntactically,
      or using a dataflow framework.
      <p><i>(Difficulty: Hard)</i></p><p>
    </li>

    <li>Implement a dataflow flamework.
      <p>The analyzer core
      implements a <a href="https://en.wikipedia.org/wiki/Symbolic_execution">symbolic execution</a>
      engine, which performs checks
      (use-after-free, uninitialized value read, etc.)
      over a <em>single</em> program path.
      However, many useful properties
      (dead code, check-after-use, etc.) require
      reasoning over <em>all</em> possible in a program.
      Such reasoning requires a
      <a href="https://en.wikipedia.org/wiki/Data-flow_analysis">dataflow analysis</a> framework.
      Clang already implements
      a few dataflow analyses (most notably, liveness),
      but they implemented in an ad-hoc fashion.
      A proper framework would enable us writing many more useful checkers.
      <p><i> (Difficulty: Hard) </i></p></p>
    </li>

    <li>Track type information through casts more precisely.
      <p>The <code>DynamicTypePropagation</code>
      checker is in charge of inferring a region's
      dynamic type based on what operations the code is performing.
      Casts are a rich source of type information that the analyzer currently ignores.
      <p><i>(Difficulty: Medium)</i></p></p>
    </li>

  </ul>
  </li>

  <li>Fixing miscellaneous bugs
    <p>Apart from the open projects listed above,
       contributors are welcome to fix any of the outstanding
       <a href="https://bugs.llvm.org/buglist.cgi?component=Static%20Analyzer&list_id=147756&product=clang&resolution=---">bugs</a>
       in the Bugzilla.
       <p><i>(Difficulty: Anything)</i></p></p>
  </li>

</ul>

</div>
</div>
</body>
</html>