llvm-project/clang/www/analyzer/open_projects.html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
          "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
  <title>Open Projects</title>
  <link type="text/css" rel="stylesheet" href="menu.css">
  <link type="text/css" rel="stylesheet" href="content.css">
  <script type="text/javascript" src="scripts/menu.js"></script>
</head>
<body>

<div id="page">
<!--#include virtual="menu.html.incl"-->
<div id="content">

<h1>Open Projects</h1>

<p>This page lists several projects that would boost analyzer's usability and
power. Most of the projects listed here are infrastructure-related so this list
is an addition to the <a href="potential_checkers.html">potential checkers
list</a>. If you are interested in tackling one of these, please send an email
to the <a href=https://lists.llvm.org/mailman/listinfo/cfe-dev>cfe-dev
mailing list</a> to notify other members of the community.</p>

<ul>
  <li>Release checkers from "alpha"
    <p>New checkers which were contributed to the analyzer,
    but have not passed a rigorous evaluation process,
    are committed as "alpha checkers" (from "alpha version"),
    and are not enabled by default.</p>

    <p>Ideally, only the checkers which are actively being worked on should be in
    "alpha",
    but over the years the development of many of those has stalled.
    Such checkers should either be improved
    up to a point where they can be enabled by default,
    or removed from the analyzer entirely.

    <ul>
      <li><code>alpha.security.ArrayBound</code> and
      <code>alpha.security.ArrayBoundV2</code>
      <p>Array bounds checking is a desired feature,
      but having an acceptable rate of false positives might not be possible
      without a proper
      <a href="https://en.wikipedia.org/wiki/Widening_(computer_science)">loop widening</a> support.
      Additionally, it might be more promising to perform index checking based on
      <a href="https://en.wikipedia.org/wiki/Taint_checking">tainted</a> index values.
      <p><i>(Difficulty: Medium)</i></p></p>
      </li>

      <li><code>alpha.unix.StreamChecker</code>
        <p>A SimpleStreamChecker has been presented in the Building a Checker in 24
        Hours talk
        (<a href="https://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">slides</a>
        <a href="https://youtu.be/kdxlsP5QVPw">video</a>).</p>

        <p>This alpha checker is an attempt to write a production grade stream checker.
        However, it was found to have an unacceptably high false positive rate.
        One of the found problems was that eagerly splitting the state
        based on whether the system call may fail leads to too many reports.
        A <em>delayed</em> split where the implication is stored in the state
        (similarly to nullability implications in <code>TrustNonnullChecker</code>)
        may produce much better results.</p>
        <p><i>(Difficulty: Medium)</i></p>
      </li>
    </ul>
  </li>

  <li>Improve C++ support
  <ul>
    <li>Handle construction as part of aggregate initialization.
      <p><a href="https://en.cppreference.com/w/cpp/language/aggregate_initialization">Aggregates</a>
      are objects that can be brace-initialized without calling a
      constructor (that is, <code><a href="https://clang.llvm.org/doxygen/classclang_1_1CXXConstructExpr.html">
      CXXConstructExpr</a></code> does not occur in the AST),
      but potentially calling
      constructors for their fields and base classes
      These
      constructors of sub-objects need to know what object they are constructing.
      Moreover, if the aggregate contains
      references, lifetime extension needs to be properly modeled.

      One can start untangling this problem by trying to replace the
      current ad-hoc <code><a href="https://clang.llvm.org/doxygen/classclang_1_1ParentMap.html">
      ParentMap</a></code> lookup in <a href="https://clang.llvm.org/doxygen/ExprEngineCXX_8cpp_source.html#l00430">
      <code>CXXConstructExpr::CK_NonVirtualBase</code></a> branch of
      <code>ExprEngine::VisitCXXConstructExpr()</code>
      with proper support for the feature.
      <p><i>(Difficulty: Medium) </i></p></p>
    </li>

    <li>Handle array constructors.
      <p>When an array of objects is allocated (say, using the
         <code>operator new[]</code> or defining a stack array),
         constructors for all elements of the array are called.
         We should model (potentially some of) such evaluations,
         and the same applies for destructors called from
         <code>operator delete[]</code>.
         See tests cases in <a href="https://github.com/llvm/llvm-project/tree/master/clang/test/Analysis/handle_constructors_with_new_array.cpp">handle_constructors_with_new_array.cpp</a>.
      </p>
      <p>
      Constructing an array requires invoking multiple (potentially unknown)
      amount of constructors with the same construct-expression.
      Apart from the technical difficulties of juggling program points around
      correctly to avoid accidentally merging paths together, we'll have to
      be a judge on when to exit the loop and how to widen it.
      Given that the constructor is going to be a default constructor,
      a nice 95% solution might be to execute exactly one constructor and
      then default-bind the resulting LazyCompoundVal to the whole array;
      it'll work whenever the default constructor doesn't touch global state
      but only initializes the object to various default values.
      But if, say, we're making an array of strings,
      depending on the implementation you might have to allocate a new buffer
      for each string, and in this case default-binding won't cut it.
      We might want to come up with an auxiliary analysis in order to perform
      widening of these simple loops more precisely.
      </p>
    </li>

    <li>Handle constructors that can be elided due to Named Return Value Optimization (NRVO)
      <p>Local variables which are returned by values on all return statements
         may be stored directly at the address for the return value,
         eliding the copy or move constructor call.
         Such variables can be identified using the AST call <code>VarDecl::isNRVOVariable</code>.
      </p>
    </li>

    <li>Handle constructors of lambda captures
      <p>Variables which are captured by value into a lambda require a call to
         a copy constructor.
         This call is not currently modeled.
      </p>
    </li>

    <li>Handle constructors for default arguments
      <p>Default arguments in C++ are recomputed at every call,
         and are therefore local, and not static, variables.
         See tests cases in <a href="https://github.com/llvm/llvm-project/tree/master/clang/test/Analysis/handle_constructors_for_default_arguments.cpp">handle_constructors_for_default_arguments.cpp</a>.
      </p>
      <p>
      Default arguments are annoying because the initializer expression is
      evaluated at the call site but doesn't syntactically belong to the
      caller's AST; instead it belongs to the ParmVarDecl for the default
      parameter. This can lead to situations when the same expression has to
      carry different values simultaneously -
      when multiple instances of the same function are evaluated as part of the
      same full-expression without specifying the default arguments.
      Even simply calling the function twice (not necessarily within the
      same full-expression) may lead to program points agglutinating because
      it's the same expression. There are some nasty test cases already
      in temporaries.cpp (struct DefaultParam and so on). I recommend adding a
      new LocationContext kind specifically to deal with this problem. It'll
      also help you figure out the construction context when you evaluate the
      construct-expression (though you might still need to do some additional
      CFG work to get construction contexts right).
      </p>
    </li>

    <li>Enhance the modeling of the standard library.
      <p>The analyzer needs a better understanding of STL in order to be more
      useful on C++ codebases.
      While full library modeling is not an easy task,
      large gains can be achieved by supporting only a few cases:
      e.g. calling <code>.length()</code> on an empty
      <code>std::string</code> always yields zero.
    <p><i>(Difficulty: Medium)</i></p><p>
    </li>

    <li>Enhance CFG to model exception-handling.
      <p>Currently exceptions are treated as "black holes", and exception-handling
      control structures are poorly modeled in order to be conservative.
      This could be improved for both C++ and Objective-C exceptions.
      <p><i>(Difficulty: Hard)</i></p></p>
    </li>
  </ul>
  </li>

  <li>Core Analyzer Infrastructure
  <ul>
    <li>Handle unions.
      <p>Currently in the analyzer the value of a union is always regarded as
      an unknown.
      This problem was
      previously <a href="https://lists.llvm.org/pipermail/cfe-dev/2017-March/052864.html">discussed</a>
      on the mailing list, but no solution was implemented.
      <p><i> (Difficulty: Medium) </i></p></p>
    </li>

    <li>Floating-point support.
      <p>Currently, the analyzer treats all floating-point values as unknown.
      This project would involve adding a new <code>SVal</code> kind
      for constant floats, generalizing the constraint manager to handle floats,
      and auditing existing code to make sure it doesn't
      make incorrect assumptions (most notably, that <code>X == X</code>
      is always true, since it does not hold for <code>NaN</code>).
      <p><i> (Difficulty: Medium)</i></p></p>
    </li>

    <li>Improved loop execution modeling.
      <p>The analyzer simply unrolls each loop <tt>N</tt> times before
      dropping the path, for a fixed constant <tt>N</tt>.
      However, that results in lost coverage in cases where the loop always
      executes more than <tt>N</tt> times.
      A Google Summer Of Code
      <a href="https://summerofcode.withgoogle.com/archive/2017/projects/6071606019358720/">project</a>
      was completed to make the loop bound parameterizable,
      but the <a href="https://en.wikipedia.org/wiki/Widening_(computer_science)">widening</a>
      problem still remains open.

      <p><i> (Difficulty: Hard)</i></p></p>
    </li>

    <li>Basic function summarization support
      <p>The analyzer performs inter-procedural analysis using
      either inlining or "conservative evaluation" (invalidating all data
      passed to the function).
      Often, a very simple summary
      (e.g. "this function is <a href="https://en.wikipedia.org/wiki/Pure_function">pure</a>") would be
      enough to be a large improvement over conservative evaluation.
      Such summaries could be obtained either syntactically,
      or using a dataflow framework.
      <p><i>(Difficulty: Hard)</i></p><p>
    </li>

    <li>Implement a dataflow flamework.
      <p>The analyzer core
      implements a <a href="https://en.wikipedia.org/wiki/Symbolic_execution">symbolic execution</a>
      engine, which performs checks
      (use-after-free, uninitialized value read, etc.)
      over a <em>single</em> program path.
      However, many useful properties
      (dead code, check-after-use, etc.) require
      reasoning over <em>all</em> possible in a program.
      Such reasoning requires a
      <a href="https://en.wikipedia.org/wiki/Data-flow_analysis">dataflow analysis</a> framework.
      Clang already implements
      a few dataflow analyses (most notably, liveness),
      but they implemented in an ad-hoc fashion.
      A proper framework would enable us writing many more useful checkers.
      <p><i> (Difficulty: Hard) </i></p></p>
    </li>

    <li>Track type information through casts more precisely.
      <p>The <code>DynamicTypePropagation</code>
      checker is in charge of inferring a region's
      dynamic type based on what operations the code is performing.
      Casts are a rich source of type information that the analyzer currently ignores.
      <p><i>(Difficulty: Medium)</i></p></p>
    </li>

  </ul>
  </li>

  <li>Fixing miscellaneous bugs
    <p>Apart from the open projects listed above,
       contributors are welcome to fix any of the outstanding
       <a href="https://bugs.llvm.org/buglist.cgi?component=Static%20Analyzer&list_id=147756&product=clang&resolution=---">bugs</a>
       in the Bugzilla.
       <p><i>(Difficulty: Anything)</i></p></p>
  </li>

</ul>

</div>
</div>
</body>
</html>