[analyzer] Add overview and checker registration to the checker developer manual.

llvm-svn: 143911
This commit is contained in:
Anna Zaks 2011-11-07 05:36:29 +00:00
parent ffc8ca2d84
commit 5259086b8e
1 changed files with 121 additions and 27 deletions

View File

@ -28,8 +28,9 @@ for general developer guidelines and information. </p>
<ul>
<li><a href="#start">Getting Started</a></li>
<li><a href="#analyzer">Static Analyzer Overview</a></li>
<li><a href="#analyzer">Analyzer Overview</a></li>
<li><a href="#idea">Idea for a Checker</a></li>
<li><a href="#registration">Checker Registration</a></li>
<li><a href="#skeleton">Checker Skeleton</a></li>
<li><a href="#node">Exploded Node</a></li>
<li><a href="#bugs">Bug Reports</a></li>
@ -40,16 +41,19 @@ for general developer guidelines and information. </p>
<h2 id=start>Getting Started</h2>
<ul>
<li>To check out the source code and build the project, follow steps 1-4 of the <a href="http://clang.llvm.org/get_started.html">Clang Getting Started</a>
<li>To check out the source code and build the project, follow steps 1-4 of
the <a href="http://clang.llvm.org/get_started.html">Clang Getting Started</a>
page.</li>
<li>The analyzer source code is located under the Clang source tree:
<br><tt>
$ <b>cd llvm/tools/clang</b>
</tt>
<br>See: <tt>include/clang/StaticAnalyzer</tt>, <tt>lib/StaticAnalyzer</tt>, <tt>test/Analysis</tt>.</li>
<br>See: <tt>include/clang/StaticAnalyzer</tt>, <tt>lib/StaticAnalyzer</tt>,
<tt>test/Analysis</tt>.</li>
<li>The analyzer regression tests can be executed from the Clang's build directory:
<li>The analyzer regression tests can be executed from the Clang's build
directory:
<br><tt>
$ <b>cd ../../../; cd build/tools/clang; TESTDIRS=Analysis make test</b>
</tt></li>
@ -64,7 +68,8 @@ for general developer guidelines and information. </p>
$ <b>clang -cc1 -analyzer-checker-help</b>
</tt></li>
<li>See the analyzer help for different output formats, fine tuning, and debug options:
<li>See the analyzer help for different output formats, fine tuning, and
debug options:
<br><tt>
$ <b>clang -cc1 -help | grep "analyzer"</b>
</tt></li>
@ -72,45 +77,126 @@ for general developer guidelines and information. </p>
</ul>
<h2 id=analyzer>Static Analyzer Overview</h2>
ExplidedGraph, ExplodedNode (ProgramPoint, State)<br>
Engine-Checker Interaction<br>
Symbols<br>
<h2 id=idea>Idea for a Checker</h2>
Here are several questions which you should consider when evaluating your checker idea:
The analyzer core performs symbolic execution of the given program. All the
input values are represented with symbolic values; further, the engine deduces
the values of all the expressions in the program based on the input symbols
and the path. The execution is path sensitive and every possible path through
the program is explored. The explored execution traces are represented with
<a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedGraph.html">ExplidedGraph</a> object.
Each node of the graph is
<a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedNode.html">ExplodedNode</a>,
which consists of a <tt>ProgramPoint</tt> and a <tt>ProgramState</tt>.
<p>
<a href="http://clang.llvm.org/doxygen/classclang_1_1ProgramPoint.html">ProgramPoint</a>
represents the corresponding location in the program (or the CFG graph).
<tt>ProgramPoint</tt> is also used to record additional information on
when/how the state was added. For example, <tt>PostPurgeDeadSymbolsKind</tt>
kind means that the state is the result of purging dead symbols - the
analyzer's equivalent of garbage collection.
<p>
<a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ProgramState.html">ProgramState</a>
represents abstract state of the program. It consists of:
<ul>
<li>Can the check be effectively implemented without path-sensitive analysis? See <a href="#ast">AST Visitors</a>.</li>
<li><tt>Environment</tt> - a mapping from source code expressions to symbolic
values
<li><tt>Store</tt> - a mapping from memory locations to symbolic values
<li><tt>GenericDataMap</tt> - constraints on symbolic values
</ul>
<p>
Checkers are not merely passive receivers of the analyzer core changes - they
actively participate in the <tt>ProgramState</tt> construction through the
<tt>GenericDataMap</tt> which can be used to store the checker-defined part
of the state. Each time the analyzer engine explores a new statement, it
notifies each checker registered to listen for that statement, giving it an
opportunity to either report a bug or modify the state. (As a rule of thumb,
the checker itself should be stateless.) The checkers are called one after another
in the predefined order; thus, calling all the checkers adds a chain to the
<tt>ExplodedGraph</tt>.
<!--
TODO: Add a picture.
<br>
Symbols<br>
FunctionalObjects are used throughout.
-->
<h2 id=idea>Idea for a Checker</h2>
Here are several questions which you should consider when evaluating your
checker idea:
<ul>
<li>Can the check be effectively implemented without path-sensitive
analysis? See <a href="#ast">AST Visitors</a>.</li>
<li>How high the false positive rate is going to be? Looking at the occurrences
of the issue you want to write a checker for in the existing code bases might give you some
ideas. </li>
of the issue you want to write a checker for in the existing code bases might
give you some ideas. </li>
<li>How the current limitations of the analysis will effect the false alarm
rate? Currently, the analyzer only reasons about one procedure at a time (no
inter-procedural analysis). Also, it uses a simple range tracking based solver to model symbolic
execution.</li>
inter-procedural analysis). Also, it uses a simple range tracking based
solver to model symbolic execution.</li>
<li>Consult the <a href="http://llvm.org/bugs/buglist.cgi?query_format=advanced&bug_status=NEW&bug_status=REOPENED&version=trunk&component=Static%20Analyzer&product=clang">Bugzilla database</a>
to get some ideas for new checkers and consider starting with improving/fixing
bugs in the existing checkers.</li>
</ul>
<h2 id=registration>Checker Registration</h2>
All checker implementation files are located in <tt>clang/lib/StaticAnalyzer/Checkers</tt>
folder. Follow the steps below to register a new checker with the analyzer.
<ol>
<li>Create a new checker implementation file, for example <tt>./lib/StaticAnalyzer/Checkers/NewChecker.cpp</tt>
<pre class="code_example">
using namespace clang;
using namespace ento;
namespace {
class NewChecker: public Checker< check::PreStmt<CallExpr> > {
public:
void checkPreStmt(const CallExpr *CE, CheckerContext &Ctx) const {}
}
}
void ento::registerNewChecker(CheckerManager &mgr) {
mgr.registerChecker<NewChecker>();
}
</pre>
<li>Pick the package name for your checker and add the registration code to
<tt>./lib/StaticAnalyzer/Checkers/Checkers.td</tt>. Note, all checkers should
first be developed as experimental. Suppose our new checker performs security
related checks, then we should add the following lines under
<tt>SecurityExperimental</tt> package:
<pre class="code_example">
let ParentPackage = SecurityExperimental in {
...
def NewChecker : Checker<"NewChecker">,
HelpText<"This text should give a short description of the checks performed.">,
DescFile<"NewChecker.cpp">;
...
} // end "security.experimental"
</pre>
<li>Make the source code file visible to CMake by adding it to
<tt>./lib/StaticAnalyzer/Checkers/CMakeLists.txt</tt>.
<li>Compile and see your checker in the list of available checkers by running:<br>
<tt><b>$clang -cc1 -analyzer-checker-help</b></tt>
</ol>
<h2 id=skeleton>Checker Skeleton</h2>
The source code for all the checkers goes into <tt>clang/lib/StaticAnalyzer/Checkers</tt>.<p>
There are two main decisions you need to make:
<ul>
<li> Which events the checker should be tracking.</li>
<li> What data you want to store as part of the checker-specific program state. Try to minimize the checker state as much as possible. </li>
<li> What data you want to store as part of the checker-specific program
state. Try to minimize the checker state as much as possible. </li>
</ul>
Describe the registration process.
<h2 id=bugs>Bug Reports</h2>
<h2 id=ast>AST Visitors</h2>
Some checks might not require path-sensitivity to be effective. Simple AST walk
might be sufficient. If that is the case, consider implementing a Clang compiler warning.
On the other hand, a check might not be acceptable as a compiler
might be sufficient. If that is the case, consider implementing a Clang
compiler warning. On the other hand, a check might not be acceptable as a compiler
warning; for example, because of a relatively high false positive rate. In this
situation, AST callbacks <tt><b>checkASTDecl</b></tt> and
<tt><b>checkASTCodeBody</b></tt> are your best friends.
@ -126,7 +212,8 @@ for general developer guidelines and information. </p>
<h2 id=commands>Useful Commands/Debugging Hints</h2>
<ul>
<li>
While investigating a checker-related issue, instruct the analyzer to only execute a single checker:
While investigating a checker-related issue, instruct the analyzer to only
execute a single checker:
<br><tt>
$ <b>clang -cc1 -analyze -analyzer-checker=osx.KeychainAPI test.c</b>
</tt>
@ -150,19 +237,26 @@ $ <b>clang -cc1 -analyzer-checker-help | grep "debug"</b>
</tt>
</li>
<li>
To see which function is failing while processing a large file use <tt>-analyzer-display-progress</tt> option.
To see which function is failing while processing a large file use
<tt>-analyzer-display-progress</tt> option.
</li>
<li>
While debugging execute <tt>clang -cc1 -analyze -analyzer-checker=core</tt> instead of <tt>clang --analyze</tt>, as the later would call the compiler in a separate process.
While debugging execute <tt>clang -cc1 -analyze -analyzer-checker=core</tt>
instead of <tt>clang --analyze</tt>, as the later would call the compiler
in a separate process.
</li>
<li>
To view <tt>ExplodedGraph</tt> (the state graph explored by the analyzer) while debugging, goto a frame that has <tt>clang::ento::ExprEngine</tt> object and execute:
To view <tt>ExplodedGraph</tt> (the state graph explored by the analyzer) while
debugging, goto a frame that has <tt>clang::ento::ExprEngine</tt> object and
execute:
<br><tt>
(gdb) <b>p ViewGraph(0)</b>
</tt>
</li>
<li>
To see <tt>clang::Expr</tt> while debugging use the following command. If you pass in a SourceManager object, it will also dump the corresponding line in the source code.
To see <tt>clang::Expr</tt> while debugging use the following command. If you
pass in a SourceManager object, it will also dump the corresponding line in the
source code.
<br><tt>
(gdb) <b>p E->dump()</b>
</tt>