llvm-project/clang/lib/StaticAnalyzer
Jordan Rose cc0b1bfa56 [analyzer] Ensure that PathDiagnostics profile the same regardless of path.
PathDiagnostics are actually profiled and uniqued independently of the
path on which the bug occurred. This is used to merge diagnostics that
refer to the same issue along different paths, as well as by the plist
diagnostics to reference files created by the HTML diagnostics.

However, there are two problems with the current implementation:

1) The bug description is included in the profile, but some
   PathDiagnosticConsumers prefer abbreviated descriptions and some
   prefer verbose descriptions. Fixed by including both descriptions in
   the PathDiagnostic objects and always using the verbose one in the profile.

2) The "minimal" path generation scheme provides extra information about
   which events came from macros that the "extensive" scheme does not.
   This resulted not only in different locations for the plist and HTML
   diagnostics, but also in diagnostics being uniqued in the plist output
   but not in the HTML output. Fixed by storing the "end path" location
   explicitly in the PathDiagnostic object, rather than trying to find the
   last piece of the path when the diagnostic is requested.

This should hopefully finish unsticking our internal buildbot.

llvm-svn: 162965
2012-08-31 00:36:26 +00:00
..
Checkers [analyzer] Remove cast inside dyn_cast. 2012-08-30 22:55:32 +00:00
Core [analyzer] Ensure that PathDiagnostics profile the same regardless of path. 2012-08-31 00:36:26 +00:00
Frontend [analyzer] Ensure that PathDiagnostics profile the same regardless of path. 2012-08-31 00:36:26 +00:00
CMakeLists.txt [analyzer] Introduce libclangStaticAnalyzerFrontend and move Checkers/AnalysisConsumer.cpp into Frontend lib. 2011-02-14 18:13:01 +00:00
Makefile [analyzer] AnalyzerFrontend is dependent on AnalyzerCheckers. 2011-02-16 01:40:55 +00:00
README.txt Rename GRState to ProgramState, and cleanup some code formatting along the way. 2011-08-15 22:09:50 +00:00

README.txt

//===----------------------------------------------------------------------===//
// Clang Static Analyzer
//===----------------------------------------------------------------------===//

= Library Structure =

The analyzer library has two layers: a (low-level) static analysis
engine (GRExprEngine.cpp and friends), and some static checkers
(*Checker.cpp).  The latter are built on top of the former via the
Checker and CheckerVisitor interfaces (Checker.h and
CheckerVisitor.h).  The Checker interface is designed to be minimal
and simple for checker writers, and attempts to isolate them from much
of the gore of the internal analysis engine.

= How It Works =

The analyzer is inspired by several foundational research papers ([1],
[2]).  (FIXME: kremenek to add more links)

In a nutshell, the analyzer is basically a source code simulator that
traces out possible paths of execution.  The state of the program
(values of variables and expressions) is encapsulated by the state
(ProgramState).  A location in the program is called a program point
(ProgramPoint), and the combination of state and program point is a
node in an exploded graph (ExplodedGraph).  The term "exploded" comes
from exploding the control-flow edges in the control-flow graph (CFG).

Conceptually the analyzer does a reachability analysis through the
ExplodedGraph.  We start at a root node, which has the entry program
point and initial state, and then simulate transitions by analyzing
individual expressions.  The analysis of an expression can cause the
state to change, resulting in a new node in the ExplodedGraph with an
updated program point and an updated state.  A bug is found by hitting
a node that satisfies some "bug condition" (basically a violation of a
checking invariant).

The analyzer traces out multiple paths by reasoning about branches and
then bifurcating the state: on the true branch the conditions of the
branch are assumed to be true and on the false branch the conditions
of the branch are assumed to be false.  Such "assumptions" create
constraints on the values of the program, and those constraints are
recorded in the ProgramState object (and are manipulated by the
ConstraintManager).  If assuming the conditions of a branch would
cause the constraints to be unsatisfiable, the branch is considered
infeasible and that path is not taken.  This is how we get
path-sensitivity.  We reduce exponential blow-up by caching nodes.  If
a new node with the same state and program point as an existing node
would get generated, the path "caches out" and we simply reuse the
existing node.  Thus the ExplodedGraph is not a DAG; it can contain
cycles as paths loop back onto each other and cache out.

ProgramState and ExplodedNodes are basically immutable once created.  Once
one creates a ProgramState, you need to create a new one to get a new
ProgramState.  This immutability is key since the ExplodedGraph represents
the behavior of the analyzed program from the entry point.  To
represent these efficiently, we use functional data structures (e.g.,
ImmutableMaps) which share data between instances.

Finally, individual Checkers work by also manipulating the analysis
state.  The analyzer engine talks to them via a visitor interface.
For example, the PreVisitCallExpr() method is called by GRExprEngine
to tell the Checker that we are about to analyze a CallExpr, and the
checker is asked to check for any preconditions that might not be
satisfied.  The checker can do nothing, or it can generate a new
ProgramState and ExplodedNode which contains updated checker state.  If it
finds a bug, it can tell the BugReporter object about the bug,
providing it an ExplodedNode which is the last node in the path that
triggered the problem.

= Notes about C++ =

Since now constructors are seen before the variable that is constructed 
in the CFG, we create a temporary object as the destination region that 
is constructed into. See ExprEngine::VisitCXXConstructExpr().

In ExprEngine::processCallExit(), we always bind the object region to the
evaluated CXXConstructExpr. Then in VisitDeclStmt(), we compute the
corresponding lazy compound value if the variable is not a reference, and
bind the variable region to the lazy compound value. If the variable
is a reference, just use the object region as the initilizer value.

Before entering a C++ method (or ctor/dtor), the 'this' region is bound
to the object region. In ctors, we synthesize 'this' region with  
CXXRecordDecl*, which means we do not use type qualifiers. In methods, we
synthesize 'this' region with CXXMethodDecl*, which has getThisType() 
taking type qualifiers into account. It does not matter we use qualified
'this' region in one method and unqualified 'this' region in another
method, because we only need to ensure the 'this' region is consistent 
when we synthesize it and create it directly from CXXThisExpr in a single
method call.

= Working on the Analyzer =

If you are interested in bringing up support for C++ expressions, the
best place to look is the visitation logic in GRExprEngine, which
handles the simulation of individual expressions.  There are plenty of
examples there of how other expressions are handled.

If you are interested in writing checkers, look at the Checker and
CheckerVisitor interfaces (Checker.h and CheckerVisitor.h).  Also look
at the files named *Checker.cpp for examples on how you can implement
these interfaces.

= Debugging the Analyzer =

There are some useful command-line options for debugging.  For example:

$ clang -cc1 -help | grep analyze
 -analyze-function <value>
 -analyzer-display-progress
 -analyzer-viz-egraph-graphviz
 ...

The first allows you to specify only analyzing a specific function.
The second prints to the console what function is being analyzed.  The
third generates a graphviz dot file of the ExplodedGraph.  This is
extremely useful when debugging the analyzer and viewing the
simulation results.

Of course, viewing the CFG (Control-Flow Graph) is also useful:

$ clang -cc1 -help | grep cfg
 -cfg-add-implicit-dtors Add C++ implicit destructors to CFGs for all analyses
 -cfg-add-initializers   Add C++ initializers to CFGs for all analyses
 -cfg-dump               Display Control-Flow Graphs
 -cfg-view               View Control-Flow Graphs using GraphViz
 -unoptimized-cfg        Generate unoptimized CFGs for all analyses

-cfg-dump dumps a textual representation of the CFG to the console,
and -cfg-view creates a GraphViz representation.

= References =

[1] Precise interprocedural dataflow analysis via graph reachability,
    T Reps, S Horwitz, and M Sagiv, POPL '95,
    http://portal.acm.org/citation.cfm?id=199462

[2] A memory model for static analysis of C programs, Z Xu, T
    Kremenek, and J Zhang, http://lcs.ios.ac.cn/~xzx/memmodel.pdf