2007-12-10 13:20:47 +08:00
|
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
|
|
|
|
"http://www.w3.org/TR/html4/strict.dtd">
|
2007-10-06 13:23:00 +08:00
|
|
|
<html>
|
|
|
|
<head>
|
2007-12-10 13:20:47 +08:00
|
|
|
<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
|
2007-12-10 13:52:05 +08:00
|
|
|
<title>Clang - Features and Goals</title>
|
2007-12-10 13:20:47 +08:00
|
|
|
<link type="text/css" rel="stylesheet" href="menu.css" />
|
|
|
|
<link type="text/css" rel="stylesheet" href="content.css" />
|
|
|
|
<style type="text/css">
|
2007-10-06 13:23:00 +08:00
|
|
|
</style>
|
|
|
|
</head>
|
|
|
|
<body>
|
2007-12-10 13:20:47 +08:00
|
|
|
|
2007-10-06 13:23:00 +08:00
|
|
|
<!--#include virtual="menu.html.incl"-->
|
2007-12-10 13:20:47 +08:00
|
|
|
|
2007-10-06 13:23:00 +08:00
|
|
|
<div id="content">
|
|
|
|
|
2007-12-10 16:12:49 +08:00
|
|
|
<!--*************************************************************************-->
|
2007-12-10 13:52:05 +08:00
|
|
|
<h1>Clang - Features and Goals</h1>
|
2007-12-10 16:12:49 +08:00
|
|
|
<!--*************************************************************************-->
|
|
|
|
|
2007-12-10 13:52:05 +08:00
|
|
|
<p>
|
|
|
|
This page describes the <a href="index.html#goals">features and goals</a> of
|
|
|
|
Clang in more detail and gives a more broad explanation about what we mean.
|
|
|
|
These features are:
|
|
|
|
</p>
|
2007-10-06 13:23:00 +08:00
|
|
|
|
2007-12-10 15:14:08 +08:00
|
|
|
<p>End-User Features:</p>
|
|
|
|
|
|
|
|
<ul>
|
2007-12-13 13:42:27 +08:00
|
|
|
<li><a href="#performance">Fast compiles and low memory use</a></li>
|
2007-12-10 16:19:29 +08:00
|
|
|
<li><a href="#expressivediags">Expressive diagnostics</a></li>
|
2007-12-10 15:23:52 +08:00
|
|
|
<li><a href="#gcccompat">GCC compatibility</a></li>
|
2007-12-10 15:14:08 +08:00
|
|
|
</ul>
|
|
|
|
|
2007-12-10 16:12:49 +08:00
|
|
|
<p>Utility and Applications:</p>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
<li><a href="#libraryarch">Library based architecture</a></li>
|
|
|
|
<li><a href="#diverseclients">Support diverse clients</a></li>
|
|
|
|
<li><a href="#ideintegration">Integration with IDEs</a></li>
|
|
|
|
<li><a href="#license">Use the LLVM 'BSD' License</a></li>
|
|
|
|
</ul>
|
|
|
|
|
|
|
|
<p>Internal Design and Implementation:</p>
|
|
|
|
|
2007-12-10 13:52:05 +08:00
|
|
|
<ul>
|
|
|
|
<li><a href="#real">A real-world, production quality compiler</a></li>
|
2007-12-10 15:23:52 +08:00
|
|
|
<li><a href="#simplecode">A simple and hackable code base</a></li>
|
2007-12-10 13:52:05 +08:00
|
|
|
<li><a href="#unifiedparser">A single unified parser for C, Objective C, C++,
|
|
|
|
and Objective C++</a></li>
|
|
|
|
<li><a href="#conformance">Conformance with C/C++/ObjC and their
|
|
|
|
variants</a></li>
|
|
|
|
</ul>
|
|
|
|
|
2007-12-10 16:12:49 +08:00
|
|
|
<!--*************************************************************************-->
|
2008-06-17 14:35:36 +08:00
|
|
|
<h2><a name="enduser">End-User Features</a></h2>
|
2007-12-10 16:12:49 +08:00
|
|
|
<!--*************************************************************************-->
|
2007-12-10 15:14:08 +08:00
|
|
|
|
|
|
|
|
|
|
|
<!--=======================================================================-->
|
2008-06-17 14:35:36 +08:00
|
|
|
<h3><a name="performance">Fast compiles and Low Memory Use</a></h3>
|
2007-12-10 15:14:08 +08:00
|
|
|
<!--=======================================================================-->
|
|
|
|
|
|
|
|
<p>A major focus of our work on clang is to make it fast, light and scalable.
|
|
|
|
The library-based architecture of clang makes it straight-forward to time and
|
|
|
|
profile the cost of each layer of the stack, and the driver has a number of
|
|
|
|
options for performance analysis.</p>
|
|
|
|
|
|
|
|
<p>While there is still much that can be done, we find that the clang front-end
|
|
|
|
is significantly quicker than gcc and uses less memory For example, when
|
|
|
|
compiling "Carbon.h" on Mac OS/X, we see that clang is 2.5x faster than GCC:</p>
|
|
|
|
|
|
|
|
<img class="img_slide" src="feature-compile1.png" width="400" height="300" />
|
|
|
|
|
|
|
|
<p>Carbon.h is a monster: it transitively includes 558 files, 12.3M of code,
|
|
|
|
declares 10000 functions, has 2000 struct definitions, 8000 fields, 20000 enum
|
|
|
|
constants, etc (see slide 25+ of the <a href="clang_video-07-25-2007.html">clang
|
|
|
|
talk</a> for more information). It is also #include'd into almost every C file
|
|
|
|
in a GUI app on the Mac, so its compile time is very important.</p>
|
|
|
|
|
|
|
|
<p>From the slide above, you can see that we can measure the time to preprocess
|
|
|
|
the file independently from the time to parse it, and independently from the
|
|
|
|
time to build the ASTs for the code. GCC doesn't provide a way to measure the
|
|
|
|
parser without AST building (it only provides -fsyntax-only). In our
|
|
|
|
measurements, we find that clang's preprocessor is consistently 40% faster than
|
|
|
|
GCCs, and the parser + AST builder is ~4x faster than GCC's. If you have
|
|
|
|
sources that do not depend as heavily on the preprocessor (or if you
|
|
|
|
use Precompiled Headers) you may see a much bigger speedup from clang.
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<p>Compile time performance is important, but when using clang as an API, often
|
|
|
|
memory use is even moreso: the less memory the code takes the more code you can
|
|
|
|
fit into memory at a time (useful for whole program analysis tools, for
|
|
|
|
example).</p>
|
|
|
|
|
|
|
|
<img class="img_slide" src="feature-memory1.png" width="400" height="300" />
|
|
|
|
|
|
|
|
<p>Here we see a huge advantage of clang: its ASTs take <b>5x less memory</b>
|
|
|
|
than GCC's syntax trees, despite the fact that clang's ASTs capture far more
|
|
|
|
source-level information than GCC's trees do. This feat is accomplished through
|
|
|
|
the use of carefully designed APIs and efficient representations.</p>
|
|
|
|
|
|
|
|
<p>In addition to being efficient when pitted head-to-head against GCC in batch
|
|
|
|
mode, clang is built with a <a href="#libraryarch">library based
|
|
|
|
architecture</a> that makes it relatively easy to adapt it and build new tools
|
|
|
|
with it. This means that it is often possible to apply out-of-the-box thinking
|
|
|
|
and novel techniques to improve compilation in various ways.</p>
|
|
|
|
|
|
|
|
<img class="img_slide" src="feature-compile2.png" width="400" height="300" />
|
|
|
|
|
|
|
|
<p>This slide shows how the clang preprocessor can be used to make "distcc"
|
|
|
|
parallelization <b>3x</b> more scalable than when using the GCC preprocessor.
|
|
|
|
"distcc" quickly bottlenecks on the preprocessor running on the central driver
|
|
|
|
machine, so a fast preprocessor is very useful. Comparing the first two bars
|
|
|
|
of each group shows how a ~40% faster preprocessor can reduce preprocessing time
|
|
|
|
of these large C++ apps by about 40% (shocking!).</p>
|
|
|
|
|
|
|
|
<p>The third bar on the slide is the interesting part: it shows how trivial
|
|
|
|
caching of file system accesses across invocations of the preprocessor allows
|
|
|
|
clang to reduce time spent in the kernel by 10x, making distcc over 3x more
|
|
|
|
scalable. This is obviously just one simple hack, doing more interesting things
|
|
|
|
(like caching tokens across preprocessed files) would yield another substantial
|
|
|
|
speedup.</p>
|
|
|
|
|
|
|
|
<p>The clean framework-based design of clang means that many things are possible
|
|
|
|
that would be very difficult in other systems, for example incremental
|
|
|
|
compilation, multithreading, intelligent caching, etc. We are only starting
|
|
|
|
to tap the full potential of the clang design.</p>
|
|
|
|
|
|
|
|
|
|
|
|
<!--=======================================================================-->
|
2008-06-17 14:35:36 +08:00
|
|
|
<h3><a name="expressivediags">Expressive Diagnostics</a></h3>
|
2007-12-10 15:14:08 +08:00
|
|
|
<!--=======================================================================-->
|
|
|
|
|
2009-03-19 14:52:51 +08:00
|
|
|
<p>In addition to being fast and functional, we aim to make Clang extremely user
|
|
|
|
friendly. As far as a command-line compiler goes, this basically boils down to
|
|
|
|
making the diagnostics (error and warning messages) generated by the compiler
|
|
|
|
be as useful as possible. There are several ways that we do this. This section
|
|
|
|
talks about the experience provided by the command line compiler, contrasting
|
|
|
|
Clang output to GCC 4.2's output in several examples.
|
|
|
|
<!--
|
|
|
|
Other clients
|
|
|
|
that embed Clang and extract equivalent information through internal APIs.-->
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<h4>Column Numbers and Caret Diagnostics</h4>
|
|
|
|
|
|
|
|
<p>First, all diagnostics produced by clang include full column number
|
|
|
|
information, and use this to print "caret diagnostics". This is a feature
|
|
|
|
provided by many commercial compilers, but is generally missing from open source
|
|
|
|
compilers. This is nice because it makes it very easy to understand exactly
|
|
|
|
what is wrong in a particular piece of code, an example is:</p>
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
$ <b>gcc-4.2 -fsyntax-only -Wformat format-strings.c</b>
|
|
|
|
format-strings.c:91: warning: too few arguments for format
|
|
|
|
$ <b>clang -fsyntax-only format-strings.c</b>
|
|
|
|
format-strings.c:91:13: warning: '.*' specified field precision is missing a matching 'int' argument
|
|
|
|
<font color="darkgreen"> printf("%.*d");</font>
|
|
|
|
<font color="blue"> ^</font>
|
|
|
|
</pre>
|
|
|
|
|
|
|
|
<p>The caret (the blue "^" character) exactly shows where the problem is, even
|
|
|
|
inside of the string. This makes it really easy to jump to the problem and
|
|
|
|
helps when multiple instances of the same character occur on a line. We'll
|
|
|
|
revisit this more in following examples.</p>
|
|
|
|
|
|
|
|
<h4>Range Highlighting for Related Text</h4>
|
|
|
|
|
|
|
|
<p>Clang captures and accurately tracks range information for expressions,
|
|
|
|
statements, and other constructs in your program and uses this to make
|
|
|
|
diagnostics highlight related information. For example, here's a somewhat
|
|
|
|
nonsensical example to illustrate this:</p>
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
$ <b>gcc-4.2 -fsyntax-only t.c</b>
|
|
|
|
t.c:7: error: invalid operands to binary + (have 'int' and 'struct A')
|
|
|
|
$ <b>clang -fsyntax-only t.c</b>
|
|
|
|
t.c:7:39: error: invalid operands to binary expression ('int' and 'struct A')
|
|
|
|
<font color="darkgreen"> return y + func(y ? ((SomeA.X + 40) + SomeA) / 42 + SomeA.X : SomeA.X);</font>
|
|
|
|
<font color="blue"> ~~~~~~~~~~~~~~ ^ ~~~~~</font>
|
|
|
|
</pre>
|
|
|
|
|
|
|
|
<p>Here you can see that you don't even need to see the original source code to
|
|
|
|
understand what is wrong based on the Clang error: Because clang prints a
|
|
|
|
caret, you know exactly <em>which</em> plus it is complaining about. The range
|
|
|
|
information highlights the left and right side of the plus which makes it
|
|
|
|
immediately obvious what the compiler is talking about, which is very useful for
|
|
|
|
cases involving precedence issues and many other cases.</p>
|
|
|
|
|
|
|
|
<h4>Precision in Wording</h4>
|
|
|
|
|
|
|
|
<p>A detail is that we have tried really hard to make the diagnostics that come
|
|
|
|
out of clang contain exactly the pertinent information about what is wrong and
|
|
|
|
why. In the example above, we tell you what the inferred types are for
|
|
|
|
the left and right hand sides, and we don't repeat what is obvious from the
|
|
|
|
caret (that this is a "binary +"). Many other examples abound, here is a simple
|
|
|
|
one:</p>
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
$ <b>gcc-4.2 -fsyntax-only t.c</b>
|
|
|
|
t.c:5: error: invalid type argument of 'unary *'
|
|
|
|
$ <b>clang -fsyntax-only t.c</b>
|
|
|
|
t.c:5:11: error: indirection requires pointer operand ('int' invalid)
|
|
|
|
<font color="darkgreen"> int y = *SomeA.X;</font>
|
|
|
|
<font color="blue"> ^~~~~~~~</font>
|
|
|
|
</pre>
|
|
|
|
|
|
|
|
<p>In this example, not only do we tell you that there is a problem with the *
|
|
|
|
and point to it, we say exactly why and tell you what the type is (in case it is
|
|
|
|
a complicated subexpression, such as a call to an overloaded function). This
|
|
|
|
sort of attention to detail makes it much easier to understand and fix problems
|
|
|
|
quickly.</p>
|
|
|
|
|
|
|
|
<h4>No Pretty Printing of Expressions in Diagnostics</h4>
|
|
|
|
|
|
|
|
<p>Since Clang has range highlighting, it never needs to pretty print your code
|
|
|
|
back out to you. This is particularly bad in G++ (which often emits errors
|
|
|
|
containing lowered vtable references), but even GCC can produce
|
|
|
|
inscrutible error messages in some cases when it tries to do this. In this
|
|
|
|
example P and Q have type "int*":</p>
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
$ <b>gcc-4.2 -fsyntax-only t.c</b>
|
|
|
|
#'exact_div_expr' not supported by pp_c_expression#'t.c:12: error: called object is not a function
|
|
|
|
$ <b>clang -fsyntax-only t.c</b>
|
|
|
|
t.c:12:8: error: called object type 'int' is not a function or function pointer
|
|
|
|
<font color="darkgreen"> (P-Q)();</font>
|
|
|
|
<font color="blue"> ~~~~~^</font>
|
|
|
|
</pre>
|
|
|
|
|
|
|
|
|
|
|
|
<h4>Typedef Preservation and Selective Unwrapping</h4>
|
|
|
|
|
|
|
|
<p>Many programmers use high-level user defined types, typedefs, and other
|
|
|
|
syntactic sugar to refer to types in their program. This is useful because they
|
|
|
|
can abbreviate otherwise very long types and it is useful to preserve the
|
|
|
|
typename in diagnostics. However, sometimes very simple typedefs can wrap
|
|
|
|
trivial types and it is important to strip off the typedef to understand what
|
|
|
|
is going on. Clang aims to handle both cases well.<p>
|
|
|
|
|
|
|
|
<p>For example, here is an example that shows where it is important to preserve
|
|
|
|
a typedef in C:</p>
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
$ <b>gcc-4.2 -fsyntax-only t.c</b>
|
|
|
|
t.c:15: error: invalid operands to binary / (have 'float __vector__' and 'const int *')
|
|
|
|
$ <b>clang -fsyntax-only t.c</b>
|
|
|
|
t.c:15:11: error: can't convert between vector values of different size ('__m128' and 'int const *')
|
|
|
|
<font color="darkgreen"> myvec[1]/P;</font>
|
|
|
|
<font color="blue"> ~~~~~~~~^~</font>
|
|
|
|
</pre>
|
|
|
|
|
|
|
|
<p>Here the type printed by GCC isn't even valid, but if the error were about a
|
|
|
|
very long and complicated type (as often happens in C++) the error message would
|
|
|
|
be ugly just because it was long and hard to read. Here's an example where it
|
|
|
|
is useful for the compiler to expose underlying details of a typedef:</p>
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
$ <b>gcc-4.2 -fsyntax-only t.c</b>
|
|
|
|
t.c:13: error: request for member 'x' in something not a structure or union
|
|
|
|
$ <b>clang -fsyntax-only t.c</b>
|
|
|
|
t.c:13:9: error: member reference base type 'pid_t' (aka 'int') is not a structure or union
|
|
|
|
<font color="darkgreen"> myvar = myvar.x;</font>
|
|
|
|
<font color="blue"> ~~~~~ ^</font>
|
|
|
|
</pre>
|
|
|
|
|
|
|
|
<p>If the user was somehow confused about how the system "pid_t" typedef is
|
|
|
|
defined, Clang helpfully displays it with "aka".</p>
|
|
|
|
|
|
|
|
<h4>Automatic Macro Expansion</h4>
|
|
|
|
|
|
|
|
<p>Many errors happen in macros that are sometimes deeply nested. With
|
|
|
|
traditional compilers, you need to dig deep into the definition of the macro to
|
|
|
|
understand how you got into trouble. Here's a simple example that shows how
|
|
|
|
Clang helps you out:</p>
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
$ <b>gcc-4.2 -fsyntax-only t.c</b>
|
|
|
|
t.c: In function 'test':
|
|
|
|
t.c:80: error: invalid operands to binary < (have 'struct mystruct' and 'float')
|
|
|
|
$ <b>clang -fsyntax-only t.c</b>
|
|
|
|
t.c:80:3: error: invalid operands to binary expression ('typeof(P)' (aka 'struct mystruct') and 'typeof(F)' (aka 'float'))
|
|
|
|
<font color="darkgreen"> X = MYMAX(P, F);</font>
|
|
|
|
<font color="blue"> ^~~~~~~~~~~</font>
|
|
|
|
t.c:76:94: note: instantiated from:
|
|
|
|
<font color="darkgreen">#define MYMAX(A,B) __extension__ ({ __typeof__(A) __a = (A); __typeof__(B) __b = (B); __a < __b ? __b : __a; })</font>
|
|
|
|
<font color="blue"> ~~~ ^ ~~~</font>
|
|
|
|
</pre>
|
|
|
|
|
|
|
|
<p>This shows how clang automatically prints instantiation information and
|
|
|
|
nested range information for diagnostics as they are instantiated through macros
|
|
|
|
and also shows how some of the other pieces work in a bigger example. Here's
|
|
|
|
another real world warning that occurs in the "window" Unix package (which
|
|
|
|
implements the "wwopen" class of APIs):</p>
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
$ <b>clang -fsyntax-only t.c</b>
|
|
|
|
t.c:22:2: warning: type specifier missing, defaults to 'int'
|
|
|
|
<font color="darkgreen"> ILPAD();</font>
|
|
|
|
<font color="blue"> ^</font>
|
|
|
|
t.c:17:17: note: instantiated from:
|
|
|
|
<font color="darkgreen">#define ILPAD() PAD((NROW - tt.tt_row) * 10) /* 1 ms per char */</font>
|
|
|
|
<font color="blue"> ^</font>
|
|
|
|
t.c:14:2: note: instantiated from:
|
|
|
|
<font color="darkgreen"> register i; \</font>
|
|
|
|
<font color="blue"> ^</font>
|
|
|
|
</pre>
|
|
|
|
|
|
|
|
<p>In practice, we've found that this is actually more useful in multiply nested
|
|
|
|
macros that in simple ones.</p>
|
|
|
|
|
2009-03-19 15:06:44 +08:00
|
|
|
<h4>Fix-it Hints</h4>
|
|
|
|
|
|
|
|
<p>simple example + template<> example</p>
|
2009-03-19 14:52:51 +08:00
|
|
|
|
|
|
|
<h4>C++ Fun Examples</h4>
|
|
|
|
|
|
|
|
<p>...</p>
|
2007-12-10 15:14:08 +08:00
|
|
|
|
2007-12-10 15:23:52 +08:00
|
|
|
<!--=======================================================================-->
|
2008-06-17 14:35:36 +08:00
|
|
|
<h3><a name="gcccompat">GCC Compatibility</a></h3>
|
2007-12-10 15:23:52 +08:00
|
|
|
<!--=======================================================================-->
|
|
|
|
|
|
|
|
<p>GCC is currently the defacto-standard open source compiler today, and it
|
|
|
|
routinely compiles a huge volume of code. GCC supports a huge number of
|
|
|
|
extensions and features (many of which are undocumented) and a lot of
|
|
|
|
code and header files depend on these features in order to build.</p>
|
|
|
|
|
|
|
|
<p>While it would be nice to be able to ignore these extensions and focus on
|
|
|
|
implementing the language standards to the letter, pragmatics force us to
|
|
|
|
support the GCC extensions that see the most use. Many users just want their
|
|
|
|
code to compile, they don't care to argue about whether it is pedantically C99
|
|
|
|
or not.</p>
|
|
|
|
|
|
|
|
<p>As mentioned above, all
|
|
|
|
extensions are explicitly recognized as such and marked with extension
|
|
|
|
diagnostics, which can be mapped to warnings, errors, or just ignored.
|
|
|
|
</p>
|
|
|
|
|
2007-12-10 15:14:08 +08:00
|
|
|
|
2007-12-10 16:12:49 +08:00
|
|
|
<!--*************************************************************************-->
|
2008-06-17 14:35:36 +08:00
|
|
|
<h2><a name="applications">Utility and Applications</a></h2>
|
2007-12-10 16:12:49 +08:00
|
|
|
<!--*************************************************************************-->
|
|
|
|
|
2007-12-10 15:14:08 +08:00
|
|
|
<!--=======================================================================-->
|
2008-06-17 14:35:36 +08:00
|
|
|
<h3><a name="libraryarch">Library Based Architecture</a></h3>
|
2007-12-10 15:14:08 +08:00
|
|
|
<!--=======================================================================-->
|
|
|
|
|
2007-12-10 16:12:49 +08:00
|
|
|
<p>A major design concept for clang is its use of a library-based
|
|
|
|
architecture. In this design, various parts of the front-end can be cleanly
|
|
|
|
divided into separate libraries which can then be mixed up for different needs
|
|
|
|
and uses. In addition, the library-based approach encourages good interfaces
|
|
|
|
and makes it easier for new developers to get involved (because they only need
|
|
|
|
to understand small pieces of the big picture).</p>
|
|
|
|
|
|
|
|
<blockquote>
|
|
|
|
"The world needs better compiler tools, tools which are built as libraries.
|
|
|
|
This design point allows reuse of the tools in new and novel ways. However,
|
|
|
|
building the tools as libraries isn't enough: they must have clean APIs, be as
|
|
|
|
decoupled from each other as possible, and be easy to modify/extend. This
|
|
|
|
requires clean layering, decent design, and keeping the libraries independent of
|
|
|
|
any specific client."</blockquote>
|
|
|
|
|
|
|
|
<p>
|
|
|
|
Currently, clang is divided into the following libraries and tool:
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<ul>
|
|
|
|
<li><b>libsupport</b> - Basic support library, from LLVM.</li>
|
|
|
|
<li><b>libsystem</b> - System abstraction library, from LLVM.</li>
|
|
|
|
<li><b>libbasic</b> - Diagnostics, SourceLocations, SourceBuffer abstraction,
|
|
|
|
file system caching for input source files.</li>
|
|
|
|
<li><b>libast</b> - Provides classes to represent the C AST, the C type system,
|
|
|
|
builtin functions, and various helpers for analyzing and manipulating the
|
|
|
|
AST (visitors, pretty printers, etc).</li>
|
|
|
|
<li><b>liblex</b> - Lexing and preprocessing, identifier hash table, pragma
|
|
|
|
handling, tokens, and macro expansion.</li>
|
|
|
|
<li><b>libparse</b> - Parsing. This library invokes coarse-grained 'Actions'
|
|
|
|
provided by the client (e.g. libsema builds ASTs) but knows nothing about
|
|
|
|
ASTs or other client-specific data structures.</li>
|
|
|
|
<li><b>libsema</b> - Semantic Analysis. This provides a set of parser actions
|
|
|
|
to build a standardized AST for programs.</li>
|
|
|
|
<li><b>libcodegen</b> - Lower the AST to LLVM IR for optimization & code
|
|
|
|
generation.</li>
|
|
|
|
<li><b>librewrite</b> - Editing of text buffers (important for code rewriting
|
|
|
|
transformation, like refactoring).</li>
|
|
|
|
<li><b>libanalysis</b> - Static analysis support.</li>
|
|
|
|
<li><b>clang</b> - A driver program, client of the libraries at various
|
|
|
|
levels.</li>
|
|
|
|
</ul>
|
|
|
|
|
|
|
|
<p>As an example of the power of this library based design.... If you wanted to
|
|
|
|
build a preprocessor, you would take the Basic and Lexer libraries. If you want
|
|
|
|
an indexer, you would take the previous two and add the Parser library and
|
|
|
|
some actions for indexing. If you want a refactoring, static analysis, or
|
|
|
|
source-to-source compiler tool, you would then add the AST building and
|
|
|
|
semantic analyzer libraries.</p>
|
|
|
|
|
|
|
|
<p>For more information about the low-level implementation details of the
|
|
|
|
various clang libraries, please see the <a href="docs/InternalsManual.html">
|
|
|
|
clang Internals Manual</a>.</p>
|
|
|
|
|
|
|
|
<!--=======================================================================-->
|
2008-06-17 14:35:36 +08:00
|
|
|
<h3><a name="diverseclients">Support Diverse Clients</a></h3>
|
2007-12-10 16:12:49 +08:00
|
|
|
<!--=======================================================================-->
|
|
|
|
|
|
|
|
<p>Clang is designed and built with many grand plans for how we can use it. The
|
|
|
|
driving force is the fact that we use C and C++ daily, and have to suffer due to
|
|
|
|
a lack of good tools available for it. We believe that the C and C++ tools
|
|
|
|
ecosystem has been significantly limited by how difficult it is to parse and
|
|
|
|
represent the source code for these languages, and we aim to rectify this
|
|
|
|
problem in clang.</p>
|
|
|
|
|
|
|
|
<p>The problem with this goal is that different clients have very different
|
|
|
|
requirements. Consider code generation, for example: a simple front-end that
|
|
|
|
parses for code generation must analyze the code for validity and emit code
|
|
|
|
in some intermediate form to pass off to a optimizer or backend. Because
|
|
|
|
validity analysis and code generation can largely be done on the fly, there is
|
|
|
|
not hard requirement that the front-end actually build up a full AST for all
|
|
|
|
the expressions and statements in the code. TCC and GCC are examples of
|
|
|
|
compilers that either build no real AST (in the former case) or build a stripped
|
|
|
|
down and simplified AST (in the later case) because they focus primarily on
|
|
|
|
codegen.</p>
|
|
|
|
|
|
|
|
<p>On the opposite side of the spectrum, some clients (like refactoring) want
|
|
|
|
highly detailed information about the original source code and want a complete
|
|
|
|
AST to describe it with. Refactoring wants to have information about macro
|
|
|
|
expansions, the location of every paren expression '(((x)))' vs 'x', full
|
|
|
|
position information, and much more. Further, refactoring wants to look
|
|
|
|
<em>across the whole program</em> to ensure that it is making transformations
|
|
|
|
that are safe. Making this efficient and getting this right requires a
|
|
|
|
significant amount of engineering and algorithmic work that simply are
|
|
|
|
unnecessary for a simple static compiler.</p>
|
|
|
|
|
|
|
|
<p>The beauty of the clang approach is that it does not restrict how you use it.
|
|
|
|
In particular, it is possible to use the clang preprocessor and parser to build
|
|
|
|
an extremely quick and light-weight on-the-fly code generator (similar to TCC)
|
|
|
|
that does not build an AST at all. As an intermediate step, clang supports
|
|
|
|
using the current AST generation and semantic analysis code and having a code
|
|
|
|
generation client free the AST for each function after code generation. Finally,
|
|
|
|
clang provides support for building and retaining fully-fledged ASTs, and even
|
|
|
|
supports writing them out to disk.</p>
|
|
|
|
|
|
|
|
<p>Designing the libraries with clean and simple APIs allows these high-level
|
|
|
|
policy decisions to be determined in the client, instead of forcing "one true
|
|
|
|
way" in the implementation of any of these libraries. Getting this right is
|
|
|
|
hard, and we don't always get it right the first time, but we fix any problems
|
|
|
|
when we realize we made a mistake.</p>
|
|
|
|
|
|
|
|
<!--=======================================================================-->
|
2008-06-17 14:35:36 +08:00
|
|
|
<h3><a name="ideintegration">Integration with IDEs</h3>
|
2007-12-10 16:12:49 +08:00
|
|
|
<!--=======================================================================-->
|
|
|
|
|
|
|
|
<p>
|
|
|
|
We believe that Integrated Development Environments (IDE's) are a great way
|
|
|
|
to pull together various pieces of the development puzzle, and aim to make clang
|
|
|
|
work well in such an environment. The chief advantage of an IDE is that they
|
|
|
|
typically have visibility across your entire project and are long-lived
|
|
|
|
processes, whereas stand-alone compiler tools are typically invoked on each
|
|
|
|
individual file in the project, and thus have limited scope.</p>
|
|
|
|
|
|
|
|
<p>There are many implications of this difference, but a significant one has to
|
|
|
|
do with efficiency and caching: sharing an address space across different files
|
|
|
|
in a project, means that you can use intelligent caching and other techniques to
|
|
|
|
dramatically reduce analysis/compilation time.</p>
|
|
|
|
|
|
|
|
<p>A further difference between IDEs and batch compiler is that they often
|
|
|
|
impose very different requirements on the front-end: they depend on high
|
|
|
|
performance in order to provide a "snappy" experience, and thus really want
|
|
|
|
techniques like "incremental compilation", "fuzzy parsing", etc. Finally, IDEs
|
|
|
|
often have very different requirements than code generation, often requiring
|
|
|
|
information that a codegen-only frontend can throw away. Clang is
|
|
|
|
specifically designed and built to capture this information.
|
|
|
|
</p>
|
|
|
|
|
|
|
|
|
|
|
|
<!--=======================================================================-->
|
2008-06-17 14:35:36 +08:00
|
|
|
<h3><a name="license">Use the LLVM 'BSD' License</a></h3>
|
2007-12-10 16:12:49 +08:00
|
|
|
<!--=======================================================================-->
|
|
|
|
|
|
|
|
<p>We actively indend for clang (and a LLVM as a whole) to be used for
|
|
|
|
commercial projects, and the BSD license is the simplest way to allow this. We
|
|
|
|
feel that the license encourages contributors to pick up the source and work
|
|
|
|
with it, and believe that those individuals and organizations will contribute
|
|
|
|
back their work if they do not want to have to maintain a fork forever (which is
|
|
|
|
time consuming and expensive when merges are involved). Further, nobody makes
|
|
|
|
money on compilers these days, but many people need them to get bigger goals
|
|
|
|
accomplished: it makes sense for everyone to work together.</p>
|
|
|
|
|
|
|
|
<p>For more information about the LLVM/clang license, please see the <a
|
|
|
|
href="http://llvm.org/docs/DeveloperPolicy.html#license">LLVM License
|
|
|
|
Description</a> for more information.</p>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<!--*************************************************************************-->
|
2008-06-17 14:35:36 +08:00
|
|
|
<h2><a name="design">Internal Design and Implementation</a></h2>
|
2007-12-10 16:12:49 +08:00
|
|
|
<!--*************************************************************************-->
|
|
|
|
|
2007-12-10 13:52:05 +08:00
|
|
|
<!--=======================================================================-->
|
2008-06-17 14:35:36 +08:00
|
|
|
<h3><a name="real">A real-world, production quality compiler</a></h3>
|
2007-12-10 13:52:05 +08:00
|
|
|
<!--=======================================================================-->
|
|
|
|
|
|
|
|
<p>
|
2007-12-11 02:56:37 +08:00
|
|
|
Clang is designed and built by experienced compiler developers who
|
2007-12-10 13:52:05 +08:00
|
|
|
are increasingly frustrated with the problems that <a
|
|
|
|
href="comparison.html">existing open source compilers</a> have. Clang is
|
|
|
|
carefully and thoughtfully designed and built to provide the foundation of a
|
|
|
|
whole new generation of C/C++/Objective C development tools, and we intend for
|
2007-12-11 02:56:37 +08:00
|
|
|
it to be production quality.</p>
|
2007-12-10 13:52:05 +08:00
|
|
|
|
|
|
|
<p>Being a production quality compiler means many things: it means being high
|
|
|
|
performance, being solid and (relatively) bug free, and it means eventually
|
|
|
|
being used and depended on by a broad range of people. While we are still in
|
|
|
|
the early development stages, we strongly believe that this will become a
|
|
|
|
reality.</p>
|
|
|
|
|
2007-12-10 15:23:52 +08:00
|
|
|
<!--=======================================================================-->
|
2008-06-17 14:35:36 +08:00
|
|
|
<h3><a name="simplecode">A simple and hackable code base</a></h3>
|
2007-12-10 15:23:52 +08:00
|
|
|
<!--=======================================================================-->
|
|
|
|
|
|
|
|
<p>Our goal is to make it possible for anyone with a basic understanding
|
|
|
|
of compilers and working knowledge of the C/C++/ObjC languages to understand and
|
|
|
|
extend the clang source base. A large part of this falls out of our decision to
|
|
|
|
make the AST mirror the languages as closely as possible: you have your friendly
|
|
|
|
if statement, for statement, parenthesis expression, structs, unions, etc, all
|
|
|
|
represented in a simple and explicit way.</p>
|
|
|
|
|
|
|
|
<p>In addition to a simple design, we work to make the source base approachable
|
|
|
|
by commenting it well, including citations of the language standards where
|
|
|
|
appropriate, and designing the code for simplicity. Beyond that, clang offers
|
|
|
|
a set of AST dumpers, printers, and visualizers that make it easy to put code in
|
|
|
|
and see how it is represented.</p>
|
|
|
|
|
2007-12-10 13:52:05 +08:00
|
|
|
<!--=======================================================================-->
|
2008-06-17 14:35:36 +08:00
|
|
|
<h3><a name="unifiedparser">A single unified parser for C, Objective C, C++,
|
|
|
|
and Objective C++</a></h3>
|
2007-12-10 13:52:05 +08:00
|
|
|
<!--=======================================================================-->
|
2007-10-06 13:23:00 +08:00
|
|
|
|
2007-12-10 13:52:05 +08:00
|
|
|
<p>Clang is the "C Language Family Front-end", which means we intend to support
|
|
|
|
the most popular members of the C family. We are convinced that the right
|
|
|
|
parsing technology for this class of languages is a hand-built recursive-descent
|
|
|
|
parser. Because it is plain C++ code, recursive descent makes it very easy for
|
|
|
|
new developers to understand the code, it easily supports ad-hoc rules and other
|
|
|
|
strange hacks required by C/C++, and makes it straight-forward to implement
|
|
|
|
excellent diagnostics and error recovery.</p>
|
2007-10-06 13:23:00 +08:00
|
|
|
|
2007-12-10 13:52:05 +08:00
|
|
|
<p>We believe that implementing C/C++/ObjC in a single unified parser makes the
|
|
|
|
end result easier to maintain and evolve than maintaining a separate C and C++
|
|
|
|
parser which must be bugfixed and maintained independently of each other.</p>
|
|
|
|
|
|
|
|
<!--=======================================================================-->
|
2008-06-17 14:35:36 +08:00
|
|
|
<h3><a name="conformance">Conformance with C/C++/ObjC and their
|
|
|
|
variants</a></h3>
|
2007-12-10 13:52:05 +08:00
|
|
|
<!--=======================================================================-->
|
|
|
|
|
|
|
|
<p>When you start work on implementing a language, you find out that there is a
|
|
|
|
huge gap between how the language works and how most people understand it to
|
|
|
|
work. This gap is the difference between a normal programmer and a (scary?
|
|
|
|
super-natural?) "language lawyer", who knows the ins and outs of the language
|
|
|
|
and can grok standardese with ease.</p>
|
|
|
|
|
|
|
|
<p>In practice, being conformant with the languages means that we aim to support
|
|
|
|
the full language, including the dark and dusty corners (like trigraphs,
|
|
|
|
preprocessor arcana, C99 VLAs, etc). Where we support extensions above and
|
|
|
|
beyond what the standard officially allows, we make an effort to explicitly call
|
|
|
|
this out in the code and emit warnings about it (which are disabled by default,
|
|
|
|
but can optionally be mapped to either warnings or errors), allowing you to use
|
|
|
|
clang in "strict" mode if you desire.</p>
|
|
|
|
|
|
|
|
<p>We also intend to support "dialects" of these languages, such as C89, K&R
|
|
|
|
C, C++'03, Objective-C 2, etc.</p>
|
|
|
|
|
2007-10-06 13:23:00 +08:00
|
|
|
</div>
|
|
|
|
</body>
|
2007-10-06 13:48:57 +08:00
|
|
|
</html>
|