__debug overflow_stack'.
- For testing crash reporting stuff... you'd think I could just use some C++
code but Doug keeps fixing stuff!
llvm-svn: 109587
reparsing an ASTUnit. When saving a preamble, create a buffer larger
than the actual file we're working with but fill everything from the
end of the preamble to the end of the file with spaces (so the lexer
will quickly skip them). When we load the file, create a buffer of the
same size, filling it with the file and then spaces. Then, instruct
the lexer to start lexing after the preamble, therefore continuing the
parse from the spot where the preamble left off.
It's now possible to perform a simple preamble build + parse (+
reparse) with ASTUnit. However, one has to disable a bunch of checking
in the PCH reader to do so. That part isn't committed; it will likely
be handled with some other kind of flag (e.g., -fno-validate-pch).
As part of this, fix some issues with null termination of the memory
buffers created for the preamble; we were trying to explicitly
NULL-terminate them, even though they were also getting implicitly
NULL terminated, leading to excess warnings about NULL characters in
source files.
llvm-svn: 109445
is present.
Rather than using clang_getCursorExtent(), which requires
us to lex the token at the ending position to determine its
length. Then, we'd be comparing [a, b) source ranges that cover the
characters in the range rather than the normal behavior for Clang's
source ranges, which covers the tokens in the range. However, relexing
causes us to read the source file (which may come from a precompiled
header), which is rather unfortunate and affects performance.
In the new scheme, we only use Clang-style source ranges that cover
the tokens in the range. At the entry points where this matters
(clang_annotateTokens, clang_getCursor), we make sure to move source
locations to the start of the token.
Addresses most of <rdar://problem/8049381>.
llvm-svn: 109134
which is the part of the file that contains all of the initial
comments, includes, and preprocessor directives that occur before any
of the actual code. Added a new -print-preamble cc1 action that is
only used for testing.
llvm-svn: 108913
When loading the PCH, IdentifierInfos that are associated with pragmas cause declarations that use these identifiers to be deserialized (e.g. the "clang" pragma causes the "clang" namespace to be loaded).
We can avoid this if we just use StringRefs for the pragmas.
As a bonus, since we don't have to create and pass IdentifierInfos, the pragma interfaces get a bit more simplified.
llvm-svn: 108237
In the case of backtracking, the cached token lexer will be the only
lexer on the stack, without this the token stack will be empty and EOF
won't be returned.
This fixes PR7072.
llvm-svn: 108124
Fix string concatenation to treat escapes in concatenated strings that
are wide because of other string chunks to process the escapes as wide
themselves. Before we would warn about and miscompile the attached testcase.
This fixes rdar://8040728 - miscompile + warning: hex escape sequence out of range
llvm-svn: 106012
diagnostics. That would be while we're parsing string literals for the
sole purpose of producing a diagnostic about them. Fixes
<rdar://problem/8026030>.
llvm-svn: 104684
1) Suppress diagnostics as soon as we form the code-completion
token, so we don't get any error/warning spew from the early
end-of-file.
2) If we consume a code-completion token when we weren't expecting
one, go into a code-completion recovery path that produces the best
results it can based on the context that the parser is in.
llvm-svn: 104585
make it miss (invalid) things like:
<<<<<<<
>>>>>>>
and crash if
<<<<<<<
was at the end of the line. When we find a >>>>>>> that is not at the
end of the line, make sure to reset Pos so we don't crash on something
like:
<<<<<<< >>>>>>>
This isn't worth making testcases for, since each would require a new file.
rdar://7987078 - signal 11 compiling "<<<<<<<<<<"
llvm-svn: 103968
in an input file like this:
# 42
int x;
we were emitting:
# <something>
int x;
(with a space before the int) because we weren't clearing the leading
whitespace flag properly after the \n from the directive was handled.
llvm-svn: 101084
instantiations when we have the corresponding macro definition and by
removing macro definition information from our table when the macro is
undefined.
llvm-svn: 99004
record (which includes all macro instantiations and definitions). As
with all lay deserialization, this introduces a new external source
(here, an external preprocessing record source) that loads all of the
preprocessed entities prior to iterating over the entities.
The preprocessing record is an optional part of the precompiled header
that is disabled by default (enabled with
-detailed-preprocessing-record). When the preprocessor given to the
PCH writer has a preprocessing record, that record is written into the
PCH file. When the PCH reader is given a PCH file that contains a
preprocessing record, it will be lazily loaded (which, effectively,
implicitly adds -detailed-preprocessing-record). This is the first
case where we have sections of the precompiled header that are
added/removed based on a compilation flag, which is
unfortunate. However, this data consumes ~550k in the PCH file for
Cocoa.h (out of ~9.9MB), and there is a non-trivial cost to gathering
this detailed preprocessing information, so it's too expensive to turn
on by default. In the future, we should investigate a better encoding
of this information.
llvm-svn: 99002
eliminating the extra PopulatePreprocessingRecord object. This will
become useful once we start writing the preprocessing record to
precompiled headers.
llvm-svn: 98966
preprocessing record. Use that link with clang_getCursorReferenced()
and clang_getCursorDefinition() to match instantiations of a macro to
the definition of the macro.
llvm-svn: 98842
the macro definitions and macro instantiations that are found
during preprocessing. Preprocessing records are *not* generated by
default; rather, we provide a PPCallbacks subclass that hooks into the
existing callback mechanism to record this activity.
The only client of preprocessing records is CIndex, which keeps track
of macro definitions and instantations so that they can be exposed via
cursors. At present, only token annotation uses these facilities, and
only for macro instantiations; both will change in the near
future. However, with this change, token annotation properly annotates
macro instantiations that do not produce any tokens and instantiations
of macros that are later undef'd, improving our consistency.
Preprocessing directives that are not macro definitions are still
handled by clang_annotateTokens() via re-lexing, so that we don't have
to track every preprocessing directive in the preprocessing record.
Performance impact of preprocessing records is still TBD, although it
is limited to CIndex and therefore out of the path of the main compiler.
llvm-svn: 98836
SourceManager's getBuffer() and, therefore, could fail, along with
Preprocessor::getSpelling(). Use the Invalid parameters in the literal
parsers (string, floating point, integral, character) to make them
robust against errors that stem from, e.g., PCH files that are not
consistent with the underlying file system.
I still need to audit every use caller to all of these routines, to
determine which ones need specific handling of error conditions.
llvm-svn: 98608
SourceManager's getBuffer() (and similar) operations. This abstract
can be used to force callers to cope with errors in getBuffer(), such
as missing files and changed files. Fix a bunch of callers to use the
new interface.
Add some very basic checks for file consistency (file size,
modification time) into ContentCache::getBuffer(), although these
checks don't help much until we've updated the main callers (e.g.,
SourceManager::getSpelling()).
llvm-svn: 98585
guard macro is already defined for the first occurrence of the
header. If it is, the body will be skipped and not be properly
analyzed for the include guard optimization.
llvm-svn: 95972
doing so invalidates the file guard optimization and is not
in the spirit of "#if 0" because it is supposed to completely
skip everything, even if it isn't lexically valid. Patch by
Abramo Bagnara!
llvm-svn: 95253
region of interest (if provided). Implement clang_getCursor() in terms
of this traversal rather than using the Index library; the unified
cursor visitor is more complete, and will be The Way Forward.
Minor other tweaks needed to make this work:
- Extend Preprocessor::getLocForEndOfToken() to accept an offset
from the end, making it easy to move to the last character in the
token (rather than just past the end of the token).
- In Lexer::MeasureTokenLength(), the length of whitespace is zero.
llvm-svn: 94200
disabled with the intent that users can start with them now and not have to change
a thing to have them work when we implement the features.
llvm-svn: 93312
incompatible with user-defined literals, specifically with the following form:
0x1p+1
The preprocessing-number token extends only as far as the 'p'; the '+' is not
included. Previously we could get away with this extension as p was an invalid
suffix, but now with user-defined literals, 'p' might well be a valid suffix
and we are forced to consider it as such.
This patch also adds a warning in non-0x C++ modes telling the user that
this extension is incompatible with C++0x that is enabled by default
(previously and with other languages, we warn only with a compliance
option such as -pedantic).
llvm-svn: 93135
definitions from a precompiled header. This ensures that
code-completion with macro names behaves the same with or without
precompiled headers.
llvm-svn: 92497
1. Don't make a copy of LangOptions every time a lexer is created.
2. Don't make CharInfo global mutable state.
3. Fix the implementation to properly treat ^Z as EOF instead of as
horizontal whitespace, which matches the semantic implemented by VC++.
llvm-svn: 91586
on PR5610 (2.185 -> 2.130s). The big issue is that this is making
insanely huge macro argument lists with over a million tokens in it.
The reason that mallco and free are so expensive is that we are
actually going to the kernel to get it, and switching to a bump
pointer allocator won't change this in an interesting way.
llvm-svn: 91449
We creating and free thousands of MacroArgs objects (and the related
std::vectors hanging off them) for the testcase in PR5610 even though
there are only ~20 live at a time. This doesn't actually use the
cache yet.
llvm-svn: 91391
expanding directives withing macro expansions. This is undefined behavior
according to 6.10.3p11, so we might as well be undefined in ways similar to
GCC.
llvm-svn: 91266
files with the contents of an arbitrary memory buffer. Use this new
functionality to drastically clean up the way in which we handle file
truncation for code-completion: all of the truncation/completion logic
is now encapsulated in the preprocessor where it belongs
(<rdar://problem/7434737>).
llvm-svn: 90300
in diagnostics when we fail to open a file. This allows us to
report things like:
$ clang test.c -I.
test.c:2:10: fatal error: error opening file './foo.h': Permission denied
#include "foo.h"
^
llvm-svn: 90276
stat a file but where mmaping it fails. In this case, we emit an
error like:
t.c:1:10: fatal error: error opening file '../../foo.h'
instead of "cannot find file".
llvm-svn: 90110
annotation token, because some of the tokens we're annotating might
not be in the set of cached tokens (we could have consumed them
unconditionally).
Also, move the tentative parsing from ParseTemplateTemplateArgument
into the one caller that needs it, improving recovery.
llvm-svn: 86904
a little fuzzy, but conceptually it's just uniquing the identifier.
Chris, please review. I debated splitting into const/non-const versions where
the const one propogated constness to the resulting IdentifierInfo*.
llvm-svn: 86106
only supporting a single stat cache. The immediate benefit of this
change is that we can now generate a PCH/AST file when including
another PCH file; in the future, the chain of stat caches will likely
be useful with multiple levels of PCH files.
llvm-svn: 84263
-code-completion-at=filename:line:column
which performs code completion at the specified location by truncating
the file at that position and enabling code completion. This approach
makes it possible to run multiple tests from a single test file, and
gives a more natural command-line interface.
llvm-svn: 82571
essence, code completion is triggered by a magic "code completion"
token produced by the lexer [*], which the parser recognizes at
certain points in the grammar. The parser then calls into the Action
object with the appropriate CodeCompletionXXX action.
Sema implements the CodeCompletionXXX callbacks by performing minimal
translation, then forwarding them to a CodeCompletionConsumer
subclass, which uses the results of semantic analysis to provide
code-completion results. At present, only a single, "printing" code
completion consumer is available, for regression testing and
debugging. However, the design is meant to permit other
code-completion consumers.
This initial commit contains two code-completion actions: one for
member access, e.g., "x." or "p->", and one for
nested-name-specifiers, e.g., "std::". More code-completion actions
will follow, along with improved gathering of code-completion results
for the various contexts.
[*] In the current -code-completion-dump testing/debugging mode, the
file is truncated at the completion point and EOF is translated into
"code completion".
llvm-svn: 82166
declaration in the AST.
The new ASTContext::getCommentForDecl function searches for a comment
that is attached to the given declaration, and returns that comment,
which may be composed of several comment blocks.
Comments are always available in an AST. However, to avoid harming
performance, we don't actually parse the comments. Rather, we keep the
source ranges of all of the comments within a large, sorted vector,
then lazily extract comments via a binary search in that vector only
when needed (which never occurs in a "normal" compile).
Comments are written to a precompiled header/AST file as a blob of
source ranges. That blob is only lazily loaded when one requests a
comment for a declaration (this never occurs in a "normal" compile).
The indexer testbed now supports comment extraction. When the
-point-at location points to a declaration with a Doxygen-style
comment, the indexer testbed prints the associated comment
block(s). See test/Index/comments.c for an example.
Some notes:
- We don't actually attempt to parse the comment blocks themselves,
beyond identifying them as Doxygen comment blocks to associate them
with a declaration.
- We won't find comment blocks that aren't adjacent to the
declaration, because we start our search based on the location of
the declaration.
- We don't go through the necessary hops to find, for example,
whether some redeclaration of a declaration has comments when our
current declaration does not. Similarly, we don't attempt to
associate a \param Foo marker in a function body comment with the
parameter named Foo (although that is certainly possible).
- Verification of my "no performance impact" claims is still "to be
done".
llvm-svn: 74704
with dos style newlines. I have a trivial test for this:
// RUN: clang-cc %s -verify
#define test(x, y) \
x ## y
but I don't know how to get svn to not change newlines and testrunner
doesn't work with dos style newlines either, so "not worth it". :)
rdar://6994000
llvm-svn: 73945
line, and when the pragma is at the end of a file. In this case, the last
token consumed could pop the lexer, invalidating CurPPLexer. Thanks to
Peter Thoman for pointing it out.
llvm-svn: 73689
registered when PCH wasn't being used. We should always install (in BuiltinInfo)
information about target-specific builtins, but we shouldn't register any builtin
identifier infos. This fixes the build of apps that use PCH and target specific
builtins together.
llvm-svn: 73492
builtin preprocessor macro. This appears to work with two caveats:
1) builtins are registered in -E mode, and 2) target-specific builtins
are unconditionally registered even if they aren't supported by the
target (e.g. SSE4 builtin when only SSE1 is enabled).
llvm-svn: 73289
diagnostic to include the full instantiation location for the
invalid paste. For:
#define foo(a, b) a ## b
#define bar(x) foo(x, ])
bar(a)
bar(zdy)
Instead of:
t.c:3:22: error: pasting formed 'a]', an invalid preprocessing token
#define foo(a, b) a ## b
^
t.c:3:22: error: pasting formed 'zdy]', an invalid preprocessing token
we now produce:
t.c:7:1: error: pasting formed 'a]', an invalid preprocessing token
bar(a)
^
t.c:4:16: note: instantiated from:
#define bar(x) foo(x, ])
^
t.c:3:22: note: instantiated from:
#define foo(a, b) a ## b
^
t.c:8:1: error: pasting formed 'zdy]', an invalid preprocessing token
bar(zdy)
^
t.c:4:16: note: instantiated from:
#define bar(x) foo(x, ])
^
t.c:3:22: note: instantiated from:
#define foo(a, b) a ## b
^
llvm-svn: 72519
1. When we accept "#garbage" in asm-with-cpp mode, change the token kind
of the # to unknown so that the preprocessor won't try to process it as
a real #. This fixes a crash on the attached example
2. Fix macro definition extents processing to handle #foo at the end of a
macro to say the definition ends with the foo, not the #.
This is a follow-on fix to r72283, and rdar://6916026
llvm-svn: 72388
two empty arguments. Also, add an assert so that this bug
manifests as an assertion failure, not a valgrind problem.
This fixes rdar://6880648 - [cpp] crash in ArgNeedsPreexpansion
llvm-svn: 71616