Commit Graph

138 Commits

Author SHA1 Message Date
Argyrios Kyrtzidis 41fb2d95a3 Make the Preprocessor more memory efficient and improve macro instantiation diagnostics.
When a macro instantiation occurs, reserve a SLocEntry chunk with length the
full length of the macro definition source. Set the spelling location of this chunk
to point to the start of the macro definition and any tokens that are lexed directly
from the macro definition will get a location from this chunk with the appropriate offset.

For any tokens that come from argument expansion, '##' paste operator, etc. have their
instantiation location point at the appropriate place in the instantiated macro definition
(the argument identifier and the '##' token respectively).
This improves macro instantiation diagnostics:

Before:

t.c:5:9: error: invalid operands to binary expression ('struct S' and 'int')
int y = M(/);
        ^~~~
t.c:5:11: note: instantiated from:
int y = M(/);
          ^

After:

t.c:5:9: error: invalid operands to binary expression ('struct S' and 'int')
int y = M(/);
        ^~~~
t.c:3:20: note: instantiated from:
\#define M(op) (foo op 3);
                ~~~ ^  ~
t.c:5:11: note: instantiated from:
int y = M(/);
          ^

The memory savings for a candidate boost library that abuses the preprocessor are:

- 32% less SLocEntries (37M -> 25M)
- 30% reduction in PCH file size (900M -> 635M)
- 50% reduction in memory usage for the SLocEntry table (1.6G -> 800M)

llvm-svn: 134587
2011-07-07 03:40:34 +00:00
Argyrios Kyrtzidis 2cfce18645 Allow Lexer::getLocForEndOfToken to return the location just passed the macro instantiation
if the location given points at the last token of the macro instantiation.

Fixes rdar://9045701.

llvm-svn: 133804
2011-06-24 17:58:59 +00:00
Eli Friedman 86a5101c27 Don't strlen() every file before parsing it.
llvm-svn: 131132
2011-05-10 17:11:21 +00:00
Chris Lattner 57540c5be0 fix a bunch of comment typos found by codespell. Patch by
Luis Felipe Strano Moraes!

llvm-svn: 129559
2011-04-15 05:22:18 +00:00
Richard Smith f7b6202e6c Implement C++0x [lex.pptoken]p3's handling of <::.
llvm-svn: 129525
2011-04-14 18:36:27 +00:00
Eric Christopher 7f36a79ee9 Eat the UTF-8 BOM at the beginning of a file since it's ignored anyhow.
Nom Nom Nom.

Patch by Anton Korobeynikov!

llvm-svn: 129174
2011-04-09 00:01:04 +00:00
John McCall 75ca6d72c2 Fix getLocForEndOfToken to not double-count spurious internal characters
within a token, like trigraphs and escaped newlines.               
Patch by Marcin Kowalczyk!

llvm-svn: 128978
2011-04-06 01:50:22 +00:00
Daniel Dunbar 1057f86d0e Lexer: Add extremely limited support for -traditional-cpp, ignoring BCPL
comments.

llvm-svn: 127910
2011-03-18 21:23:38 +00:00
John McCall 462c055d85 Fix my earlier commit to work with escaped newlines and leave breadcrumbs
in case we want to make a world where we can check intermediate instantiations
for this kind of breadcrumb.

llvm-svn: 127221
2011-03-08 07:59:04 +00:00
Peter Collingbourne 2f1e36bfd0 Rename tok::eom to tok::eod.
The previous name was inaccurate as this token in fact appears at
the end of every preprocessing directive, not just macro definitions.
No functionality change, except for a diagnostic tweak.

llvm-svn: 126631
2011-02-28 02:37:51 +00:00
Argyrios Kyrtzidis c541ade850 Warn for missing terminating " or ' instead of error for gcc compatibility. Fixed rdar://8914293.
llvm-svn: 125616
2011-02-15 23:45:31 +00:00
Peter Collingbourne c1270f51fa Lexer: add CUDA kernel call tokens
llvm-svn: 125218
2011-02-09 21:08:21 +00:00
Douglas Gregor 86af98444f Harden Lexer::GetBeginningOfToken() against bogus source locations and
the disappearance/alteration of files.

llvm-svn: 124616
2011-01-31 22:42:36 +00:00
Abramo Bagnara ea4f7c7761 Introduced raw_identifier token kind.
llvm-svn: 122394
2010-12-22 08:23:18 +00:00
Chris Lattner 39720111e0 move getSpelling from Preprocessor to Lexer, which it is more conceptually related to.
llvm-svn: 119479
2010-11-17 07:26:20 +00:00
Chris Lattner 2a6ee91619 move AdvanceToTokenCharacter and getLocForEndOfToken from
Preprocessor to Lexer where they make more sense.

llvm-svn: 119474
2010-11-17 07:05:50 +00:00
Chandler Carruth c3ce5840af Update remaining attribute macros to new style.
llvm-svn: 117204
2010-10-23 08:44:57 +00:00
Sebastian Redl 517523014d In MeasureTokenLength, the FileLoc supplied to the lexer must point to the start of the buffer, or we risk overflow.
llvm-svn: 115117
2010-09-30 01:03:03 +00:00
Chris Lattner 0f0492e69c improve isHexaLiteral to work with escaped newlines and trigraphs,
patch by Francois Pichet!

llvm-svn: 112602
2010-08-31 16:42:00 +00:00
Chris Lattner dec7334218 silence a warning
llvm-svn: 112549
2010-08-30 23:11:03 +00:00
Alexis Hunt 3b7918625c Revert my user-defined literal commits - r1124{58,60,67} pending
some issues being sorted out.

llvm-svn: 112493
2010-08-30 17:47:05 +00:00
Chris Lattner 5f183aa592 add a fixme.
llvm-svn: 112491
2010-08-30 17:11:14 +00:00
Chris Lattner 7a9e9e7d76 use 'features' instead of 'PP->getLangOptions'.
llvm-svn: 112490
2010-08-30 17:09:08 +00:00
Douglas Gregor 759ef23bb8 In Microsoft compatibility mode, don't parse the exponent as part of
the pp-number in a hexadecimal floating point literal, from Francois
Pichet! Fixes PR7968.

llvm-svn: 112481
2010-08-30 14:50:47 +00:00
Alexis Hunt 79eb5469e0 Implement C++0x user-defined string literals.
The extra data stored on user-defined literal Tokens is stored in extra
allocated memory, which is managed by the PreprocessorLexer because there isn't
a better place to put it that makes sure it gets deallocated, but only after
it's used up. My testing has shown no significant slowdown as a result, but
independent testing would be appreciated.

llvm-svn: 112458
2010-08-29 21:26:48 +00:00
Douglas Gregor 115837041e Introduce a preprocessor code-completion hook for contexts where we
expect "natural" language and should not provide any completions,
e.g., comments, string literals, #error.

llvm-svn: 112054
2010-08-25 17:04:25 +00:00
Douglas Gregor 3a7ad25eb6 Introduce basic code-completion support for preprocessor directives,
e.g., after a "#" we'll suggest #if, #ifdef, etc.

llvm-svn: 111943
2010-08-24 19:08:16 +00:00
Douglas Gregor 02690ba643 Don't emit end-of-file diagnostics like "unterminated conditional" or
"unterminated string" when we're performing code completion.

llvm-svn: 110933
2010-08-12 17:04:55 +00:00
Benjamin Kramer e8394df11b Random temporary string cleanup.
llvm-svn: 110807
2010-08-11 14:47:12 +00:00
Douglas Gregor 028d3e4d0f Use precompiled preambles for in-process code completion.
llvm-svn: 110596
2010-08-09 20:45:32 +00:00
Douglas Gregor 3f4bea0646 Introduce basic support for loading a precompiled preamble while
reparsing an ASTUnit. When saving a preamble, create a buffer larger
than the actual file we're working with but fill everything from the
end of the preamble to the end of the file with spaces (so the lexer
will quickly skip them). When we load the file, create a buffer of the
same size, filling it with the file and then spaces. Then, instruct
the lexer to start lexing after the preamble, therefore continuing the
parse from the spot where the preamble left off.

It's now possible to perform a simple preamble build + parse (+
reparse) with ASTUnit. However, one has to disable a bunch of checking
in the PCH reader to do so. That part isn't committed; it will likely
be handled with some other kind of flag (e.g., -fno-validate-pch).

As part of this, fix some issues with null termination of the memory
buffers created for the preamble; we were trying to explicitly
NULL-terminate them, even though they were also getting implicitly
NULL terminated, leading to excess warnings about NULL characters in
source files.

llvm-svn: 109445
2010-07-26 21:36:20 +00:00
Douglas Gregor cd8bdd025f Improve performance during cursor traversal when a region of interest
is present. 

Rather than using clang_getCursorExtent(), which requires
us to lex the token at the ending position to determine its
length. Then, we'd be comparing [a, b) source ranges that cover the
characters in the range rather than the normal behavior for Clang's
source ranges, which covers the tokens in the range. However, relexing
causes us to read the source file (which may come from a precompiled
header), which is rather unfortunate and affects performance.

In the new scheme, we only use Clang-style source ranges that cover
the tokens in the range. At the entry points where this matters
(clang_annotateTokens, clang_getCursor), we make sure to move source
locations to the start of the token.

Addresses most of <rdar://problem/8049381>.

llvm-svn: 109134
2010-07-22 20:22:31 +00:00
Douglas Gregor af82e3510b Introduce a new lexer function to compute the "preamble" of a file,
which is the part of the file that contains all of the initial
comments, includes, and preprocessor directives that occur before any
of the actual code. Added a new -print-preamble cc1 action that is
only used for testing.

llvm-svn: 108913
2010-07-20 20:18:03 +00:00
Chris Lattner 86851b8a7a fix PR4499, patch by Kyle Dean!
llvm-svn: 107836
2010-07-07 23:24:27 +00:00
Chris Lattner 52d96ac930 simpler fix for rdar://8044135 - escaped newlines have already
been processed, so they don't have to be tip-toed around.

llvm-svn: 105182
2010-05-30 23:27:38 +00:00
Douglas Gregor fe4a4107d8 Improve our handling of NULL after an escaping '\' in a string
literal. Fixes <rdar://problem/8044135>.

llvm-svn: 105181
2010-05-30 22:59:50 +00:00
Douglas Gregor 6da3db4af3 Improve code completion in failure cases in two ways:
1) Suppress diagnostics as soon as we form the code-completion
  token, so we don't get any error/warning spew from the early
  end-of-file.
  2) If we consume a code-completion token when we weren't expecting
  one, go into a code-completion recovery path that produces the best
  results it can based on the context that the parser is in.

llvm-svn: 104585
2010-05-25 05:58:43 +00:00
Chris Lattner 467f6bcfe5 robustify the conflict marker stuff. Don't add 7 twice, which would
make it miss (invalid) things like:
<<<<<<<
>>>>>>>

and crash if 

<<<<<<< 

was at the end of the line.  When we find a >>>>>>> that is not at the
end of the line, make sure to reset Pos so we don't crash on something
like:
<<<<<<< >>>>>>>

This isn't worth making testcases for, since each would require a new file.

rdar://7987078 - signal 11 compiling "<<<<<<<<<<"

llvm-svn: 103968
2010-05-17 20:27:25 +00:00
Chris Lattner 561aabd943 when code completing inside a C-style block comment, don't emit errors about
a missing */ since we truncated the file.

This fixes rdar://7948776

llvm-svn: 103913
2010-05-16 19:54:05 +00:00
Chris Lattner 1a9e873bf9 fix a minor bug I noticed while work with Jordy's patch for PR6101,
in an input file like this:

# 42
int x;

we were emitting:

# <something>
 int x;

(with a space before the int) because we weren't clearing the leading 
whitespace flag properly after the \n from the directive was handled.

llvm-svn: 101084
2010-04-12 23:04:41 +00:00
Douglas Gregor a771f46c82 Reinstate my CodeModificationHint -> FixItHint renaming patch, without
the C-only "optimization".

llvm-svn: 100022
2010-03-31 17:46:05 +00:00
Douglas Gregor 30e631862f Revert r100008, which inexplicably breaks the clang-i686-darwin10 builder
llvm-svn: 100018
2010-03-31 17:25:35 +00:00
Douglas Gregor 3baad0d4f7 Rename CodeModificationHint to FixItHint, since we've been using the
term "fix-it" everywhere and even *I* get tired of long names
sometimes. No functionality change.

llvm-svn: 100008
2010-03-31 15:31:50 +00:00
Douglas Gregor 1668355e06 Remove unused variable
llvm-svn: 98691
2010-03-16 22:54:32 +00:00
Douglas Gregor dc970f0866 Audit all Preprocessor::getSpelling() callers, improving failure
recovery for those that need it.

llvm-svn: 98689
2010-03-16 22:30:13 +00:00
Douglas Gregor 42fe858cd6 Audit all callers of SourceManager::getCharacterData(); update some of
them to recover more gracefully on failure.

llvm-svn: 98672
2010-03-16 20:46:42 +00:00
Benjamin Kramer eb92dc0b09 Let SourceManager::getBufferData return StringRef instead of a pair of two const char*.
llvm-svn: 98630
2010-03-16 14:14:31 +00:00
Douglas Gregor e0fbb83b8b Give SourceManager a Diagnostic object with which to report errors,
and start simplifying the interfaces in SourceManager that can fail.

llvm-svn: 98594
2010-03-16 00:06:06 +00:00
Douglas Gregor 802b77601e Introduce a new BufferResult class to act as the return type of
SourceManager's getBuffer() (and similar) operations. This abstract
can be used to force callers to cope with errors in getBuffer(), such
as missing files and changed files. Fix a bunch of callers to use the
new interface.

Add some very basic checks for file consistency (file size,
modification time) into ContentCache::getBuffer(), although these
checks don't help much until we've updated the main callers (e.g.,
SourceManager::getSpelling()).

llvm-svn: 98585
2010-03-15 22:54:52 +00:00
Chris Lattner 93ddf80eb7 don't inform comment handlers about comments in #if 0 blocks,
doing so invalidates the file guard optimization and is not
in the spirit of "#if 0" because it is supposed to completely
skip everything, even if it isn't lexically valid. Patch by
Abramo Bagnara!

llvm-svn: 95253
2010-02-03 21:06:21 +00:00