Commit Graph

286 Commits

Author SHA1 Message Date
Argyrios Kyrtzidis 065d720c31 [Lexer] Improve Lexer::getSourceText() when the given range deals with function macro arguments.
This is a modified version of a patch by Manuel Klimek.

llvm-svn: 182055
2013-05-16 21:37:39 +00:00
Richard Smith 0f7f6f1abc Typo and misc comment fix.
llvm-svn: 181583
2013-05-10 02:36:35 +00:00
Argyrios Kyrtzidis 0903f8dac5 [libclang] Make sure the preable does not truncate comments.
rdar://13647445

llvm-svn: 179907
2013-04-19 23:24:25 +00:00
Richard Smith 06d274fdb7 Add -Wc99-compat warning for C11 unicode string and character literals.
llvm-svn: 176817
2013-03-11 18:01:42 +00:00
Richard Smith 9b36209e31 When lexing in C11 mode, accept unicode character and string literals, per C11
6.4.4.4/1 and 6.4.5/1.

llvm-svn: 176780
2013-03-09 23:56:02 +00:00
Jordan Rose 864b810739 Preprocessor: don't consider // to be a line comment in -E -std=c89 mode.
It's beneficial when compiling to treat // as the start of a line
comment even in -std=c89 mode, since it's not valid C code (with a few
rare exceptions) and is usually intended as such. We emit a pedantic
warning and then continue on as if line comments were enabled.
This has been our behavior for quite some time.

However, people use the preprocessor for things besides C source files.
In today's prompting example, the input contains (unquoted) URLs, which
contain // but should still be preserved.

This change instructs the lexer to treat // as a plain token if Clang is
in C90 mode and generating preprocessed output rather than actually compiling.

<rdar://problem/13338743>

llvm-svn: 176526
2013-03-05 22:51:04 +00:00
Jordan Rose cb8a1aca35 Preprocessor: preserve whitespace in -traditional-cpp mode.
Note that unlike GNU cpp we currently do not preserve whitespace in macros
(even in -traditional-cpp mode).

<rdar://problem/12897179>

llvm-svn: 175778
2013-02-21 18:53:19 +00:00
Jordan Rose 58c61e006f Properly validate UCNs for C99 and C++03 (both more restrictive than C(++)11).
Add warnings under -Wc++11-compat, -Wc++98-compat, and -Wc99-compat when a
particular UCN is incompatible with a different standard, and -Wunicode when
a UCN refers to a surrogate character in C++03.

llvm-svn: 174788
2013-02-09 01:10:25 +00:00
Jordan Rose a2100d755a Pull Lexer's CharInfo table out for general use throughout Clang.
Rewriting the same predicates over and over again is bad for code size and
code maintainence. Using the functions in <ctype.h> is generally unsafe
unless they are specified to be locale-independent (i.e. only isdigit and
isxdigit).

The next commit will try to clean up uses of <ctype.h> functions within Clang.

llvm-svn: 174765
2013-02-08 22:30:22 +00:00
Jordan Rose cc538345be Lexer: Don't warn about Unicode in preprocessor directives.
This allows people to use Unicode in their #pragma mark and in macros
that exist only to be string-ized.

<rdar://problem/13107323&13121362>

llvm-svn: 174081
2013-01-31 19:48:48 +00:00
Jordan Rose f649795f84 Fix r173881 to properly skip invalid UTF-8 characters in raw lexing and -E.
This caused hangs as we processed the same invalid byte over and over.

<rdar://problem/13115651>

llvm-svn: 173959
2013-01-30 19:21:12 +00:00
Dmitri Gribenko 9feeef40f5 Move UTF conversion routines from clang/lib/Basic to llvm/lib/Support
This is required to use them in TableGen.

llvm-svn: 173924
2013-01-30 12:06:08 +00:00
Jordan Rose 17441589c3 Don't warn about Unicode characters in -E mode.
People use the C preprocessor for things other than C files. Some of them
have Unicode characters. We shouldn't warn about Unicode characters
appearing outside of identifiers in this case.

There's not currently a way for the preprocessor to tell if it's in -E mode,
so I added a new flag, derived from the PreprocessorOutputOptions. This is
only used by the Unicode warnings for now, but could conceivably be used by
other warnings or even behavioral differences later.

<rdar://problem/13107323>

llvm-svn: 173881
2013-01-30 01:52:57 +00:00
Jordan Rose cccbdbf0db PR15067 (again): Don't warn about UCNs in C90 if we're raw-lexing.
Fixes a crash. Thanks, Richard.

llvm-svn: 173701
2013-01-28 17:49:02 +00:00
Jordan Rose c0cba27230 PR15067: Don't assert when a UCN appears in a C90 file.
Unfortunately, we can't accept the UCN as an extension because we're
required to treat it as two tokens for preprocessing purposes.

llvm-svn: 173622
2013-01-27 20:12:04 +00:00
NAKAMURA Takumi e8f83dbbd8 Lexer.cpp: Fix a warning with ptrdiff_t on i686. [-Wsign-compare]
llvm-svn: 173447
2013-01-25 14:57:21 +00:00
Jordan Rose 8b4af2ae88 Clarify comment: "diagnose" is better than "warn" when emitting an error.
Thanks, Dmitri.

llvm-svn: 173400
2013-01-25 00:20:28 +00:00
Jordan Rose 62db5066e9 Add a fixit for \U1234 -> \u1234.
llvm-svn: 173371
2013-01-24 20:50:52 +00:00
Jordan Rose 4246ae0089 As an extension, treat Unicode whitespace characters as whitespace.
llvm-svn: 173370
2013-01-24 20:50:50 +00:00
Jordan Rose 7f43dddae0 Handle universal character names and Unicode characters outside of literals.
This is a missing piece for C99 conformance.

This patch handles UCNs by adding a '\\' case to LexTokenInternal and
LexIdentifier -- if we see a backslash, we tentatively try to read in a UCN.
If the UCN is not syntactically well-formed, we fall back to the old
treatment: a backslash followed by an identifier beginning with 'u' (or 'U').

Because the spelling of an identifier with UCNs still has the UCN in it, we
need to convert that to UTF-8 in Preprocessor::LookUpIdentifierInfo.

Of course, valid code that does *not* use UCNs will see only a very minimal
performance hit (checks after each identifier for non-ASCII characters,
checks when converting raw_identifiers to identifiers that they do not
contain UCNs, and checks when getting the spelling of an identifier that it
does not contain a UCN).

This patch also adds basic support for actual UTF-8 in the source. This is
treated almost exactly the same as UCNs except that we consider stray
Unicode characters to be mistakes and offer a fixit to remove them.

llvm-svn: 173369
2013-01-24 20:50:46 +00:00
Dmitri Gribenko f857950d39 Remove useless 'llvm::' qualifier from names like StringRef and others that are
brought into 'clang' namespace by clang/Basic/LLVM.h

llvm-svn: 172323
2013-01-12 19:30:44 +00:00
Argyrios Kyrtzidis 86f1a935dc Pull the bulk of Lexer::MeasureTokenLength() out into a new function,
Lexer::getRawToken().

No functionality change.

llvm-svn: 171771
2013-01-07 19:16:18 +00:00
Richard Smith 2bf7fdb723 s/CPlusPlus0x/CPlusPlus11/g
llvm-svn: 171367
2013-01-02 11:42:31 +00:00
Chandler Carruth 3a02247dc9 Sort all of Clang's files under 'lib', and fix up the broken headers
uncovered.

This required manually correcting all of the incorrect main-module
headers I could find, and running the new llvm/utils/sort_includes.py
script over the files.

I also manually added quite a few missing headers that were uncovered by
shuffling the order or moving headers up to be main-module-headers.

llvm-svn: 169237
2012-12-04 09:13:33 +00:00
Richard Smith 9a67f47882 Teach Lexer::getSpelling about raw string literals. Specifically, if a raw
string literal needs cleaning (because it contains line-splicing in the
encoding prefix or in the ud-suffix), do not clean the section between the
double-quotes -- that's the "raw" bit!

llvm-svn: 168776
2012-11-28 07:29:00 +00:00
Nico Weber 4e270380c1 Fix crash on end-of-file after \ in a char literal, fixes PR14369.
This makes LexCharConstant() look more like LexStringLiteral(), which doesn't
have this bug. Add tests for eof after \ for several other cases.

llvm-svn: 168269
2012-11-17 20:25:54 +00:00
Eli Friedman b699e619fe Fix an assertion failure printing the unused-label fixit in files using CRLF line endings. <rdar://problem/12639047>.
llvm-svn: 167900
2012-11-14 01:28:38 +00:00
Daniel Dunbar cf3f2c49ea Revert r167801, "[preprocessor] When #including something that contributes no
tokens at all,". This change broke External/Nurbs in LLVM test-suite.

llvm-svn: 167858
2012-11-13 19:12:37 +00:00
Nico Weber 7cc28804e2 UCNs in char literals are done (in LiteralSupport), remove FIXME. Expand UCN FIXME in LexNumericConstant.
llvm-svn: 167818
2012-11-13 06:25:15 +00:00
Argyrios Kyrtzidis 4f10a3e9f0 [preprocessor] When #including something that contributes no tokens at all,
don't recursively continue lexing.

This avoids a stack overflow with a sequence of many empty #includes.
rdar://11988695

llvm-svn: 167801
2012-11-13 01:03:15 +00:00
Argyrios Kyrtzidis 36675b75fb In Lexer::LexTokenInternal, avoid code duplication; no functionality change.
llvm-svn: 167800
2012-11-13 01:02:40 +00:00
Nico Weber 158a31abe2 s/BCPLComment/LineComment/
llvm-svn: 167690
2012-11-11 07:02:14 +00:00
Argyrios Kyrtzidis d53d0daab9 Take into account that there may be a BOM at the beginning of the file,
when computing the size of the precompiled preamble.

llvm-svn: 166659
2012-10-25 01:51:45 +00:00
Dmitri Gribenko b8e9e7507e StringRef'ize Preprocessor::CreateString().
llvm-svn: 164555
2012-09-24 21:07:17 +00:00
Roman Divacky e637711ae0 Dont cast away const needlessly. Found by gcc48 -Wcast-qual.
llvm-svn: 163325
2012-09-06 15:59:27 +00:00
Eli Friedman 324adad966 Make a bunch of methods on Lexer private.
llvm-svn: 162970
2012-08-31 02:29:37 +00:00
Dmitri Gribenko 4aa05c571e Lexer: remove dead stores. Found by Clang static analyzer!
llvm-svn: 160973
2012-07-30 17:59:40 +00:00
Richard Smith 608c0b65d7 Add warning flag -Winvalid-pp-token for preprocessing-tokens which have
undefined behaviour, and move the diagnostic for '' from an Error into
an ExtWarn in this group. This is important for some users of the preprocessor,
and is necessary for gcc compatibility.

llvm-svn: 159335
2012-06-28 07:51:56 +00:00
James Dennett f442d2455b Documentation cleanup:
* Removed docs for Lexer::makeFileCharRange from Lexer.cpp, as they're in
  the header file;
* Reworked the documentation for SkipBlockComment so that it doesn't confuse
  Doxygen's comment parsing;
* Added another summary with \brief markup.

llvm-svn: 158618
2012-06-17 03:40:43 +00:00
Jordan Rose 127f6eef7e [-E] Emit a rewritten _Pragma on its own line.
1. Teach Lexer that pragma lexers are like macro expansions at EOF.
2. Treat pragmas like #define/#undef when printing.
3. If we just printed a directive, add a newline before any more tokens.
(4. Miscellaneous cleanup in PrintPreprocessedOutput.cpp)

PR10594 and <rdar://problem/11562490> (two separate related problems)

llvm-svn: 158571
2012-06-15 23:33:51 +00:00
James Dennett ff3c995624 Documentation cleanup: escape backslashes in Doxygen comments.
llvm-svn: 158552
2012-06-15 21:36:54 +00:00
Richard Smith e6799ddae8 PR12717: Clang supports hexadecimal floating-point literals in all language
modes. For languages other than C99/C11, this isn't quite a conforming
extension, and for C++11, it breaks some reasonable code containing
user-defined literals.

In languages which don't officially have hexfloats, pare back this extension
to only apply in cases where the token starts 0x and does not contain an
underscore. The extension is still not quite conforming, but it's a lot closer
now.

llvm-svn: 158487
2012-06-15 05:07:49 +00:00
David Blaikie 2af2b3071d Fix PR13065.
This condition (added in r158093) was overly conservative.

llvm-svn: 158483
2012-06-15 00:47:13 +00:00
Dmitri Gribenko 702b732d6f Correct method name in comment: from LexRawToken to LexFromRawLexer, according
to a change done long ago in r57393.

llvm-svn: 158243
2012-06-08 23:19:37 +00:00
Jordan Rose 288c421b3d Insert a space if necessary when suggesting CFBridgingRetain/Release.
This was a problem for people who write 'return(result);'

Also fix ARCMT's corresponding code, though there's no test case for this
because implicit casts like this are rejected by the migrator for being
ambiguous, and explicit casts have no problem.

<rdar://problem/11577346>

llvm-svn: 158130
2012-06-07 01:10:31 +00:00
David Blaikie d5321247c4 Add a -rewrite-includes option, which is similar to -rewrite-macros, but only expands #include directives.
Patch contributed by Lubos Lunak (l.lunax@suse.cz).
Review by Matt Beaumont-Gay (matthewbg@google.com).

llvm-svn: 158093
2012-06-06 18:52:13 +00:00
David Blaikie 987bcf9462 Escape \n and \r in doxycomment.
llvm-svn: 158091
2012-06-06 18:43:20 +00:00
Benjamin Kramer e5fbc6c85d Lexer::ReadToEndOfLine: Only build the string if it's actually used and do so in a less malloc-intensive way.
llvm-svn: 157064
2012-05-18 19:32:16 +00:00
Seth Cantrell e83c731cad Support -Wc++98-compat-pedantic as requested:
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20120409/056126.html

llvm-svn: 154655
2012-04-13 03:43:23 +00:00
Seth Cantrell 10ac7205ce C++11 no longer requires files to end with a newline
llvm-svn: 154643
2012-04-13 01:00:34 +00:00
Francois Pichet 7ebc4c1910 ext_reserved_user_defined_literal must not default to Error in MicrosoftMode. Hence create ext_ms_reserved_user_defined_literal that doesn't default to Error; otherwise MSVC headers won't parse.
Fixes PR12383.

llvm-svn: 154273
2012-04-07 23:09:23 +00:00
David Blaikie bbafb8a745 Unify naming of LangOptions variable/get function across the Clang stack (Lex to AST).
The member variable is always "LangOpts" and the member function is always "getLangOpts".

Reviewed by Chris Lattner

llvm-svn: 152536
2012-03-11 07:00:24 +00:00
Richard Smith 0df56f4a90 Implement C++11 [lex.ext]p10 for string and character literals: a ud-suffix not
starting with an underscore is ill-formed.

Since this rule rejects programs that were using <inttypes.h>'s macros, recover
from this error by treating the ud-suffix as a separate preprocessing-token,
with a DefaultError ExtWarn. The approach of treating such cases as two tokens
is under discussion for standardization, but is in any case a conforming
extension and allows existing codebases to keep building while the committee
makes up its mind.

Reword the warning on the definition of literal operators not starting with
underscores (which are, strangely, legal) to more explicitly state that such
operators can't be called by literals. Remove the special-case diagnostic for
hexfloats, since it was both triggering in the wrong cases and incorrect.

llvm-svn: 152287
2012-03-08 02:39:21 +00:00
Richard Smith 3e4a60a2cd Add -Wc++11-compat warning for string and character literals followed by
identifiers, in cases where those identifiers would be treated as
user-defined literal suffixes in C++11.

llvm-svn: 152198
2012-03-07 03:13:00 +00:00
Richard Smith d67aea28f6 User-defined literals: reject string and character UDLs in all places where the
grammar requires a string-literal and not a user-defined-string-literal. The
two constructs are still represented by the same TokenKind, in order to prevent
a combinatorial explosion of different kinds of token. A flag on Token tracks
whether a ud-suffix is present, in order to prevent clients from needing to look
at the token's spelling.

llvm-svn: 152098
2012-03-06 03:21:47 +00:00
Richard Smith e18f0faff2 Lexing support for user-defined literals. Currently these lex as the same token
kinds as the underlying string literals, and we silently drop the ud-suffix;
those issues will be fixed by subsequent patches.

llvm-svn: 152012
2012-03-05 04:02:15 +00:00
Argyrios Kyrtzidis 0d9e24b1db Change Lexer::makeFileCharRange() to have it accept a CharSourceRange
instead of a SourceRange, and handle the case where the range is
a char (not token) range.

llvm-svn: 149677
2012-02-03 05:58:29 +00:00
Argyrios Kyrtzidis abff5f1271 Improve Lexer::getImmediateMacroName to take into account inner macros
of macro arguments.

For "MAC1( MAC2(foo) )" and location of 'foo' token it would return
"MAC1" instead of "MAC2".

llvm-svn: 148704
2012-01-23 16:58:33 +00:00
Argyrios Kyrtzidis 85e7671b71 Enhance Lexer::makeFileCharRange to check for ranges inside a macro argument
expansion, in which case it returns a file range in the location where the
argument was spelled.

llvm-svn: 148551
2012-01-20 16:52:43 +00:00
Argyrios Kyrtzidis 7838a2bffb Introduce Lexer::getSourceText() that returns a string for the source
that the given source range encompasses.

llvm-svn: 148481
2012-01-19 15:59:19 +00:00
Argyrios Kyrtzidis a99e02d019 Introduce Lexer::makeFileCharRange() that accepts a token source range
and returns a character range with file locations.

llvm-svn: 148480
2012-01-19 15:59:14 +00:00
Argyrios Kyrtzidis 1b07c344b4 For Lexer's isAt[Start/End]OfMacroExpansion add an out parameter for the macro
start/end location.

It is commonly needed after calling the function; with this way we avoid
recalculating it.

llvm-svn: 148479
2012-01-19 15:59:08 +00:00
Anna Zaks 1bea4bf590 Refactor: Pull getImmediateMacroName() out of DiagnosticRenderer and
into Lexer and Preprocessor; making it widely available.

llvm-svn: 148410
2012-01-18 20:17:16 +00:00
Chandler Carruth 5b15a9be6a Two variables had been added for an assert, but their values were
re-computed rather than the variables be re-used just after the assert.
Just use the variables since we have them already. Fixes an unused
variable warning.

Also fix an 80-column violation.

llvm-svn: 148212
2012-01-15 09:03:45 +00:00
Argyrios Kyrtzidis 8a26c4de64 In Lexer::getCharAndSizeSlow[NoWarn] if we come up against
\<newline><newline>

don't consume the second newline.

Thanks to David Blaikie for pointing out the crash!

llvm-svn: 147138
2011-12-22 04:38:07 +00:00
Argyrios Kyrtzidis e5cdd080ba In Lexer::getCharAndSizeSlow[NoWarn] make sure we don't go over the end of the buffer
when the end of the buffer is immediately after an escaped newline.

Fixes http://llvm.org/PR10153.

llvm-svn: 147091
2011-12-21 20:19:55 +00:00
David Blaikie 68e081d606 Unweaken vtables as per http://llvm.org/docs/CodingStandards.html#ll_virtual_anch
llvm-svn: 146959
2011-12-20 02:48:34 +00:00
Benjamin Kramer 900f1defdd Remove assert from hot code path and add a clarifying comment.
The assert wasn't adding much value but slowed down Release+Asserts builds.

llvm-svn: 145082
2011-11-22 20:39:31 +00:00
Benjamin Kramer 3885737a1b Lexer: Don't throw away the hard work SSE did to find a slash.
We can reuse the information and avoid looping over all the bytes again.

llvm-svn: 145070
2011-11-22 18:56:46 +00:00
Ted Kremenek a08713ce86 Move about 20 random diagnostics under -W flags. Patch by Ahmed Charles!
llvm-svn: 142284
2011-10-17 21:47:53 +00:00
Richard Smith acd4d3d52a -Wc++98-compat warnings for the lexer.
This also adds a -Wc++98-compat-pedantic for warning on constructs which would
be diagnosed by -std=c++98 -pedantic (that is, it warns even on C++11 features
which we enable by default, with no warning, in C++98 mode).

llvm-svn: 142034
2011-10-15 01:18:56 +00:00
Douglas Gregor 227c352bae We do parse hexfloats in C++11; make it actually work.
llvm-svn: 141798
2011-10-12 18:51:02 +00:00
Richard Smith a9e33d44a6 Handle Perforce-style conflict markers like normal conflict markers. Perforce
swaps over the <<<< and >>>> markers, and uses shorter markers than traditional
tools.

llvm-svn: 141751
2011-10-12 00:37:51 +00:00
Abramo Bagnara e398e60611 Fixed exapnsion range for # and ##.
llvm-svn: 141012
2011-10-03 18:39:03 +00:00
Argyrios Kyrtzidis e6e67deeed Rename SourceLocation::getFileLocWithOffset -> getLocWithOffset.
It already works (and is useful with) macro locs as well.

llvm-svn: 140057
2011-09-19 20:40:19 +00:00
Francois Pichet 0706d203cf Rename LangOptions::Microsoft to LangOptions::MicrosoftExt to make it clear that this flag must be used only for Microsoft extensions and not emulation; to avoid confusion with the new LangOptions::MicrosoftMode flag.
Many of the code now under LangOptions::MicrosoftExt will eventually be moved under the LangOptions::MicrosoftMode flag.

llvm-svn: 139987
2011-09-17 17:15:52 +00:00
Benjamin Kramer 17ff23c708 Speed up BCPL comment lexing by looking aggressively for newlines and then scannig backwards to see if the newline is escaped.
3% speedup in preprocessing all of clang with -Eonly. Also includes a small testcase for coverage.

llvm-svn: 139116
2011-09-05 07:19:39 +00:00
Benjamin Kramer dbfb18a0a9 Use the Lexer's definition of whitespace here.
llvm-svn: 139115
2011-09-05 07:19:35 +00:00
Argyrios Kyrtzidis 5cec2aea3f Support code-completion for C++ inline methods and ObjC buffering methods.
Previously we would cut off the source file buffer at the code-completion
point; this impeded code-completion inside C++ inline methods and,
recently, with buffering ObjC methods.

Have the code-completion inserted into the source buffer so that it can
be buffered along with a method body. When we actually hit the code-completion
point the cut-off lexing or parsing.

Fixes rdar://10056932&8319466

llvm-svn: 139086
2011-09-04 03:32:15 +00:00
Argyrios Kyrtzidis a3deaeeb52 Fix Lexer::ComputePreamble when MaxLines parameter is non-zero.
The function was only counting lines that included tokens and not empty lines,
but MaxLines (mainly initiated to the line where the code-completion point resides)
is a count of overall lines (even empty ones).

llvm-svn: 139085
2011-09-04 03:32:04 +00:00
Douglas Gregor 081425343b Introduce support for a simple module import declaration, which
loads the named module. The syntax itself is intentionally hideous and
will be replaced at some later point with something more
palatable. For now, we're focusing on the semantics:
  - Module imports are handled first by the preprocessor (to get macro
  definitions) and then the same tokens are also handled by the parser
  (to get declarations). If both happen (as in normal compilation),
  the second one is redundant, because we currently have no way to
  hide macros or declarations when loading a module. Chris gets credit
  for this mad-but-workable scheme.
  - The Preprocessor now holds on to a reference to a module loader,
  which is responsible for loading named modules. CompilerInstance is
  the only important module loader: it now knows how to create and
  wire up an AST reader on demand to actually perform the module load.
  - We search for modules in the include path, using the module name
  with the suffix ".pcm" (precompiled module) for the file name. This
  is a temporary hack; we hope to improve the situation in the
  future.

llvm-svn: 138679
2011-08-26 23:56:07 +00:00
Argyrios Kyrtzidis 7aecbc7661 Make Lexer::ComputePreamble accept a LangOptions parameter, otherwise it may be
out-of-sync how a file is compiled. Patch by Matthias Kleine!

llvm-svn: 138580
2011-08-25 20:39:19 +00:00
Argyrios Kyrtzidis f6a3b0ca4b In Lexer::isAtEndOfMacroExpansion use SourceManager::isInFileID and avoid
the extra SourceManager::getFileID call.

llvm-svn: 138376
2011-08-23 21:02:30 +00:00
Argyrios Kyrtzidis 161868db4c Make Lexer::GetBeginningOfToken able to handle macro arg expansion locations.
llvm-svn: 137795
2011-08-17 00:31:23 +00:00
Craig Topper 54edccafc5 Add support for C++0x raw string literals.
llvm-svn: 137298
2011-08-11 04:06:15 +00:00
Anna Zaks 59a3c80717 Add a utility function to the Lexer, which makes it easier to find a token after the given location. (It is a generalized version of trans::findLocationAfterSemi from ArcMigrate, which will be changed to use the Lexer utility).
llvm-svn: 136268
2011-07-27 21:43:43 +00:00
Douglas Gregor fb65e592e0 Add support for C++0x unicode string and character literals, from Craig Topper!
llvm-svn: 136210
2011-07-27 05:40:30 +00:00
Chandler Carruth ee4c1d1298 Migrate 'Instantiation' data and API bits of SLocEntry to 'Expansion'
etc. With this I think essentially all of the SourceManager APIs are
converted. Comments and random other bits of cleanup should be all thats
left.

llvm-svn: 136057
2011-07-26 04:56:51 +00:00
Chandler Carruth 73ee5d7fae Convert InstantiationInfo and much of the related code to ExpansionInfo
and various other 'expansion' based terms. I've tried to reformat where
appropriate and catch as many references in comments but I'm going to do
several more passes. Also I've tried to expand parameter names to be
more clear where appropriate.

llvm-svn: 136056
2011-07-26 04:41:47 +00:00
Chandler Carruth 115b077f30 Rename create(MacroArg)InstantiationLoc to create(MacroArg)ExpansionLoc.
llvm-svn: 136054
2011-07-26 03:03:05 +00:00
Chandler Carruth ca757587a3 Rename SourceManager::getImmediateInstantiationRange to
getImmediateExpansionRange.

llvm-svn: 135960
2011-07-25 20:52:21 +00:00
Chandler Carruth 6d28d7f2a3 Rename SourceManager::getInstantiationRange to getExpansionRange.
llvm-svn: 135915
2011-07-25 16:56:02 +00:00
Chandler Carruth 35f5320d8e Mechanically rename SourceManager::getInstantiationLoc and
FullSourceLoc::getInstantiationLoc to ...::getExpansionLoc. This is part
of the API and documentation update from 'instantiation' as the term for
macros to 'expansion'.

llvm-svn: 135914
2011-07-25 16:49:02 +00:00
Chris Lattner 0e62c1cc0b remove unneeded llvm:: namespace qualifiers on some core types now that LLVM.h imports
them into the clang namespace.

llvm-svn: 135852
2011-07-23 10:55:15 +00:00
Joerg Sonnenberger da5d2b761a Spelling
llvm-svn: 135545
2011-07-20 00:14:37 +00:00
Douglas Gregor 925296b4c2 Revamp the SourceManager to separate the representation of parsed
source locations from source locations loaded from an AST/PCH file.

Previously, loading an AST/PCH file involved carefully pre-allocating
space at the beginning of the source manager for the source locations
and FileIDs that correspond to the prefix, and then appending the
source locations/FileIDs used for parsing the remaining translation
unit. This design forced us into loading PCH files early, as a prefix,
whic has become a rather significant limitation.

This patch splits the SourceManager space into two parts: for source
location "addresses", the lower values (growing upward) are used to
describe parsed code, while upper values (growing downward) are used
for source locations loaded from AST/PCH files. Similarly, positive
FileIDs are used to describe parsed code while negative FileIDs are
used to file/macro locations loaded from AST/PCH files. As a result,
we can load PCH/AST files even during parsing, making various
improvemnts in the future possible, e.g., teaching #include <foo.h> to
look for and load <foo.h.gch> if it happens to be already available.

This patch was originally written by Sebastian Redl, then brought
forward to the modern age by Jonathan Turner, and finally
polished/finished by me to be committed.

llvm-svn: 135484
2011-07-19 16:10:42 +00:00
Chandler Carruth e2c09ebcaa Convert terminology in the Lexer from 'instantiate' and variants to
'expand'. Also update the public API it provides to the new term, and
propagate that update to the various clients.

No functionality changed.

llvm-svn: 135138
2011-07-14 08:20:40 +00:00
Argyrios Kyrtzidis 61c58f7f43 Move SourceManager::isAt[Start/End]OfMacroInstantiation functions to the Lexer, since they depend on it now.
llvm-svn: 134644
2011-07-07 21:54:45 +00:00
Argyrios Kyrtzidis 41fb2d95a3 Make the Preprocessor more memory efficient and improve macro instantiation diagnostics.
When a macro instantiation occurs, reserve a SLocEntry chunk with length the
full length of the macro definition source. Set the spelling location of this chunk
to point to the start of the macro definition and any tokens that are lexed directly
from the macro definition will get a location from this chunk with the appropriate offset.

For any tokens that come from argument expansion, '##' paste operator, etc. have their
instantiation location point at the appropriate place in the instantiated macro definition
(the argument identifier and the '##' token respectively).
This improves macro instantiation diagnostics:

Before:

t.c:5:9: error: invalid operands to binary expression ('struct S' and 'int')
int y = M(/);
        ^~~~
t.c:5:11: note: instantiated from:
int y = M(/);
          ^

After:

t.c:5:9: error: invalid operands to binary expression ('struct S' and 'int')
int y = M(/);
        ^~~~
t.c:3:20: note: instantiated from:
\#define M(op) (foo op 3);
                ~~~ ^  ~
t.c:5:11: note: instantiated from:
int y = M(/);
          ^

The memory savings for a candidate boost library that abuses the preprocessor are:

- 32% less SLocEntries (37M -> 25M)
- 30% reduction in PCH file size (900M -> 635M)
- 50% reduction in memory usage for the SLocEntry table (1.6G -> 800M)

llvm-svn: 134587
2011-07-07 03:40:34 +00:00
Argyrios Kyrtzidis 2cfce18645 Allow Lexer::getLocForEndOfToken to return the location just passed the macro instantiation
if the location given points at the last token of the macro instantiation.

Fixes rdar://9045701.

llvm-svn: 133804
2011-06-24 17:58:59 +00:00