llvm-project

Commit Graph

Author	SHA1	Message	Date
Richard Smith	06d274fdb7	Add -Wc99-compat warning for C11 unicode string and character literals. llvm-svn: 176817	2013-03-11 18:01:42 +00:00
Richard Smith	9b36209e31	When lexing in C11 mode, accept unicode character and string literals, per C11 6.4.4.4/1 and 6.4.5/1. llvm-svn: 176780	2013-03-09 23:56:02 +00:00
Jordan Rose	864b810739	Preprocessor: don't consider // to be a line comment in -E -std=c89 mode. It's beneficial when compiling to treat // as the start of a line comment even in -std=c89 mode, since it's not valid C code (with a few rare exceptions) and is usually intended as such. We emit a pedantic warning and then continue on as if line comments were enabled. This has been our behavior for quite some time. However, people use the preprocessor for things besides C source files. In today's prompting example, the input contains (unquoted) URLs, which contain // but should still be preserved. This change instructs the lexer to treat // as a plain token if Clang is in C90 mode and generating preprocessed output rather than actually compiling. <rdar://problem/13338743> llvm-svn: 176526	2013-03-05 22:51:04 +00:00
Jordan Rose	cb8a1aca35	Preprocessor: preserve whitespace in -traditional-cpp mode. Note that unlike GNU cpp we currently do not preserve whitespace in macros (even in -traditional-cpp mode). <rdar://problem/12897179> llvm-svn: 175778	2013-02-21 18:53:19 +00:00
Jordan Rose	58c61e006f	Properly validate UCNs for C99 and C++03 (both more restrictive than C(++)11). Add warnings under -Wc++11-compat, -Wc++98-compat, and -Wc99-compat when a particular UCN is incompatible with a different standard, and -Wunicode when a UCN refers to a surrogate character in C++03. llvm-svn: 174788	2013-02-09 01:10:25 +00:00
Jordan Rose	a2100d755a	Pull Lexer's CharInfo table out for general use throughout Clang. Rewriting the same predicates over and over again is bad for code size and code maintainence. Using the functions in <ctype.h> is generally unsafe unless they are specified to be locale-independent (i.e. only isdigit and isxdigit). The next commit will try to clean up uses of <ctype.h> functions within Clang. llvm-svn: 174765	2013-02-08 22:30:22 +00:00
Jordan Rose	cc538345be	Lexer: Don't warn about Unicode in preprocessor directives. This allows people to use Unicode in their #pragma mark and in macros that exist only to be string-ized. <rdar://problem/13107323&13121362> llvm-svn: 174081	2013-01-31 19:48:48 +00:00
Jordan Rose	f649795f84	Fix r173881 to properly skip invalid UTF-8 characters in raw lexing and -E. This caused hangs as we processed the same invalid byte over and over. <rdar://problem/13115651> llvm-svn: 173959	2013-01-30 19:21:12 +00:00
Dmitri Gribenko	9feeef40f5	Move UTF conversion routines from clang/lib/Basic to llvm/lib/Support This is required to use them in TableGen. llvm-svn: 173924	2013-01-30 12:06:08 +00:00
Jordan Rose	17441589c3	Don't warn about Unicode characters in -E mode. People use the C preprocessor for things other than C files. Some of them have Unicode characters. We shouldn't warn about Unicode characters appearing outside of identifiers in this case. There's not currently a way for the preprocessor to tell if it's in -E mode, so I added a new flag, derived from the PreprocessorOutputOptions. This is only used by the Unicode warnings for now, but could conceivably be used by other warnings or even behavioral differences later. <rdar://problem/13107323> llvm-svn: 173881	2013-01-30 01:52:57 +00:00
Jordan Rose	cccbdbf0db	PR15067 (again): Don't warn about UCNs in C90 if we're raw-lexing. Fixes a crash. Thanks, Richard. llvm-svn: 173701	2013-01-28 17:49:02 +00:00
Jordan Rose	c0cba27230	PR15067: Don't assert when a UCN appears in a C90 file. Unfortunately, we can't accept the UCN as an extension because we're required to treat it as two tokens for preprocessing purposes. llvm-svn: 173622	2013-01-27 20:12:04 +00:00
NAKAMURA Takumi	e8f83dbbd8	Lexer.cpp: Fix a warning with ptrdiff_t on i686. [-Wsign-compare] llvm-svn: 173447	2013-01-25 14:57:21 +00:00
Jordan Rose	8b4af2ae88	Clarify comment: "diagnose" is better than "warn" when emitting an error. Thanks, Dmitri. llvm-svn: 173400	2013-01-25 00:20:28 +00:00
Jordan Rose	62db5066e9	Add a fixit for \U1234 -> \u1234. llvm-svn: 173371	2013-01-24 20:50:52 +00:00
Jordan Rose	4246ae0089	As an extension, treat Unicode whitespace characters as whitespace. llvm-svn: 173370	2013-01-24 20:50:50 +00:00
Jordan Rose	7f43dddae0	Handle universal character names and Unicode characters outside of literals. This is a missing piece for C99 conformance. This patch handles UCNs by adding a '\\' case to LexTokenInternal and LexIdentifier -- if we see a backslash, we tentatively try to read in a UCN. If the UCN is not syntactically well-formed, we fall back to the old treatment: a backslash followed by an identifier beginning with 'u' (or 'U'). Because the spelling of an identifier with UCNs still has the UCN in it, we need to convert that to UTF-8 in Preprocessor::LookUpIdentifierInfo. Of course, valid code that does not use UCNs will see only a very minimal performance hit (checks after each identifier for non-ASCII characters, checks when converting raw_identifiers to identifiers that they do not contain UCNs, and checks when getting the spelling of an identifier that it does not contain a UCN). This patch also adds basic support for actual UTF-8 in the source. This is treated almost exactly the same as UCNs except that we consider stray Unicode characters to be mistakes and offer a fixit to remove them. llvm-svn: 173369	2013-01-24 20:50:46 +00:00
Dmitri Gribenko	f857950d39	Remove useless 'llvm::' qualifier from names like StringRef and others that are brought into 'clang' namespace by clang/Basic/LLVM.h llvm-svn: 172323	2013-01-12 19:30:44 +00:00
Argyrios Kyrtzidis	86f1a935dc	Pull the bulk of Lexer::MeasureTokenLength() out into a new function, Lexer::getRawToken(). No functionality change. llvm-svn: 171771	2013-01-07 19:16:18 +00:00
Richard Smith	2bf7fdb723	s/CPlusPlus0x/CPlusPlus11/g llvm-svn: 171367	2013-01-02 11:42:31 +00:00
Chandler Carruth	3a02247dc9	Sort all of Clang's files under 'lib', and fix up the broken headers uncovered. This required manually correcting all of the incorrect main-module headers I could find, and running the new llvm/utils/sort_includes.py script over the files. I also manually added quite a few missing headers that were uncovered by shuffling the order or moving headers up to be main-module-headers. llvm-svn: 169237	2012-12-04 09:13:33 +00:00
Richard Smith	9a67f47882	Teach Lexer::getSpelling about raw string literals. Specifically, if a raw string literal needs cleaning (because it contains line-splicing in the encoding prefix or in the ud-suffix), do not clean the section between the double-quotes -- that's the "raw" bit! llvm-svn: 168776	2012-11-28 07:29:00 +00:00
Nico Weber	4e270380c1	Fix crash on end-of-file after \ in a char literal, fixes PR14369. This makes LexCharConstant() look more like LexStringLiteral(), which doesn't have this bug. Add tests for eof after \ for several other cases. llvm-svn: 168269	2012-11-17 20:25:54 +00:00
Eli Friedman	b699e619fe	Fix an assertion failure printing the unused-label fixit in files using CRLF line endings. <rdar://problem/12639047>. llvm-svn: 167900	2012-11-14 01:28:38 +00:00
Daniel Dunbar	cf3f2c49ea	Revert r167801, "[preprocessor] When #including something that contributes no tokens at all,". This change broke External/Nurbs in LLVM test-suite. llvm-svn: 167858	2012-11-13 19:12:37 +00:00
Nico Weber	7cc28804e2	UCNs in char literals are done (in LiteralSupport), remove FIXME. Expand UCN FIXME in LexNumericConstant. llvm-svn: 167818	2012-11-13 06:25:15 +00:00
Argyrios Kyrtzidis	4f10a3e9f0	[preprocessor] When #including something that contributes no tokens at all, don't recursively continue lexing. This avoids a stack overflow with a sequence of many empty #includes. rdar://11988695 llvm-svn: 167801	2012-11-13 01:03:15 +00:00
Argyrios Kyrtzidis	36675b75fb	In Lexer::LexTokenInternal, avoid code duplication; no functionality change. llvm-svn: 167800	2012-11-13 01:02:40 +00:00
Nico Weber	158a31abe2	s/BCPLComment/LineComment/ llvm-svn: 167690	2012-11-11 07:02:14 +00:00
Argyrios Kyrtzidis	d53d0daab9	Take into account that there may be a BOM at the beginning of the file, when computing the size of the precompiled preamble. llvm-svn: 166659	2012-10-25 01:51:45 +00:00
Dmitri Gribenko	b8e9e7507e	StringRef'ize Preprocessor::CreateString(). llvm-svn: 164555	2012-09-24 21:07:17 +00:00
Roman Divacky	e637711ae0	Dont cast away const needlessly. Found by gcc48 -Wcast-qual. llvm-svn: 163325	2012-09-06 15:59:27 +00:00
Eli Friedman	324adad966	Make a bunch of methods on Lexer private. llvm-svn: 162970	2012-08-31 02:29:37 +00:00
Dmitri Gribenko	4aa05c571e	Lexer: remove dead stores. Found by Clang static analyzer! llvm-svn: 160973	2012-07-30 17:59:40 +00:00
Richard Smith	608c0b65d7	Add warning flag -Winvalid-pp-token for preprocessing-tokens which have undefined behaviour, and move the diagnostic for '' from an Error into an ExtWarn in this group. This is important for some users of the preprocessor, and is necessary for gcc compatibility. llvm-svn: 159335	2012-06-28 07:51:56 +00:00
James Dennett	f442d2455b	Documentation cleanup: * Removed docs for Lexer::makeFileCharRange from Lexer.cpp, as they're in the header file; * Reworked the documentation for SkipBlockComment so that it doesn't confuse Doxygen's comment parsing; * Added another summary with \brief markup. llvm-svn: 158618	2012-06-17 03:40:43 +00:00
Jordan Rose	127f6eef7e	[-E] Emit a rewritten _Pragma on its own line. 1. Teach Lexer that pragma lexers are like macro expansions at EOF. 2. Treat pragmas like #define/#undef when printing. 3. If we just printed a directive, add a newline before any more tokens. (4. Miscellaneous cleanup in PrintPreprocessedOutput.cpp) PR10594 and <rdar://problem/11562490> (two separate related problems) llvm-svn: 158571	2012-06-15 23:33:51 +00:00
James Dennett	ff3c995624	Documentation cleanup: escape backslashes in Doxygen comments. llvm-svn: 158552	2012-06-15 21:36:54 +00:00
Richard Smith	e6799ddae8	PR12717: Clang supports hexadecimal floating-point literals in all language modes. For languages other than C99/C11, this isn't quite a conforming extension, and for C++11, it breaks some reasonable code containing user-defined literals. In languages which don't officially have hexfloats, pare back this extension to only apply in cases where the token starts 0x and does not contain an underscore. The extension is still not quite conforming, but it's a lot closer now. llvm-svn: 158487	2012-06-15 05:07:49 +00:00
David Blaikie	2af2b3071d	Fix PR13065. This condition (added in r158093) was overly conservative. llvm-svn: 158483	2012-06-15 00:47:13 +00:00
Dmitri Gribenko	702b732d6f	Correct method name in comment: from LexRawToken to LexFromRawLexer, according to a change done long ago in r57393. llvm-svn: 158243	2012-06-08 23:19:37 +00:00
Jordan Rose	288c421b3d	Insert a space if necessary when suggesting CFBridgingRetain/Release. This was a problem for people who write 'return(result);' Also fix ARCMT's corresponding code, though there's no test case for this because implicit casts like this are rejected by the migrator for being ambiguous, and explicit casts have no problem. <rdar://problem/11577346> llvm-svn: 158130	2012-06-07 01:10:31 +00:00
David Blaikie	d5321247c4	Add a -rewrite-includes option, which is similar to -rewrite-macros, but only expands #include directives. Patch contributed by Lubos Lunak (l.lunax@suse.cz). Review by Matt Beaumont-Gay (matthewbg@google.com). llvm-svn: 158093	2012-06-06 18:52:13 +00:00
David Blaikie	987bcf9462	Escape \n and \r in doxycomment. llvm-svn: 158091	2012-06-06 18:43:20 +00:00
Benjamin Kramer	e5fbc6c85d	Lexer::ReadToEndOfLine: Only build the string if it's actually used and do so in a less malloc-intensive way. llvm-svn: 157064	2012-05-18 19:32:16 +00:00
Seth Cantrell	e83c731cad	Support -Wc++98-compat-pedantic as requested: http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20120409/056126.html llvm-svn: 154655	2012-04-13 03:43:23 +00:00
Seth Cantrell	10ac7205ce	C++11 no longer requires files to end with a newline llvm-svn: 154643	2012-04-13 01:00:34 +00:00
Francois Pichet	7ebc4c1910	ext_reserved_user_defined_literal must not default to Error in MicrosoftMode. Hence create ext_ms_reserved_user_defined_literal that doesn't default to Error; otherwise MSVC headers won't parse. Fixes PR12383. llvm-svn: 154273	2012-04-07 23:09:23 +00:00
David Blaikie	bbafb8a745	Unify naming of LangOptions variable/get function across the Clang stack (Lex to AST). The member variable is always "LangOpts" and the member function is always "getLangOpts". Reviewed by Chris Lattner llvm-svn: 152536	2012-03-11 07:00:24 +00:00
Richard Smith	0df56f4a90	Implement C++11 [lex.ext]p10 for string and character literals: a ud-suffix not starting with an underscore is ill-formed. Since this rule rejects programs that were using <inttypes.h>'s macros, recover from this error by treating the ud-suffix as a separate preprocessing-token, with a DefaultError ExtWarn. The approach of treating such cases as two tokens is under discussion for standardization, but is in any case a conforming extension and allows existing codebases to keep building while the committee makes up its mind. Reword the warning on the definition of literal operators not starting with underscores (which are, strangely, legal) to more explicitly state that such operators can't be called by literals. Remove the special-case diagnostic for hexfloats, since it was both triggering in the wrong cases and incorrect. llvm-svn: 152287	2012-03-08 02:39:21 +00:00

1 2 3 4 5

233 Commits