Commit Graph

171 Commits

Author SHA1 Message Date
Jordan Rose c0cba27230 PR15067: Don't assert when a UCN appears in a C90 file.
Unfortunately, we can't accept the UCN as an extension because we're
required to treat it as two tokens for preprocessing purposes.

llvm-svn: 173622
2013-01-27 20:12:04 +00:00
Jordan Rose aa89cf1a66 Unify diagnostics for \x, \u, and \U without any following hex digits.
llvm-svn: 173368
2013-01-24 20:50:13 +00:00
Jordan Rose 78ed86a7e5 Adopt llvm::hexDigitValue.
llvm-svn: 172861
2013-01-18 22:33:58 +00:00
Richard Smith 2bf7fdb723 s/CPlusPlus0x/CPlusPlus11/g
llvm-svn: 171367
2013-01-02 11:42:31 +00:00
Chandler Carruth 3a02247dc9 Sort all of Clang's files under 'lib', and fix up the broken headers
uncovered.

This required manually correcting all of the incorrect main-module
headers I could find, and running the new llvm/utils/sort_includes.py
script over the files.

I also manually added quite a few missing headers that were uncovered by
shuffling the order or moving headers up to be main-module-headers.

llvm-svn: 169237
2012-12-04 09:13:33 +00:00
Benjamin Kramer 7d574e269d LiteralSupport: Don't overflow the temporary buffer when decoding invalid string parts.
Instead just use a dummy buffer, we're not going to use the decoded string anyways.
Fixes PR14292.

llvm-svn: 167594
2012-11-08 19:22:31 +00:00
Benjamin Kramer f23a6e6f80 LiteralSupport: Clean up style violations. No functionality change.
llvm-svn: 167593
2012-11-08 19:22:26 +00:00
David Blaikie a0613170b4 Handle string encoding diagnostics when there are too many invalid ranges.
llvm-svn: 167059
2012-10-30 23:22:22 +00:00
Seth Cantrell 4cfc817a9a improve highlighting of invalid string encodings
limit highlight to exactly the bad encoding, and highlight every
bad encoding in a string.

llvm-svn: 166900
2012-10-28 18:24:46 +00:00
Jordan Rose de584de370 Rename CanFitInto64Bits to alwaysFitsInto64Bits per discussion on IRC.
This makes the behavior clearer concerning literals with the maximum
number of digits. For a 32-bit example, 4,000,000,000 is a valid uint32_t,
but 5,000,000,000 is not, so we'd have to count 10-digit decimal numbers
as "unsafe" (meaning we have to check for overflow when parsing them,
just as we would for numbers with 11 digits or higher). This is the same,
only with 64 bits to play with.

No functionality change.

llvm-svn: 164639
2012-09-25 22:32:51 +00:00
Dmitri Gribenko 511288b2b5 Optimize NumericLiteralParser::GetIntegerValue().
It does a conservative estimate on the size of numbers that can fit into
uint64_t.  This bound is improved.

llvm-svn: 164624
2012-09-25 19:09:15 +00:00
Dmitri Gribenko 7ba91723e7 Small cleanup of literal semantic analysis: hiding 'char *' pointers behind
StringRef makes code cleaner.  Also, make the temporary buffer smaller:
512 characters is unreasonably large for integer literals.

llvm-svn: 164484
2012-09-24 09:53:54 +00:00
Richard Smith 639b8d05dd When a bad UTF-8 encoding or bogus escape sequence is encountered in a
string literal, produce a diagnostic pointing at the erroneous character
range, not at the start of the literal.

llvm-svn: 163459
2012-09-08 07:16:20 +00:00
Nico Weber 4b18c3ff40 Share ConvertUTF8toWide() between Lex and CodeGen.
llvm-svn: 159634
2012-07-03 02:24:52 +00:00
James Dennett 99c193b3c0 Documentation cleanup: add \verbatim markup for grammar productions
llvm-svn: 158740
2012-06-19 21:04:25 +00:00
James Dennett 1cc2203286 Documentation cleanup: added \verbatim...\verbatim markup to fix the
formatting of Doxygen's output for StringLiteralParser::StringLiteralParser.

llvm-svn: 158616
2012-06-17 03:34:42 +00:00
Richard Smith 0948d93b7f Fix off-by-one error in UTF-16 encoding: don't try to use a surrogate pair for U+FFFF.
llvm-svn: 158391
2012-06-13 05:41:29 +00:00
Richard Smith 4060f77462 PR13099: Teach -Wformat about raw string literals, UTF-8 strings and Unicode escape sequences.
llvm-svn: 158390
2012-06-13 05:37:23 +00:00
Argyrios Kyrtzidis 9933e3ac88 In StringLiteralParser::init, make sure we emit an error when
failing to lex the string, as suggested by Eli.

Part of rdar://11305263.

llvm-svn: 156081
2012-05-03 17:50:32 +00:00
Argyrios Kyrtzidis 4e5b5c36f4 In StringLiteralParser::init(), fail gracefully if the string is
not as we expect; it may be due to racing issue of a file coming from PCH
changing after the PCH is loaded.

rdar://11353109

llvm-svn: 156043
2012-05-03 01:01:56 +00:00
David Blaikie bbafb8a745 Unify naming of LangOptions variable/get function across the Clang stack (Lex to AST).
The member variable is always "LangOpts" and the member function is always "getLangOpts".

Reviewed by Chris Lattner

llvm-svn: 152536
2012-03-11 07:00:24 +00:00
Richard Smith 2a70e65436 Improve diagnostics for UCNs referring to control characters and members of the
basic source character set in C++98. Add -Wc++98-compat diagnostics for same in
literals in C++11. Extend such support to cover string literals as well as
character literals, and mark N2170 as done.

This seems too minor to warrant a release note to me. Let me know if you disagree.

llvm-svn: 152444
2012-03-09 22:27:51 +00:00
Richard Smith 812924502b When checking the encoding of an 8-bit string literal, don't just check the
first codepoint! Also, don't reject empty raw string literals for spurious
"encoding" issues. Also, don't rely on undefined behavior in ConvertUTF.c.

llvm-svn: 152344
2012-03-08 21:59:28 +00:00
Richard Smith 39570d0020 Add support for cooked forms of user-defined-integer-literal and
user-defined-floating-literal. Support for raw forms of these literals
to follow.

llvm-svn: 152302
2012-03-08 08:45:32 +00:00
Richard Smith 75b67d6dc5 User-defined literal support for character literals.
llvm-svn: 152277
2012-03-08 01:34:56 +00:00
Richard Smith e18f0faff2 Lexing support for user-defined literals. Currently these lex as the same token
kinds as the underlying string literals, and we silently drop the ud-suffix;
those issues will be fixed by subsequent patches.

llvm-svn: 152012
2012-03-05 04:02:15 +00:00
Eli Friedman 9436352a82 Implement warning for non-wide string literals with an unexpected encoding. Downgrade error for non-wide character literals with an unexpected encoding to a warning for compatibility with gcc and older versions of clang. <rdar://problem/10837678>.
llvm-svn: 150295
2012-02-11 05:08:10 +00:00
Aaron Ballman e1224a5067 Fixing hex floating literal support so that it handles 0x.2p2 properly.
llvm-svn: 150072
2012-02-08 13:36:33 +00:00
Aaron Ballman b97a5addd5 Hex literals without a significand no longer crash the lexer. Fixes bug 7910
Patch by Eitan Adler

llvm-svn: 149984
2012-02-07 13:46:03 +00:00
Dylan Noblesmith 2c1dd2716a Basic: import SmallString<> into clang namespace
(I was going to fix the TODO about DenseMap too, but
that would break self-host right now. See PR11922.)

llvm-svn: 149799
2012-02-05 02:13:05 +00:00
Seth Cantrell 9c2d6f0279 stop claiming unicode escape sequences are too long in strings, because they never are
llvm-svn: 148391
2012-01-18 12:27:08 +00:00
Seth Cantrell 8b2b677f39 Improves support for Unicode in character literals
Updates ProcessUCNExcape() for C++. C++11 allows UCNs in character
and string literals that represent control characters and basic
source characters. Also C++03 allows UCNs that refer to surrogate
codepoints.

UTF-8 sequences in character literals are now handled as single
c-chars.

Added error for multiple characters in Unicode character literals.

Added errors for when a the execution charset encoding of a c-char
cannot be represented as a single code unit in the associated
character type. Note that for the purposes of this error the asso-
ciated character type for a narrow character literal is char, not
int, even though in C narrow character literals have type int.

llvm-svn: 148389
2012-01-18 12:27:04 +00:00
Nico Weber d60b72f696 Fix a regression in wide character codegen. See PR11369.
llvm-svn: 144521
2011-11-14 05:17:37 +00:00
Eli Friedman 20554708fb Fix one last place where we weren't writing into a string literal consistently.
llvm-svn: 143769
2011-11-05 00:41:04 +00:00
Eli Friedman d1370791c2 Use native endianness for writing out character escapes to the result buffer for string literal parsing. No functional change on little-endian architectures; should fix test failures on PPC.
llvm-svn: 143585
2011-11-02 23:06:23 +00:00
Eli Friedman 703e7153af Perform proper conversion for strings encoded in the source file as UTF-8. (For now, we are assuming the source character set is always UTF-8; this can be easily extended if necessary.)
Tests will be coming up in a subsequent commit.

Patch by Seth Cantrell.

llvm-svn: 143416
2011-11-01 02:14:50 +00:00
Douglas Gregor 227c352bae We do parse hexfloats in C++11; make it actually work.
llvm-svn: 141798
2011-10-12 18:51:02 +00:00
Douglas Gregor 4d68366b2f When parsing a character literal, extract the characters from the
buffer as an 'unsigned char', so that integer promotion doesn't
sign-extend character values > 127 into oblivion. Fixes
<rdar://problem/10188919>.

llvm-svn: 140608
2011-09-27 17:00:18 +00:00
David Blaikie 9c902b5502 Rename Diagnostic to DiagnosticsEngine as per issue 5397
llvm-svn: 140478
2011-09-25 23:23:43 +00:00
David Blaikie 76bd3c80d4 Fix missing includes for llvm_unreachable
llvm-svn: 140368
2011-09-23 05:35:21 +00:00
David Blaikie 83d382b1ca Switch assert(0/false) llvm_unreachable.
llvm-svn: 140367
2011-09-23 05:06:16 +00:00
Francois Pichet 0706d203cf Rename LangOptions::Microsoft to LangOptions::MicrosoftExt to make it clear that this flag must be used only for Microsoft extensions and not emulation; to avoid confusion with the new LangOptions::MicrosoftMode flag.
Many of the code now under LangOptions::MicrosoftExt will eventually be moved under the LangOptions::MicrosoftMode flag.

llvm-svn: 139987
2011-09-17 17:15:52 +00:00
Douglas Gregor 86325ad2b5 Allow C99 hexfloats in C++0x mode. This change resolves the standards
collision between C99 hexfloats and C++0x user-defined literals by
giving C99 hexfloats precedence. Also, warning about user-defined
literals that conflict with hexfloats and those that have names that
are reserved by the implementation. Fixes <rdar://problem/9940194>.

llvm-svn: 138839
2011-08-30 22:40:35 +00:00
Craig Topper 6eb2058a6a Warn about and truncate UCNs that are too big for their character literal type.
llvm-svn: 138031
2011-08-19 03:20:12 +00:00
NAKAMURA Takumi 9f8a02d34e De-Unicode-ify.
llvm-svn: 137430
2011-08-12 05:49:51 +00:00
Craig Topper 5265bb211d Raw string followup. Pass a couple StringRefs by value.
llvm-svn: 137301
2011-08-11 05:10:55 +00:00
Craig Topper 54edccafc5 Add support for C++0x raw string literals.
llvm-svn: 137298
2011-08-11 04:06:15 +00:00
Craig Topper 61147ed270 Fix comment (test commit)
llvm-svn: 137039
2011-08-08 06:10:39 +00:00
Douglas Gregor fb65e592e0 Add support for C++0x unicode string and character literals, from Craig Topper!
llvm-svn: 136210
2011-07-27 05:40:30 +00:00
Chris Lattner 0e62c1cc0b remove unneeded llvm:: namespace qualifiers on some core types now that LLVM.h imports
them into the clang namespace.

llvm-svn: 135852
2011-07-23 10:55:15 +00:00
Argyrios Kyrtzidis 8b7252a8b3 Fix a nasty bug where inside StringLiteralParser:
1. We would assume that the length of the string literal token was at least 2
2. We would allocate a buffer with size length-2

And when the stars aligned (one of which would be an invalid source location due to stale PCH)
The length would be 0 and we would try to allocate a 4GB buffer.

Add checks for this corner case and a bunch of asserts.
(We really really should have had an assert for 1.).

Note that there's no test case since I couldn't get one (it was major PITA to reproduce),
maybe later.

llvm-svn: 131492
2011-05-17 22:09:56 +00:00
Chris Lattner 57540c5be0 fix a bunch of comment typos found by codespell. Patch by
Luis Felipe Strano Moraes!

llvm-svn: 129559
2011-04-15 05:22:18 +00:00
Francois Pichet 12df1dc8f2 Microsoft integer suffix changes:
i64 is like LL
i32 is like L

Also set isMicrosoftInteger to true only if the suffix is well formed.

llvm-svn: 123230
2011-01-11 11:57:53 +00:00
Ted Kremenek 8c4c74f4fb Fix diagnostic for reporting bad escape sequence.
Patch by Paul Curtis!

llvm-svn: 120759
2010-12-03 00:09:56 +00:00
Chris Lattner 39720111e0 move getSpelling from Preprocessor to Lexer, which it is more conceptually related to.
llvm-svn: 119479
2010-11-17 07:26:20 +00:00
Chris Lattner 6bab435db6 propagate preprocessor out of StringLiteralParser. It is now
possible to create one without a preprocessor.

llvm-svn: 119476
2010-11-17 07:21:13 +00:00
Chris Lattner 2be8aa9611 push the preprocessor out of EncodeUCNEscape
llvm-svn: 119475
2010-11-17 07:12:42 +00:00
Chris Lattner 2a6ee91619 move AdvanceToTokenCharacter and getLocForEndOfToken from
Preprocessor to Lexer where they make more sense.

llvm-svn: 119474
2010-11-17 07:05:50 +00:00
Chris Lattner b1ab2c2d3d add a static version of PP::AdvanceToTokenCharacter.
llvm-svn: 119472
2010-11-17 06:55:10 +00:00
Chris Lattner bde1b81eb8 push use of Preprocessor out farther.
llvm-svn: 119471
2010-11-17 06:46:14 +00:00
Chris Lattner 3a324d3232 push use of Preprocessor out of getOffsetOfStringByte
llvm-svn: 119470
2010-11-17 06:35:43 +00:00
Chris Lattner 30d4c928ac add a static form of the efficient PP::getSpelling method.
llvm-svn: 119469
2010-11-17 06:31:48 +00:00
Chris Lattner 7a02bfdfce refactor the interface to StringLiteralParser::getOffsetOfStringByte,
pushing the dependency on the preprocessor out a bit.

llvm-svn: 119468
2010-11-17 06:26:08 +00:00
Chris Lattner 26f6c227dc allow I128 suffixes in msextensions mode just like i128 suffixes, patch
by Martin Vejnar!

llvm-svn: 116460
2010-10-14 00:24:10 +00:00
Nico Weber a6bde81bc8 Add support for UCNs for character literals
llvm-svn: 116129
2010-10-09 00:27:47 +00:00
Nico Weber 9762e0a234 Add support for 4-byte UCNs like \U12345678. Warn about UCNs in c90 mode.
llvm-svn: 115743
2010-10-06 04:57:26 +00:00
Fariborz Jahanian 39de024e66 Prevent warning when built with assert off.
llvm-svn: 112680
2010-08-31 23:54:38 +00:00
Fariborz Jahanian abaae2b692 Some support for unicode string constants
in wide strings. radar 8360841.

llvm-svn: 112672
2010-08-31 23:34:27 +00:00
Alexis Hunt 3b7918625c Revert my user-defined literal commits - r1124{58,60,67} pending
some issues being sorted out.

llvm-svn: 112493
2010-08-30 17:47:05 +00:00
Alexis Hunt 79eb5469e0 Implement C++0x user-defined string literals.
The extra data stored on user-defined literal Tokens is stored in extra
allocated memory, which is managed by the PreprocessorLexer because there isn't
a better place to put it that makes sure it gets deallocated, but only after
it's used up. My testing has shown no significant slowdown as a result, but
independent testing would be appreciated.

llvm-svn: 112458
2010-08-29 21:26:48 +00:00
Benjamin Kramer e8394df11b Random temporary string cleanup.
llvm-svn: 110807
2010-08-11 14:47:12 +00:00
Douglas Gregor b37b46e488 Complain when string literals are too long for the active language
standard's minimum requirements.

llvm-svn: 108837
2010-07-20 14:33:20 +00:00
Chris Lattner c548be9ab3 Remove a dead argument to ProcessUCNEscape.
Fix string concatenation to treat escapes in concatenated strings that
are wide because of other string chunks to process the escapes as wide
themselves.  Before we would warn about and miscompile the attached testcase.

This fixes rdar://8040728 - miscompile + warning: hex escape sequence out of range

llvm-svn: 106012
2010-06-15 18:06:43 +00:00
Fariborz Jahanian 93bef10131 Fix a miscompile of wchar pascal strings.
(radar 8020384)

llvm-svn: 104996
2010-05-28 19:40:48 +00:00
Douglas Gregor 9af03022ff Tell the string literal parser when it's not permitted to emit
diagnostics. That would be while we're parsing string literals for the
sole purpose of producing a diagnostic about them. Fixes
<rdar://problem/8026030>.

llvm-svn: 104684
2010-05-26 05:35:51 +00:00
Chris Lattner 1cf5bdd03d emit warn_char_constant_too_large at most once per literal, fixing PR6852
llvm-svn: 101580
2010-04-16 23:44:05 +00:00
Douglas Gregor 7bda4b8310 Introduce optional "Invalid" parameters to routines that invoke the
SourceManager's getBuffer() and, therefore, could fail, along with
Preprocessor::getSpelling(). Use the Invalid parameters in the literal
parsers (string, floating point, integral, character) to make them
robust against errors that stem from, e.g., PCH files that are not
consistent with the underlying file system.

I still need to audit every use caller to all of these routines, to
determine which ones need specific handling of error conditions.

llvm-svn: 98608
2010-03-16 05:20:39 +00:00
Fariborz Jahanian 8c6c0b6a1f ui64, etc. are valid VS suffixes.
Fixes radar 7562363.

llvm-svn: 94224
2010-01-22 21:36:53 +00:00
Alexis Hunt 91b78382b5 Do not parse hexadecimal floating point literals in C++0x mode because they are
incompatible with user-defined literals, specifically with the following form:

  0x1p+1

The preprocessing-number token extends only as far as the 'p'; the '+' is not
included. Previously we could get away with this extension as p was an invalid
suffix, but now with user-defined literals, 'p' might well be a valid suffix
and we are forced to consider it as such.

This patch also adds a warning in non-0x C++ modes telling the user that
this extension is incompatible with C++0x that is enabled by default
(previously and with other languages, we warn only with a compliance
option such as -pedantic).

llvm-svn: 93135
2010-01-10 23:37:56 +00:00
John McCall 53b93a091e Diagnose out-of-bounds floating-point constants. Fixes rdar://problem/6974641
llvm-svn: 92127
2009-12-24 09:08:04 +00:00
John McCall 230a5d527e Eliminate a completely unnecessary buffer copy when parsing float literals.
llvm-svn: 91974
2009-12-23 01:37:10 +00:00
Nuno Lopes baa1bc44af cleanup parsing of MS integer suffixes a little. this fixes PR5616
btw, I believe that isMicrosoftInteger can go away; it's not read anywhere

llvm-svn: 90036
2009-11-28 13:37:52 +00:00
Mike Stump c99c022841 This fixes support for complex literals, reworked to avoid a goto, and
to add a flag noting the presence of a Microsoft extension suffix (i8,
i16, i32, i64).  Patch by John Thompson.

llvm-svn: 83591
2009-10-08 22:55:36 +00:00
Mike Stump 11289f4280 Remove tabs, and whitespace cleanups.
llvm-svn: 81346
2009-09-09 15:08:12 +00:00
Erick Tryzelaar b90731117c Update lexer to work with the new APFloat string parsing.
llvm-svn: 79211
2009-08-16 23:36:28 +00:00
Daniel Dunbar a444cc2fa8 CharLiteralParser::IsMultiChar was sometimes uninitialized.
llvm-svn: 77420
2009-07-29 01:46:05 +00:00
Alisdair Meredith ed28f6e433 Fix the build
llvm-svn: 75627
2009-07-14 08:10:06 +00:00
Eli Friedman 28a00aa646 PR4353: Add support for \E as a character escape.
llvm-svn: 73153
2009-06-10 01:32:39 +00:00
Eli Friedman 9ffd4a9b96 Move CharIsSigned from TargetInfo to LangOptions.
llvm-svn: 72928
2009-06-05 07:05:05 +00:00
Eli Friedman d8cec57b9d PR4283: Don't truncate multibyte character constants in the
preprocessor.

llvm-svn: 72686
2009-06-01 05:25:02 +00:00
Chris Lattner 8577f62622 Implement -Wfour-char-constants, which is an extension, not an extwarn,
and apparently not part of -Wall

llvm-svn: 70329
2009-04-28 21:51:46 +00:00
Chris Lattner 74c95e20af implement -Wmultichar
llvm-svn: 70315
2009-04-28 18:52:02 +00:00
Eli Friedman 5d72d41189 Get rid of some useless uses of NoExtensions. The philosophy here is
that if we're going to print an extension warning anyway, 
there's no point to changing behavior based on NoExtensions: it will 
only make error recovery worse.

Note that this doesn't cause any behavior change because NoExtensions 
isn't used by the current front-end.  I'm still considering what to do about
the remaining use of NoExtensions in IdentifierTable.cpp.

llvm-svn: 70273
2009-04-28 00:51:18 +00:00
Sanjiv Gupta f09cb95236 Use an APInt of target int size to detect overflow while parsing multichars.
So 'abc' on i16 platforms will warn but not on i32 platforms.

llvm-svn: 69653
2009-04-21 02:21:29 +00:00
Chris Lattner 66037791b1 temporarily revert r69046
llvm-svn: 69054
2009-04-14 18:05:08 +00:00
Sanjiv Gupta 69650b099a Literal value calculation isn't likely to overflow on targets having int as 32 or less. Fixing the assert as it otherwise triggers for PIC16 which as i16 as int.
llvm-svn: 69046
2009-04-14 16:46:37 +00:00
Steve Naroff c94adda157 ProcessUCNEscape(): Incorportate some feedback from Chris.
llvm-svn: 68198
2009-04-01 11:09:15 +00:00
Eli Friedman 1c3fb22cad Fix pascal string support; testcase from mailing list message.
llvm-svn: 68181
2009-04-01 03:17:08 +00:00
Steve Naroff f2a880ca22 Incorporate feedback from Eli.
llvm-svn: 68107
2009-03-31 10:29:45 +00:00
Steve Naroff 7b753d21b5 Implement UCN support for C string literals (C99 6.4.3) and add some very basic tests. Chris Goller has graciously offered to write some test to help validate UCN support.
From a front-end perspective, I believe this code should work for ObjC @-strings. At the moment, I believe we need to tweak the code generation for @-strings (which doesn't appear to handle them). Will be investigating.

llvm-svn: 68076
2009-03-30 23:46:03 +00:00