Commit Graph

99 Commits

Author SHA1 Message Date
Richard Smith 812924502b When checking the encoding of an 8-bit string literal, don't just check the
first codepoint! Also, don't reject empty raw string literals for spurious
"encoding" issues. Also, don't rely on undefined behavior in ConvertUTF.c.

llvm-svn: 152344
2012-03-08 21:59:28 +00:00
Richard Smith 39570d0020 Add support for cooked forms of user-defined-integer-literal and
user-defined-floating-literal. Support for raw forms of these literals
to follow.

llvm-svn: 152302
2012-03-08 08:45:32 +00:00
Richard Smith 75b67d6dc5 User-defined literal support for character literals.
llvm-svn: 152277
2012-03-08 01:34:56 +00:00
Richard Smith e18f0faff2 Lexing support for user-defined literals. Currently these lex as the same token
kinds as the underlying string literals, and we silently drop the ud-suffix;
those issues will be fixed by subsequent patches.

llvm-svn: 152012
2012-03-05 04:02:15 +00:00
Eli Friedman 9436352a82 Implement warning for non-wide string literals with an unexpected encoding. Downgrade error for non-wide character literals with an unexpected encoding to a warning for compatibility with gcc and older versions of clang. <rdar://problem/10837678>.
llvm-svn: 150295
2012-02-11 05:08:10 +00:00
Aaron Ballman e1224a5067 Fixing hex floating literal support so that it handles 0x.2p2 properly.
llvm-svn: 150072
2012-02-08 13:36:33 +00:00
Aaron Ballman b97a5addd5 Hex literals without a significand no longer crash the lexer. Fixes bug 7910
Patch by Eitan Adler

llvm-svn: 149984
2012-02-07 13:46:03 +00:00
Dylan Noblesmith 2c1dd2716a Basic: import SmallString<> into clang namespace
(I was going to fix the TODO about DenseMap too, but
that would break self-host right now. See PR11922.)

llvm-svn: 149799
2012-02-05 02:13:05 +00:00
Seth Cantrell 9c2d6f0279 stop claiming unicode escape sequences are too long in strings, because they never are
llvm-svn: 148391
2012-01-18 12:27:08 +00:00
Seth Cantrell 8b2b677f39 Improves support for Unicode in character literals
Updates ProcessUCNExcape() for C++. C++11 allows UCNs in character
and string literals that represent control characters and basic
source characters. Also C++03 allows UCNs that refer to surrogate
codepoints.

UTF-8 sequences in character literals are now handled as single
c-chars.

Added error for multiple characters in Unicode character literals.

Added errors for when a the execution charset encoding of a c-char
cannot be represented as a single code unit in the associated
character type. Note that for the purposes of this error the asso-
ciated character type for a narrow character literal is char, not
int, even though in C narrow character literals have type int.

llvm-svn: 148389
2012-01-18 12:27:04 +00:00
Nico Weber d60b72f696 Fix a regression in wide character codegen. See PR11369.
llvm-svn: 144521
2011-11-14 05:17:37 +00:00
Eli Friedman 20554708fb Fix one last place where we weren't writing into a string literal consistently.
llvm-svn: 143769
2011-11-05 00:41:04 +00:00
Eli Friedman d1370791c2 Use native endianness for writing out character escapes to the result buffer for string literal parsing. No functional change on little-endian architectures; should fix test failures on PPC.
llvm-svn: 143585
2011-11-02 23:06:23 +00:00
Eli Friedman 703e7153af Perform proper conversion for strings encoded in the source file as UTF-8. (For now, we are assuming the source character set is always UTF-8; this can be easily extended if necessary.)
Tests will be coming up in a subsequent commit.

Patch by Seth Cantrell.

llvm-svn: 143416
2011-11-01 02:14:50 +00:00
Douglas Gregor 227c352bae We do parse hexfloats in C++11; make it actually work.
llvm-svn: 141798
2011-10-12 18:51:02 +00:00
Douglas Gregor 4d68366b2f When parsing a character literal, extract the characters from the
buffer as an 'unsigned char', so that integer promotion doesn't
sign-extend character values > 127 into oblivion. Fixes
<rdar://problem/10188919>.

llvm-svn: 140608
2011-09-27 17:00:18 +00:00
David Blaikie 9c902b5502 Rename Diagnostic to DiagnosticsEngine as per issue 5397
llvm-svn: 140478
2011-09-25 23:23:43 +00:00
David Blaikie 76bd3c80d4 Fix missing includes for llvm_unreachable
llvm-svn: 140368
2011-09-23 05:35:21 +00:00
David Blaikie 83d382b1ca Switch assert(0/false) llvm_unreachable.
llvm-svn: 140367
2011-09-23 05:06:16 +00:00
Francois Pichet 0706d203cf Rename LangOptions::Microsoft to LangOptions::MicrosoftExt to make it clear that this flag must be used only for Microsoft extensions and not emulation; to avoid confusion with the new LangOptions::MicrosoftMode flag.
Many of the code now under LangOptions::MicrosoftExt will eventually be moved under the LangOptions::MicrosoftMode flag.

llvm-svn: 139987
2011-09-17 17:15:52 +00:00
Douglas Gregor 86325ad2b5 Allow C99 hexfloats in C++0x mode. This change resolves the standards
collision between C99 hexfloats and C++0x user-defined literals by
giving C99 hexfloats precedence. Also, warning about user-defined
literals that conflict with hexfloats and those that have names that
are reserved by the implementation. Fixes <rdar://problem/9940194>.

llvm-svn: 138839
2011-08-30 22:40:35 +00:00
Craig Topper 6eb2058a6a Warn about and truncate UCNs that are too big for their character literal type.
llvm-svn: 138031
2011-08-19 03:20:12 +00:00
NAKAMURA Takumi 9f8a02d34e De-Unicode-ify.
llvm-svn: 137430
2011-08-12 05:49:51 +00:00
Craig Topper 5265bb211d Raw string followup. Pass a couple StringRefs by value.
llvm-svn: 137301
2011-08-11 05:10:55 +00:00
Craig Topper 54edccafc5 Add support for C++0x raw string literals.
llvm-svn: 137298
2011-08-11 04:06:15 +00:00
Craig Topper 61147ed270 Fix comment (test commit)
llvm-svn: 137039
2011-08-08 06:10:39 +00:00
Douglas Gregor fb65e592e0 Add support for C++0x unicode string and character literals, from Craig Topper!
llvm-svn: 136210
2011-07-27 05:40:30 +00:00
Chris Lattner 0e62c1cc0b remove unneeded llvm:: namespace qualifiers on some core types now that LLVM.h imports
them into the clang namespace.

llvm-svn: 135852
2011-07-23 10:55:15 +00:00
Argyrios Kyrtzidis 8b7252a8b3 Fix a nasty bug where inside StringLiteralParser:
1. We would assume that the length of the string literal token was at least 2
2. We would allocate a buffer with size length-2

And when the stars aligned (one of which would be an invalid source location due to stale PCH)
The length would be 0 and we would try to allocate a 4GB buffer.

Add checks for this corner case and a bunch of asserts.
(We really really should have had an assert for 1.).

Note that there's no test case since I couldn't get one (it was major PITA to reproduce),
maybe later.

llvm-svn: 131492
2011-05-17 22:09:56 +00:00
Chris Lattner 57540c5be0 fix a bunch of comment typos found by codespell. Patch by
Luis Felipe Strano Moraes!

llvm-svn: 129559
2011-04-15 05:22:18 +00:00
Francois Pichet 12df1dc8f2 Microsoft integer suffix changes:
i64 is like LL
i32 is like L

Also set isMicrosoftInteger to true only if the suffix is well formed.

llvm-svn: 123230
2011-01-11 11:57:53 +00:00
Ted Kremenek 8c4c74f4fb Fix diagnostic for reporting bad escape sequence.
Patch by Paul Curtis!

llvm-svn: 120759
2010-12-03 00:09:56 +00:00
Chris Lattner 39720111e0 move getSpelling from Preprocessor to Lexer, which it is more conceptually related to.
llvm-svn: 119479
2010-11-17 07:26:20 +00:00
Chris Lattner 6bab435db6 propagate preprocessor out of StringLiteralParser. It is now
possible to create one without a preprocessor.

llvm-svn: 119476
2010-11-17 07:21:13 +00:00
Chris Lattner 2be8aa9611 push the preprocessor out of EncodeUCNEscape
llvm-svn: 119475
2010-11-17 07:12:42 +00:00
Chris Lattner 2a6ee91619 move AdvanceToTokenCharacter and getLocForEndOfToken from
Preprocessor to Lexer where they make more sense.

llvm-svn: 119474
2010-11-17 07:05:50 +00:00
Chris Lattner b1ab2c2d3d add a static version of PP::AdvanceToTokenCharacter.
llvm-svn: 119472
2010-11-17 06:55:10 +00:00
Chris Lattner bde1b81eb8 push use of Preprocessor out farther.
llvm-svn: 119471
2010-11-17 06:46:14 +00:00
Chris Lattner 3a324d3232 push use of Preprocessor out of getOffsetOfStringByte
llvm-svn: 119470
2010-11-17 06:35:43 +00:00
Chris Lattner 30d4c928ac add a static form of the efficient PP::getSpelling method.
llvm-svn: 119469
2010-11-17 06:31:48 +00:00
Chris Lattner 7a02bfdfce refactor the interface to StringLiteralParser::getOffsetOfStringByte,
pushing the dependency on the preprocessor out a bit.

llvm-svn: 119468
2010-11-17 06:26:08 +00:00
Chris Lattner 26f6c227dc allow I128 suffixes in msextensions mode just like i128 suffixes, patch
by Martin Vejnar!

llvm-svn: 116460
2010-10-14 00:24:10 +00:00
Nico Weber a6bde81bc8 Add support for UCNs for character literals
llvm-svn: 116129
2010-10-09 00:27:47 +00:00
Nico Weber 9762e0a234 Add support for 4-byte UCNs like \U12345678. Warn about UCNs in c90 mode.
llvm-svn: 115743
2010-10-06 04:57:26 +00:00
Fariborz Jahanian 39de024e66 Prevent warning when built with assert off.
llvm-svn: 112680
2010-08-31 23:54:38 +00:00
Fariborz Jahanian abaae2b692 Some support for unicode string constants
in wide strings. radar 8360841.

llvm-svn: 112672
2010-08-31 23:34:27 +00:00
Alexis Hunt 3b7918625c Revert my user-defined literal commits - r1124{58,60,67} pending
some issues being sorted out.

llvm-svn: 112493
2010-08-30 17:47:05 +00:00
Alexis Hunt 79eb5469e0 Implement C++0x user-defined string literals.
The extra data stored on user-defined literal Tokens is stored in extra
allocated memory, which is managed by the PreprocessorLexer because there isn't
a better place to put it that makes sure it gets deallocated, but only after
it's used up. My testing has shown no significant slowdown as a result, but
independent testing would be appreciated.

llvm-svn: 112458
2010-08-29 21:26:48 +00:00
Benjamin Kramer e8394df11b Random temporary string cleanup.
llvm-svn: 110807
2010-08-11 14:47:12 +00:00
Douglas Gregor b37b46e488 Complain when string literals are too long for the active language
standard's minimum requirements.

llvm-svn: 108837
2010-07-20 14:33:20 +00:00