Commit Graph

6 Commits

Author SHA1 Message Date
Scott Egerton a90ea38698 [Lexer] Allow UCN for dollar symbol '\u0024' in identifiers when using -fdollars-in-identifiers flag.
Summary:
Previously, the -fdollars-in-identifiers flag allows the '$' symbol to be used
in an identifier but the universal character name equivalent '\u0024' is not
allowed.
This patch changes this, so that \u0024 is valid in identifiers.

Reviewers: rsmith, jordan_rose

Reviewed By: rsmith

Subscribers: dexonsmith, simoncook, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D71758
2020-01-15 11:28:57 +00:00
Alp Toker b05e0b53b9 Preprocessor: support defined() with operator names for MS compatibility
Also flesh out missing tests, improve diagnostic QOI and fix a couple of corner
cases found in the process.

Fixes PR10606.

llvm-svn: 209276
2014-05-21 06:13:51 +00:00
Rafael Espindola 925213b0fa Add 'not' to commands that are expected to fail.
This is at least good documentation, but also opens the possibility of
using pipefail.

llvm-svn: 185652
2013-07-04 16:16:58 +00:00
Jordan Rose d4960d3eb1 Test fix-it ranges for Unicode characters.
Also, remove stray -fdiagnostics-parseable-fixits from ucn-pp-identifier.

llvm-svn: 173373
2013-01-24 21:19:51 +00:00
Jordan Rose 62db5066e9 Add a fixit for \U1234 -> \u1234.
llvm-svn: 173371
2013-01-24 20:50:52 +00:00
Jordan Rose 7f43dddae0 Handle universal character names and Unicode characters outside of literals.
This is a missing piece for C99 conformance.

This patch handles UCNs by adding a '\\' case to LexTokenInternal and
LexIdentifier -- if we see a backslash, we tentatively try to read in a UCN.
If the UCN is not syntactically well-formed, we fall back to the old
treatment: a backslash followed by an identifier beginning with 'u' (or 'U').

Because the spelling of an identifier with UCNs still has the UCN in it, we
need to convert that to UTF-8 in Preprocessor::LookUpIdentifierInfo.

Of course, valid code that does *not* use UCNs will see only a very minimal
performance hit (checks after each identifier for non-ASCII characters,
checks when converting raw_identifiers to identifiers that they do not
contain UCNs, and checks when getting the spelling of an identifier that it
does not contain a UCN).

This patch also adds basic support for actual UTF-8 in the source. This is
treated almost exactly the same as UCNs except that we consider stray
Unicode characters to be mistakes and offer a fixit to remove them.

llvm-svn: 173369
2013-01-24 20:50:46 +00:00