Commit Graph

12 Commits

Author SHA1 Message Date
Aaron Ballman 7de7161304 Use functions with prototypes when appropriate; NFC
A significant number of our tests in C accidentally use functions
without prototypes. This patch converts the function signatures to have
a prototype for the situations where the test is not specific to K&R C
declarations. e.g.,

  void func();

becomes

  void func(void);

This is the sixth batch of tests being updated (there are a significant
number of other tests left to be updated).
2022-02-09 17:16:10 -05:00
Corentin Jabot afb6223bc5 Support Unicode 14 identifiers
This update the UAX tables to support new Unicode 14 identifiers.
2021-09-16 13:21:27 -04:00
Aaron Ballman 9f27364377 Use a more general test here.
The interesting bit about that triple isn't the architecture, it's the
fact that ps4 implies C99 as the standard rather than a newer C mode.
Specify the language standard rather than the triple so the test is a
bit more general.
2021-08-18 09:32:05 -04:00
Corentin Jabot 2715c4da50 Do not emit diagnostics for invalid unicode characters in preprocessing mode
This amends 4e80636db7 with a fix for
https://lab.llvm.org/buildbot/#/builders/139/builds/8943
2021-08-18 09:12:36 -04:00
Corentin Jabot 4e80636db7 Implement P1949
This adds the Unicode 13 data for XID_Start and XID_Continue.
The definition of valid identifier is changed in all C++ modes
as P1949 (https://wg21.link/p1949) was accepted by WG21 as a defect
report.
2021-08-18 07:33:14 -04:00
Richard Smith 4e966e8135 Don't emit "will be treated as an identifier character" warning for
UTF-8 characters that aren't identifier characters in the current
language mode.

llvm-svn: 343040
2018-09-25 22:34:45 +00:00
Richard Smith 8ed7776bc4 PR38870: Add warning for zero-width unicode characters appearing in
identifiers.

llvm-svn: 341700
2018-09-07 19:25:39 +00:00
Richard Smith 77091b167f Warn if we find a Unicode homoglyph for a symbol in an identifier.
Specifically, warn if:
 * we find a character that the language standard says we must treat as an
   identifier, and
 * that character is not reasonably an identifier character (it's a punctuation
   character or similar), and 
 * it renders identically to a valid non-identifier character in common
   fixed-width fonts.

Some tools "helpfully" substitute the surprising characters for the expected
characters, and replacing semicolons with Greek question marks is a common
"prank".

llvm-svn: 320697
2017-12-14 13:15:08 +00:00
Richard Smith 664798c034 Add test that we correctly allow some non-letter unicode characters in
identifiers, and extend existing test to also cover C++.

llvm-svn: 248079
2015-09-19 02:14:12 +00:00
Jordan Rose cc538345be Lexer: Don't warn about Unicode in preprocessor directives.
This allows people to use Unicode in their #pragma mark and in macros
that exist only to be string-ized.

<rdar://problem/13107323&13121362>

llvm-svn: 174081
2013-01-31 19:48:48 +00:00
Jordan Rose 17441589c3 Don't warn about Unicode characters in -E mode.
People use the C preprocessor for things other than C files. Some of them
have Unicode characters. We shouldn't warn about Unicode characters
appearing outside of identifiers in this case.

There's not currently a way for the preprocessor to tell if it's in -E mode,
so I added a new flag, derived from the PreprocessorOutputOptions. This is
only used by the Unicode warnings for now, but could conceivably be used by
other warnings or even behavioral differences later.

<rdar://problem/13107323>

llvm-svn: 173881
2013-01-30 01:52:57 +00:00
Jordan Rose 4246ae0089 As an extension, treat Unicode whitespace characters as whitespace.
llvm-svn: 173370
2013-01-24 20:50:50 +00:00