This is a missing piece for C99 conformance.
This patch handles UCNs by adding a '\\' case to LexTokenInternal and
LexIdentifier -- if we see a backslash, we tentatively try to read in a UCN.
If the UCN is not syntactically well-formed, we fall back to the old
treatment: a backslash followed by an identifier beginning with 'u' (or 'U').
Because the spelling of an identifier with UCNs still has the UCN in it, we
need to convert that to UTF-8 in Preprocessor::LookUpIdentifierInfo.
Of course, valid code that does *not* use UCNs will see only a very minimal
performance hit (checks after each identifier for non-ASCII characters,
checks when converting raw_identifiers to identifiers that they do not
contain UCNs, and checks when getting the spelling of an identifier that it
does not contain a UCN).
This patch also adds basic support for actual UTF-8 in the source. This is
treated almost exactly the same as UCNs except that we consider stray
Unicode characters to be mistakes and offer a fixit to remove them.
llvm-svn: 173369
Presumably, if the printf format has the sign explicitly requested, the user
wants to treat the data as signed.
This is a fix-up for r172739, and also includes several test changes that
didn't make it into that commit.
llvm-svn: 172762
Following r168626, in class declaration or definition, there are a combination of syntactic locations
where C++11 attributes could appear, and among those the only valid location permitted by standard is
between class-key and class-name. So for those attributes appear at wrong locations, fixit is used to
move them to expected location and we recover by applying them to the class specifier.
llvm-svn: 171757
to also remove a trailing space if possible.
For example, removing '__bridge' from:
i = (__bridge I*)p;
should result in:
i = (I*)p;
not:
i = ( I*)p;
rdar://11314821
llvm-svn: 170764
For most cases where a conversion specifier doesn't match an argument,
we usually guess that the conversion specifier is wrong. However, if
the argument is an integer type and the specifier is %C, it's likely
the user really did mean to print the integer as a character.
(This is more common than %c because there is no way to specify a unichar
literal -- you have to write an integer literal, such as '0x2603',
and then cast it to unichar.)
This does not change the behavior of %S, since there are fewer cases
where printing a literal Unicode *string* is necessary, but this could
easily be changed in the future.
<rdar://problem/11982013>
llvm-svn: 169400
The type of a character literal is 'int' in C, but if the user writes a
character /as/ a literal, we should assume they meant it to be a
character and not a numeric value, and thus offer %c as a correction
rather than %d.
There's a special case for multi-character literals (like 'MooV'), which
have implementation-defined value and usually cannot be printed with %c.
These still use %d as the suggestion.
In C++, the type of a character literal is 'char', and so this problem
doesn't exist.
<rdar://problem/12282316>
llvm-svn: 169398
We tried to account for 'uint8_t' by saying that /typedefs/ of 'char'
should be corrected as %hhd rather than %c, but the condition was wrong.
llvm-svn: 169397
When suggesting "foo::bar" as a correction for "fob::bar" we mistakenly
replaced only "bar" with "foo::bar" producing "fob::foo::bar" which was broken.
This corrects that replacement in as many places as I could find & provides
test cases for all those cases I could find a test case for. There are a couple
that don't seem to be reachable (one looks entirely dead, the other just
doesn't seem to ever get called with a namespace to namespace change).
Review by Richard Smith ( http://llvm-reviews.chandlerc.com/D57 ).
llvm-svn: 165817
This only applies if the type has a name. (we could potentially do something
crazy with decltype in C++11 to qualify members of unnamed types but that
seems excessive)
It might be nice to also suggest a fixit for "&this->i", "&foo->i",
and "&foo.i" but those expressions produce 'bound' member functions that have
a different AST representation & make error recovery a little trickier. Left
as future work.
llvm-svn: 165763
have PPCallbacks::InclusionDirective pass the character range for the filename quotes or brackets.
rdar://11113134 & http://llvm.org/PR13880
llvm-svn: 164743
start of a statement or the end of a compound-statement, diagnose the comma as
a typo for a semicolon. Patch by Ahmed Bougacha! Additional test cases and
minor refactoring by me.
llvm-svn: 164085
warning to an error. C++ bans it, and both GCC and EDG diagnose it as
an error. Microsoft allows it, so we still warn in Microsoft
mode. Fixes <rdar://problem/11135644>.
llvm-svn: 163831
These types are defined differently on 32-bit and 64-bit platforms, and
trying to offer a fixit for one platform would only mess up the format
string for the other. The Apple-recommended solution is to cast to a type
that is known to be large enough and always use that to print the value.
This should only have an impact on compile time if the format string is
incorrect; in cases where the format string matches the definition on the
current platform, no warning will be emitted.
<rdar://problem/9135072&12164284>
llvm-svn: 163266
accurate by asking the parser whether there was an ambiguity rather than trying
to reverse-engineer it from the DeclSpec. Make the with-parameters case have
better diagnostics by using semantic information to drive the warning,
improving the diagnostics and adding a fixit.
Patch by Nikola Smiljanic. Some minor changes by me to suppress diagnostics for
declarations of the form 'T (*x)(...)', which seem to have a very high false
positive rate, and to reduce indentation in 'warnAboutAmbiguousFunction'.
llvm-svn: 160998
This time, make sure we don't try to print fixits with newline characters,
since they don't have a valid column width, and they don't look good anyway.
PR13417 (and originally <rdar://problem/11877454>)
llvm-svn: 160561
Recovering as if the user had actually called -isEqual: is a bit too far from
the semantics of the program as written, /even though/ it's probably what they
intended.
llvm-svn: 160377
This code is very sensitive to the difference between "columns" as printed
and "bytes" (SourceManager columns). All variables are now named explicitly
and our assumptions are (hopefully) documented as both comment and assertion.
Whether parseable fixits should use byte offsets or Unicode character counts
is pending discussion on the mailing list; currently the implementation uses
bytes (and has no problems on lines containing multibyte characters).
This has been added to the user manual.
<rdar://problem/11877454>
llvm-svn: 160319
If a non-unicode locale is used, the unicode character is escaped and any
byte that is in the escaped representation but not the semicolon will
become whitespace.
llvm-svn: 160113
Chris pointed out that while the comparison is certainly problematic
and does not have well-defined behavior, it isn't any worse than some
of the other abuses that we merely warn about and doesn't need to make
the compilation fail.
Revert the release notes change (r159766) now that this is just a new warning.
llvm-svn: 159939
also deal with '>>>' (in CUDA), '>=', and '>>='. Fix the FixItHints logic to
deal with cases where the token is followed by an adjacent '=', '==', '>=',
'>>=', or '>>>' token, where a naive fix-it would result in a differing token
stream on a re-lex.
llvm-svn: 158652
Objective-C literals conceptually always create new objects, but may be
optimized by the compiler or runtime (constant folding, singletons, etc).
Comparing addresses of these objects is relying on this optimization
behavior, which is really an implementation detail.
In the case of == and !=, offer a fixit to a call to -isEqual:, if the
method is available. This fixit is directly on the error so that it is
automatically applied.
Most of the time, this is really a newbie mistake, hence the fixit.
llvm-svn: 158230