This is a missing piece for C99 conformance.
This patch handles UCNs by adding a '\\' case to LexTokenInternal and
LexIdentifier -- if we see a backslash, we tentatively try to read in a UCN.
If the UCN is not syntactically well-formed, we fall back to the old
treatment: a backslash followed by an identifier beginning with 'u' (or 'U').
Because the spelling of an identifier with UCNs still has the UCN in it, we
need to convert that to UTF-8 in Preprocessor::LookUpIdentifierInfo.
Of course, valid code that does *not* use UCNs will see only a very minimal
performance hit (checks after each identifier for non-ASCII characters,
checks when converting raw_identifiers to identifiers that they do not
contain UCNs, and checks when getting the spelling of an identifier that it
does not contain a UCN).
This patch also adds basic support for actual UTF-8 in the source. This is
treated almost exactly the same as UCNs except that we consider stray
Unicode characters to be mistakes and offer a fixit to remove them.
llvm-svn: 173369
This makes LexCharConstant() look more like LexStringLiteral(), which doesn't
have this bug. Add tests for eof after \ for several other cases.
llvm-svn: 168269
checks to enable. Remove frontend support for -fcatch-undefined-behavior,
-faddress-sanitizer and -fthread-sanitizer now that they don't do anything.
llvm-svn: 167413
This is accomplished by making VerifyDiagnosticsConsumer a CommentHandler,
which then only reads the -verify directives that are actually in live
blocks of code. It also makes it simpler to handle -verify directives that
appear in header files, though we still have to manually reparse some files
depending on how they are generated.
This requires some test changes. In particular, all PCH tests now have their
-verify directives outside the "header" portion of the file, using the @line
syntax added in r159978. Other tests have been modified mostly to make it
clear what is being tested, and to prevent polluting the expected output with
the directives themselves.
Patch by Andy Gibbs! (with slight modifications)
The new Frontend/verify-* tests exercise the functionality of this commit,
as well as r159978, r159979, and r160053 (Andy's other -verify enhancements).
llvm-svn: 160068
1. Teach Lexer that pragma lexers are like macro expansions at EOF.
2. Treat pragmas like #define/#undef when printing.
3. If we just printed a directive, add a newline before any more tokens.
(4. Miscellaneous cleanup in PrintPreprocessedOutput.cpp)
PR10594 and <rdar://problem/11562490> (two separate related problems)
llvm-svn: 158571
modes. For languages other than C99/C11, this isn't quite a conforming
extension, and for C++11, it breaks some reasonable code containing
user-defined literals.
In languages which don't officially have hexfloats, pare back this extension
to only apply in cases where the token starts 0x and does not contain an
underscore. The extension is still not quite conforming, but it's a lot closer
now.
llvm-svn: 158487
Passing -verify to clang without -cc1 or -Xclang silently passes (with a
printed warning, but lit doesn't care about that). This change adds -cc1 or,
as is necessary in one case, -Xclang to fix this so that these tests are
actually verifying as intended.
I'd like to change the driver so this kind of mistake could not be made, but
I'm not entirely sure how. Further, since the driver only warns about unknown
flags in general, we could have similar bugs with a misspellings of arguments
that would be nice to find.
llvm-svn: 154776
first codepoint! Also, don't reject empty raw string literals for spurious
"encoding" issues. Also, don't rely on undefined behavior in ConvertUTF.c.
llvm-svn: 152344
grammar requires a string-literal and not a user-defined-string-literal. The
two constructs are still represented by the same TokenKind, in order to prevent
a combinatorial explosion of different kinds of token. A flag on Token tracks
whether a ud-suffix is present, in order to prevent clients from needing to look
at the token's spelling.
llvm-svn: 152098
- Add atomic-to/from-nonatomic cast types
- Emit atomic operations for arithmetic on atomic types
- Emit non-atomic stores for initialisation of atomic types, but atomic stores and loads for every other store / load
- Add a __atomic_init() intrinsic which does a non-atomic store to an _Atomic() type. This is needed for the corresponding C11 stdatomic.h function.
- Enables the relevant __has_feature() checks. The feature isn't 100% complete yet, but it's done enough that we want people testing it.
Still to do:
- Make the arithmetic operations on atomic types (e.g. Atomic(int) foo = 1; foo++;) use the correct LLVM intrinsic if one exists, not a loop with a cmpxchg.
- Add a signal fence builtin
- Properly set the fenv state in atomic operations on floating point values
- Correctly handle things like _Atomic(_Complex double) which are too large for an atomic cmpxchg on some platforms (this requires working out what 'correctly' means in this context)
- Fix the many remaining corner cases
llvm-svn: 148242
any language variant), and restrict __has_feature(objc_modules) to
mean that we also have the Objective-C @import syntax. I anticipate
__has_feature(cxx_modules) and/or __has_feature(c_modules) for when we
nail down the module syntax for C/C++.
llvm-svn: 147548
diagnostic message are compared. If either is a substring of the other, then
no error is given. This gives rise to an unexpected case:
// expect-error{{candidate function has different number of parameters}}
will match the following error messages from Clang:
candidate function has different number of parameters (expected 1 but has 2)
candidate function has different number of parameters
It will also match these other error messages:
candidate function
function has different number of parameters
number of parameters
This patch will change so that the verification string must be a substring of
the diagnostic message before accepting. Also, all the failing tests from this
change have been corrected. Some stats from this cleanup:
87 - removed extra spaces around verification strings
70 - wording updates to diagnostics
40 - extra leading or trailing characters (typos, unmatched parens or quotes)
35 - diagnostic level was included (error:, warning:, or note:)
18 - flag name put in the warning (-Wprotocol)
llvm-svn: 146619
increasingly prevailing case to the point that new features
like ARC don't even support the fragile ABI anymore.
This required a little bit of reshuffling with exceptions
because a check was assuming that ObjCNonFragileABI was
only being set in ObjC mode, and that's actually a bit
obnoxious to do.
Most, though, it involved a perl script to translate a ton
of test cases.
Mostly no functionality change for driver users, although
there are corner cases with disabling language-specific
exceptions that we should handle more correctly now.
llvm-svn: 140957
buffer as an 'unsigned char', so that integer promotion doesn't
sign-extend character values > 127 into oblivion. Fixes
<rdar://problem/10188919>.
llvm-svn: 140608
structure to hold inferred information, then propagate each invididual
bit down to -cc1. Separate the bits of "supports weak" and "has a native
ARC runtime"; make the latter a CodeGenOption.
The tool chain is still driving this decision, because it's the place that
has the required deployment target information on Darwin, but at least it's
better-factored now.
llvm-svn: 134453
Language-design credit goes to a lot of people, but I particularly want
to single out Blaine Garst and Patrick Beard for their contributions.
Compiler implementation credit goes to Argyrios, Doug, Fariborz, and myself,
in no particular order.
llvm-svn: 133103
minor issues along the way:
- Non-type template parameters of type 'std::nullptr_t' were not
permitted.
- We didn't properly introduce built-in operators for nullptr ==,
!=, <, <=, >=, or > as candidate functions .
To my knowledge, there's only one (minor but annoying) part of nullptr
that hasn't been implemented: catching a thrown 'nullptr' as a pointer
or pointer-to-member, per C++0x [except.handle]p4.
llvm-svn: 131813
__has_extension is a function-like macro which takes the same set
of feature identifiers as __has_feature. It evaluates to 1 if the
feature is supported by Clang in the current language (either as a
language extension or a standard language feature) or 0 if not.
At the same time, add support for the C1X feature identifiers
c_generic_selections (renamed from generic_selections) and
c_static_assert, and document them.
Patch by myself and Jean-Daniel Dupas.
llvm-svn: 131308
Wait, what?
So, we run Clang (and LLVM) tests in an environment where the md5sum of the
input files becomes a component of the path. When testing the preprocessor,
the path becomes part of the output (in line directives). In this test, we
were grepping for the absence of "abc" in the output. When the stars aligned
properly, the md5sum component of the path contained "abc" and the test
failed. Oops.
llvm-svn: 131147
Find out that our C++0x status has only one field for noexcept expression and specification together, and that it was accidentally already marked as fully implemented.
This completes noexcept specification work.
llvm-svn: 127701
- Don't publicize a C++0x feature through __has_feature if we aren't
in C++0x mode (even if the feature is available only with a
warning).
- "auto" is not implemented well enough for its __has_feature to be
turned on.
- Fix the test of C++0x __has_feature to actually test what we're
trying to test. Searching for the substring "foo" when our options
are "foo" and "no_foo" doesn't work :)
llvm-svn: 124291
and turn on __has_feature(cxx_rvalue_references). The core rvalue
references proposal seems to be fully implemented now, pending lots
more testing.
llvm-svn: 124169
Turn on the __has_feature switch for variadic templates, document
their completion, and put the ExtWarn into the c++0x-extensions
warning group.
llvm-svn: 123854
own subcategory, -Wconstant-conversion, which is on by default.
Tweak the constant folder to give better results in the invalid
case of a negative shift amount.
Implements rdar://problem/6792488
llvm-svn: 118636
The failing was due to this:
1. preamble.c contains CR+LF new lines
2. write() is called with a buffer containing the original (CR+LF) to output the result on the console.
3. In text mode(the default), write() convert LF to CR+LF even if LF is preceded by CR, hence we have CR+CR+LF which filecheck interprets as 2 lines.
llvm-svn: 116513
spelled (#pragma, _Pragma, __pragma). In -E mode, use that information
to add appropriate newlines when translating _Pragma and __pragma into
#pragma, like GCC does. Fixes <rdar://problem/8412013>.
llvm-svn: 113553
which is the part of the file that contains all of the initial
comments, includes, and preprocessor directives that occur before any
of the actual code. Added a new -print-preamble cc1 action that is
only used for testing.
llvm-svn: 108913
'-fasm' and explicitly map from that flag to -fgnu-keywords in the driver. Turn
off the driver in the lexer test for this madness and add a test to the driver
that the translation actually works.
llvm-svn: 104428
matching G++'s behavior.
Warn when -pedantic or -Wc++-hex-floats is passed, and
don't warn if -pedantic -Wno-c++-hex-floats are both passed.
llvm-svn: 104295
make it miss (invalid) things like:
<<<<<<<
>>>>>>>
and crash if
<<<<<<<
was at the end of the line. When we find a >>>>>>> that is not at the
end of the line, make sure to reset Pos so we don't crash on something
like:
<<<<<<< >>>>>>>
This isn't worth making testcases for, since each would require a new file.
rdar://7987078 - signal 11 compiling "<<<<<<<<<<"
llvm-svn: 103968
about it instead of producing tons of garbage from the lexer.
It would be even better for sourcemgr to dynamically transcode (e.g.
from UTF16 -> UTF8).
llvm-svn: 101924
disabled with the intent that users can start with them now and not have to change
a thing to have them work when we implement the features.
llvm-svn: 93312
incompatible with user-defined literals, specifically with the following form:
0x1p+1
The preprocessing-number token extends only as far as the 'p'; the '+' is not
included. Previously we could get away with this extension as p was an invalid
suffix, but now with user-defined literals, 'p' might well be a valid suffix
and we are forced to consider it as such.
This patch also adds a warning in non-0x C++ modes telling the user that
this extension is incompatible with C++0x that is enabled by default
(previously and with other languages, we warn only with a compliance
option such as -pedantic).
llvm-svn: 93135
1. Don't make a copy of LangOptions every time a lexer is created.
2. Don't make CharInfo global mutable state.
3. Fix the implementation to properly treat ^Z as EOF instead of as
horizontal whitespace, which matches the semantic implemented by VC++.
llvm-svn: 91586
- This is designed to make it obvious that %clang_cc1 is a "test variable"
which is substituted. It is '%clang_cc1' instead of '%clang -cc1' because it
can be useful to redefine what gets run as 'clang -cc1' (for example, to set
a default target).
llvm-svn: 91446
declarators are parsed primarily within a single function (at least for
these cases). Remove some excess diagnostics arising during parse failures.
llvm-svn: 85924
which tries to do better error recovery when it is "obvious" that an
identifier is a mis-typed typename. In this case, we try to parse
it as a typename instead of as the identifier in a declarator, which
gives us several options for better error recovery and immediately
makes diagnostics more useful. For example, we now produce:
t.c:4:8: error: unknown type name 'foo_t'
static foo_t a = 4;
^
instead of:
t.c:4:14: error: invalid token after top level declarator
static foo_t a = 4;
^
Also, since we now parse "a" correctly, we make a decl for it,
preventing later uses of 'a' from emitting things like:
t.c:12:20: error: use of undeclared identifier 'a'
int bar() { return a + b; }
^
I'd really appreciate any scrutiny possible on this, it
is a tricky area.
llvm-svn: 68911
Eventually, would be nice to be able to run these modifications even
when we don't want the warning or errors for the actual diagnostic.
llvm-svn: 68272
Note that this isn't really a complete fix; I think there are other
potential overrun situations. I don't really know what the best
systematic fix is, though.
llvm-svn: 55622
level code in clang. This is a cleanup, but does implement "-o" for
-emit-llvm. One effect of this is that "clang foo.c -emit-llvm" will now
emit into foo.ll instead of stdout. Use "clang foo.c -emit-llvm -o -" or
"clang < foo.c -emit-llvm" to get the old behavior.
llvm-svn: 46791
using "-parse-ast -verify".
Updated all test cases (using a sed script) that invoked -parse-ast-check to
now use -parse-ast -verify.
Fixed a bug where using "-verify" instead of "-parse-ast-check" would not
correctly create the DiagClient needed to accumulate diagnostics.
llvm-svn: 42365
accurate diagnostics. For test/Lexer/comments.c we now emit:
int x = 000000080; /* expected-error {{invalid digit}} */
^
constants.c:7:4: error: invalid digit '8' in octal constant
00080; /* expected-error {{invalid digit}} */
^
The last line is due to an escaped newline. The full line looks like:
int y = 0000\
00080; /* expected-error {{invalid digit}} */
Previously, we emitted:
constants.c:4:9: error: invalid digit '8' in octal constant
int x = 000000080; /* expected-error {{invalid digit}} */
^
constants.c:6:9: error: invalid digit '8' in octal constant
int y = 0000\
^
which isn't too bad, but the new way is better for the user,
regardless of whether there is an escaped newline or not.
All the other lexer-related diagnostics should switch over
to using AdvanceToTokenCharacter where appropriate. Help
wanted :).
This implements test/Lexer/constants.c.
llvm-svn: 39906