Summary:
Fix PR22407, where the Lexer overflows the buffer when parsing
#include<\
(end of file after slash)
Test Plan:
Added a test that will trigger in asan build.
This case is also covered by the clang-fuzzer bot.
Reviewers: rnk
Reviewed By: rnk
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D9489
llvm-svn: 236466
We would check if the terminator marker is on a newline. However, the
logic would end up out-of-bounds if the terminator marker immediately
follows the start marker.
This fixes PR21820.
llvm-svn: 224210
Changes diagnostic options, language standard options, diagnostic identifiers, diagnostic wording to use c++14 instead of c++1y. It also modifies related test cases to use the updated diagnostic wording.
llvm-svn: 215982
(dropping const from the reference as MemoryBuffer is immutable already,
so const is just redundant - and while I'd personally put const
everywhere, that's not the LLVM Way (see llvm::Type for another example
of an immutable type where "const" is omitted for brevity))
Changing the pointer argument to a reference parameter makes call sites
identical between callers with unique_ptrs or raw pointers, minimizing
the churn in a pending unique_ptr migrations.
llvm-svn: 215391
The compilation pipeline doesn't actually need to know about the high-level
concept of diagnostic mappings, and hiding the final computed level presents
several simplifications and other potential benefits.
The only exceptions are opportunistic checks to see whether expensive code
paths can be avoided for diagnostics that are guaranteed to be ignored at a
certain SourceLocation.
This commit formalizes that invariant by introducing and using
DiagnosticsEngine::isIgnored() in place of individual level checks throughout
lex, parse and sema.
llvm-svn: 211005
Extend the SSE2 comment lexing to AVX2. Only 16byte align when not on AVX2.
This provides some 3% speedup when preprocessing gcc.c as a single file.
The patch is wrong, it always uses SSE2, and when I fix that there's no speedup
at all. I am not sure where the 3% came from previously.
--Thi lie, and those below, will be ignored--
M Lex/Lexer.cpp
llvm-svn: 205548
There's been long-standing confusion over the role of these two options. This
commit makes the necessary changes to differentiate them clearly, following up
from r198936.
MicrosoftExt (aka. fms-extensions):
Enable largely unobjectionable Microsoft language extensions to ease
portability. This mode, also supported by gcc, is used for building software
like FreeBSD and Linux kernel extensions that share code with Windows drivers.
MSVCCompat (aka. -fms-compatibility, formerly MicrosoftMode):
Turn on a special mode supporting 'heinous' extensions for drop-in
compatibility with the Microsoft Visual C++ product. Standards-compilant C and
C++ code isn't guaranteed to work in this mode. Implies MicrosoftExt.
Note that full -fms-compatibility mode is currently enabled by default on the
Windows target, which may need tuning to serve as a reasonable default.
See cfe-commits for the full discourse, thread 'r198497 - Move MS predefined
type_info out of InitializePredefinedMacros'
No change in behaviour.
llvm-svn: 199209
encodes the canonical rules for LLVM's style. I noticed this had drifted
quite a bit when cleaning up LLVM, so wanted to clean up Clang as well.
llvm-svn: 198686
The warning for backslash and newline separated by whitespace was missed in
this code path.
backslash<whitespace><newline> is handled differently from compiler to compiler
so it's important to warn consistently where there's ambiguity.
Matches similar handling of block comments and non-comment lines.
llvm-svn: 197331
The C and C++ standards disallow using universal character names to
refer to some characters, such as basic ascii and control characters,
so we reject these sequences in the lexer. However, when the
preprocessor isn't being used on C or C++, it doesn't make sense to
apply these restrictions.
Notably, accepting these characters avoids issues with unicode escapes
when GHC uses the compiler as a preprocessor on haskell sources.
Fixes rdar://problem/14742289
llvm-svn: 193067
literal operators. Also, for now, allow the proposed C++1y "il", "i", and "if"
suffixes too. (Will revert the latter if LWG decides not to go ahead with that
change after all.)
llvm-svn: 191274
Before this patch, Lex() would recurse whenever the current lexer changed (e.g.
upon entry into a macro). This patch turns the recursion into a loop: the
various lex routines now don't return a token when the current lexer changes,
and at the top level Preprocessor::Lex() now loops until it finds a token.
Normally, the recursion wouldn't end up being very deep, but the recursion depth
can explode in edge cases like a bunch of consecutive macros which expand to
nothing (like in the testcase test/Preprocessor/macro_expand_empty.c in this
patch).
<rdar://problem/14569770>
llvm-svn: 190980
Apparently, gcc's -traditional-cpp behaves slightly differently in C++ mode;
specifically, it discards "//" comments. Match gcc's behavior.
<rdar://problem/14808126>
llvm-svn: 189515
If the user has requested this warning, we should emit it, even if it's not
an extension in the current language mode. However, being an extension is
more important, so prefer the pedantic warning or the pedantic-compatibility
warning if those are enabled.
<rdar://problem/12922063>
llvm-svn: 189110
It's beneficial when compiling to treat // as the start of a line
comment even in -std=c89 mode, since it's not valid C code (with a few
rare exceptions) and is usually intended as such. We emit a pedantic
warning and then continue on as if line comments were enabled.
This has been our behavior for quite some time.
However, people use the preprocessor for things besides C source files.
In today's prompting example, the input contains (unquoted) URLs, which
contain // but should still be preserved.
This change instructs the lexer to treat // as a plain token if Clang is
in C90 mode and generating preprocessed output rather than actually compiling.
<rdar://problem/13338743>
llvm-svn: 176526
Add warnings under -Wc++11-compat, -Wc++98-compat, and -Wc99-compat when a
particular UCN is incompatible with a different standard, and -Wunicode when
a UCN refers to a surrogate character in C++03.
llvm-svn: 174788
Rewriting the same predicates over and over again is bad for code size and
code maintainence. Using the functions in <ctype.h> is generally unsafe
unless they are specified to be locale-independent (i.e. only isdigit and
isxdigit).
The next commit will try to clean up uses of <ctype.h> functions within Clang.
llvm-svn: 174765
This allows people to use Unicode in their #pragma mark and in macros
that exist only to be string-ized.
<rdar://problem/13107323&13121362>
llvm-svn: 174081
People use the C preprocessor for things other than C files. Some of them
have Unicode characters. We shouldn't warn about Unicode characters
appearing outside of identifiers in this case.
There's not currently a way for the preprocessor to tell if it's in -E mode,
so I added a new flag, derived from the PreprocessorOutputOptions. This is
only used by the Unicode warnings for now, but could conceivably be used by
other warnings or even behavioral differences later.
<rdar://problem/13107323>
llvm-svn: 173881