tells us whether Preprocessor::HandleIdentifier needs to be called.
Because this method is only rarely needed, this saves a call and a
bunch of random checks. This drops the time in HandleIdentifier
from 3.52ms to .98ms on cocoa.h on my machine.
llvm-svn: 62675
Changes to IdentifierTable:
- High-level summary: StringMap never owns IdentifierInfos. It just
references them.
- The string map now has StringMapEntry<IdentifierInfo*> instead of
StringMapEntry<IdentifierInfo>. The IdentifierInfo object is
allocated using the same bump pointer allocator as used by the
StringMap.
Changes to IdentifierInfo:
- Added an extra pointer to point to the
StringMapEntry<IdentifierInfo*> in the string map. This pointer
will be null if the IdentifierInfo* is *only* used by the PTHLexer
(that is it isn't in the StringMap).
Algorithmic changes:
- Non-PTH case:
IdentifierInfo::get() will always consult the StringMap first to
see if we have an IdentifierInfo object. If that StringMapEntry
references a null pointer, we allocate a new one from the BumpPtrAllocator
and update the reference in the StringMapEntry.
- PTH case:
We do the same lookup as with the non-PTH case, but if we don't get
a hit in the StringMap we do a secondary lookup in the PTHManager for
the IdentifierInfo. If we don't find an IdentifierInfo we create a
new one as in the non-PTH case. If we do find and IdentifierInfo
in the PTHManager, we update the StringMapEntry to refer to it so
that the IdentifierInfo will be found on the next StringMap lookup.
This way we only do a binary search in the PTH file at most once
for a given IdentifierInfo. This greatly speeds things up for source
files containing a non-trivial amount of code.
Performance impact:
While these changes do add some extra indirection in
IdentifierTable to access an IdentifierInfo*, I saw speedups even
in the non-PTH case as well.
Non-PTH: For -fsyntax-only on Cocoa.h, we see a 6% speedup.
PTH (with Cocoa.h in token cache): 11% speedup.
I also did an experiment where we did -fsyntax-only on a source file
including a large header and Cocoa.h, but the token cache did not
contain the larger header. For this file, we were seeing a performance
*regression* when using PTH of 3% over non-PTH. Now we are seeing
a performance improvement of 9%!
Tests:
The serialization tests are now failing. I looked at this extensively,
and I my belief is that this change is unmasking a bug rather than
introducing a new one. I have disabled the serialization tests for now.
llvm-svn: 62636
the chunk ID not the file ID. This exposes problems in
TextDiagnosticPrinter where it should have been using the canonical
file ID but wasn't. Fix these along the way.
llvm-svn: 62427
"FileID" a concept that is now enforced by the compiler's type checker
instead of yet-another-random-unsigned floating around.
This is an important distinction from the "FileID" currently tracked by
SourceLocation. *That* FileID may refer to the start of a file or to a
chunk within it. The new FileID *only* refers to the file (and its
#include stack and eventually #line data), it cannot refer to a chunk.
FileID is a completely opaque datatype to all clients, only SourceManager
is allowed to poke and prod it.
llvm-svn: 62407
the "physical" location of tokens, refer to the "spelling" location.
This is more concrete and useful, tokens aren't really physical objects!
llvm-svn: 62309
- IdentifierInfo can now (optionally) have its string data not be
co-located with itself. This is for use with PTH. This aspect is a
little gross, as getName() and getLength() now make assumptions
about a possible alternate representation of IdentifierInfo.
Perhaps we should make IdentifierInfo have virtual methods?
IdentifierTable:
- Added class "IdentifierInfoLookup" that can be used by
IdentifierTable to perform "string -> IdentifierInfo" lookups using
an auxilliary data structure. This is used by PTH.
- Perform tests show that IdentifierTable::get() does not slow down
because of the extra check for the IdentiferInfoLookup object (the
regular StringMap lookup does enough work to mitigate the impact of
an extra null pointer check).
- The upshot is that now that some IdentifierInfo objects might be
owned by the IdentiferInfoLookup object. This should be reviewed.
PTH:
- Modified PTHManager::GetIdentifierInfo to *not* insert entries in
IdentifierTable's string map, and instead create IdentifierInfo
objects on the fly when mapping from persistent IDs to
IdentifierInfos. This saves a ton of work with string copies,
hashing, and StringMap lookup and resizing. This change was
motivated because when processing source files in the PTH cache we
don't need to do any string -> IdentifierInfo lookups.
- PTHManager now subclasses IdentifierInfoLookup, allowing clients of
IdentifierTable to transparently use IdentifierInfo objects managed
by the PTH file. PTHManager resolves "string -> IdentifierInfo"
queries by doing a binary search over a sorted table of identifier
strings in the PTH file (the exact algorithm we use can be changed
as needed).
These changes lead to the following performance changes when using PTH on Cocoa.h:
- fsyntax-only: 10% performance improvement
- Eonly: 30% performance improvement
llvm-svn: 62273
- Big Idea:
Source files are now mmaped when ContentCache::getBuffer() is first called.
While this doesn't change the functionality when lexing regular source files,
it can result in source files not being paged in when using PTH.
- Performance change:
- No observable difference (-fsyntax-only/-Eonly) on Cocoa.h when doing
regular source lexing.
- No observable time difference (-fsyntax-only/-Eonly) on Cocoa.h when using
PTH. We do observe, however, a reduction of 279K in memory mapped source
code (3% reduction). The majority of pages from Cocoa.h (and friends) are
still being pulled in, however, because any literal will cause
Preprocessor::getSpelling() to be called (causing the source for the file to
get pulled in). The next possible optimization is to cache literal strings
in the PTH file to avoid the need for the original header sources entirely.
- Right now there is a preprocessor directive to toggle between "lazy" and
"eager" creation of MemBuffers. This is not permanent, and is there in the
short term to just test additional optimizations.
llvm-svn: 61827
- 'Buffer' is now private and must be accessed via 'getBuffer()'.
This paves the way for lazily mapping in source files on demand.
- Added 'getSize()' (which gets the size of the content without
necessarily accessing the MemBuffer) and 'getSizeBytesMapped()'.
- Modifed SourceManager to use these new methods. This reduces the
number of places that actually access the MemBuffer object for a file
to those that actually look at the character data.
These changes result in no performance change for -fsyntax-only on Cocoa.h.
llvm-svn: 61782
specific targets default them to on. Default blocks to on on 10.6 and later.
Add a -fblocks option that allows the user to override the target's default.
Use -fblocks in the various testcases that use blocks.
llvm-svn: 60563
a new NamedDecl::getAsString() method.
Change uses of Selector::getName() to just pass in a Selector
where possible (e.g. to diagnostics) instead of going through
an std::string.
This also adds new formatters for objcinstance and objcclass
as described in the dox.
llvm-svn: 59933
with implicit quotes around them. This has a bunch of follow-on
effects and requires tweaking to a whole lot of code. This causes
a regression in two tests (xfailed) by causing it to emit things like:
Line 10: duplicate interface declaration for category 'MyClass1' ('Category1')
instead of:
Line 10: duplicate interface declaration for category 'MyClass1(Category1)'
I will fix this in a follow-up commit.
As part of this, I had to start switching stuff to use ->getDeclName() instead
of Decl::getName() for consistency. This is good, but I was planning to do this
as an independent patch. There will be several follow-on patches
to clean up some of the mess, but this patch is already too big.
llvm-svn: 59917
diags over to use this. QualTypes implicitly print single quotes around
them for uniformity and future extension.
Doing this requires a little function pointer dance to prevent libbasic
from depending on libast.
llvm-svn: 59907
one for building up the diagnostic that is in flight (DiagnosticBuilder)
and one for pulling structured information out of the diagnostic when
formatting and presenting it.
There is no functionality change with this patch.
llvm-svn: 59849
strings. This allows us to have considerable flexibility in how
these things are displayed and provides extra information that
allows us to merge away diagnostics that are very similar.
Diagnostic modifiers are a string of characters with the regex
[-a-z]+ that occur between the % and digit. They may
optionally have an argument that can parameterize them.
For now, I've added two example modifiers. One is a very useful
tool that allows you to factor commonality across diagnostics
that need single words or phrases combined. Basically you can
use %select{a|b|c}4 with with an integer argument that selects
either a/b/c based on an integer value in the range [0..3).
The second modifier is also an integer modifier, aimed to help
English diagnostics handle plurality. "%s3" prints to 's' if
integer argument #3 is not 1, otherwise it prints to nothing.
I'm fully aware that 's' is an English concept and doesn't
apply to all situations (mouse vs mice). However, this is very
useful and we can add other crazy modifiers once we add support
for polish! ;-)
I converted a couple C++ diagnostics over to use this as an
example, I'd appreciate it if others could merge the other
likely candiates. If you have other modifiers that you want,
lets talk on cfe-dev.
llvm-svn: 59803
const char*'s are now not converted to std::strings when the diagnostic
is formed, we just hold onto their pointer and format as needed.
This commit makes DiagnosticClient::FormatDiagnostic even more of a
mess, I'll fix it in the next commit.
llvm-svn: 59593
operator+, directly, using the same mechanism as all other special
names.
Removed the "special" identifiers for the overloaded operators from
the identifier table and IdentifierInfo data structure. IdentifierInfo
is back to representing only real identifiers.
Added a new Action, ActOnOperatorFunctionIdExpr, that builds an
expression from an parsed operator-function-id (e.g., "operator
+"). ActOnIdentifierExpr used to do this job, but
operator-function-ids are no longer represented by IdentifierInfo's.
Extended Declarator to store overloaded operator names.
Sema::GetNameForDeclarator now knows how to turn the operator
name into a DeclarationName for the overloaded operator.
Except for (perhaps) consolidating the functionality of
ActOnIdentifier, ActOnOperatorFunctionIdExpr, and
ActOnConversionFunctionExpr into a common routine that builds an
appropriate DeclRefExpr by looking up a DeclarationName, all of the
work on normalizing declaration names should be complete with this
commit.
llvm-svn: 59526
are formed. In particular, a diagnostic with all its strings and ranges is now
packaged up and sent to DiagnosticClients as a DiagnosticInfo instead of as a
ton of random stuff. This has the benefit of simplifying the interface, making
it more extensible, and allowing us to do more checking for things like access
past the end of the various arrays passed in.
In addition to introducing DiagnosticInfo, this also substantially changes how
Diagnostic::Report works. Instead of being passed in all of the info required
to issue a diagnostic, Report now takes only the required info (a location and
ID) and returns a fresh DiagnosticInfo *by value*. The caller is then free to
stuff strings and ranges into the DiagnosticInfo with the << operator. When
the dtor runs on the DiagnosticInfo object (which should happen at the end of
the statement), the diagnostic is actually emitted with all of the accumulated
information. This is a somewhat tricky dance, but it means that the
accumulated DiagnosticInfo is allowed to keep pointers to other expression
temporaries without those pointers getting invalidated.
This is just the minimal change to get this stuff working, but this will allow
us to eliminate the zillions of variant "Diag" methods scattered throughout
(e.g.) sema. For example, instead of calling:
Diag(BuiltinLoc, diag::err_overload_no_match, typeNames,
SourceRange(BuiltinLoc, RParenLoc));
We will soon be able to just do:
Diag(BuiltinLoc, diag::err_overload_no_match)
<< typeNames << SourceRange(BuiltinLoc, RParenLoc));
This scales better to support arbitrary types being passed in (not just
strings) in a type-safe way. Go operator overloading?!
llvm-svn: 59502
strings instead of array of strings. This reduces string copying
in some not-very-important cases, but paves the way for future
improvements.
llvm-svn: 59494
destructors, and conversion functions. The placeholders were used to
work around the fact that the parser and some of Sema really wanted
declarators to have simple identifiers; now, the code that deals with
declarators will use DeclarationNames.
llvm-svn: 59469
representing the names of declarations in the C family of
languages. DeclarationName is used in NamedDecl to store the name of
the declaration (naturally), and ObjCMethodDecl is now a NamedDecl.
llvm-svn: 59441
function call created in response to the use of operator syntax that
resolves to an overloaded operator in C++, e.g., "str1 +
str2" that resolves to std::operator+(str1, str2)". We now build a
CXXOperatorCallExpr in C++ when we pick an overloaded operator. (But
only for binary operators, where we actually implement overloading)
I decided *not* to refactor the current CallExpr to make it abstract
(with FunctionCallExpr and CXXOperatorCallExpr as derived
classes). Doing so would allow us to make CXXOperatorCallExpr a little
bit smaller, at the cost of making the argument and callee accessors
virtual. We won't know if this is going to be a win until we can parse
lots of C++ code to determine how much memory we'll save by making
this change vs. the performance penalty due to the extra virtual
calls.
llvm-svn: 59306
conversion functions. Instead, we just use a placeholder identifier
for these (e.g., "<constructor>") and override NamedDecl::getName() to
provide a human-readable name.
This is one potential solution to the problem; another solution would
be to replace the use of IdentifierInfo* in NamedDecl with a different
class that deals with identifiers better. I'm also prototyping that to
see how it compares, but this commit is better than what we had
previously.
llvm-svn: 59193