In instrumentation-based profiling, we need a set of data structures to
represent the counters. Previously, these were built up during static
initialization. Now, they're shoved into a specially-named section so
that they show up as an array.
As a consequence of the reorganizing symbols, instrumentation data
structures for linkonce functions are now correctly coalesced.
This is the first step in a larger project to minimize runtime overhead
and dependencies in instrumentation-based profilng. The larger picture
includes removing all initialization overhead and making the dependency
on libc optional.
<rdar://problem/15943240>
llvm-svn: 204080
r203364: what was use_iterator is now user_iterator, and there is
a use_iterator for directly iterating over the uses.
This also switches to use the range-based APIs where appropriate.
llvm-svn: 203365
In addition, for all functions, use the name from the llvm::Function to
identify the function in the profile data. Compute that "function name",
including the file name for local functions, once when assigning the PGO
counters and store it in the CodeGenPGO class.
Move the code to add InlineHint and Cold attributes out of StartFunction(),
because the "function name" string isn't available at that point.
llvm-svn: 203075
Move the PGO.assignRegionCounters() call out of StartFunction, because that
function is called from many places where it does not make sense to do PGO
instrumentation (e.g., compiler-generated helper functions). Change several
functions to take a StringRef argument for the unique name associated with
a function, so that the name can be set differently for things like Objective-C
methods and block literals.
llvm-svn: 203073
The __forceinline keyword's semantics are now recast as AlwaysInline and
the kw___forceinline token has its language mode set for KEYMS.
This preserves the semantics of the previous implementation but with
less duplication of code.
llvm-svn: 202131
Previously, we made one traversal of the AST prior to codegen to assign
counters to the ASTs and then propagated the count values during codegen. This
patch now adds a separate AST traversal prior to codegen for the
-fprofile-instr-use option to propagate the count values. The counts are then
saved in a map from which they can be retrieved during codegen.
This new approach has several advantages:
1. It gets rid of a lot of extra PGO-related code that had previously been
added to codegen.
2. It fixes a serious bug. My original implementation (which was mailed to the
list but never committed) used 3 counters for every loop. Justin improved it to
move 2 of those counters into the less-frequently executed breaks and continues,
but that turned out to produce wrong count values in some cases. The solution
requires visiting a loop body before the condition so that the count for the
condition properly includes the break and continue counts. Changing codegen to
visit a loop body first would be a fairly invasive change, but with a separate
AST traversal, it is easy to control the order of traversal. I've added a
testcase (provided by Justin) to make sure this works correctly.
3. It improves the instrumentation overhead, reducing the number of counters for
a loop from 3 to 1. We no longer need dedicated counters for breaks and
continues, since we can just use the propagated count values when visiting
breaks and continues.
To make this work, I needed to make a change to the way we count case
statements, going back to my original approach of not including the fall-through
in the counter values. This was necessary because there isn't always an AST node
that can be used to record the fall-through count. Now case statements are
handled the same as default statements, with the fall-through paths branching
over the counter increments. While I was at it, I also went back to using this
approach for do-loops -- omitting the fall-through count into the loop body
simplifies some of the calculations and make them behave the same as other
loops. Whenever we start using this instrumentation for coverage, we'll need
to add the fall-through counts into the counter values.
llvm-svn: 201528
We collect a maximal function count among all functions in the pgo data file.
For functions that are hot, we set its InlineHint attribute. For functions that
are cold, we set its Cold attribute.
We currently treat functions with >= 30% of the maximal function count as hot
and functions with <= 1% of the maximal function count are treated as cold.
These two numbers are from preliminary tuning on SPEC.
This commit should not affect non-PGO builds and should boost performance on
instrumentation based PGO.
llvm-svn: 200874
When a non-trivial parameter is present, clang now gathers up all the
parameters that lack inreg and puts them into a packed struct. MSVC
always aligns each parameter to 4 bytes and no more, so this is a pretty
simple struct to lay out.
On win64, non-trivial records are passed indirectly. Prior to this
change, clang was incorrectly using byval on win64.
I'm able to self-host a working clang with this change and additional
LLVM patches.
Reviewers: rsmith
Differential Revision: http://llvm-reviews.chandlerc.com/D2636
llvm-svn: 200597
A return type is the declared or deduced part of the function type specified in
the declaration.
A result type is the (potentially adjusted) type of the value of an expression
that calls the function.
Rule of thumb:
* Declarations have return types and parameters.
* Expressions have result types and arguments.
llvm-svn: 200082
marked as AlwaysInline or ForceInline.
This moves us to what gcc does with -fno-inline. The attribute approach
was discussed to be better than switching to InlineAlways inliner in presence
of LTO.
llvm-svn: 199324
adjustFallThroughCount isn't a good name, and the documentation was
even worse. This commit attempts to clarify what it's for and when to
use it.
llvm-svn: 199139
from the global address space (6.5.1 of the OpenCL 1.2 specification).
This makes clang construct the image arguments in the global address
space and generate the argument metadata with the correct address space
descriptor.
Patch by Pedro Ferreira!
llvm-svn: 198868
C and C++ don't emit an extra lexical scope for the compound statement
that is the body of an Objective-C method.
rdar://problem/15010825
llvm-svn: 198699
Unlike Itanium's VTTs, the 'most derived' boolean or bitfield is the
last parameter for non-variadic constructors, rather than the second.
For variadic constructors, the 'most derived' parameter comes after the
'this' parameter. This affects constructor calls and constructor decls
in a variety of places.
Reviewers: timurrrr
Differential Revision: http://llvm-reviews.chandlerc.com/D2405
llvm-svn: 197518
Summary:
In general, this type node can be used to represent any type adjustment
that occurs implicitly without losing type sugar. The immediate use of
this is to adjust the calling conventions of member function pointer
types without breaking template instantiation.
Fixes PR17996.
Reviewers: rsmith
Differential Revision: http://llvm-reviews.chandlerc.com/D2332
llvm-svn: 196451
deallocation function (and the corresponding unsized deallocation function has
been declared), emit a weak discardable definition of the function that
forwards to the corresponding unsized deallocation.
This allows a C++ standard library implementation to provide both a sized and
an unsized deallocation function, where the unsized one does not just call the
sized one, for instance by putting both in the same object file within an
archive.
llvm-svn: 194055
CodeGenABITypes is a wrapper built on top of CodeGenModule that exposes
some of the functionality of CodeGenTypes (held by CodeGenModule),
specifically methods that determine the LLVM types appropriate for
function argument and return values.
I addition to CodeGenABITypes.h, CGFunctionInfo.h is introduced, and the
definitions of ABIArgInfo, RequiredArgs, and CGFunctionInfo are moved
into this new header from the private headers ABIInfo.h and CGCall.h.
Exposing this functionality is one part of making it possible for LLDB
to determine the actual ABI locations of function arguments and return
values, making it possible for it to determine this for any supported
target without hard-coding ABI knowledge in the LLDB code.
llvm-svn: 193717
This uses function prefix data to store function type information at the
function pointer.
Differential Revision: http://llvm-reviews.chandlerc.com/D1338
llvm-svn: 193058
Specifically, the following features are not included in this commit:
- any sort of capturing within generic lambdas
- generic lambdas within template functions and nested
within other generic lambdas
- conversion operator for captureless lambdas
- ensuring all visitors are generic lambda aware
(Although I have gotten some useful feedback on my patches of the above and will be incorporating that as I submit those patches for commit)
As an example of what compiles through this commit:
template <class F1, class F2>
struct overload : F1, F2 {
using F1::operator();
using F2::operator();
overload(F1 f1, F2 f2) : F1(f1), F2(f2) { }
};
auto Recursive = [](auto Self, auto h, auto ... rest) {
return 1 + Self(Self, rest...);
};
auto Base = [](auto Self, auto h) {
return 1;
};
overload<decltype(Base), decltype(Recursive)> O(Base, Recursive);
int num_params = O(O, 5, 3, "abc", 3.14, 'a');
Please see attached tests for more examples.
This patch has been reviewed by Doug and Richard. Minor changes (non-functionality affecting) have been made since both of them formally looked at it, but the changes involve removal of supernumerary return type deduction changes (since they are now redundant, with richard having committed a recent patch to address return type deduction for C++11 lambdas using C++14 semantics).
Some implementation notes:
- Add a new Declarator context => LambdaExprParameterContext to
clang::Declarator to allow the use of 'auto' in declaring generic
lambda parameters
- Add various helpers to CXXRecordDecl to facilitate identifying
and querying a closure class
- LambdaScopeInfo (which maintains the current lambda's Sema state)
was augmented to house the current depth of the template being
parsed (id est the Parser calls Sema::RecordParsingTemplateParameterDepth)
so that SemaType.cpp::ConvertDeclSpecToType may use it to immediately
generate a template-parameter-type when 'auto' is parsed in a generic
lambda parameter context. (i.e we do NOT use AutoType deduced to
a template parameter type - Richard seemed ok with this approach).
We encode that this template type was generated from an auto by simply
adding $auto to the name which can be used for better diagnostics if needed.
- SemaLambda.h was added to hold some common lambda utility
functions (this file is likely to grow ...)
- Teach Sema::ActOnStartOfFunctionDef to check whether it
is being called to instantiate a generic lambda's call
operator, and if so, push an appropriately prepared
LambdaScopeInfo object on the stack.
- various tests were added - but much more will be needed.
There is obviously more work to be done, and both Richard (weakly) and Doug (strongly)
have requested that LambdaExpr be removed form the CXXRecordDecl LambdaDefinitionaData
in a future patch which is forthcoming.
A greatful thanks to all reviewers including Eli Friedman, James Dennett,
and especially the two gracious wizards (Richard Smith and Doug Gregor)
who spent hours providing feedback (in person in Chicago and on the mailing lists).
And yet I am certain that I have allowed unidentified bugs to creep in; bugs, that I will do my best to slay, once identified!
Thanks!
llvm-svn: 191453
CodeGenFunction is run on only one function - a new object is made for
each new function. I would add an assertion/flag to this effect, but
there's an exception: ObjC properties involve emitting helper functions
that are all emitted by the same CodeGenFunction object, so such a check
is not possible/correct.
llvm-svn: 189277
Specifically, the following features are not included in this commit:
- any sort of capturing within generic lambdas
- nested lambdas
- conversion operator for captureless lambdas
- ensuring all visitors are generic lambda aware
As an example of what compiles:
template <class F1, class F2>
struct overload : F1, F2 {
using F1::operator();
using F2::operator();
overload(F1 f1, F2 f2) : F1(f1), F2(f2) { }
};
auto Recursive = [](auto Self, auto h, auto ... rest) {
return 1 + Self(Self, rest...);
};
auto Base = [](auto Self, auto h) {
return 1;
};
overload<decltype(Base), decltype(Recursive)> O(Base, Recursive);
int num_params = O(O, 5, 3, "abc", 3.14, 'a');
Please see attached tests for more examples.
Some implementation notes:
- Add a new Declarator context => LambdaExprParameterContext to
clang::Declarator to allow the use of 'auto' in declaring generic
lambda parameters
- Augment AutoType's constructor (similar to how variadic
template-type-parameters ala TemplateTypeParmDecl are implemented) to
accept an IsParameterPack to encode a generic lambda parameter pack.
- Add various helpers to CXXRecordDecl to facilitate identifying
and querying a closure class
- LambdaScopeInfo (which maintains the current lambda's Sema state)
was augmented to house the current depth of the template being
parsed (id est the Parser calls Sema::RecordParsingTemplateParameterDepth)
so that Sema::ActOnLambdaAutoParameter may use it to create the
appropriate list of corresponding TemplateTypeParmDecl for each
auto parameter identified within the generic lambda (also stored
within the current LambdaScopeInfo). Additionally,
a TemplateParameterList data-member was added to hold the invented
TemplateParameterList AST node which will be much more useful
once we teach TreeTransform how to transform generic lambdas.
- SemaLambda.h was added to hold some common lambda utility
functions (this file is likely to grow ...)
- Teach Sema::ActOnStartOfFunctionDef to check whether it
is being called to instantiate a generic lambda's call
operator, and if so, push an appropriately prepared
LambdaScopeInfo object on the stack.
- Teach Sema::ActOnStartOfLambdaDefinition to set the
return type of a lambda without a trailing return type
to 'auto' in C++1y mode, and teach the return type
deduction machinery in SemaStmt.cpp to process either
C++11 and C++14 lambda's correctly depending on the flag.
- various tests were added - but much more will be needed.
A greatful thanks to all reviewers including Eli Friedman,
James Dennett and the ever illuminating Richard Smith. And
yet I am certain that I have allowed unidentified bugs to creep in;
bugs, that I will do my best to slay, once identified!
Thanks!
llvm-svn: 188977
Refactor the underlying code a bit to remove unnecessary calls to
"hasErrorOccurred" & make them consistently at all the entry points to
the IRGen ASTConsumer.
llvm-svn: 188707
only affect functions without a separate return block. This fixes the
linetable for void functions with cleanups and multiple returns.
llvm-svn: 187090
Test coverage for non-dependent pack expansions doesn't demonstrate a
failure prior to this patch (a follow-up commit improving debug info
will cover this commit specifically) but covers a related hole in our
test coverage.
Reviewed by Richard Smith & Eli Friedman.
llvm-svn: 186261
This allows clang to use the backend parameter attribute 'returned' when generating 'this'-returning constructors and destructors in ARM and MSVC C++ ABIs.
llvm-svn: 185291
The goal of this sugar node is to be able to look at an arbitrary
FunctionType and tell if any of the parameters were decayed from an
array or function type. Ultimately this is necessary to implement
Microsoft's C++ name mangling scheme, which mangles decayed arrays
differently from normal pointers.
Reviewers: rsmith
Differential Revision: http://llvm-reviews.chandlerc.com/D1014
llvm-svn: 184763
The backend will now use the generic 'returned' attribute to form tail calls where possible, as well as avoid save-restores of 'this' in some cases (specifically the cases that matter for the ARM C++ ABI).
This patch also reverts a prior front-end only partial implementation of these optimizations, since it's no longer required.
llvm-svn: 184205
were lacking ExprWithCleanups nodes in some cases where the new approach to
lifetime extension needed them).
Original commit message:
Rework IR emission for lifetime-extended temporaries. Instead of trying to walk
into the expression and dig out a single lifetime-extended entity and manually
pull its cleanup outside the expression, instead keep a list of the cleanups
which we'll need to emit when we get to the end of the full-expression. Also
emit those cleanups early, as EH-only cleanups, to cover the case that the
full-expression does not terminate normally. This allows IR generation to
properly model temporary lifetime when multiple temporaries are extended by the
same declaration.
We have a pre-existing bug where an exception thrown from a temporary's
destructor does not clean up lifetime-extended temporaries created in the same
expression and extended to automatic storage duration; that is not fixed by
this patch.
llvm-svn: 183859