Sema does have a CUDALaunchBoundsAttr, but CodeGen was doing nothing with it.
This change translates CUDALaunchBoundsAttr to maxntidx and minctasm
metadata, which NVPTX then translates to the correct PTX directives.
Patch by Manjunath Kudlur.
llvm-svn: 206302
This implements clause C.8 of the AAPCS in the front-end, so that Clang
accurately knows when the registers run out and it has to insert padding before
the stack objects begin.
PR19432.
llvm-svn: 206296
CapturedStmt was being ignored by instrumentation based profiling, and
its counters attributed to the containing function. Instead, we need
to treat this as a top level entity, like we do with blocks.
llvm-svn: 206231
for CXXGlobalInit/Dtor helper functions.
This makes _GLOBAL__I_a regain its DW_AT_high/low_pc in the debug info.
Thanks to echristo for catching this!
llvm-svn: 206088
Until now we were generating duplicate counters for lambdas: one set
in the function where the lambda was declared and another for the
lambda itself. Instead, we should skip over the bodies of lambdas in
their containing contexts.
llvm-svn: 206081
Fixes a bug where unsigned numbers are read using strtol and strtoll.
I don't have a testcase because this bug is effectively unobservable
right now. To expose the problem in the hash, we would need a function
with greater than INT64_MAX counters, which we don't handle anyway. To
expose the problem in the function count, we'd need a function with
greater than INT32_MAX counters; this is theoretically observable, but
it isn't a practical testcase to check in.
An upcoming commit changes the hash to be non-trivial, so we'll get some
coverage eventually.
<rdar://problem/16435801>
llvm-svn: 206001
are not associated with any source lines.
Previously, if the Location of a Decl was empty, EmitFunctionStart would
just keep using CurLoc, which would sometimes be correct (e.g., thunks)
but in other cases would just point to a hilariously random location.
This patch fixes this by completely eliminating all uses of CurLoc from
EmitFunctionStart and rather have clients explicitly pass in a
SourceLocation for the function header and the function body.
rdar://problem/14985269
llvm-svn: 205999
Emitting the PGO initialization in EmitGlobalFunctionDefinition is
inefficient, since this only has an effect once per module. We move
this to Release() with the rest of the once-per-module logic.
llvm-svn: 205977
sure that a debugger can find them when stepping through code,
for example from the included testcase:
12 int test_it() {
13 c = 1;
14 d = 2;
-> 15 a = 4;
16 return (c == 1);
17 }
18
(lldb) p a
(int) $0 = 2
(lldb) p c
(int) $1 = 2
(lldb) p d
(int) $2 = 2
and a, c, d are all part of the file static anonymous union:
static union {
int c;
int d;
union {
int a;
};
struct {
int b;
};
};
Fixes PR19221.
llvm-svn: 205952
If we crash, we raise a crash handler dialog, and that's really
annoying. Even though we can't emit correct IR until we have musttail,
don't crash.
llvm-svn: 205948
It is very similar to GCC's __PRETTY_FUNCTION__, except it prints the
calling convention.
Reviewers: majnemer
Differential Revision: http://reviews.llvm.org/D3311
llvm-svn: 205780
We already got the type alias correct (though I've included a test case
here) since Clang represents that like any other typedef - but type
alias templates weren't being handled.
llvm-svn: 205691
While the folding set would deduplicate the nodes themselves and LLVM
would handle not emitting the same global twice, it still meant creating
a long/redundant list of global variables.
llvm-svn: 205668
See the comment for CodeGenFunction::tryEmitAsConstant that describes
how in some contexts (lambdas) we must not emit references to the
variable, but instead use the constant directly - because of this we end
up emitting a constant for the variable, as well as emitting the
variable itself.
Should we just skip putting the variable on the stack at all and omit
the debug info for the constant? It's not clear to me - what if the
address of the local is taken?
llvm-svn: 205651
If all of our weights are zero when calculating branch weights, it
means we haven't profiled the code in question. Avoid creating a
metadata node that says all branches are equally likely in this case.
The test also checks constructs that hit the other createBranchWeights
overload. These were already working.
llvm-svn: 205606
Summary:
MSVC always emits inline functions marked with the extern storage class
specifier. The result is something similar to the opposite of
__attribute__((gnu_inline)).
This extension is also available in C.
This fixes PR19264.
Reviewers: rnk, rsmith
CC: cfe-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D3207
llvm-svn: 205485
This was committed 4 years ago in 108916 with insufficient testing to
explain why the "getTypeAsWritten" case was appropriate. Experience says
that it isn't - the presence or absence of an explicit instantiation
declaration was causing this code to generate either i<int> or i<int,
int>.
That didn't seem to be a useful distinction, and omitting the template
arguments was destructive to debuggers being able to associate the two
types across translation units or across compilers (GCC, reasonably,
never omitted the arguments).
llvm-svn: 205447
We don't want to encourage the code to emit a lexical block for
a function that needs one in order for the line table to change,
we need to grab the line information from the body of the pattern
that we were instantiated from, this code should do that.
Modify the test case to ensure that we're still looking in the
right place for all of the scopes and also that we haven't
created a lexical block where we didn't need one.
llvm-svn: 205368
Clang implements the part of the ARM ABI saying that certain functions
(e.g., constructors and destructors) return "this", but Apple's version of
gcc and llvm-gcc did not. The libstdc++ dylib on iOS 5 was built with
llvm-gcc, which means that clang cannot safely assume that code from the C++
runtime will correctly follow the ABI. It is also possible to run into this
problem when linking with other libraries built with gcc or llvm-gcc. Even
though there is no way to reliably detect that situation, it is most likely
to come up when targeting older versions of iOS. Disabling the optimization
for any code targeting iOS 5 solves the libstdc++ problem and has a reasonably
good chance of fixing the issue for other older libraries as well.
<rdar://problem/16377159>
llvm-svn: 205272
Summary:
The definition of a type later in a translation unit may change it's
type from {}* to (%struct.foo*)*. Earlier function definitions may use
the former while more recent definitions might use the later. This is
fine until they interact with one another (like one calling the other).
In these cases, a bitcast is needed because the inalloca must match the
function call but the store to the lvalue which initializes the argument
slot has to match the rvalue's type.
This technique is along the same lines with what the other,
non-inalloca, codepaths perform.
This fixes PR19287.
Reviewers: rnk
CC: cfe-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D3224
llvm-svn: 205217
This adds support for the various NEON intrinsics used by
aarch64-neon-intrinsics.c (originally written for AArch64) and enables the
test.
My implementations are designed to be semantically correct, the actual code
quality looks like its a wash between the two backends, and is frequently
different (hence the large number of CHECK changes).
llvm-svn: 205210
Some ABIs and C++ libraries may make different trade-offs in how RTTI
is emitted (currently with respect to visibility and so on). This
implements one scheme, as used by ARM64.
llvm-svn: 205101
This adds Clang support for the ARM64 backend. There are definitely
still some rough edges, so please bring up any issues you see with
this patch.
As with the LLVM commit though, we think it'll be more useful for
merging with AArch64 from within the tree.
llvm-svn: 205100
The peculiarities of C99 create scenario where an LLVM IR function
declaration may need to be replaced with a definition baring a different
type because the prototype and definition are not required to agree.
However, we were not properly deferring this when it occurred.
This fixes PR19280.
llvm-svn: 205099
-u behaviour is apparently not portable between linkers (see cfe-commits
discussions for r204379 and r205012). I've moved the logic to IRGen,
where it should have been in the first place.
I don't have a Linux system to test this on, so it's possible this logic
*still* doesn't pull in the instrumented profiling runtime on Linux.
I'm in the process of getting tests going on the compiler-rt side
(llvm-commits "[PATCH] InstrProf: Add initial compiler-rt test"). Once
we have tests for the full flow there, the runtime logic should get a
whole lot less brittle.
<rdar://problem/16458307>
llvm-svn: 205023
This follows the LLVM change to canonicalise the Windows target triple
spellings. Rather than treating each Windows environment as a single entity,
the environments are now modelled properly as an environment. This is a
mechanical change to convert the triple use to reflect that change.
llvm-svn: 204978
This produces valid IR now that llvm rejects aliases to weak aliases and warns
the user that the resolution is not changed if the weak alias is overridden.
llvm-svn: 204935
instead of rolling an inefficient version of the function. This
changes some order of emission of metadata nodes, fix up those
testcases and make them more flexible to some changes.
llvm-svn: 204874
This commit fixes a cast instruction assertion failure
due to the incompatible type cast. This will only happen when
the target requires atomic libcalls.
llvm-svn: 204834
The main difference between __va_start and __builtin_va_start is that
the address of the va_list has already been taken, and the va_list is
always a char*.
__va_end and __va_arg are not needed.
llvm-svn: 204821
Conceptually one of these loops is just a while-loop, but the actual code-gen
is more complicated. We don't instrument all the different control flow edges
to get accurate counts for each conditional branch, nor do I think it makes
sense to do so. Instead, make the simplifying assumption that the loop behaves
like a while-loop. Use the same branch weights for the first check for an
empty collection as would be used for the back-edge of a while loop, and use
that same weighting for the innermost loop, ignoring the possibility that there
may be some extra code to go fetch more elements.
llvm-svn: 204767
COFF doesn't have mergeable sections so LLVM/clang's normal tactics for
string deduplication will not have any effect.
To remedy this we place each string inside it's own section and mark
the section as IMAGE_COMDAT_SELECT_ANY. However, we can only do this if the
string has an external name that we can generate from it's contents.
To be compatible with MSVC, we must use their scheme. Otherwise identical
strings in translation units from clang may not be deduplicated with
translation units in MSVC.
This fixes PR18248.
N.B. We will not attempt to do anything with a string literal which is not of
type 'char' or 'wchar_t' because their compiler does not support unicode
string literals as of this date. Further, we avoid doing this if
either -fwritable-strings or -fsanitize=address are present.
This reverts commit r204596.
llvm-svn: 204675
This commit cleans up a few accidents:
- Do not rely on the order in which StringLiteral lays out bytes.
- Use a more efficient mechanism for handling so-called
"special-mappings" when mangling string literals.
- There is no need to allocate a copy of the mangled name.
- Add the test written for r204562.
Thanks to Richard Smith for pointing these out!
llvm-svn: 204586
COFF doesn't have mergeable sections so LLVM/clang's normal tactics for
string deduplication will not have any effect.
To remedy this we place each string inside it's own section and mark
the section as IMAGE_COMDAT_SELECT_ANY. However, we can only do this if the
string has an external name that we can generate from it's contents.
To be compatible with MSVC, we must use their scheme. Otherwise identical
strings in translation units from clang may not be deduplicated with
translation units in MSVC.
This fixes PR18248.
N.B. We will not attempt to do anything with a string literal which is not of
type 'char' or 'wchar_t' because their compiler does not support unicode
string literals as of this date.
llvm-svn: 204562
location that the next call emitLocation() would default to. Otherwise
setLocation() may wrongly believe that the current source file didn't
change, when in fact it did.
llvm-svn: 204517
Variables with available_externally linkage can be dropped at will.
This causes link errors, since there are still references to the
instrumentation! linkonce_odr is almost equivalent, so use that
instead.
As a drive-by fix (I don't have an Elf system, so I'm not sure how to
write a testcase), use linkonce linkage for the instrumentation of
extern_weak functions.
<rdar://problem/15943240>
llvm-svn: 204408
The variable is used to set the linkage for variables, and will become
different from function linkage in a follow-up commit.
<rdar://problem/15943240>
llvm-svn: 204407
These functions are in the profile runtime. PGO comes later.
Unfortunately, there's only room for 16 characters in a Darwin section,
so use __llvm_prf_ instead of __llvm_profile_ for section names.
<rdar://problem/15943240>
llvm-svn: 204390
Remove the remaining explicit static initialization from translation
units, at least on Darwin. Instead, create a use of __llvm_pgo_runtime,
which will pull in required code from compiler-rt.
After this commit (and its pair in compiler-rt), a user can define their
own __llvm_pgo_runtime to satisfy this undefined symbol and call the
functions in compiler-rt directly.
<rdar://problem/15943240>
llvm-svn: 204379
The hash itself is still the number of counters, which isn't all that
useful, but this separates the API changes from the actual
implementation of the hash and will make it easier to transition to
the ProfileData library once it's implemented.
llvm-svn: 204186
In instrumentation-based profiling, we need a set of data structures to
represent the counters. Previously, these were built up during static
initialization. Now, they're shoved into a specially-named section so
that they show up as an array.
As a consequence of the reorganizing symbols, instrumentation data
structures for linkonce functions are now correctly coalesced.
This is the first step in a larger project to minimize runtime overhead
and dependencies in instrumentation-based profilng. The larger picture
includes removing all initialization overhead and making the dependency
on libc optional.
<rdar://problem/15943240>
llvm-svn: 204080
data members by addition of CXXDefaultInitExpr node to the initializer expression,
it has broken treatment of arc code for such initializations. Reviewed by John McCall.
// rdar://16299964
llvm-svn: 203935
Drive-by fixing some incorrect types where a for loop would be improperly using ObjCInterfaceDecl::protocol_iterator. No functional changes in these cases.
llvm-svn: 203842
This makes Clang take advantage of the recent IR addition of a
"failure" memory ordering requirement. As with the "success" ordering,
we try to emit just a single version if the expression is constant,
but fall back to runtime detection (to allow optimisation across
function-call boundaries).
rdar://problem/15996804
llvm-svn: 203837
This updates CodeGenPGO to use the ProfileDataReader introduced to
llvm in r203703 and the new API for writing out the profile introduced
to compiler-rt in r203710.
llvm-svn: 203711
When a struct has bitfields overlapping with other members
(as required by the AAPCS), clang uses a packed struct to
represent this. If such a struct is large enough for clang to
pass it as a byval pointer (>64 bytes), we need to set the
alignment of the argument to match the original type.
llvm-svn: 203660
PGO counters are 64-bit and branch weights are 32-bit. Scale them down
when necessary, instead of just taking the lower 32 bits.
<rdar://problem/16276448>
llvm-svn: 203592
This is a conservative check, because it's valid for the expression to be
non-constant, and in cases like that we just don't know whether it's valid.
rdar://problem/16242991
llvm-svn: 203561
Previously, we would always emit them with internal linkage,
but with hidden visibility when the function was hidden, which
is an illegal combination, which could lead LLVM to actually
emit them as strong hidden symbols with hilarious results.
rdar://16265084
llvm-svn: 203503
Summary:
'Expected' should only be modified if the operation fails.
This fixes PR18899.
Reviewers: chandlerc, rsmith, rjmccall
CC: cfe-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2922
llvm-svn: 203493
r203364: what was use_iterator is now user_iterator, and there is
a use_iterator for directly iterating over the uses.
This also switches to use the range-based APIs where appropriate.
llvm-svn: 203365
LLVM currently has a hack (shouldEmitUsedDirectiveFor) that causes it to not
print no_dead_strip for symbols starting with 'l' or 'L'. These are exactly the
ones that the clang's objc codegen is producing. The net result, is that it is
equivalent to llvm.compiler.used.
The need for putting the private symbol in llvm.compiler.used should be clear
(the objc runtime uses them). The reason for also putting the weak symbols in
it is for LTO: ld64 will not ask us to preserve the it.
llvm-svn: 203172
In addition, for all functions, use the name from the llvm::Function to
identify the function in the profile data. Compute that "function name",
including the file name for local functions, once when assigning the PGO
counters and store it in the CodeGenPGO class.
Move the code to add InlineHint and Cold attributes out of StartFunction(),
because the "function name" string isn't available at that point.
llvm-svn: 203075
For C++ functions, we will continue to use the mangled name to identify
functions in the PGO profile data, but this name is confusing for things like
Objective-C methods. For functions with local linkage, we're also going to
include the file name to help distinguish those functions, so this changes to
use more generic variable names.
No functional changes.
llvm-svn: 203074
Move the PGO.assignRegionCounters() call out of StartFunction, because that
function is called from many places where it does not make sense to do PGO
instrumentation (e.g., compiler-generated helper functions). Change several
functions to take a StringRef argument for the unique name associated with
a function, so that the name can be set differently for things like Objective-C
methods and block literals.
llvm-svn: 203073
I hit this while debugging another issue where my sources were in an
inconsistent state, so I don't have a testcase. Regardless, this check is
simpler and more direct than checking if the option is enabled.
llvm-svn: 203072
This patch changes the remaining GlobalVariables using "\01L" and
"\01l" prefixes to use private linkage. What is strange about them is
that they currently use WeakAnyLinkage. There is no comment stating
why and that is really odd since the symbols are completely hidden, so
it doesn't make sense for them to be weak.
Clang revisions like r63329, r63408, r63770, r65761 set the linkage to
weak, but don't say why. I suspect they were just copying llvm-gcc.
In llvm-gcc I found r58599 and r56322 that set DECL_WEAK, but they
were just syncing from the apple gcc. I am not exactly sure what that
means, since the last commit to
svn://gcc.gnu.org/svn/gcc/branches/apple was in 2006, 2 years earlier.
In summary, I have no idea why weak linkage was being used :-(
To quote John McCall, "Let’s try without it and see" :-)
llvm-svn: 203059
Summary:
The MSVC ABI appears to mangle the lexical scope into the names of
statics. Specifically, a counter is incremented whenever a scope is
entered where things can be declared in such a way that an ambiguity can
arise. For example, a class scope inside of a class scope doesn't do
anything interesting because the nested class cannot collide with
another nested class.
There are problems with this scheme:
- It is unreliable. The counter is only incremented when a previously
never encountered scope is entered. There are cases where this will
cause ambiguity amongst declarations that have the same name where one
was introduced in a deep scope while the other was introduced right
after in the previous lexical scope.
- It is wasteful. Statements like: {{{{{{{ static int foo = a; }}}}}}}
will make the mangling of "foo" larger than it need be because the
scope counter has been incremented many times.
Because of these problems, and practical implementation concerns. We
choose not to implement this scheme if the local static or local type
isn't visible. The mangling of these declarations will look very
similar but the numbering will make far more sense, this scheme is
lifted from the Itanium ABI implementation.
Reviewers: rsmith, doug.gregor, rnk, eli.friedman, cdavis5x
Reviewed By: rnk
CC: cfe-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2953
llvm-svn: 202951
This fails an "isa<> used with null pointer" assert during a clang-cl
self-host on Windows. This was caused by r202769, and I'm currently
reducing a test case.
Reviewers: dblaikie
Differential Revision: http://llvm-reviews.chandlerc.com/D2944
llvm-svn: 202888
if an ivar offset load is invariant iff inside an instance method
and ivar belongs to instance method's class and one of its super class.
// rdar://16095748
llvm-svn: 202872
We should only have this optimization fire when the explicit
instantiation definition would cause at least one member function to be
emitted, thus ensuring that even a compiler not performing this
optimization would still emit the full type information elsewhere.
But we should also pessimize output still by always emitting the
definition when the explicit instantiation definition appears so that at
some point in the future we can depend on that information even when no
code had to be emitted in that TU. (this shouldn't happen very often,
since people mostly use explicit spec decl/defs to reduce code size -
but perhaps one day they could use it to explicitly reduce debug info
size too)
This was worth about 2% for Clang and LLVM - so not a huge win, but a
win. It looks really great for simple STL programs (include <string> and
just declare a string - 14k -> 1.4k of .dwo)
llvm-svn: 202769
When lowering a bitfield, CGRecordLowering would assign the wrong
storage type to a bitfield in some cases and trigger an assertion. In
these cases the layout was still correct, just the bitfield info was
wrong.
llvm-svn: 202562
A 'remark' is information that is not an error or a warning, but rather some
additional information provided to the user. In contrast to a 'note' a 'remark'
is an independent diagnostic, whereas a 'note' always depends on another
diagnostic.
A typical use case for remark nodes is information provided to the user, e.g.
information provided by the vectorizer about loops that have been vectorized.
This patch provides the initial implementation of 'remarks'. It includes the
actual definiton of the remark nodes, their printing as well as basic parameter
handling. We are reusing the existing diagnostic parameters which means a remark
can be enabled with normal '-Wdiagnostic-name' flags and can be upgraded to
an error using '-Werror=diagnostic-name'. '-Werror' alone does not upgrade
remarks.
This patch is by intention minimal in terms of parameter handling. More
experience and more discussions will most likely lead to further enhancements
in the parameter handling.
llvm-svn: 202475
In r201528, I changed the PGO instrumentation counter for a "do" loop to not
include the fall-through count. That fall-through count is included later, b
it means that this assertion may fail for "do" loops.
llvm-svn: 202437
Summary:
This merges VFPtrInfo and VBTableInfo into VPtrInfo, since they hold
almost the same information. With that change, the vbtable mangling
code can easily be applied to vftable data and we magically get the
correct, unambiguous vftable names.
Fixes PR17748.
Reviewers: timurrrr, majnemer
CC: cfe-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2893
llvm-svn: 202425
In llvm the only semantic difference between internal and private is that llvm
tries to hide private globals my mangling them with a private prefix. Since
the globals changed by this patch already had the magic don't mangle marker,
there should be no change in the generated assembly.
A followup patch should then be able to drop the \01L and \01l prefixes and let
llvm mangle as appropriate.
llvm-svn: 202419
Clang is using llvm::StructType::isOpaque() as a way of signaling if
we've finished record type conversion in
CodeGenTypes::isRecordLayoutComplete(). However, Clang was setting the
body of the type before it finished laying out the type as a base type.
Laying out the %class.C.base LLVM type attempts to convert more types,
eventually recursively attempting to layout 'C' again, at which point we
would say that layout was complete, even though we were still in the
middle of it.
By not setting the body, we correctly signal that layout is not
complete, and things work as expected.
At some point, it might be worth refactoring this to avoid looking at
the LLVM IR types under construction.
llvm-svn: 202320
Before this patch the globals were created with the wrong linkage and patched
afterwards. From the comments it looks like something would complain about
having an internal GV with no initializer. At least in clang the verifier will
only run way after we set the initializer, so that is not a problem.
This patch should be a nop. It just figures out the linkage earlier and
converts the old calls to setLinkage to asserts. The only case where that is
not possible is when we first see a weak import that is then implemented. In
that case we have to change the linkage, but that is the only setLinkage left.
llvm-svn: 202305