- added -fcuda-include-gpubinary option to incorporate results of
device-side compilation into host-side one.
- generate code to register GPU binaries and associated kernels
with CUDA runtime and clean-up on exit.
- added test case for init/deinit code generation.
Differential Revision: http://reviews.llvm.org/D9507
llvm-svn: 236765
The fact that PGO has a say in how these branch weights are determined
isn't interesting to most of CodeGen, so it makes more sense for this
API to be accessible via CodeGenFunction rather than CodeGenPGO.
llvm-svn: 236380
This is just the clang-side of 32-bit SEH. LLVM still needs work, and it
will determinstically fail to compile until it's feature complete.
On x86, all outlined handlers have no parameters, but they do implicitly
take the EBP value passed in and use it to address locals of the parent
frame. We model this with llvm.frameaddress(1).
This works (mostly), but __finally block inlining can break it. For now,
we apply the 'noinline' attribute. If we really want to inline __finally
blocks on 32-bit x86, we should teach the inliner how to untangle
frameescape and framerecover.
Promote the error diagnostic from codegen to sema. It now rejects SEH on
non-Windows platforms. LLVM doesn't implement SEH on non-x86 Windows
platforms, but there's nothing preventing it.
llvm-svn: 236052
The RegionCounter type does a lot of legwork, but most of it is only
meaningful within the implementation of CodeGenPGO. The uses elsewhere
in CodeGen generally just want to increment or read counters, so do
that directly.
llvm-svn: 235664
This reverts commit r234700. It turns out that the lifetime markers
were not the cause of Chromium failing but a bug which was uncovered by
optimizations exposed by the markers.
llvm-svn: 235553
Even though these symbols are in a comdat group, the Microsoft linker
really wants them to have internal linkage.
I'm planning to tweak the mangling in a follow-up change. This is a
straight revert with a 1-line fix.
llvm-svn: 234613
Now that TailRecursionElimination has been fixed with r222354, the
threshold on size for lifetime marker insertion can be removed. This
only affects named temporary though, as the patch for unnamed temporaries
is still in progress.
My previous commit (r222993) was not handling debuginfo correctly, but
this could only be seen with some asan tests. Basically, lifetime markers
are just instrumentation for the compiler's usage and should not affect
debug information; however, the cleanup infrastructure was assuming it
contained only destructors, i.e. actual code to be executed, and was
setting the breakpoint for the end of the function to the closing '}', and
not the return statement, in order to show some destructors have been
called when leaving the function. This is wrong when the cleanups are only
lifetime markers, and this is now fixed.
llvm-svn: 234581
WinEHPrepare was going to have to pattern match the control flow merge
and split that the old lowering used, and that wasn't really feasible.
Now we can teach WinEHPrepare to pattern match this, which is much
simpler:
%fp = call i8* @llvm.frameaddress(i32 0)
call void @func(iN [01], i8* %fp)
This prototype happens to match the prototype used by the Win64 SEH
personality function, so this is really simple.
llvm-svn: 234532
The driver currently accepts but ignores the -freciprocal-math flag.
This patch passes the flag through and enables 'arcp' fast-math-flag
generation in IR.
Note that this change does not actually enable the optimization for
any target. The reassociation optimization that this flag specifies
was implemented by http://reviews.llvm.org/D6334 :
http://llvm.org/viewvc/llvm-project?view=revision&revision=222510
Because the optimization is done in the backend rather than IR,
the backend must be modified to understand instruction-level
fast-math-flags or a new function-level attribute must be created.
Also note that -freciprocal-math is independent of any target-specific
usage of reciprocal estimate hardware instructions. That requires
its own flag ('-mrecip').
https://llvm.org/bugs/show_bug.cgi?id=20912
llvm-svn: 234493
The test should be fixed. It was failing in NDEBUG builds due to a
missing '*' character in a regex. In asserts builds, the pattern matched
a single digit value, which became a double digit value in NDEBUG
builds. Go figure.
This reverts commit r234261.
llvm-svn: 234447
While capturing filters aren't very common, we'd like to outline
__finally blocks in the frontend to simplify -O0 EH preparation and
reduce code size. Finally blocks are usually have captures, and this is
the first step towards that.
Currently we don't support capturing 'this' or VLAs.
Reviewers: majnemer
Differential Revision: http://reviews.llvm.org/D8825
llvm-svn: 234261
There are no widely deployed standard libraries providing sized
deallocation functions, so we have to punt and ask the user if they want
us to use sized deallocation. In the future, when such libraries are
deployed, we can teach the driver to detect them and enable this
feature.
N3536 claimed that a weak thunk from sized to unsized deallocation could
be emitted to avoid breaking backwards compatibility with standard
libraries not providing sized deallocation. However, this approach and
other variations don't work in practice.
With the weak function approach, the thunk has to have default
visibility in order to ensure that it is overridden by other DSOs
providing sized deallocation. Weak, default visibility symbols are
particularly expensive on MachO, so John McCall was considering
disabling this feature by default on Darwin. It also changes behavior
ELF linking behavior, causing certain otherwise unreferenced object
files from an archive to be pulled into the link.
Our second approach was to use an extern_weak function declaration and
do an inline conditional branch at the deletion call site. This doesn't
work because extern_weak only works on MachO if you have some archive
providing the default value of the extern_weak symbol. Arranging to
provide such an archive has the same challenges as providing the symbol
in the standard library. Not to mention that extern_weak doesn't really
work on COFF.
Reviewers: rsmith, rjmccall
Differential Revision: http://reviews.llvm.org/D8467
llvm-svn: 232788
Previously we would simply double-emit the body of the __finally block,
but that doesn't work when it contains any kind of Decl, which we can't
double emit.
This fixes that by emitting the block once and branching into a shared
code region and then branching back out.
llvm-svn: 228222
Now if you break on a dtor and go 'up' in your debugger (or you get an
asan failure in a dtor) during an exception unwind, you'll have more
context. Instead of all dtors appearing to be called from the '}' of the
function, they'll be attributed to the end of the scope of the variable,
the same as the non-exceptional dtor call.
This doesn't /quite/ remove all uses of CurEHLocation (which might be
nice to remove, for a few reasons) - it's still used to choose the
location for some other work in the landing pad. It'd be nice to
attribute that code to the same location as the exception calls within
the block and to remove CurEHLocation.
llvm-svn: 228181
This is half a fix for a GDB test suite failure that expects to start at
'a' in the following code:
void func(int a)
if (a
&&
b)
...
But instead, without this change, the comparison was assigned to '&&'
(well, worse actually - because there was a chained 'a && b && c' and it
was assigned to the second '&&' because of a recursive application of
this bug) and then the load folded into the comparison so breaking on
the function started at '&&' instead of 'a'.
The other part of this needs to be fixed in LLVM where it's ignoring the
location of the icmp and instead using the location of the branch
instruction.
The fix to the conditional operator is actually a no-op currently,
because the conditional operator's location coincides with 'a' (the
start of the conditional expression) but should probably be '?' instead.
See the FIXME in the test case that mentions the ARCMigration tool
failures when I tried to make that change.
llvm-svn: 227356
The lowering looks a lot like normal EH lowering, with the exception
that the exceptions are caught by executing filter expression code
instead of matching typeinfo globals. The filter expressions are
outlined into functions which are used in landingpad clauses where
typeinfo would normally go.
Major aspects that still need work:
- Non-call exceptions in __try bodies won't work yet. The plan is to
outline the __try block in the frontend to keep things simple.
- Filter expressions cannot use local variables until capturing is
implemented.
- __finally blocks will not run after exceptions. Fixing this requires
work in the LLVM SEH preparation pass.
The IR lowering looks like this:
// C code:
bool safe_div(int n, int d, int *r) {
__try {
*r = normal_div(n, d);
} __except(_exception_code() == EXCEPTION_INT_DIVIDE_BY_ZERO) {
return false;
}
return true;
}
; LLVM IR:
define i32 @filter(i8* %e, i8* %fp) {
%ehptrs = bitcast i8* %e to i32**
%ehrec = load i32** %ehptrs
%code = load i32* %ehrec
%matches = icmp eq i32 %code, i32 u0xC0000094
%matches.i32 = zext i1 %matches to i32
ret i32 %matches.i32
}
define i1 zeroext @safe_div(i32 %n, i32 %d, i32* %r) {
%rr = invoke i32 @normal_div(i32 %n, i32 %d)
to label %normal unwind to label %lpad
normal:
store i32 %rr, i32* %r
ret i1 1
lpad:
%ehvals = landingpad {i8*, i32} personality i32 (...)* @__C_specific_handler
catch i8* bitcast (i32 (i8*, i8*)* @filter to i8*)
%ehptr = extractvalue {i8*, i32} %ehvals, i32 0
%sel = extractvalue {i8*, i32} %ehvals, i32 1
%filter_sel = call i32 @llvm.eh.seh.typeid.for(i8* bitcast (i32 (i8*, i8*)* @filter to i8*))
%matches = icmp eq i32 %sel, %filter_sel
br i1 %matches, label %eh.except, label %eh.resume
eh.except:
ret i1 false
eh.resume:
resume
}
Reviewers: rjmccall, rsmith, majnemer
Differential Revision: http://reviews.llvm.org/D5607
llvm-svn: 226760
Several pieces of code were relying on implicit debug location setting
which usually lead to incorrect line information anyway. So I've fixed
those (in r225955 and r225845) separately which should pave the way for
this commit to be cleanly reapplied.
The reason these implicit dependencies resulted in crashes with this
patch is that the debug location would no longer implicitly leak from
one place to another, but be set back to invalid. Once a call with
no/invalid location was emitted, if that call was ever inlined it could
produce invalid debugloc chains and assert during LLVM's codegen.
There may be further cases of such bugs in this patch - they're hard to
flush out with regression testing, so I'll keep an eye out for reports
and investigate/fix them ASAP if they come up.
Original commit message:
Reapply "DebugInfo: Generalize debug info location handling"
Originally committed in r224385 and reverted in r224441 due to concerns
this change might've introduced a crash. Turns out this change fixes the
crash introduced by one of my earlier more specific location handling
changes (those specific fixes are reverted by this patch, in favor of
the more general solution).
Recommitted in r224941 and reverted in r224970 after it caused a crash
when building compiler-rt. Looks to be due to this change zeroing out
the debug location when emitting default arguments (which were meant to
inherit their outer expression's location) thus creating call
instructions without locations - these create problems for inlining and
must not be created. That is fixed and tested in this version of the
change.
Original commit message:
This is a more scalable (fixed in mostly one place, rather than many
places that will need constant improvement/maintenance) solution to
several commits I've made recently to increase source fidelity for
subexpressions.
This resetting had to be done at the DebugLoc level (not the
SourceLocation level) to preserve scoping information (if the resetting
was done with CGDebugInfo::EmitLocation, it would've caused the tail end
of an expression's codegen to end up in a potentially different scope
than the start, even though it was at the same source location). The
drawback to this is that it might leave CGDebugInfo out of sync. Ideally
CGDebugInfo shouldn't have a duplicate sense of the current
SourceLocation, but for now it seems it does... - I don't think I'm
going to tackle removing that just now.
I expect this'll probably cause some more buildbot fallout & I'll
investigate that as it comes up.
Also these sort of improvements might be starting to show a weakness/bug
in LLVM's line table handling: we don't correctly emit is_stmt for
statements, we just put it on every line table entry. This means one
statement split over multiple lines appears as multiple 'statements' and
two statements on one line (without column info) are treated as one
statement.
I don't think we have any IR representation of statements that would
help us distinguish these cases and identify the beginning of each
statement - so that might be something we need to add (possibly to the
lexical scope chain - a scope for each statement). This does cause some
problems for GDB and possibly other DWARF consumers.
llvm-svn: 225956
This reverts commit r225000, r225021, r225083, r225086, r225090.
The root change (r225000) still has several issues where it's caused
calls to be emitted without debug locations. This causes assertion
failures if/when those calls are inlined.
I'll work up some test cases and fixes before recommitting this.
llvm-svn: 225555
arithmetic relaxation flags:
-cl-no-signed-zeros
-cl-unsafe-math-optimizations
-cl-finite-math-only
-cl-fast-relaxed-math
Propagate the info to FP instruction flags as well
as function attributes where they are available.
llvm-svn: 223928
The logic for lowering profiling counters has been moved to an LLVM
pass. Emit the intrinsics rather than duplicating the whole pass in
clang.
llvm-svn: 223683
http://llvm.org/bugs/show_bug.cgi?id=21555
Currently, kernel argument metadata is omitted unless the
"-cl-kernel-arg-info" option is specified. But the SPIR 1.2 spec
requires that all metadata except kernel_arg_name should always be
emitted, and kernel_arg_name is only emitted when
"-cl-kernel-arg-info" is specified.
Patch ported by Ryan Burn from the Khronos SPIR generator.
https://github.com/KhronosGroup/SPIR
llvm-svn: 223340