Summary:
Emit !heapallocsite in the metadata for calls to functions marked with
__declspec(allocator). Eventually this will be emitted as S_HEAPALLOCSITE debug
info in codeview.
Reviewers: rnk
Subscribers: jfb, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D60237
llvm-svn: 357928
Added special processing of the memory management directives/clauses for
NVPTX target. For private locals, omp_default_mem_alloc and
omp_thread_mem_alloc result in allocation in local memory.
omp_const_mem_alloc allocates const memory, omp_teams_mem_alloc
allocates shared memory, and omp_cgroup_mem_alloc and
omp_large_cap_mem_alloc allocate global memory.
llvm-svn: 357923
Requires making the llvm::MemoryBuffer* stored by SourceManager const,
which in turn requires making the accessors for that return const
llvm::MemoryBuffer*s and updating all call sites.
The original motivation for this was to use it and fix the TODO in
CodeGenAction.cpp's ConvertBackendLocation() by using the UnownedTag
version of createFileID, and since llvm::SourceMgr* hands out a const
llvm::MemoryBuffer* this is required. I'm not sure if fixing the TODO
this way actually works, but this seems like a good change on its own
anyways.
No intended behavior change.
Differential Revision: https://reviews.llvm.org/D60247
llvm-svn: 357724
Improved classification of address space cast when qualification
conversion is performed - prevent adding addr space cast for
non-pointer and non-reference types. Take address space correctly
from the pointee.
Also pass correct address space from 'this' object using
AggValueSlot when generating addrspacecast in the constructor
call.
Differential Revision: https://reviews.llvm.org/D59988
llvm-svn: 357682
Create method `optForNone()` testing for the function level equivalent of
`-O0` and refactor appropriately.
Differential revision: https://reviews.llvm.org/D59852
llvm-svn: 357638
Also for CUDA, we need to disable producing these fat binary functions when there is no GPU code.
Reviewers: yaxunl, tra
Differential Revision: https://reviews.llvm.org/D60141
llvm-svn: 357526
Skip producing the fat binary functions for HIP when no device code is present.
Reviewers: yaxunl
Differential Review: https://reviews.llvm.org/D60141
llvm-svn: 357520
This ability was removed in r351487, but it's needed when a lambda appears as an
OpaqueValueExpr subexpression of a PseudoObjectExpr.
rdar://49030379
Differential revision: https://reviews.llvm.org/D60099
llvm-svn: 357515
Allow the optimizer to remove unnecessary EH cleanups surrounding calls
to os_log_helper, to save some code size.
As a follow-up, it might be worthwhile to add a BasicNoexcept exception
spec to os_log_helper, and to then teach CGCall to emit direct calls for
callees which can't throw. This could save some compile-time.
Differential Revision: https://reviews.llvm.org/D60108
llvm-svn: 357501
If the pointer is captured by reference, it must be mapped as
_PTR_AND_OBJ kind of mapping to correctly translate the pointer address
on the device.
llvm-svn: 357488
Before this patch, CGLoop would dump all transformations for a loop into
a single LoopID without encoding any order in which to apply them.
rL348944 added the possibility to encode a transformation order using
followup-attributes.
When a loop has more than one transformation, use the follow-up
attribute define the order in which they are applied. The emitted order
is the defacto order as defined by the current LLVM pass pipeline,
which is:
LoopFullUnrollPass
LoopDistributePass
LoopVectorizePass
LoopUnrollAndJamPass
LoopUnrollPass
MachinePipeliner
This patch should therefore not change the assembly output, assuming
that all explicit transformations can be applied, and no implicit
transformations in-between. In the former case,
WarnMissedTransformationsPass should emit a warning (except for
MachinePipeliner which is not implemented yet). The latter could be
avoided by adding 'llvm.loop.disable_nonforced' attributes.
Because LoopUnrollAndJamPass processes a loop nest, generation of the
MDNode is delayed to after the inner loop metadata have been processed.
A temporary LoopID is therefore used to annotate instructions and
RAUW'ed by the actual LoopID later.
Differential Revision: https://reviews.llvm.org/D57978
llvm-svn: 357415
Summary:
Based on a patch by Dustin Howett, modified to not change the ABI for
ELF platforms.
Use more Windows-like section names.
This also makes things more readable by PE/COFF debug tools that assume
sections fit in the first header.
With these changes in, it is now possible to build a working WinObjC
with clang and the WinObjC version of GNUstep libobjc (upstream GNUstep
libobjc + a work around for incremental linking, which can be removed
once LINK.EXE gains a feature to opt sections out of receiving extra
padding during an incremental link).
Patch by Dustin Howett!
Reviewers: DHowett-MSFT
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D58724
llvm-svn: 357364
Without this change, linking multiple objects containing block
descriptors together on Windows will generate duplicate symbol errors.
Patch by Dustin Howett!
Differential Revision: https://reviews.llvm.org/D58807
llvm-svn: 357363
This change adds hierarchical "time trace" profiling blocks that can be visualized in Chrome, in a "flame chart" style. Each profiling block can have a "detail" string that for example indicates the file being processed, template name being instantiated, function being optimized etc.
This is taken from GitHub PR: https://github.com/aras-p/llvm-project-20170507/pull/2
Patch by Aras Pranckevičius.
Differential Revision: https://reviews.llvm.org/D58675
llvm-svn: 357340
copy/move constructor/assignment operator functions for non-trivial C
structs.
This commit fixes a bug where the offset of struct fields weren't being
taken into account when computing the addresses passed to calls to the
special functions.
For example, the copy constructor for S1 (__copy_constructor_8_8_s0_s8)
would pass the start addresses of the destination and source structs to
the call to S0's copy constructor (_copy_constructor_8_8_s0) without
adding the offset of field f1 to the addresses.
typedef struct {
id f0;
S0 f1;
} S1;
void test(S1 s1) {
S1 t = s1;
}
rdar://problem/49400610
llvm-svn: 357229
Future versions of MSVC make these intrinsics available on x86 & x64,
according to:
http://lists.llvm.org/pipermail/cfe-dev/2019-March/061711.html
The purpose of these builtins is to emit plain, non-atomic, volatile
stores when /volatile:ms (-cc1 -fms-volatile) is enabled.
llvm-svn: 357220
In https://bugs.llvm.org/show_bug.cgi?id=41206 we observe bad codegen
when embedding a non-trivial C struct within a C struct. This is due to
the fact that name mangling for non-trivial structs marks the two
structs as identical. This diff contains a fix for this issue.
Patch by Dan Zimmerman <daniel.zimmerman@me.com>.
Differential Revision: https://reviews.llvm.org/D59873
llvm-svn: 357184
This is the result of discussions on the list about how to deal with intrinsics
which require codegen to disambiguate them via only the integer/fp overloads.
It causes problems for GlobalISel as some of that information is lost during
translation, while with other operations like IR instructions the information is
encoded into the instruction opcode.
This patch changes clang to emit the new faddp intrinsic if the vector operands
to the builtin have FP element types. LLVM IR AutoUpgrade has been taught to
upgrade existing calls to aarch64.neon.addp with fp vector arguments, and
we remove the workarounds introduced for GlobalISel in r355865.
This is a more permanent solution to PR40968.
Differential Revision: https://reviews.llvm.org/D59655
llvm-svn: 356722
with notail on x86-64.
On x86-64, the epilogue code inserted before the tail jump blocks the
autoreleased return optimization.
rdar://problem/38675807
Differential Revision: https://reviews.llvm.org/D59656
llvm-svn: 356705
For the global variables the allocate directive must specify only the
predefined allocator. This allocator must be translated into the correct
form of the address space for the targets that support different address
spaces.
llvm-svn: 356702
Summary:
[OpenCL] Generate 'unroll.enable' metadata for __attribute__((opencl_unroll_hint))
For both !{!"llvm.loop.unroll.enable"} and !{!"llvm.loop.unroll.full"} the unroller
will try to fully unroll a loop unless the trip count is not known at compile time.
In that case for '.full' metadata no unrolling will be processed, while for '.enable'
the loop will be partially unrolled with a heuristically chosen unroll factor.
See: docs/LanguageExtensions.rst
From https://www.khronos.org/registry/OpenCL/sdk/2.0/docs/man/xhtml/attributes-loopUnroll.html
__attribute__((opencl_unroll_hint))
for (int i=0; i<2; i++)
{
...
}
In the example above, the compiler will determine how much to unroll the loop.
Before the patch for __attribute__((opencl_unroll_hint)) was generated metadata
!{!"llvm.loop.unroll.full"}, which limits ability of loop unroller to decide, how
much to unroll the loop.
Reviewers: Anastasia, yaxunl
Reviewed By: Anastasia
Subscribers: zzheng, dmgreen, jdoerfert, cfe-commits, asavonic, AlexeySotkin
Tags: #clang
Differential Revision: https://reviews.llvm.org/D59493
llvm-svn: 356571
The attribute pass_dynamic_object_size(n) behaves exactly like
pass_object_size(n), but instead of evaluating __builtin_object_size on calls,
it evaluates __builtin_dynamic_object_size, which has the potential to produce
runtime code when the object size can't be determined statically.
Differential revision: https://reviews.llvm.org/D58757
llvm-svn: 356515
Added initial codegen for the local variables with the #pragma omp
allocate directive. Instead of allocating the variables on the stack,
__kmpc_alloc|__kmpc_free functions are used for memory (de-)allocation.
llvm-svn: 356472
Summary:
This patch refactors several instances of cast<> used in if
conditionals. Since cast<> asserts on failure, the else branch can
never be taken.
In some cases, the fix is to replace cast<> with dyn_cast<>. While
others required the removal of the conditional and some minor
refactoring.
A discussion can be seen here: http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20190318/265044.html
Differential Revision: https://reviews.llvm.org/D59529
llvm-svn: 356441
As background, when constructing a complete object, virtual bases are
constructed first. If an exception is thrown later in the ctor, those
virtual bases are destroyed, so sema marks the relevant constructors and
destructors of virtual bases as referenced. If necessary, they are
emitted.
However, an abstract class can never be used to construct a complete
object. In the Itanium C++ ABI, this works out nicely, because we never
end up emitting the "complete" constructor variant, only the "base"
constructor variant, which can be called by constructors of derived
classes. Clang's Sema::MarkBaseAndMemberDestructorsReferenced is aware
of this optimization, and it does not mark ctors and dtors of virtual
bases referenced when the constructor of an abstract class is emitted.
In the Microsoft ABI, there are no complete/base variants, so before
this change, the constructor of an abstract class could reference ctors
and dtors of a virtual base without marking them referenced. This could
lead to unresolved symbol errors at link time, as reported in PR41065.
The fix is to implement the same optimization as Sema: If the class is
abstract, don't bother initializing its virtual bases. The "is this
class the most derived class" check in the constructor will never pass,
and the virtual base constructor calls are always dead. Skip them.
I think Richard noticed this missed optimization back in 2016 when he
was implementing inheriting constructors. I wasn't able to find any bugs
or email about it, though.
Fixes PR41065
llvm-svn: 356425
Summary:
Because in wasm we merge all catch clauses into one big catchpad, in
case none of the types in catch handlers matches after we test against
each of them, we should unwind to the next EH enclosing scope. For this,
we should NOT use a call to `__cxa_rethrow` but rather a call to our own
rethrow intrinsic, because what we're trying to do here is just to
transfer the control flow into the next enclosing EH pad (or the
caller). Calls to `__cxa_rethrow` should only be used after a call to
`__cxa_begin_catch`.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D59353
llvm-svn: 356317
If the doacross lop construct is used and the loop counter is declare
outside of the loop, the compiler might crash trying to get the address
of the loop counter. Patch fixes this problem.
llvm-svn: 356198
The constraint "0" in the following asm did not consider the its
relationship with "=y" when try to replace the type of the operands.
asm ("nop" : "=y"(Mu8_1 ) : "0"(Mu8_0 ));
Patch by Xiang Zhang.
Differential Revision: https://reviews.llvm.org/D56990
llvm-svn: 356196