I. Current implementation of images is not conformant to spec in the following points:
1. It makes no distinction with respect to access qualifiers and therefore allows to use images with different access type interchangeably. The following code would compile just fine:
void write_image(write_only image2d_t img);
kernel void foo(read_only image2d_t img) { write_image(img); } // Accepted code
which is disallowed according to s6.13.14.
2. It discards access qualifier on generated code, which leads to generated code for the above example:
call void @write_image(%opencl.image2d_t* %img);
In OpenCL2.0 however we can have different calls into write_image with read_only and wite_only images.
Also generally following compiler steps have no easy way to take different path depending on the image access: linking to the right implementation of image types, performing IR opts and backend codegen differently.
3. Image types are language keywords and can't be redeclared s6.1.9, which can happen currently as they are just typedef names.
4. Default access qualifier read_only is to be added if not provided explicitly.
II. This patch corrects the above points as follows:
1. All images are encapsulated into a separate .def file that is inserted in different points where image handling is required. This avoid a lot of code repetition as all images are handled the same way in the code with no distinction of their exact type.
2. The Cartesian product of image types and image access qualifiers is added to the builtin types. This simplifies a lot handling of access type mismatch as no operations are allowed by default on distinct Builtin types. Also spec intended access qualifier as special type qualifier that are combined with an image type to form a distinct type (see statement above - images can't be created w/o access qualifiers).
3. Improves testing of images in Clang.
Author: Anastasia Stulova
Reviewers: bader, mgrang.
Subscribers: pxli168, pekka.jaaskelainen, yaxunl.
Differential Revision: http://reviews.llvm.org/D17821
llvm-svn: 265783
CodeGen-level implementation. Instead of adding an attribute to clang's
FunctionDecl, add the IR attribute directly. This means a module built with
this flag is now compatible with code built without it and vice versa.
This change also results in the 'noalias' attribute no longer being added to
calls to operator new in the IR; it's now only added to the declaration. It
also fixes a bug where we failed to add the attribute to the 'nothrow' versions
(because we didn't implicitly declare them, there was no good time to inject a
fake attribute).
llvm-svn: 265728
This is a mechanical move of CodeGenOptions from libFrontend to libBasic. This
fixes the layering violation introduced earlier by threading CodeGenOptions into
TargetInfo. It should also fix the modules based self-hosting builds. NFC.
llvm-svn: 265702
isinf (is infinite) and isfinite should be implemented with the same function
except we change the comparison operator.
See PR27145 for more details:
https://llvm.org/bugs/show_bug.cgi?id=27145
Ref: forked off of the discussion in D18513.
Differential Revision: http://reviews.llvm.org/D18648
llvm-svn: 265675
Turns out it was there mostly to prevent Clang asking people to report a bug.
This time we report something to Clang's real diagnostics handler so that it
exits with something approximating a real error and tidies up after itself.
llvm-svn: 265592
Summary: See LLVM change D18775 for details, this change depends on it.
Reviewers: jyknight, reames
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D18776
llvm-svn: 265569
Summary:
Setting this flag causes all functions are annotated with the
"nvvm-f32ftz" = "true" attribute.
In addition, we annotate the module with "nvvm-reflect-ftz" set
to 0 or 1, depending on whether -cuda-flush-denormals-to-zero is set.
This is read by the NVVMReflect pass.
Reviewers: tra, rnk
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D18671
llvm-svn: 265435
Add no-jump-tables flag to disable use of jump tables when lowering
switch statements
Reviewers: echristo, hans
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18407
llvm-svn: 265425
This patch implements the teams directive for the NVPTX backend. It is different from the host code generation path as it:
Does not call kmpc_fork_teams. All necessary teams and threads are started upon touching the target region, when launching a CUDA kernel, and their execution is coordinated through sequential and parallel regions within the target region.
Does not call kmpc_push_num_teams even if a num_teams of thread_limit clause is present. Setting the number of teams and the thread limit is implemented by the nvptx-related runtime.
Please note that I am now passing a Clang Expr * to emitPushNumTeams instead of the originally chosen llvm::Value * type. The reason for that is that I want to avoid emitting expressions for num_teams and thread_limit if they are not needed in the target region.
http://reviews.llvm.org/D17963
llvm-svn: 265304
The objc_runtime_visible attribute deals with an odd corner case where
a particular Objective-C class is known to the Objective-C runtime
(and, therefore, accessible by name) but its symbol has been hidden
for some reason. For such classes, teach CodeGen to use
objc_lookUpClass to retrieve the Class object, rather than referencing
the class symbol directly.
Classes annotated with objc_runtime_visible have two major limitations
that fall out from places where Objective-C metadata needs to refer to
the class (or metaclass) symbol directly:
* One cannot implement a subclass of an objc_runtime_visible class.
* One cannot implement a category on an objc_runtime_visible class.
Implements rdar://problem/25494092.
llvm-svn: 265201
landing pads.
Previously, lifetime.end intrinsics were inserted only on normal control
flows. This prevented StackColoring from merging stack slots for objects
that were destroyed on the exception handling control flow since it
couldn't tell their lifetime ranges were disjoint. This patch fixes
code-gen to emit the intrinsic on both control flows.
rdar://problem/22181976
Differential Revision: http://reviews.llvm.org/D18196
llvm-svn: 265197
Whatever crash it was there to present appears to have been fixed in the
backend now, and it had the nasty side-effect of causing clang to exit(0) and
leave a .o containing goodness knows what even when an error hit.
llvm-svn: 265038
Value profiling should not profile constants and/or constant
expressions when they appear as callees in call instructions.
Constant expressions form when a direct callee has bitcasts or
inttoptr(ptrtint (callee)) nests surrounding it. Value profiling
should avoid instrumenting such cases. Mostly NFC.
llvm-svn: 265037
alignment on Darwin.
Itanium C++ ABI specifies that _Unwind_Exception should be double-word
aligned (16B). To conform to the ABI, libraries implementing exception
handling declare the struct with __attribute__((aligned)), which aligns
the unwindHeader field (and the end of __cxa_exception) to the default
target alignment (which is typically 16-bytes).
struct __cxa_exception {
...
// struct is declared with __attribute__((aligned)).
_Unwind_Exception unwindHeader;
};
Based on the assumption that _Unwind_Exception is declared with
__attribute__((aligned)), ItaniumCXXABI::getAlignmentOfExnObject returns
the target default alignment for __attribute__((aligned)). It turns out
that libc++abi, which is used on Darwin, doesn't declare the struct with
the attribute and therefore doesn't guarantee that unwindHeader is
aligned to the alignment specified by the ABI, which in some cases
causes the program to crash because of unaligned memory accesses.
This commit avoids crashes due to unaligned memory accesses by having
getAlignmentOfExnObject return an 8-byte alignment on Darwin. I've only
fixed the problem for Darwin, but we should also figure out whether other
platforms using libc++abi need similar fixes.
rdar://problem/25314277
Differential revision: http://reviews.llvm.org/D18479
llvm-svn: 264998
...as that is apparently what MSVC does. This is an updated version of r263738,
which had to be reverted in r263740 due to test failures. The original version
had erroneously emitted functions that are defined in class templates, too (see
the updated "Handle friend functions" code in EmitDeferredDecls,
lib/CodeGen/ModuleBuilder.cpp). (The updated tests needed to be split out into
their own dllexport-ms-friend.cpp because of the CHECK-NOTs which would have
interfered with subsequent CHECK-DAGs in dllexport.cpp.)
Differential Revision: http://reviews.llvm.org/D18430
llvm-svn: 264841
For terminator instructions, the value profiling instrumentation
happens in a basic block other than where the value site resides.
This CR moves the instrumentation point prior to the value site.
Mostly NFC.
llvm-svn: 264783
For better support of some specific GNU extensions some extra
transformation of AST nodes were introduced. These transformations are
very hard to handle. The code is improved in handling of these
extensions by using captured expressions construct.
llvm-svn: 264709
Solution unifies interface of RegionCodeGenTy type to allow insert
runtime-specific code before/after main codegen action defined in
CGStmtOpenMP.cpp file. Runtime should not define its own RegionCodeGenTy
for general OpenMP directives, but must be allowed to insert its own
(required) code to support target specific codegen.
llvm-svn: 264700
to function names
Summary:
Hopefully this will make it easier for the next person to figure all
this out...
Reviewers: bogner, davidxl
Subscribers: davidxl, cfe-commits
Differential Revision: http://reviews.llvm.org/D18489
llvm-svn: 264681
Solution unifies interface of RegionCodeGenTy type to allow insert
runtime-specific code before/after main codegen action defined in
CGStmtOpenMP.cpp file. Runtime should not define its own RegionCodeGenTy
for general OpenMP directives, but must be allowed to insert its own
(required) code to support target specific codegen.
llvm-svn: 264576
Solution unifies interface of RegionCodeGenTy type to allow insert
runtime-specific code before/after main codegen action defined in
CGStmtOpenMP.cpp file. Runtime should not define its own RegionCodeGenTy
for general OpenMP directives, but must be allowed to insert its own
(required) code to support target specific codegen.
llvm-svn: 264569
I broke this back in r264529 because I forgot to serialize the UuidAttr
member. Fix this by replacing the UuidAttr with a StringRef which is
properly serialized and deserialized.
llvm-svn: 264562
The _GUID_ descriptors emitted by MSVC have alignment 8 for 64-bit
builds: we should do the same if the linker picks the "wrong" COMDAT.
llvm-svn: 264530
Keep a pointer to the UuidAttr that the CXXUuidofExpr corresponds to.
This makes translating from __uuidof to the underlying constant a lot
more straightforward.
llvm-svn: 264529
Summary:
This patch adds base support for codegen of the target directive on the NVPTX device.
Reviewers: ABataev
Differential Revision: http://reviews.llvm.org/D17877
Reworked test case after buildbot failure on windows.
Updated patch to integrate r263837 and test case nvptx_target_firstprivate_codegen.cpp.
llvm-svn: 264018
These functions cannot be implemented as atomicrmw or cmpxchg
instructions, so they are implemented as a call to the NVVM intrinsics
@llvm.nvvm.atomic.load.inc.32.p0i32 and
@llvm.nvvm.atomic.load.dec.32.p0i32.
Patch by Jason Henline.
Reviewers: jlebar
Differential Revision: http://reviews.llvm.org/D18322
llvm-svn: 264009
This reverts commit r263607.
This change caused more objc_retain/objc_release calls in the IR but those
are then incorrectly optimized by the ARC optimizer. Work is going to have
to be done to ensure the ARC optimizer doesn't optimize user written RR, but
that should land before this change.
This change will also need to be updated to take account for any changes required
to ensure that user written calls to RR are distinct from those inserted by ARC.
llvm-svn: 263984
Implement lambda capture of *this by copy.
For e.g.:
struct A {
int d = 10;
auto foo() { return [*this] (auto a) mutable { d+=a; return d; }; }
};
auto L = A{}.foo(); // A{}'s lifetime is gone.
// Below is still ok, because *this was captured by value.
assert(L(10) == 20);
assert(L(100) == 120);
If the capture was implicit, or [this] (i.e. *this was captured by reference), this code would be otherwise undefined.
Implementation Strategy:
- amend the parser to accept *this in the lambda introducer
- add a new king of capture LCK_StarThis
- teach Sema::CheckCXXThisCapture to handle by copy captures of the
enclosing object (i.e. *this)
- when CheckCXXThisCapture does capture by copy, the corresponding
initializer expression for the closure's data member
direct-initializes it thus making a copy of '*this'.
- in codegen, when assigning to CXXThisValue, if *this was captured by
copy, make sure it points to the corresponding field member, and
not, unlike when captured by reference, what the field member points
to.
- mark feature as implemented in svn
Much gratitude to Richard Smith for his carefully illuminating reviews!
llvm-svn: 263921
This makes sure we don't generate a lot of code to spill/reload
CSRs when calling tls_init from the access functions.
This helps performance when tls_init is not inlined into the access
functions.
llvm-svn: 263854
This patch implements the following aspects:
It extends sema to check that a variable is not reference in both a map clause and firstprivate or private. This is needed to ensure correct functioning at codegen level, apart from being useful for the user.
It implements firstprivate for target in codegen. The implementation applies to both host and nvptx devices.
It adds regression tests for codegen of firstprivate, host and device side when using the host as device, and nvptx side.
Please note that the regression test for nvptx codegen is missing VLAs. This is because VLAs currently require saving and restoring the stack which appears not to be a supported operation by nvptx backend.
It adds a check in sema regression tests for target map, firstprivate, and private clauses.
http://reviews.llvm.org/D18203
llvm-svn: 263837
Summary:
r246764 handled __fp16 arguments and returns for AAPCS, but skipped this
handling for OpenCL. Simlar to OpenCL, RenderScript also handles __fp16
type natively.
This patch adds the -fnative-half-arguments-and-returns command line
flag to allow such languages to skip this coercion of __fp16.
Reviewers: srhines, olista01
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D18138
llvm-svn: 263795
Summary:
Reworked test case after buildbot failure on windows.
This patch adds base support for codegen of the target directive on the NVPTX device.
Reviewers: ABataev
Differential Revision: http://reviews.llvm.org/D17877
llvm-svn: 263783
Summary: ...as that is apparently what MSVC does
Reviewers: rnk
Patch by Stephan Bergmann
Differential Revision: http://reviews.llvm.org/D15267
llvm-svn: 263738
OpenMP 4.0 allows to define custom reduction operations using '#pragma
omp declare reduction' construct. Patch allows to use this custom
defined reduction operations in 'reduction' clauses.
llvm-svn: 263701
This patch adds support for codegen of private clause of target and a regression test for host code generation, when the host is used as target device. I believe that code generation for nvptx backend would not require anything additional or different to what is done for the host.
http://reviews.llvm.org/D18105
llvm-svn: 263654
Till now, preserve_mostcc/preserve_allcc calling convention attributes were only
available at the LLVM IR level. This patch adds attributes for
preserve_mostcc/preserve_allcc calling conventions to the C/C++ front-end.
The code was mostly written by Juergen Ributzka.
I just added support for the AArch64 target and tests.
Differential Revision: http://reviews.llvm.org/D18025
llvm-svn: 263647
It is faster to directly call the ObjC runtime for methods such as retain/release instead of sending a message to those functions.
This patch adds support for converting messages to retain/release/alloc/autorelease to their equivalent runtime calls.
Tests included for the positive case of applying this transformation, negative tests that we ensure we only convert "alloc" to objc_alloc, not "alloc2", and also a driver test to ensure we enable this only for supported runtime versions.
Reviewed by John McCall.
Differential Revision: http://reviews.llvm.org/D14737
llvm-svn: 263607
Summary:
This patch adds base support for codegen of the target directive on the NVPTX device.
Reviewers: ABataev
Differential Revision: http://reviews.llvm.org/D17877
llvm-svn: 263587
In the cross-DSO CFI mode clang emits __cfi_check_fail that handles
errors triggered from other modules with targets in the current
module. With this change, __cfi_check_fail will handle errors for
CFI kinds that are not enabled in the current module as if they
have the trapping behaviour (-fsanitize-trap=...).
This fixes a bug where some combinations of -fsanitize* flags may
result in a link failure due to a missing sanitizer runtime library
for the diagnostic calls in __cfi_check_fail.
llvm-svn: 263578
Summary:
This patch adds base support for codegen of the target directive on the NVPTX device.
Reviewers: ABataev
Differential Revision: http://reviews.llvm.org/D17877
llvm-svn: 263552
This is the companion to an LLVM patch that renamed the function index
data structures and files to use the more general module summary index.
(Recommit after fixing LLVM side to add back missed file)
llvm-svn: 263514
This is the companion to an LLVM patch that renamed the function index
data structures and files to use the more general module summary index.
llvm-svn: 263491
The relative vtable ABI will use a struct rather than an array as the type
of a vtable. LLVM only allows 32-bit integers as struct indices, so we need
to use 32-bit integers to get addresses of address points. In order to keep
the code simple, we might as well do that unconditionally.
It's probably a reasonable implementation limit to support no more than 2
billion virtual functions per class.
This change causes quite a bit of churn in the test suite, so I'm making
it separately.
Differential Revision: http://reviews.llvm.org/D18113
llvm-svn: 263469
This marks virtual function declarations, as well as runtime library functions
__cxa_pure_virtual, __cxa_deleted_virtual and _purecall, as unnamed_addr. This
will allow us to correctly form relative references to them from vtables in
the relative vtable ABI.
Differential Revision: http://reviews.llvm.org/D18071
llvm-svn: 263464
Summary:
This flag is enabled by default in the driver when NDEBUG is set. It
is forwarded on the LLVMContext to discard all value names (but
GlobalValue) for performance purpose.
This an improved version of D18024
Reviewers: echristo, chandlerc
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D18127
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 263394
commit 60d9845f6a037122d9be9a6d92d4de617ef45b04
Author: Mehdi Amini <mehdi.amini@apple.com>
Date: Fri Mar 11 18:48:02 2016 +0000
Fix clang crash: when CodeGenAction is initialized without a
context, use the member and not the parameter
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@263273
91177308-0d34-0410-b5e6-96231b3b80d8
commit af7ce3bf04a75ad5124b457b805df26006bd215b
Author: Mehdi Amini <mehdi.amini@apple.com>
Date: Fri Mar 11 17:32:58 2016 +0000
Fix build: use -> with pointers and not .
Silly typo.
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@263267
91177308-0d34-0410-b5e6-96231b3b80d8
commit d0eea119192814954e7368c77d0dc5a9eeec1fbb
Author: Mehdi Amini <mehdi.amini@apple.com>
Date: Fri Mar 11 17:15:44 2016 +0000
Remove compile time PreserveName switch based on NDEBUG
Summary:
Following r263086, we are now relying on a flag on the Context to
discard Value names in release builds.
Reviewers: chandlerc
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D18024
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@263257
91177308-0d34-0410-b5e6-96231b3b80d8
until we can fix the Release builds.
This reverts commits 263257, 263267, 263273
llvm-svn: 263320
Summary:
Following r263086, we are now relying on a flag on the Context to
discard Value names in release builds.
Reviewers: chandlerc
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D18024
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 263257
Includes new built-in, conversion of built-in to target-independent intrinsic
and update in the header file. Tests are also updated. There is a second part in
the backend for which I will post a separate code-review. BACKEND PART SHOULD BE
COMMITTED FIRST.
Phabricator: http://reviews.llvm.org/D17816
llvm-svn: 263051
OpenMP 4.5 allows privatization of non-static data members in OpenMP
constructs. Patch adds proper codegen support for data members in
'linear' clause
llvm-svn: 263003
This patch provide basic implementation of codegen for teams directive, excluding all clauses except dist_schedule. It also fixes parts of AST reader/writer to enable correct pre-compiled header handling.
http://reviews.llvm.org/D17170
llvm-svn: 262832
This patch provide basic implementation of codegen for teams directive, excluding all clauses except dist_schedule. It also fixes parts of AST reader/writer to enable correct pre-compiled header handling.
http://reviews.llvm.org/D17170
llvm-svn: 262741
Use it to calculate UserLabelPrefix, instead of specifying it (often
incorrectly).
Note that the *actual* user label prefix has always come from the
DataLayout, and is handled within LLVM. The main thing clang's
TargetInfo::UserLabelPrefix did was to set the #define value. Having
these be different from each-other is just silly.
Differential Revision: http://reviews.llvm.org/D17183
llvm-svn: 262737
While pushing switch statements onto the region stack we neglected to
specify their start/end locations. This results in a crash (PR26825) if
we end up in nested macro expansions without enough information to
handle the relevant file exits.
I added a test in switchmacro.c and fixed up a bunch of incorrect CHECK
lines that specify strange end locations for switches.
llvm-svn: 262697
For compatibility with GCC, classify __m64 as SSE.
However, clang is a platform compiler for certain targets; retain our
old behavior on those targets: classify __m64 as integer.
This fixes PR26832.
llvm-svn: 262688
Add code generation support for firstprivate and private clauses of teams on the host. Add extensive regression tests including lambda functions and vla testing.
http://reviews.llvm.org/D17582
llvm-svn: 262663
Summary:
This patch implements the launching of a target region in the presence of a nested teams region, i.e calls tgt_target_teams with the required arguments gathered from the enclosed teams directive.
The actual codegen of the region enclosed by the teams construct will be contributed in a separate patch.
Reviewers: hfinkel, arpith-jacob, kkwli0, carlo.bertolli, ABataev
Subscribers: cfe-commits, caomhin, fraggamuffin
Differential Revision: http://reviews.llvm.org/D17019
llvm-svn: 262625
Add parsing, sema analysis and serialization/deserialization for 'declare reduction' construct.
User-defined reductions are defined as
#pragma omp declare reduction( reduction-identifier : typename-list : combiner ) [initializer ( initializer-expr )]
These custom reductions may be used in 'reduction' clauses of OpenMP constructs. The combiner specifies how partial results can be combined into a single value. The
combiner can use the special variable identifiers omp_in and omp_out that are of the type of the variables being reduced with this reduction-identifier. Each of them will
denote one of the values to be combined before executing the combiner. It is assumed that the special omp_out identifier will refer to the storage that holds the resulting
combined value after executing the combiner.
As the initializer-expr value of a user-defined reduction is not known a priori the initializer-clause can be used to specify one. Then the contents of the initializer-clause
will be used as the initializer for private copies of reduction list items where the omp_priv identifier will refer to the storage to be initialized. The special identifier
omp_orig can also appear in the initializer-clause and it will refer to the storage of the original variable to be reduced.
Differential Revision: http://reviews.llvm.org/D11182
llvm-svn: 262582
OpenMP 4.5 allows to privatize data members of current class in member
functions. Patch adds initial support for privatization of data members
in 'linear' clause, no codegen support.
llvm-svn: 262578
This patch changes cc1 option for PGO profile use from
-fprofile-instr-use=<path> to -fprofile-instrument-use-path=<path>.
-fprofile-instr-use=<path> is now a driver only option.
In addition to decouple the cc1 option from the driver level option, this patch
also enables IR level profile use. cc1 option handling now reads the profile
header and sets CodeGenOpt ProfileUse (valid values are {None, Clang, LLVM}
-- this is a common enum for -fprofile-instrument={}, for the profile
instrumentation), and invoke the pipeline to enable the respective PGO use pass.
Reviewers: silvas, davidxl
Differential Revision: http://reviews.llvm.org/D17737
llvm-svn: 262515
This is like r262493, but for pragma detect_mismatch instead of pragma comment.
The two pragmas have similar behavior, so use the same approach for both.
llvm-svn: 262506
... and register them with CUDA runtime.
This is needed for commonly used cudaMemcpy*() APIs that use address of
host-side shadow to access their counterparts on device side.
Fixes PR26340
Differential Revision: http://reviews.llvm.org/D17779
llvm-svn: 262498
`#pragma comment` was handled by Sema calling a function on ASTConsumer, and
CodeGen then implementing this function and writing things to its output.
Instead, introduce a PragmaCommentDecl AST node and hang one off the
TranslationUnitDecl for every `#pragma comment` line, and then use the regular
serialization machinery. (Since PragmaCommentDecl has codegen relevance, it's
eagerly deserialized.)
http://reviews.llvm.org/D17799
llvm-svn: 262493
Sema allows max values up to 2**28, use unsigned instead of unsiged
short to hold values that large.
Differential Revision: http://reviews.llvm.org/D17248
Patch by Don Hinton!
llvm-svn: 262466
OpenMP 4.5 allows to privatize non-static data members of current class
in non-static member functions. Patch supports codegen for non-static
data members in 'reduction' clauses.
llvm-svn: 262460
We'd lose track of the parent CodeGenFunction, leading us to get
confused with regard to which function a nested finally belonged to.
Differential Revision: http://reviews.llvm.org/D17752
llvm-svn: 262379
This patch expands cc1 option -fprofile-instrument= with a new value: -fprofile-instrument=llvm
which enables IR level PGO instrumentation.
Reviewers: davidxl, silvas
Differential Revision: http://reviews.llvm.org/D17622
llvm-svn: 262239
Summary:
OpenCL access qualifiers are now not only used for image types, refine it to avoid misleading,
Add semacheck for OpenCL access qualifier as well as test caees.
Reviewers: pekka.jaaskelainen, Anastasia, aaron.ballman
Subscribers: aaron.ballman, cfe-commits
Differential Revision: http://reviews.llvm.org/D16040
llvm-svn: 261961
OpenMP 4.5 allows to privatize non-static member decls in non-static
member functions. Patch captures such decls by reference in general (for
bitfields, by value) and then operates with this capture. For bitfields,
at the end of codegen for lastprivates original bitfield is updated with the value of captured copy.
llvm-svn: 261824
Summary:
This is important for e.g. the following case:
void sync() { __syncthreads(); }
void foo() {
do_something();
sync();
do_something_else():
}
Without this change, if the optimizer does not inline sync() (which it
won't because __syncthreads is also marked as noduplicate, for now
anyway), it is free to perform optimizations on sync() that it would not
be able to perform on __syncthreads(), because sync() is not marked as
convergent.
Similarly, we need a notion of convergent calls, since in the case when
we can't statically determine a call's target(s), we need to know
whether it's safe to perform optimizations around the call.
This change is conservative; the optimizer will remove these attrs where
it can, see r260318, r260319.
Reviewers: majnemer
Subscribers: cfe-commits, jhen, echristo, tra
Differential Revision: http://reviews.llvm.org/D17056
llvm-svn: 261779
This patch introduces the -fwhole-program-vtables flag, which enables the
whole-program vtable optimization feature (D16795) in Clang.
Differential Revision: http://reviews.llvm.org/D16821
llvm-svn: 261767
This is mainly for extensibility. Note that fragile category metadata,
metadata for classes and protocols all have a size field.
Initial patch was provided by Greg Parker.
rdar://problem/24804226
llvm-svn: 261756
Fixes PR11517 for SPARC.
On most targets, clang lowers va_arg itself, eschewing the use of the
llvm vaarg instruction. This is necessary (at least for now) as the type
argument to the vaarg instruction cannot represent all the ABI
information that is needed to support complex calling conventions.
However, on targets with a simpler varrags ABIs, the LLVM instruction
can work just fine, and clang can simply lower to it. Unfortunately,
even on such targets, vaarg with a struct argument would fail, because
the default lowering to vaarg was naive: it didn't take into account the
ABI attribute computed by classifyArgumentType. In particular, for the
DefaultABIInfo, structs are supposed to be passed indirectly and so
llvm's vaarg instruction should be emitted with a pointer argument.
Now, vaarg instruction emission is able to use computed ABIArgInfo for
the provided argument type, which allows the default ABI support to work
for structs too.
I haven't touched the EmitVAArg implementation for PPC32_SVR4 or XCore,
although I believe both are now redundant, and could be switched over to
use the default implementation as well.
Differential Revision: http://reviews.llvm.org/D16154
llvm-svn: 261717
Remove an unnecessary workaround introduced in r259975. (NFC)
Now that LLVM r259973 allows replacing a temporary type with another
temporary we can rely on the original implementation.
It is possible for enums to be created as part of
their own declcontext. In this case a FwdDecl will be created
twice. This doesn't cause a problem because both FwdDecls are
entered into the ReplaceMap: finalize() will replace the first
FwdDecl with the second and then replace the second with
complete type.
Thanks to echristo for pointing this out.
# Conflicts:
# lib/CodeGen/CGDebugInfo.cpp
llvm-svn: 261673
Now that LLVM r259973 allows replacing a temporary type with another
temporary we can rely on the original implementation.
It is possible for enums to be created as part of
their own declcontext. In this case a FwdDecl will be created
twice. This doesn't cause a problem because both FwdDecls are
entered into the ReplaceMap: finalize() will replace the first
FwdDecl with the second and then replace the second with
complete type.
Thanks to echristo for pointing this out.
llvm-svn: 261657
This uses the general emitVoidPtrVAArg lowering logic for everything, since
this supports all types, and we don't have any special requirements.
llvm-svn: 261557
We gave a VBTable dllimport storage class and external linkage while
also providing an initializer. An initializer is only valid if the
VBTable has available_externally linkage. Fix this by setting the
linkage to available_externally in situ while generating the
initializer.
This fixes PR26686.
llvm-svn: 261535
This modification applies the following Android commit when we have an
Android environment. This is the sole non-renderscript in the Android repo
commit 9212d4fb30a3ca2f4ee966dd2748c35573d9682c
Author: Tim Murray <timmurray@google.com>
Date: Fri Aug 15 16:00:15 2014 -0700
Update vector calling convention for AArch64.
bug 16846318
Change-Id: I3cfd167758b4bd634d8480ee6ba6bb55d61f82a7
Reviewers: srhines, jyknight
Subscribers: mcrosier, aemerson, rengolin, tberghammer, danalbert, srhines
Differential Revision: http://reviews.llvm.org/D17448
llvm-svn: 261533
It can happen that when we only have 1 more register left in the regsave
area we need to store a value bigger than 1 register and therefore we
go to the overflow area. In this case we have to leave the last slot
in the regsave area unused and keep using overflow area. Do this
by storing a limit value to the used register counter in the overflow block.
Issue diagnosed by and solution tested by Mark Millard!
llvm-svn: 261422
Add support for opencl_unroll_hint attribute from OpenCL v2.0 s6.11.5.
Reusing most of metadata generation from CGLoopInfo helper class.
The code is based on Khronos OpenCL compiler:
https://github.com/KhronosGroup/SPIR/tree/spirv-1.0
Patch by Liu Yaxun (Sam)!
Differential Revision: http://reviews.llvm.org/D16686
llvm-svn: 261350
Llvm module object is shared between CodeGenerator and BackendConsumer,
in both classes it is stored as std::unique_ptr, which is not a good
design solution and can cause double deletion error. Usually it does
not occur because in BackendConsumer::HandleTranslationUnit the
ownership of CodeGenerator over the module is taken away. If however
this method is not called, the module is deleted twice and compiler crashes.
As the module owned by BackendConsumer is always the same as CodeGenerator
has, pointer to llvm module can be removed from BackendGenerator.
Differential Revision: http://reviews.llvm.org/D15450
llvm-svn: 261222
Patch fixes bug with codegen for lastprivate loop counters. Also it may
improve performance for lastprivates calculations in some cases.
llvm-svn: 261209
The assert is triggered because isObjCRetainableType() is called on the
canonicalized return type that has been stripped of the typedefs and
attributes attached to it. To fix this assert, this commit gets the
original return type from CurCodeDecl or BlockInfo and uses it instead
of the canoicalized type.
rdar://problem/24470031
Differential Revision: http://reviews.llvm.org/D16914
llvm-svn: 261151
Summary: Use the new pipeline implemented in D17115
Reviewers: tejohnson
Subscribers: joker.eph, cfe-commits
Differential Revision: http://reviews.llvm.org/D17272
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 261045
Expressions inside 'schedule'|'dist_schedule' clause must be captured in
combined directives to avoid possible crash during codegen. Patch
improves handling of such constructs
llvm-svn: 260954
Sync barrier will be emitted after generation of firstprivate variables
only if one of the firstprivate vars is used in lastprivate clause.
llvm-svn: 260877
Summary:
Unlike other outlined regions in OpenMP, offloading entry points have to have be visible (external linkage) for the device side. Using dots in the names of the entries can be therefore problematic for some toolchains, e.g. NVPTX.
Also the patch drops the column information in the unique name of the entry points. The parsing of directives ignore unknown tokens, preventing several target regions to be implemented in the same line. Therefore, the line information is sufficient for the name to be unique. Also, the preprocessor printer does not preserve the column information, causing offloading-entry detection issues if the host uses an integrated preprocessor and the target doesn't (or vice versa).
Reviewers: hfinkel, arpith-jacob, carlo.bertolli, kkwli0, ABataev
Subscribers: cfe-commits, fraggamuffin, caomhin
Differential Revision: http://reviews.llvm.org/D17179
llvm-svn: 260837
This commit changes the root from "Simple C/C++ TBAA" to "Simple C++ TBAA" for
C++.
The problem is that the type name in the TBAA nodes is generated differently
for C vs C++. If we link an IR file for C with an IR file for C++, since they
have the same root and the type names are different, accesses to the two type
nodes will be considered no-alias, even though the two type nodes are from
the same type in a header file.
The fix is to use different roots for C and C++. Types from C will be treated
conservatively in respect to types from C++.
Follow-up commits will change the C root to "Simple C TBAA" plus some mangling
change for C types to make it a little more aggresive.
llvm-svn: 260567
This reverts commit r260449.
We would supress our emission of vftable definitions if we thought
another translation unit would provide the definition because we saw an
explicit instantiation declaration. This is not the case with
dllimport, we want to synthesize a definition of the vftable regardless.
This fixes PR26569.
llvm-svn: 260548
The current macho linker just copies symbols in section datacoal_nt to
section data, so it doesn't really matter whether or not section
"datacoal_nt" is attached to the global variable.
This is a follow-up to r250370, which made changes in llvm to stop
putting functions and data in the *coal* sections.
rdar://problem/24528611
llvm-svn: 260496
OMPCapturedExprDecl allows caopturing not only of fielddecls, but also
other expressions. It also allows to simplify codegen for several
clauses.
llvm-svn: 260492
Summary:
We can't do the right thing, since there's no right thing to do, but at
least we can not crash the compiler.
Reviewers: majnemer, rnk
Subscribers: cfe-commits, jhen, tra
Differential Revision: http://reviews.llvm.org/D17103
llvm-svn: 260479
Referencing a dllimported vtable is impossible in a constexpr
constructor. It would be friendlier to C++ programmers if we
synthesized a copy of the vftable which referenced imported virtual
functions. This would let us initialize the object in a way which
preserves both the intent to import functionality from another DLL while
also making constexpr work.
Differential Revision: http://reviews.llvm.org/D17061
llvm-svn: 260388
being called with the wrong size: convert CGFunctionInfo to use TrailingObjects
and ask TrailingObjects to provide a working 'operator delete' for us.
llvm-svn: 260181
When handling 'if' statements, we crash if the condition and the consequent
branch are spanned by a single macro expansion.
The crash occurs because of a sanity 'reset' in popRegions(): if an expansion
exactly spans an entire region, we set MostRecentLocation to the start of the
expansion (its 'include location'). This ensures we don't handleFileExit()
ourselves out of the expansion before we're done processing all of the regions
within it. This is tested in test/CoverageMapping/macro-expressions.c.
This causes a problem when an expansion spans both the condition and the
consequent branch of an 'if' statement. MostRecentLocation is updated to the
start of the 'if' statement in popRegions(), so the file for the expansion
isn't exited by the time we're done handling the statement. We then crash with
'fatal: File exit not handled before popRegions'.
The fix for this is to detect these kinds of expansions, and conservatively
update MostRecentLocation to the end of expansion region containing the
conditional. I've added tests to make sure we don't have the same problem with
other kinds of statements.
rdar://problem/23630316
Differential Revision: http://reviews.llvm.org/D16934
llvm-svn: 260129
OpenMP 4.5 introduces privatization of non-static data members of current class in non-static member functions.
To correctly handle such kind of privatization a new (pseudo)declaration VarDecl-based node is added. It allows to reuse an existing code for capturing variables in Lambdas/Block/Captured blocks of code for correct privatization and codegen.
llvm-svn: 260077
A dllimport'd vtable always points one past the RTTI data, this means
that the initializer will never end up referencing the data. Our
emission is a harmless waste.
llvm-svn: 260062
Summary:
Different devices may in some cases require different code generation schemes in order to implement OpenMP. This is required not only for performance reasons, but also because it may not be possible to have the current (default) implementation working for these devices. E.g. GPU's cannot implement the same scheme a target such as powerpc or x86b would use, in the sense that it does not have the ability to fork threads, instead all the threads are always executing and need to be managed by the implementation.
This patch proposes a reorganization of the code in the OpenMP code generation to pave the way to have specialized implementation of OpenMP support. More than a "real" patch this is more a request for comments in order to understand if what is proposed is acceptable or if there are better/easier ways to do it.
In this patch part of the common OpenMP codegen infrastructure is moved to a new file under a new namespace (CGOpenMPCommon) so it can be shared between the default implementation and the specialized one. When CGOpenMPRuntime is created, an attempt to select a specialized implementation is done.
In the patch a specialization for nvptx targets is done which currently checks if the target is an OpenMP device and trap if it is not.
Let me know comments suggestions you may have.
Reviewers: hfinkel, carlo.bertolli, arpith-jacob, kkwli0, ABataev
Subscribers: Hahnfeld, cfe-commits, fraggamuffin, caomhin, jholewinski
Differential Revision: http://reviews.llvm.org/D16784
llvm-svn: 259977
It is possible for enums to be created as part of their own
declcontext. We need to cache a placeholder to avoid the type being
created twice before hitting the cache.
<rdar://problem/24493203>
llvm-svn: 259975
Because the Decl is explicitly passed as nullptr further up the call chain, it
is possible to invoke isa on a nullptr, which will assert. Guard against the
nullptr.
Take the opportunity to reuse the helper method rather than re-implementing this
logic.
llvm-svn: 259874
This patch changes cc1 option -fprofile-instr-generate to an enum option
-fprofile-instrument={clang|none}. It also changes cc1 options
-fprofile-instr-generate= to -fprofile-instrument-path=.
The driver level option -fprofile-instr-generate and -fprofile-instr-generate=
remain intact. This change will pave the way to integrate new PGO
instrumentation in IR level.
Review: http://reviews.llvm.org/D16730
llvm-svn: 259811
Codegen for array sections/array subscripts worked only for expressions with arrays as base. Patch fixes codegen for bases with pointer/reference types.
llvm-svn: 259776
Avoid crashing when printing diagnostics for vtable-related CFI
errors. In diagnostic mode, the frontend does an additional check of
the vtable pointer against the set of all known vtable addresses and
lets the runtime handler know if it is safe to inspect the vtable.
http://reviews.llvm.org/D16823
llvm-svn: 259716
Summary:
This patch adds parsing + sema for the target parallel for directive along with testcases.
Reviewers: ABataev
Differential Revision: http://reviews.llvm.org/D16759
llvm-svn: 259654
C++ ABI library for the same set of types for which we expect the C++ ABI
library to provide the RTTI.
Specifically:
1) __int128 and unsigned __int128 are now emitted into the ABI library. We
always expected them to be there but never actually made sure to emit them.
2) Do not expect OpenCL builtin types to have type info in the C++ ABI library.
Neither libc++abi nor libstdc++ puts them there when built with either GCC
or Clang.
This matches GCC's behavior.
llvm-svn: 259616
the details of the bug, but avoiding overloading llvm::cast with another
function template sidesteps it.
See gcc.gnu.org/PR58022 for details of the bug, and llvm.org/PR26362 for more
backgound on how it manifested in Clang. Patch by Igor Sugak!
llvm-svn: 259598
In general CUDA does not allow dynamic initialization of
global device-side variables. One exception is that CUDA allows
records with empty constructors as described in section E2.2.1 of
CUDA 7.5 Programming guide.
This patch applies initializer checks for all device-side variables.
Empty constructors are accepted, but no code is generated for them.
Differential Revision: http://reviews.llvm.org/D15305
llvm-svn: 259592
Re-commit of r258950 after fixing layering violation.
The related LLVM patch adds a backend diagnostic type for reporting
unsupported features, this adds a printer for them to clang.
In the case where debug location information is not available, I've
changed the printer to report the location as the first line of the
function, rather than the closing brace, as the latter does not give the
user any information. This also affects optimisation remarks.
llvm-svn: 259499
The list of class properties is saved in
Old ABI: protocol->ext->class_properties (protocol->ext->size will be updated)
New ABI: protocol->class_properties (protocol->size will be updated)
rdar://23891898
llvm-svn: 259268
The list of class properties is saved in
Old ABI: category->class_properties (category->size will be updated as well)
New ABI: category->class_properties (a flag in objc_image_info to indicate
whether or not the list of class properties is present)
rdar://23891898
llvm-svn: 259267
Summary:
This is necessary to prevent llvm from generating stacksave intrinsics
around this alloca. NVVM doesn't have a stack, and we don't handle said
intrinsics.
Reviewers: rnk, echristo
Subscribers: cfe-commits, jhen, tra
Differential Revision: http://reviews.llvm.org/D16664
llvm-svn: 259122
Frontend can emit errors when releaseing the Builder. If there are errors before
or when releasing the Builder, we reset the module to stop here before invoking
the backend.
Before this commit, clang will continue to invoke the backend and backend can
crash.
Differential Revision: http://reviews.llvm.org/D16564
llvm-svn: 259116