No functional changes intended.
Renamed EmitOMPSimdLoop to EmitOMPInnerLoop, I plan to re-use
it to emit inner loop in the future patches for CodeGen of the
worksharing loop directives (omp for, omp for simd).
llvm-svn: 219195
By leaving these members out of the member list, we avoid them being
emitted into type unit definitions - while still allowing the
definition/declaration to be injected into the compile unit as expected.
llvm-svn: 219101
By leaving these members out of the member list, we avoid them being
emitted into type unit definitions - while still allowing the
definition/declaration to be injected into the compile unit as expected.
llvm-svn: 219100
Summary:
This add support for the C++11 feature, thread_local global variables.
The ABI Clang implements is an improvement of the MSVC ABI. Sadly,
further improvements could be made but not without sacrificing ABI
compatibility.
The feature is implemented as follows:
- All thread_local initialization routines are pointed to from the
.CRT$XDU section.
- All non-weak thread_local variables have their initialization routines
call from a single function instead of getting their own .CRT$XDU
section entry. This is done to open up optimization opportunities to
the compiler.
- All weak thread_local variables have their own .CRT$XDU section entry.
This entry is in a COMDAT with the global variable it is initializing;
this ensures that we will initialize the global exactly once.
- Destructors are registered in the initialization function using
__tlregdtor.
Differential Revision: http://reviews.llvm.org/D5597
llvm-svn: 219074
We already add the align parameter attribute for function parameters that have
the align_value attribute (or those with a typedef type having that attribute),
which is an important special case, but does not handle pointers with value
alignment assumptions that come into scope in any other way. To handle the
general case, emit an @llvm.assume-based alignment assumption whenever we load
the pointer-typed lvalue of an align_value-attributed variable (except for
function parameters, which we already deal with at entry).
I'll also note that this is more general than Intel's described support in:
https://software.intel.com/en-us/articles/data-alignment-to-assist-vectorization
which states that the compiler inserts __assume_aligned directives in response
to align_value-attributed variables only for function parameters and for the
initializers of local variables. I think that we can make the optimizer deal
with this more-general scheme (which could lead to a lot of calls to
@llvm.assume inside of loop bodies, for example), but if not, I'll rework this
to be less aggressive.
llvm-svn: 219052
When the aligned clause of an OpenMP simd pragma is not provided with an
explicit alignment, a target-dependent default must be used. This adds such a
default of PPC targets.
This will become slightly more complicated when BG/Q support is added (because
then it will depend on the type). For now, 16 is a correct value for all
systems, and covers Altivec and VSX vectors.
llvm-svn: 218994
This adds support for the align_value attribute. This attribute is supported by
Intel's compiler (versions 14.0+), and several of my HPC users have requested
support in Clang. It specifies an alignment assumption on the values to which a
pointer points, and is used by numerical libraries to encourage efficient
generation of vector code.
Of course, we already have an aligned attribute that can specify enhanced
alignment for a type, so why is this additional attribute important? The
problem is that if you want to specify that an input array of T is, say,
64-byte aligned, you could try this:
typedef double aligned_double attribute((aligned(64)));
void foo(aligned_double *P) {
double x = P[0]; // This is fine.
double y = P[1]; // What alignment did those doubles have again?
}
the access here to P[1] causes problems. P was specified as a pointer to type
aligned_double, and any object of type aligned_double must be 64-byte aligned.
But if P[0] is 64-byte aligned, then P[1] cannot be, and this access causes
undefined behavior. Getting round this problem requires a lot of awkward
casting and hand-unrolling of loops, all of which is bad.
With the align_value attribute, we can accomplish what we'd like in a well
defined way:
typedef double *aligned_double_ptr attribute((align_value(64)));
void foo(aligned_double_ptr P) {
double x = P[0]; // This is fine.
double y = P[1]; // This is fine too.
}
This attribute does not create a new type (and so it not part of the type
system), and so will only "propagate" through templates, auto, etc. by
optimizer deduction after inlining. This seems consistent with Intel's
implementation (thanks to Alexey for confirming the various Intel-compiler
behaviors).
As a final note, I would have chosen to call this aligned_value, not
align_value, for better naming consistency with the aligned attribute, but I
think it would be more useful to users to adopt Intel's name.
llvm-svn: 218910
Prior to GCC 4.4, __sync_fetch_and_nand was implemented as:
{ tmp = *ptr; *ptr = ~tmp & value; return tmp; }
but this was changed in GCC 4.4 to be:
{ tmp = *ptr; *ptr = ~(tmp & value); return tmp; }
in response to this change, support for sync_fetch_and_nand (and
sync_nand_and_fetch) was removed in r99522 in order to avoid miscompiling code
depending on the old semantics. However, at this point:
1. Many years have passed, and the amount of code relying on the old
semantics is likely smaller.
2. Through the work of many contributors, all LLVM backends have been updated
such that "atomicrmw nand" provides the newer GCC 4.4+ semantics (this process
was complete July of 2014 (added to the release notes in r212635).
3. The lack of this intrinsic is now a needless impediment to porting codes
from GCC to Clang (I've now seen several examples of this).
It is true, however, that we still set GNUC_MINOR to 2 (corresponding to GCC
4.2). To compensate for this, and to address the original concern regarding
code relying on the old semantics, I've added a warning that specifically
details the fact that the semantics have changed and that we provide the newer
semantics.
Fixes PR8842.
llvm-svn: 218905
Summary:
Currently, with struct my_struct { int x; method_ptr y; };
a call to foo(my_struct s) may end up dropping the last 4 bytes
of the method pointer for x86_64 NaCl and x32.
When checking Has64BitPointers, also check if the method pointer
straddles an eightbyte boundary and classify Hi as well as Lo if needed.
Test Plan: test/CodeGenCXX/x86_64-arguments-nacl-x32.cpp
Reviewers: dschuff, pavel.v.chupin
Subscribers: jfb
Differential Revision: http://reviews.llvm.org/D5555
llvm-svn: 218889
Complex address expressions are no longer part of DIVariable, but
rather an extra argument to the debug intrinsics.
http://reviews.llvm.org/D4919
rdar://problem/17994491
llvm-svn: 218788
Complex address expressions are no longer part of DIVariable, but
rather an extra argument to the debug intrinsics.
http://reviews.llvm.org/D4919
rdar://problem/17994491
llvm-svn: 218777
This patch implements collapsing of the loops (in particular, in
presense of clause 'collapse'). It calculates number of iterations N
and expressions nesessary to calculate the nested loops counters
values based on new iteration variable (that goes from 0 to N-1)
in Sema. It also adds Codegen for 'omp simd', which uses
(and tests) this feature.
Differential Revision: http://reviews.llvm.org/D5184
llvm-svn: 218743
When generating coverage regions, we were doing a linear search
through the existing regions in order to try to merge related ones.
Most of the time this would find what it was looking for in a small
number of steps and it wasn't a big deal, but in cases with many
regions and few mergeable ones this leads to an absurd compile time
regression.
This changes the coverage mapping logic to do a single sort and then
merge as we go, which is a bit simpler and about 100 times faster.
I've also added FIXMEs on a couple of behaviours that seem a little
suspect, while keeping them behaving as they were - I'll look into
these soon.
The test changes here are mostly tedious reorganization, because the
ordering of regions we output has become slightly (but not completely)
more consistent from the almost completely arbitrary ordering we got
before.
llvm-svn: 218738
This struct has some members that are accessed directly and others
that need accessors, but it's all just public. This is confusing, so
I've changed it to a class and made more members private.
llvm-svn: 218737
This is the last piece of CGCall code that had implicit assumptions about
the order in which Clang arguments are translated to LLVM ones (positions
of inalloca argument, sret, this, padding arguments etc.) Now all of
this data is encapsulated in ClangToLLVMArgsMapping. If this information
would be required somewhere else, this class can be moved to a separate
header or pulled into CGFunctionInfo.
llvm-svn: 218634
Add a method to calculate the number of arguments given QualType
expnads to. Use this method in ClangToLLVMArgMapping calculation.
This number may be cached in CodeGenTypes for efficiency, if needed.
llvm-svn: 218623
Hoist the logic which determines the way QualType is expanded
into a separate method. Remove a bunch of copy-paste and simplify
getTypesFromArgs() / ExpandTypeFromArgs() / ExpandTypeToArgs() methods.
llvm-svn: 218615
Fixes incorrect codegen when devirtualization is aborted due to covariant return types.
Differential Revision: http://reviews.llvm.org/D5321
llvm-svn: 218602
Clang uses two types to talk about a C++ class, the
NonVirtualBaseLLVMType and the LLVMType. Previously, we would allow one
of these to be packed and the other not.
This is problematic. If both don't agree on a common subset of fields,
then routines like getLLVMFieldNo will point to the wrong field. Solve
this by copying the 'packed'-ness of the complete type to the
non-virtual subobject. For this to work, we need to take into account
the non-virtual subobject's size and alignment when we are computing the
layout of the complete object.
This fixes PR21089.
llvm-svn: 218577
(clang crashed in CodeGen in llvm::Module::getNamedValue on
thread_local std::unique_ptr<int>).
Differential Revision: http://reviews.llvm.org/D5353
llvm-svn: 218503
In addition to __builtin_assume_aligned, GCC also supports an assume_aligned
attribute which specifies the alignment (and optional offset) of a function's
return value. Here we implement support for the assume_aligned attribute by making
use of the @llvm.assume intrinsic.
llvm-svn: 218500
AFAICT the semantics of frem match libm's fmod.
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 218488
Most of the debug info emission is powered essentially from function
definitions - if we emit the definition of a function, we emit the types
of its parameters, the members of those types, and so on and so forth.
For types that aren't referenced even indirectly due to this - because
they only appear in temporary expressions, not in any named variable, we
need to explicitly emit/add them as is done here. This is not the only
case of such code, and we might want to consider handling "void
func(void*); ... func(new T());" (currently debug info for T is not
emitted) at some point, though GCC doesn't. There's a much broader
solution to these issues, but it's a lot of work for possibly marginal
gain (but might help us improve the default -fno-standalone-debug
behavior to be even more aggressive in some places). See the original
review thread for more details.
Patch by jyoti allur (jyoti.yalamanchili@gmail.com)!
Differential Revision: http://reviews.llvm.org/D2498
llvm-svn: 218390
On further investigation, COMDATs should work with .ctors, and the issue
I was hitting probably reproduces with .init_array.
This reverts commit r218287.
llvm-svn: 218313
In particular, pre-.init_array ELF uses the .ctors section mechanism.
MinGW COFF also uses .ctors, now that I think about it. Therefore,
restrict this optimization to the two platforms that are currently known
to work: ELF with .init_array and COFF with .CRT$XCU.
llvm-svn: 218287
Summary:
Vectors are normally 16-byte aligned, however the O32 ABI enforces a
maximum alignment of 8-bytes since the base of the stack is 8-byte aligned.
Previously, this was enforced on the caller side, but not on the callee side.
This fixes the output of OpenCL's printf when given vectors.
Reviewers: atanasyan
Reviewed By: atanasyan
Subscribers: llvm-commits, pekka.jaaskelainen
Differential Revision: http://reviews.llvm.org/D5433
llvm-svn: 218248
This patch adds codegen for constructs:
#pragma omp critical [name]
<body>
It generates global variable ".gomp_critical_user_[name].var" of type int32[8]. Then it generates library call "kmpc_critical(loc, gtid, .gomp_critical_user_[name].var)", code for <body> statement and final call "kmpc_end_critical(loc, gtid, .gomp_critical_user_[name].var)".
Differential Revision: http://reviews.llvm.org/D5202
llvm-svn: 218239
This patch makes sure that the dllexport attribute is transferred to the alias when such alias is created. It only affects the Itanium ABI because for the MSVC ABI a workaround is in place to not generate aliases of dllexport ctors/dtors.
A new CodeGenModule function is provided, CodeGenModule::setAliasAttributes, to factor the code for transferring attributes to aliases.
llvm-svn: 218159
Clang can already handle
-------------------------------------------
struct S {
static const int x;
};
template<typename T> struct U {
static const int k;
};
template<typename T> const int U<T>::k = T::x;
const int S::x = 42;
extern const int *f();
const int *g() { return &U<S>::k; }
int main() {
return *f() + U<S>::k;
}
const int *f() { return &U<S>::k; }
-------------------------------------------
since r217264 which puts the .inint_array section in the same COMDAT
as the variable.
This patch allows the linker to more easily delete some dead code and data by
putting the guard variable and init function in the same COMDAT.
This is a fixed version of r218089.
llvm-svn: 218141
The field is defined as:
If the third field is present, non-null, and points to a global variable or function, the initializer function will only run if the associated data from the current module is not discarded.
And without COMDATs we can't implement that.
llvm-svn: 218097
Clang can already handle
-------------------------------------------
struct S {
static const int x;
};
template<typename T> struct U {
static const int k;
};
template<typename T> const int U<T>::k = T::x;
const int S::x = 42;
extern const int *f();
const int *g() { return &U<S>::k; }
int main() {
return *f() + U<S>::k;
}
const int *f() { return &U<S>::k; }
-------------------------------------------
since r217264 which puts the .inint_array section in the same COMDAT
as the variable.
This patch allows the linker to more easily delete some dead code and data by
putting the guard variable and init function in the same COMDAT.
llvm-svn: 218089
CodeGen would try to come up with an LLVM IR type for a pointer to
member type on the way to forming an LLVM IR type for a pointer to
pointer to member type.
However, if the pointer to member representation has not been locked in yet,
we would not be able to come up with a pointer to member IR type.
In these cases, make the pointer to member type an incomplete type.
This will make the pointer to pointer to member type a pointer to an
incomplete type. If the class eventually obtains an inheritance model,
we will make the pointer to member type represent the actual inheritance
model.
Differential Revision: http://reviews.llvm.org/D5373
llvm-svn: 218084
There are situations when clang knows that the C1 and C2 constructors
or the D1 and D2 destructors are identical. We already optimize some
of these cases, but cannot optimize it when the GlobalValue is
weak_odr.
The problem with weak_odr is that an old TU seeing the same code will
have a C1 and a C2 comdat with the corresponding symbols. We cannot
suddenly start putting the C2 symbol in the C1 comdat as we cannot
guarantee that the linker will not pick a .o with only C1 in it.
The solution implemented by GCC is to expand the ABI to have a comdat
whose name uses a C5/D5 suffix and always has both symbols. That is
what this patch implements.
llvm-svn: 217874
This adds a flag called -fseh-exceptions that uses the native Windows
.pdata and .xdata unwind mechanism to throw exceptions. The other EH
possibilities are DWARF and SJLJ exceptions.
Patch by Martell Malone!
Reviewed By: asl, rnk
Differential Revision: http://reviews.llvm.org/D3419
llvm-svn: 217790
Deleted virtual functions get _purecall inserted into the vftable.
Earlier CTPs would simply stick nullptr in there.
N.B. MSVC can't handle deleted virtual functions which require return
adjusting thunks, they give an error that a deleted function couldn't be
called inside of a compiler generated function. We get this correct by
making the thunk have a __purecall entry as well.
llvm-svn: 217654
We assumed that the incoming this argument would be the last argument.
However, this is not true under the MS ABI.
This fixes PR20897.
llvm-svn: 217642
This prevents initializers for comdat-folded globals from running multiple times.
Differential Revision: http://reviews.llvm.org/D5281
llvm-svn: 217534
Pointer-sized alignment is sufficient as we only ever read single values
from the table. Otherwise we'd bump the alignment to 16 bytes in the
backend if the vtable is larger than 16 bytes. This is great for
structures that are accessed with vector instructions or copied around, but
that's simply not the case for vtables.
Shrinks the data segment of a Release x86_64 clang by 0.3%. The wins are
larger for i386 and code bases that use vtables more often than we do.
This matches the behavior of GCC 5.
llvm-svn: 217495
Summary:
This patch implements a new UBSan check, which verifies
that function arguments declared to be nonnull with __attribute__((nonnull))
are actually nonnull in runtime.
To implement this check, we pass FunctionDecl to CodeGenFunction::EmitCallArgs
(where applicable) and if function declaration has nonnull attribute specified
for a certain formal parameter, we compare the corresponding RValue to null as
soon as it's calculated.
Test Plan: regression test suite
Reviewers: rsmith
Reviewed By: rsmith
Subscribers: cfe-commits, rnk
Differential Revision: http://reviews.llvm.org/D5082
llvm-svn: 217389
There were code paths that are duplicated for constructors and destructors just
because we have both CXXCtorType and CXXDtorsTypes.
This patch introduces an unified enum and reduces code deplication a bit.
llvm-svn: 217383
This makes use of the recently-added @llvm.assume intrinsic to implement a
__builtin_assume(bool) intrinsic (to provide additional information to the
optimizer). This hooks up __assume in MS-compatibility mode to mirror
__builtin_assume (the semantics have been intentionally kept compatible), and
implements GCC's __builtin_assume_aligned as assume((p - o) & mask == 0). LLVM
now contains special logic to deal with assumptions of this form.
llvm-svn: 217349
This patch adds support for the 32bit numeric max/min and directed round-to-integral NEON intrinsics that were added as part of v8, along with unit tests.
Patch by Graham Hunter!
llvm-svn: 217242
For naked functions with parameters, Clang would still emit stores in the prologue
that would clobber the stack, because LLVM doesn't set up a stack frame. (This
shows up in -O0 compiles, because the stores are optimized away otherwise.)
For example:
__attribute__((naked)) int f(int x) {
asm("movl $42, %eax");
asm("retl");
}
Would result in:
_Z1fi:
movl 12(%esp), %eax
movl %eax, (%esp) <--- Oops.
movl $42, %eax
retl
Differential Revision: http://reviews.llvm.org/D5183
llvm-svn: 217198
If control falls off the end of a function after an __asm block, MSVC
assumes that the inline assembly filled the EAX and possibly EDX
registers with an appropriate return value. This functionality is used
in inline functions returning 64-bit integers in system headers, so we
need some amount of compatibility.
This is implemented in Clang by adding extra output constraints to every
inline asm block, and storing the resulting output registers into the
return value slot. If we see an asm block somewhere in the function
body, we emit a normal epilogue instead of marking the end of the
function with a return type unreachable.
Normal returns in functions not using this functionality will overwrite
the return value slot, and in most cases LLVM should be able to
eliminate the dead stores.
Fixes PR17201.
Reviewed By: majnemer
Differential Revision: http://reviews.llvm.org/D5177
llvm-svn: 217187
Summary:
This allows us to easily find them in the backend after the aggregates have
been lowered to other types. This is important on big-endian targets using
the N32/N64 ABI's since these ABI's must shift small structures into the
upper bits of the register.
Reviewers: atanasyan
Reviewed By: atanasyan
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5005
llvm-svn: 217160
Summary:
They are returned indirectly which causes the other arguments to move to
the next argument slot.
With this, utils/ABITest does not discover any failing cases in the first
500 attempts on big/little endian for O32. Previously some of these failed.
Also tested N32/N64 little endian (big endian has other known issues) with
no issues.
Reviewers: atanasyan
Reviewed By: atanasyan
Subscribers: atanasyan, cfe-commits
Differential Revision: http://reviews.llvm.org/D4811
llvm-svn: 217147
Using the intrinsic allows the SelectionDAGBuilder to turn this call
into the FABS Node and also the intrinsic is something the vectorizer knows
how to vectorize.
This patch also sets the readnone attribute on this call, which should
enable additional optmizations.
llvm-svn: 217042
This avoids encoding information about the function prototype into the
thunk at the cost of some function prototype bitcast gymnastics.
Fixes PR20653.
llvm-svn: 216782
In theory, it'd be nice if we could move to a case where all buried
pointers were buried via unique_ptr to demonstrate that the program had
finished with the value (that we could really have cleanly deallocated
it) but instead chose to bury it.
I think the main reason that's not possible right now is the various
IntrusiveRefCntPtrs in the Frontend, sharing ownership for a variety of
compiler bits (see the various similar
"CompilerInstance::releaseAndLeak*" functions). I have yet to figure out
their correct ownership semantics - but perhaps, even if the
intrusiveness can be removed, the shared ownership may yet remain and
that would lead to a non-unique burying as is there today. (though we
could model that a little better - by passing in a shared_ptr, etc -
rather than needing the two step that's currently used in those other
releaseAndLeak* functions)
This might be a bit more robust if BuryPointer took the boolean:
BuryPointer(bool, unique_ptr<T>)
and the choice to bury was made internally - that way, even when
DisableFree was not set, the unique_ptr would still be null in the
caller and there'd be no chance of accidentally having a different
codepath where the value is used after burial in !DisableFree, but it
becomes null only in DisableFree, etc...
llvm-svn: 216742
Previously, EnterStructPointerForCoercedAccess used Alloc size when determining how to convert. This was problematic, because there were situations were the alloc size was larger than the store size. For example, if the first element of a structure were i24 and the destination type were i32, the old code would generate a GEP and a load i24. The code should compare store sizes to ensure the whole object is loaded. I have attached a test case.
This patch modifies the output of arm64-be-bitfield.c test case, but the new IR seems to be equivalent, and after -O3, the compiler generates identical ARM assembly. (asr x0, x0, #54)
Patch by Thomas Jablin!
llvm-svn: 216722
Summary:
We did a great job getting this wrong:
- We messed up which LLVM IR types to use for arguments and return values.
The optimized libcalls use integer types for values.
Clang attempted to use the IR type which corresponds to the value
passed in instead of using an appropriately sized integer type. This
would result in violations of the ABI for, as an example, floating
point types.
- We didn't bother recording the result of the atomic libcall in the
destination memory.
Instead, call the functions with arguments matching the type of the
libcall prototype's parameters.
This fixes PR20780.
Differential Revision: http://reviews.llvm.org/D5098
llvm-svn: 216714
Summary:
The current implementation of asan cookie is incorrect:
we add nosanitize metadata to the cookie load, but the metadata may be lost
and we will instrument the load from poisoned memory.
This change replaces the load with a call to __asan_load_cxx_array_cookie (r216692)
Reviewers: rsmith
Reviewed By: rsmith
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D5111
llvm-svn: 216702
For the following code:
__declspec(dllimport) int f(int x);
int user(int x) {
return f(x);
}
int f(int x) { return 1; }
Clang will drop the dllimport attribute in the AST, but CodeGen would have
already put it on the LLVM::Function, and that would never get updated.
(The same thing happens for global variables.)
This makes Clang check dropped DLL attribute case each time the LLVM object
is referenced.
This isn't perfect, because we will still get it wrong if the function is
never referenced by codegen after the attribute is dropped, but this handles
the common cases and makes us not fail in the verifier.
llvm-svn: 216699
linkage related to generation of OBJC_SELECTOR_REFERENCES symbol
needed in generation of call to 'super' in a class method.
// rdar://18150301
llvm-svn: 216676