operators where one type is a C complex type, and to emit both the
efficient and correct implementation for complex arithmetic according to
C11 Annex G using this extra information.
For both multiply and divide the old code was writing a long-hand
reduced version of the math without any of the special handling of inf
and NaN recommended by the standard here. Instead of putting more
complexity here, this change does what GCC does which is to emit
a libcall for the fully general case.
However, the old code also failed to do the proper minimization of the
set of operations when there was a mixed complex and real operation. In
those cases, C provides a spec for much more minimal operations that are
valid. Clang now emits the exact suggested operations. This change isn't
*just* about performance though, without minimizing these operations, we
again lose the correct handling of infinities and NaNs. It is critical
that this happen in the frontend based on assymetric type operands to
complex math operations.
The performance implications of this change aren't trivial either. I've
run a set of benchmarks in Eigen, an open source mathematics library
that makes heavy use of complex. While a few have slowed down due to the
libcall being introduce, most sped up and some by a huge amount: up to
100% and 140%.
In order to make all of this work, also match the algorithm in the
constant evaluator to the one in the runtime library. Currently it is
a broken port of the simplifications from C's Annex G to the long-hand
formulation of the algorithm.
Splitting this patch up is very hard because none of this works without
the AST change to preserve non-complex operands. Sorry for the enormous
change.
Follow-up changes will include support for sinking the libcalls onto
cold paths in common cases and fastmath improvements to allow more
aggressive backend folding.
Differential Revision: http://reviews.llvm.org/D5698
llvm-svn: 219557
Make it possible to pass NULL through variadic functions on 64-bit
Windows targets. The Visual C++ headers define NULL to 0, when they
should define it to 0LL on Win64 so that NULL is a pointer-sized
integer.
Fixes PR20949.
Reviewers: thakis, rsmith
Differential Revision: http://reviews.llvm.org/D5480
llvm-svn: 219456
We already add the align parameter attribute for function parameters that have
the align_value attribute (or those with a typedef type having that attribute),
which is an important special case, but does not handle pointers with value
alignment assumptions that come into scope in any other way. To handle the
general case, emit an @llvm.assume-based alignment assumption whenever we load
the pointer-typed lvalue of an align_value-attributed variable (except for
function parameters, which we already deal with at entry).
I'll also note that this is more general than Intel's described support in:
https://software.intel.com/en-us/articles/data-alignment-to-assist-vectorization
which states that the compiler inserts __assume_aligned directives in response
to align_value-attributed variables only for function parameters and for the
initializers of local variables. I think that we can make the optimizer deal
with this more-general scheme (which could lead to a lot of calls to
@llvm.assume inside of loop bodies, for example), but if not, I'll rework this
to be less aggressive.
llvm-svn: 219052
This reverts commit r218917, effectively reapplying r218913. Original
commit message follows.
--
Update debug info testcases for an LLVM metadata schema change to fold
metadata constant operands into a single `MDString`.
Part of PR17891.
llvm-svn: 219011
This test includes stdint.h (via stdatomic.h), which might include system
headers (and that might not work, depending on the system configuration).
Attempting to fix llvm-clang-lld-x86_64-debian-fast.
llvm-svn: 218960
Adds a Clang-specific implementation of C11's stdatomic.h header. On systems,
such as FreeBSD, where a stdatomic.h header is already provided, we defer to
that header instead (using our __has_include_next technology). Otherwise, we
provide an implementation in terms of our __c11_atomic_* intrinsics (that were
created for this purpose).
C11 7.1.4p1 requires function declarations for atomic_thread_fence,
atomic_signal_fence, atomic_flag_test_and_set,
atomic_flag_test_and_set_explicit, and atomic_flag_clear, and requires that
they have external linkage. Accordingly, we provide these declarations, but if
a user elides the shadowing macros and uses them, then they must have a libc
(or similar) that actually provides definitions.
atomic_flag is implemented using _Bool as the underlying type. This is
consistent with the implementation provided by FreeBSD and also GCC 4.9 (at
least when __GCC_ATOMIC_TEST_AND_SET_TRUEVAL == 1).
Patch by Richard Smith (rebased and slightly edited by me -- Richard said I
should drive at this point).
llvm-svn: 218957
Update debug info testcases for an LLVM metadata schema change to fold
metadata constant operands into a single `MDString`.
Part of PR17891.
llvm-svn: 218913
This adds support for the align_value attribute. This attribute is supported by
Intel's compiler (versions 14.0+), and several of my HPC users have requested
support in Clang. It specifies an alignment assumption on the values to which a
pointer points, and is used by numerical libraries to encourage efficient
generation of vector code.
Of course, we already have an aligned attribute that can specify enhanced
alignment for a type, so why is this additional attribute important? The
problem is that if you want to specify that an input array of T is, say,
64-byte aligned, you could try this:
typedef double aligned_double attribute((aligned(64)));
void foo(aligned_double *P) {
double x = P[0]; // This is fine.
double y = P[1]; // What alignment did those doubles have again?
}
the access here to P[1] causes problems. P was specified as a pointer to type
aligned_double, and any object of type aligned_double must be 64-byte aligned.
But if P[0] is 64-byte aligned, then P[1] cannot be, and this access causes
undefined behavior. Getting round this problem requires a lot of awkward
casting and hand-unrolling of loops, all of which is bad.
With the align_value attribute, we can accomplish what we'd like in a well
defined way:
typedef double *aligned_double_ptr attribute((align_value(64)));
void foo(aligned_double_ptr P) {
double x = P[0]; // This is fine.
double y = P[1]; // This is fine too.
}
This attribute does not create a new type (and so it not part of the type
system), and so will only "propagate" through templates, auto, etc. by
optimizer deduction after inlining. This seems consistent with Intel's
implementation (thanks to Alexey for confirming the various Intel-compiler
behaviors).
As a final note, I would have chosen to call this aligned_value, not
align_value, for better naming consistency with the aligned attribute, but I
think it would be more useful to users to adopt Intel's name.
llvm-svn: 218910
Prior to GCC 4.4, __sync_fetch_and_nand was implemented as:
{ tmp = *ptr; *ptr = ~tmp & value; return tmp; }
but this was changed in GCC 4.4 to be:
{ tmp = *ptr; *ptr = ~(tmp & value); return tmp; }
in response to this change, support for sync_fetch_and_nand (and
sync_nand_and_fetch) was removed in r99522 in order to avoid miscompiling code
depending on the old semantics. However, at this point:
1. Many years have passed, and the amount of code relying on the old
semantics is likely smaller.
2. Through the work of many contributors, all LLVM backends have been updated
such that "atomicrmw nand" provides the newer GCC 4.4+ semantics (this process
was complete July of 2014 (added to the release notes in r212635).
3. The lack of this intrinsic is now a needless impediment to porting codes
from GCC to Clang (I've now seen several examples of this).
It is true, however, that we still set GNUC_MINOR to 2 (corresponding to GCC
4.2). To compensate for this, and to address the original concern regarding
code relying on the old semantics, I've added a warning that specifically
details the fact that the semantics have changed and that we provide the newer
semantics.
Fixes PR8842.
llvm-svn: 218905
In addition to __builtin_assume_aligned, GCC also supports an assume_aligned
attribute which specifies the alignment (and optional offset) of a function's
return value. Here we implement support for the assume_aligned attribute by making
use of the @llvm.assume intrinsic.
llvm-svn: 218500
AFAICT the semantics of frem match libm's fmod.
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 218488
Fixes PR21027. The MIDL compiler produces code that does this.
If we wanted to improve the warning, I think we could do this:
void __stdcall f(); // Don't warn without -Wstrict-prototypes.
void g() {
f(); // Might warn, the user probably meant for f to take no args.
f(1, 2, 3); // Warn, we have no idea what args f takes.
f(1); // Error, this is insane, one of these calls is broken.
}
Reviewers: thakis
Differential Revision: http://reviews.llvm.org/D5481
llvm-svn: 218394
Summary:
Vectors are normally 16-byte aligned, however the O32 ABI enforces a
maximum alignment of 8-bytes since the base of the stack is 8-byte aligned.
Previously, this was enforced on the caller side, but not on the callee side.
This fixes the output of OpenCL's printf when given vectors.
Reviewers: atanasyan
Reviewed By: atanasyan
Subscribers: llvm-commits, pekka.jaaskelainen
Differential Revision: http://reviews.llvm.org/D5433
llvm-svn: 218248
Summary:
This fixes PR20023. In order to implement this scoping rule, we piggy
back on the existing LabelDecl machinery, by creating LabelDecl's that
will carry the "internal" name of the inline assembly label, which we
will rewrite the asm label to.
Reviewers: rnk
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D4589
llvm-svn: 218230
According to lore, we used to verifier-fail on:
void __thiscall f();
int main() { f(1); }
So that's fixed now. System headers use prototype-less __stdcall functions,
so make that a warning that's DefaultError -- then it fires on regular code
but is suppressed in system headers.
Since it's used in system headers, we have codegen tests for this; massage
them slightly so that they still compile.
llvm-svn: 218166
They are added to adxintrin.h but outside __ADX__ block.
These intrinics generates adc and sbb correspondingly that were available before ADX
llvm-svn: 218118
Summary:
This patch implements a new UBSan check, which verifies
that function arguments declared to be nonnull with __attribute__((nonnull))
are actually nonnull in runtime.
To implement this check, we pass FunctionDecl to CodeGenFunction::EmitCallArgs
(where applicable) and if function declaration has nonnull attribute specified
for a certain formal parameter, we compare the corresponding RValue to null as
soon as it's calculated.
Test Plan: regression test suite
Reviewers: rsmith
Reviewed By: rsmith
Subscribers: cfe-commits, rnk
Differential Revision: http://reviews.llvm.org/D5082
llvm-svn: 217389
This makes use of the recently-added @llvm.assume intrinsic to implement a
__builtin_assume(bool) intrinsic (to provide additional information to the
optimizer). This hooks up __assume in MS-compatibility mode to mirror
__builtin_assume (the semantics have been intentionally kept compatible), and
implements GCC's __builtin_assume_aligned as assume((p - o) & mask == 0). LLVM
now contains special logic to deal with assumptions of this form.
llvm-svn: 217349
made the 8-bit masks actually 8-bit arguments to these intrinsics.
These builtins are a mess. Many were missing the I qualifier which
I added where obviously correct. Most aren't tested, but I've updated
the relevant tests. I've tried to catch all the things that should
become 'c' in this round.
It's also frustrating because the set of these is really ad-hoc and
doesn't really map that cleanly to the set supported by either GCC or
LLVM. Oh well...
llvm-svn: 217311
This patch adds support for the 32bit numeric max/min and directed round-to-integral NEON intrinsics that were added as part of v8, along with unit tests.
Patch by Graham Hunter!
llvm-svn: 217242
For naked functions with parameters, Clang would still emit stores in the prologue
that would clobber the stack, because LLVM doesn't set up a stack frame. (This
shows up in -O0 compiles, because the stores are optimized away otherwise.)
For example:
__attribute__((naked)) int f(int x) {
asm("movl $42, %eax");
asm("retl");
}
Would result in:
_Z1fi:
movl 12(%esp), %eax
movl %eax, (%esp) <--- Oops.
movl $42, %eax
retl
Differential Revision: http://reviews.llvm.org/D5183
llvm-svn: 217198
If control falls off the end of a function after an __asm block, MSVC
assumes that the inline assembly filled the EAX and possibly EDX
registers with an appropriate return value. This functionality is used
in inline functions returning 64-bit integers in system headers, so we
need some amount of compatibility.
This is implemented in Clang by adding extra output constraints to every
inline asm block, and storing the resulting output registers into the
return value slot. If we see an asm block somewhere in the function
body, we emit a normal epilogue instead of marking the end of the
function with a return type unreachable.
Normal returns in functions not using this functionality will overwrite
the return value slot, and in most cases LLVM should be able to
eliminate the dead stores.
Fixes PR17201.
Reviewed By: majnemer
Differential Revision: http://reviews.llvm.org/D5177
llvm-svn: 217187
Summary:
This allows us to easily find them in the backend after the aggregates have
been lowered to other types. This is important on big-endian targets using
the N32/N64 ABI's since these ABI's must shift small structures into the
upper bits of the register.
Reviewers: atanasyan
Reviewed By: atanasyan
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5005
llvm-svn: 217160
Summary:
They are returned indirectly which causes the other arguments to move to
the next argument slot.
With this, utils/ABITest does not discover any failing cases in the first
500 attempts on big/little endian for O32. Previously some of these failed.
Also tested N32/N64 little endian (big endian has other known issues) with
no issues.
Reviewers: atanasyan
Reviewed By: atanasyan
Subscribers: atanasyan, cfe-commits
Differential Revision: http://reviews.llvm.org/D4811
llvm-svn: 217147
Using the intrinsic allows the SelectionDAGBuilder to turn this call
into the FABS Node and also the intrinsic is something the vectorizer knows
how to vectorize.
This patch also sets the readnone attribute on this call, which should
enable additional optmizations.
llvm-svn: 217042
Previously, EnterStructPointerForCoercedAccess used Alloc size when determining how to convert. This was problematic, because there were situations were the alloc size was larger than the store size. For example, if the first element of a structure were i24 and the destination type were i32, the old code would generate a GEP and a load i24. The code should compare store sizes to ensure the whole object is loaded. I have attached a test case.
This patch modifies the output of arm64-be-bitfield.c test case, but the new IR seems to be equivalent, and after -O3, the compiler generates identical ARM assembly. (asr x0, x0, #54)
Patch by Thomas Jablin!
llvm-svn: 216722
Summary:
We did a great job getting this wrong:
- We messed up which LLVM IR types to use for arguments and return values.
The optimized libcalls use integer types for values.
Clang attempted to use the IR type which corresponds to the value
passed in instead of using an appropriately sized integer type. This
would result in violations of the ABI for, as an example, floating
point types.
- We didn't bother recording the result of the atomic libcall in the
destination memory.
Instead, call the functions with arguments matching the type of the
libcall prototype's parameters.
This fixes PR20780.
Differential Revision: http://reviews.llvm.org/D5098
llvm-svn: 216714
Summary:
The current implementation of asan cookie is incorrect:
we add nosanitize metadata to the cookie load, but the metadata may be lost
and we will instrument the load from poisoned memory.
This change replaces the load with a call to __asan_load_cxx_array_cookie (r216692)
Reviewers: rsmith
Reviewed By: rsmith
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D5111
llvm-svn: 216702
For the following code:
__declspec(dllimport) int f(int x);
int user(int x) {
return f(x);
}
int f(int x) { return 1; }
Clang will drop the dllimport attribute in the AST, but CodeGen would have
already put it on the LLVM::Function, and that would never get updated.
(The same thing happens for global variables.)
This makes Clang check dropped DLL attribute case each time the LLVM object
is referenced.
This isn't perfect, because we will still get it wrong if the function is
never referenced by codegen after the attribute is dropped, but this handles
the common cases and makes us not fail in the verifier.
llvm-svn: 216699
Summary:
ACLE 2.0 section 9.2 defines the following "miscellaneous data processing intrinsics": `__clz`, `__cls`, `__ror`, `__rev`, `__rev16`, `__revsh` and `__rbit`.
`__clz` has already been implemented in the arm_acle.h header file. The rest are not supported yet. This patch completes ACLE data processing intrinsics.
Reviewers: t.p.northover, rengolin
Reviewed By: rengolin
Subscribers: aemerson, mroth, llvm-commits
Differential Revision: http://reviews.llvm.org/D4983
llvm-svn: 216658
ACLE 2.0 allows __fp16 to be used as a function argument or return
type. This enables this for AArch64.
This also fixes an existing bug that causes clang to not allow
homogeneous floating-point aggregates with a base type of __fp16. This
is valid for AAPCS64, but not for AAPCS-VFP.
llvm-svn: 216558
This tidies up some ARM-specific code added by r208417 to move it out
of the target-independent parts of clang into TargetInfo.cpp. This
also has the advantage that we can now flatten struct arguments to
variadic AAPCS functions.
llvm-svn: 216535
This time though, preserve the extension for bool types since that's compatible
with what MSVC expects.
See http://reviews.llvm.org/D4380
llvm-svn: 216507
Summary:
MSVC doesn't extend integer types smaller than 64bit, so to preserve
binary compatibility, clang shouldn't either.
For example, the following C code built with MSVC:
unsigned test(unsigned v);
unsigned foobar(unsigned short);
int main() { return test(0xffffffff) + foobar(28); }
Produces the following:
0000000000000004: B9 FF FF FF FF mov ecx,0FFFFFFFFh
0000000000000009: E8 00 00 00 00 call test
000000000000000E: 89 44 24 20 mov dword ptr [rsp+20h],eax
0000000000000012: 66 B9 1C 00 mov cx,1Ch
0000000000000016: E8 00 00 00 00 call foobar
And as you can see, when setting up the call to foobar, only cx is overwritten.
If foobar is compiled with clang, then the zero extension added by clang means
the rest of the register, which contains garbage, could be used.
For example if foobar is:
unsigned foobar(unsigned short v) {
return v;
}
Compiled with clang -fomit-frame-pointer -O3 gives the following assembly:
foobar:
0000000000000000: 89 C8 mov eax,ecx
0000000000000002: C3 ret
And that function would return garbage because the 16 most significant bits of
ecx still contain garbage from the first call.
With this change, the code for that function is now:
foobar:
0000000000000000: 0F B7 C1 movzx eax,cx
0000000000000003: C3 ret
Reviewers: chapuni, rnk
Reviewed By: rnk
Subscribers: majnemer, cfe-commits
Differential Revision: http://reviews.llvm.org/D4380
llvm-svn: 216491
lowering of the intrinsics.
Prior to this commit, most of the copy-related intrinsics could be optimized
away. The situation is still not ideal as there are several possibilities to
lower a given intrinsic. Currently, we match LLVM behavior.
llvm-svn: 216474
feature is c11 about nested struct declarations must have
struct-declarator-list. Without this change, code
which was meant for c99 breaks. rdar://18125536
llvm-svn: 216469
Summary:
PR19838
When operator new[] is called and an array cookie is created
we want asan to detect buffer overflow bugs that touch the cookie.
For that we need to
a) poison the shadow for the array cookie (call __asan_poison_cxx_array_cookie).
b) ignore the legal accesses to the cookie generated by clang (add 'nosanitize' metadata)
Reviewers: timurrrr, samsonov, rsmith
Reviewed By: rsmith
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D4774
llvm-svn: 216434
PowerPC uses the special PPC_FP128 type for long double on Linux, which is
composed of two 64-bit doubles. The higher-order double (which contains the
overall sign) comes first, and so the __builtin_signbitl implementation
requires special handling to extract the sign bit.
Fixes PR20691.
llvm-svn: 216341
Moreover, rework some patterns to actually check the emitted instructions
instead of matching unrelated string!
E.g.,
some of the "// CHECK: vmov" were matching stuff like ".globl
funcname_with_vmov" instead of actual instructions.
llvm-svn: 216275
It fits better with LLVM's memory model to try to do this in the
backend. Specifically, narrowing wide loads in the backends should be
relatively straightforward and is generally valuable, whereas widening
loads tends to be very constrained.
Discussion here:
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20140811/112581.html
This reverts commit r215614.
llvm-svn: 215648
Currently when laying out bitfields that don't need any padding, we
represent them as a wide enough int to contain all of the bits. This
can be hard on the backend since we'll do things like represent stores
to a few bits as loading an i144, masking it with a large constant,
and storing it back.
This turns up in less pathological cases where we load and mask 64 bit
word on a 32 bit platform when we actually only need to access 32 bits.
This leads to bad code being generated in most of our 32 bit backends.
In practice, there are often natural breaks in bitfields, and it's a
fairly simple and effective heuristic to split these fields into legal
integer sized chunks when it will be equivalent (ie, it won't force us
to add any extra padding).
llvm-svn: 215614
Similar approach to the set1 intrinsics is used: implement in terms of vector
initializers and then ensure with an LLVM test that a broadcast is generated
at the end.
Part of <rdar://problem/17688758>
llvm-svn: 215486
Summary:
This patch adds a runtime check verifying that functions
annotated with "returns_nonnull" attribute do in fact return nonnull pointers.
It is based on suggestion by Jakub Jelinek:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223693.html.
Test Plan: regression test suite
Reviewers: rsmith
Reviewed By: rsmith
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D4849
llvm-svn: 215485
Due to the possible presence of return-by-out parameters, using the LLVM
argument number count when numbering debug info arguments can end up
off-by-one. This could produce two arguments with the same number, which
would in turn cause LLVM to emit only one of those arguments (whichever
it found last) or assert (r215157).
llvm-svn: 215227
Note that similar to palingr, we could further optimize these to emit
shufflevector when the shift count is <=64. This however does not
change the overall design that unlike palignr we would still need the LLVM
intrinsic corresponding to this intruction to handle the >64 cases. (palignr
uses the psrldq intrinsic in this case.)
llvm-svn: 214891
My original LE implementation of the vsldoi instruction, with its
altivec.h interfaces vec_sld and vec_vsldoi, produces incorrect
shufflevector operations in the LLVM IR. Correct code is generated
because the back end handles the incorrect shufflevector in a
consistent manner.
This patch and a companion patch for LLVM correct this problem by
removing the fixup from altivec.h and the corresponding fixup from the
PowerPC back end. Several test cases are also modified to reflect the
now-correct LLVM IR.
The vec_sums and vec_vsumsws interfaces in altivec.h are also fixed,
because they used vec_perm calls intended to be recognized as vsldoi
instructions. These vec_perm calls are now replaced with code that
more clearly shows the intent of the transformation.
llvm-svn: 214801
Instead of creating global variables for source locations and global names,
just create metadata nodes and strings. They will be transformed into actual
globals in the instrumentation pass (if necessary). This approach is more
flexible:
1) we don't have to ensure that our custom globals survive all the optimizations
2) if globals are discarded for some reason, we will simply ignore metadata for them
and won't have to erase corresponding globals
3) metadata for source locations can be reused for other purposes: e.g. we may
attach source location metadata to alloca instructions and provide better descriptions
for stack variables in ASan error reports.
No functionality change.
llvm-svn: 214604
These tests seem like an exception to the rule against assembly emitting
tests in clang. I made an LLVM side change that can only be tested by
setting up the inline assembly machinery that is only implemented by
Clang.
llvm-svn: 214552
It appears that the backend does not handle all cases that were handled by clang.
In particular, it does not handle structs as used in
SingleSource/UnitTests/2003-05-07-VarArgs.
llvm-svn: 214512
Summary:
This patch causes clang to emit va_arg instructions to the backend instead of
expanding them into an implementation itself. The backend already implements
va_arg since this is necessary for NaCl so this patch is removing redundant
code.
Together with the llvm patch (D4556) that accounts for the effect of endianness
on the expansion of va_arg, this fixes PR19612.
Depends on D4556
Reviewers: sstankovic, dsanders
Reviewed By: dsanders
Subscribers: rnk, cfe-commits
Differential Revision: http://reviews.llvm.org/D4742
llvm-svn: 214497
Note that it's not clear whether this is the right behavior, please see
the review for the discussion.
Reviewers: rnk
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D4577
llvm-svn: 214401
or a class derived from T. We already supported this when initializing
_Atomic(T) from T for most (and maybe all) other reasonable values of T.
llvm-svn: 214390
(Dropped the byte and word variants from the patch. Turns out these are not
part of AVX512F but only AVX512BW/VL.)
Part of <rdar://problem/17688758>
llvm-svn: 214314
This broke the following gdb tests:
gdb.base__annota1.exp
gdb.base__consecutive.exp
gdb.python__py-symtab.exp
gdb.reverse__consecutive-precsave.exp
gdb.reverse__consecutive-reverse.exp
I will look into this.
This reverts commit 214162.
llvm-svn: 214163
This allows us to give more precise diagnostics.
Diego kindly tested the impact on debug info size: "The increase on average
debug sizes is 0.1%. The total file size increase is ~0%."
llvm-svn: 214162
While Clang now supports both ELFv1 and ELFv2 ABIs, their use is currently
hard-coded via the target triple: powerpc64-linux is always ELFv1, while
powerpc64le-linux is always ELFv2.
These are of course the most common scenarios, but in principle it is
possible to support the ELFv2 ABI on big-endian or the ELFv1 ABI on
little-endian systems (and GCC does support that), and there are some
special use cases for that (e.g. certain Linux kernel versions could
only be built using ELFv1 on LE).
This patch implements the Clang side of supporting this, based on the
LLVM commit 214072. The command line options -mabi=elfv1 or -mabi=elfv2
select the desired ABI if present. (If not, Clang uses the same default
rules as now.)
Specifically, the patch implements the following changes based on the
presence of the -mabi= option:
In the driver:
- Pass the appropiate -target-abi flag to the back-end
- Select the correct dynamic loader version (/lib64/ld64.so.[12])
In the preprocessor:
- Define _CALL_ELF to the appropriate value (1 or 2)
In the compiler back-end:
- Select the correct ABI in TargetInfo.cpp
- Select the desired ABI for LLVM via feature (elfv1/elfv2)
llvm-svn: 214074
Summary:
This patch extends the __asm parser to make it keep parsing input tokens
as inline assembly if a single-line __asm line is followed by another line
starting with __asm too. It also makes sure that we correctly keep
matching braces in such situations by separating the notions of how many
braces we are matching and whether we are in single-line asm block mode.
Reviewers: rnk
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D4598
llvm-svn: 213916
arm64_be doesn't really exist; it was useful for testing while AArch64 and
ARM64 were separate, but now the only real way to refer to the system is
aarch64_be.
llvm-svn: 213747
In addition to enabling ELFv2 homogeneous aggregate handling,
LLVM support to pass array types directly also enables a performance
enhancement. We can now pass (non-homogeneous) aggregates that fit
fully in registers as direct integer arrays, using an element type
to encode the alignment requirement (that would otherwise go to the
"byval align" field).
This is preferable since "byval" forces the back-end to write the
aggregate out to the stack, even if it could be passed fully in
registers. This is particularly annoying on ELFv2, if there is
no parameter save area available, since we then need to allocate
space on the callee's stack just to hold those aggregates.
Note that to implement this optimization, this patch does not attempt
to fully anticipate register allocation rules as (defined in the
ABI and) implemented in the back-end. Instead, the patch is simply
passing *any* aggregate passed by value using the array mechanism
if its size is up to 64 bytes. This means that some of those will
end up being passed in stack slots anyway, but the generated code
shouldn't be any worse either. (*Large* aggregates remain passed
using "byval" to enable optimized copying via memcpy etc.)
llvm-svn: 213495
This patch implements clang support for the PowerPC ELFv2 ABI.
Together with a series of companion patches in LLVM, this makes
clang/LLVM fully usable on powerpc64le-linux.
Most of the ELFv2 ABI changes are fully implemented on the LLVM side.
On the clang side, we only need to implement some changes in how
aggregate types are passed by value. Specifically, we need to:
- pass (and return) "homogeneous" floating-point or vector aggregates in
FPRs and VRs (this is similar to the ARM homogeneous aggregate ABI)
- return aggregates of up to 16 bytes in one or two GPRs
The second piece is trivial to implement in any case. To implement
the first piece, this patch makes use of infrastructure recently
enabled in the LLVM PowerPC back-end to support passing array types
directly, where the array element type encodes properties needed to
handle homogeneous aggregates correctly.
Specifically, the array element type encodes:
- whether the parameter should be passed in FPRs, VRs, or just
GPRs/stack slots (for float / vector / integer element types,
respectively)
- what the alignment requirements of the parameter are when passed in
GPRs/stack slots (8 for float / 16 for vector / the element type
size for integer element types) -- this corresponds to the
"byval align" field
With this support in place, the clang part simply needs to *detect*
whether an aggregate type implements a float / vector homogeneous
aggregate as defined by the ELFv2 ABI, and if so, pass/return it
as array type using the appropriate float / vector element type.
llvm-svn: 213494
In C99, an array parameter declarator might have the form:
direct-declarator '[' 'static' type-qual-list[opt] assign-expr ']'
where the static keyword indicates that the caller will always provide a
pointer to the beginning of an array with at least the number of elements
specified by the assignment expression. For constant sizes, we can use the
new dereferenceable attribute to pass this information to the optimizer. For
VLAs, we don't know the size, but (for addrspace(0)) do know that the pointer
must be nonnull (and so we can use the nonnull attribute).
llvm-svn: 213444
r211898 introduced a regression where a large struct, which would
normally be passed ByVal, was causing padding to be inserted to
prevent the backend from using some GPRs, in order to follow the
AAPCS. However, the type of the argument was not being set correctly,
so the backend cannot align 8-byte aligned struct types on the stack.
The fix is to not insert the padding arguments when the argument is
being passed ByVal.
llvm-svn: 213359
1. Revert "Add default feature for CPUs on AArch64 target in Clang"
at r210625. Then, all enabled feature will by passed explicitly by
-target-feature in -cc1 option.
2. Get "-mfpu" deprecated.
3. Implement support of "-march". Usage is:
-march=armv8-a+[no]feature
For instance, "-march=armv8-a+neon+crc+nocrypto". Here "armv8-a" is
necessary, and CPU names are not acceptable. Candidate features are
fp, neon, crc and crypto. Where conflicting feature modifiers are
specified, the right-most feature is used.
4. Implement support of "-mtune". Usage is:
-march=CPU_NAME
For instance, "-march=cortex-a57". This option will ONLY get
micro-architectural feature enabled specifying to target CPU,
like "+zcm" and "+zcz" for cyclone. Any architectural features
WON'T be modified.
5. Change usage of "-mcpu" to "-mcpu=CPU_NAME+[no]feature", which is
an alias to "-march={feature of CPU_NAME}+[no]feature" and
"-mtune=CPU_NAME" together. Where this option is used in conjunction
with -march or -mtune, those options take precedence over the
appropriate part of this option.
llvm-svn: 213353
This is used to mark the instructions emitted by Clang to implement
variety of UBSan checks. Generally, we don't want to instrument these
instructions with another sanitizers (like ASan).
Reviewed in http://reviews.llvm.org/D4544
llvm-svn: 213291
Summary:
I'm planning on upstreaming some test cases for the inline assembly
usage in the Mozilla code base. A lot of these test cases test the
recent fixes to this code.
Reviewers: rnk
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D4508
llvm-svn: 213255
Memory barrier __builtin_arm_[dmb, dsb, isb] intrinsics are required to
implement their corresponding ACLE and MSVC intrinsics.
This patch ports ARM dmb, dsb, isb intrinsic to AArch64.
Requires LLVM r213247.
Differential Revision: http://reviews.llvm.org/D4521
llvm-svn: 213250
Clang supports __assume, at least at the semantic level, when MS extensions are
enabled. Unfortunately, trying to actually compile code using __assume would
result in this error:
error: cannot compile this builtin function yet
__assume is an optimizer hint, and can be ignored at the IR level. Until LLVM
supports assumptions at the IR level, a noop lowering is valid, and that is
what is done here.
llvm-svn: 213206
An array showing up in an inline assembly input is accepted in ICC and
GCC 4.8
This fixes PR20201.
Differential Revision: http://reviews.llvm.org/D4382
llvm-svn: 212954
This patch implements __builtin_arm_nop intrinsic for AArch32 and AArch64,
which generates hint 0x0, the alias of NOP instruction.
This intrinsic is necessary to implement ACLE __nop intrinsic.
Differential Revision: http://reviews.llvm.org/D4495
llvm-svn: 212947
Currently ASan instrumentation pass creates a string with global name
for each instrumented global (to include global names in the error report). Global
name is already mangled at this point, and we may not be able to demangle it
at runtime (e.g. there is no __cxa_demangle on Android).
Instead, create a string with fully qualified global name in Clang, and pass it
to ASan instrumentation pass in llvm.asan.globals metadata. If there is no metadata
for some global, ASan will use the original algorithm.
This fixes https://code.google.com/p/address-sanitizer/issues/detail?id=264.
llvm-svn: 212872
MSVC accepts __noop without any trailing parens and treats it like a
literal zero. We don't treat __noop as an integer literal, but now at
least we can parse a naked __noop expression.
Reviewers: rsmith
Differential Revision: http://reviews.llvm.org/D4476
llvm-svn: 212860
We now have an LLVM-level nonnull attribute that can be applied to function
parameters, and we emit it for reference types (as of r209723), but did not
emit it when an __attribute__((nonnull)) was provided. Now we will.
llvm-svn: 212835
Teach UBSan vptr checker to ignore technically invalud down-casts on
blacklisted types.
Based on http://reviews.llvm.org/D4407 by Byoungyoung Lee!
llvm-svn: 212770
This patch adds support for respecting the ABI and type alignment
of aggregates passed by value. Currently, all aggregates are aligned
at 8 bytes in the parameter save area. This is incorrect for two
reasons:
- Aggregates that need alignment of 16 bytes or more should be aligned
at 16 bytes in the parameter save area. This is implemented by
using an appropriate "byval align" attribute in the IR.
- Aggregates that need alignment beyond 16 bytes need to be dynamically
realigned by the caller. This is implemented by setting the Realign
flag of the ABIArgInfo::getIndirect call.
In addition, when expanding a va_arg call accessing a type that is
aligned at 16 bytes in the argument save area (either one of the
aggregate types as above, or a vector type which is already aligned
at 16 bytes), code needs to align the va_list pointer accordingly.
Reviewed by Hal Finkel.
llvm-svn: 212743
This patch adds support for passing arguments of non-Altivec vector type
(i.e. defined via attribute ((vector_size (...)))) on powerpc64-linux.
While such types are not mentioned in the formal ABI document, this
patch implements a calling convention compatible with GCC:
- Vectors of size < 16 bytes are passed in a GPR
- Vectors of size > 16 bytes are passed via reference
Note that vector types with a number of elements that is not a power
of 2 are not supported by GCC, so there is no pre-existing ABI to
follow. We choose to pass those (of size < 16) as if widened to the
next power of two, so they might end up in a vector register or
in a GPR. (Sizes > 16 are always passed via reference as well.)
Reviewed by Hal Finkel.
llvm-svn: 212734
Summary:
While debugging another issue, I noticed that Mips currently specifies that the
count leading zero builtins are undefined when the input is zero. The
architecture specifications say that the clz and dclz instructions write 32 or
64 respectively when given zero.
This doesn't fix any bugs that I'm aware of but it may improve optimisation in
some cases.
Differential Revision: http://reviews.llvm.org/D4431
llvm-svn: 212618
Having some kind of weird kernel-assisted ABI for these when the
native instructions are available appears to be (and should be) the
exception; OSs have been gradually opting in for years and the code
was getting silly.
So let LLVM decide whether it's possible/profitable to inline them by
default.
Patch by Phoebe Buckheister.
llvm-svn: 212598
Get rid of cached CodeGenModule::SanOpts, which was used to turn off
sanitizer codegen options if current LLVM Module is blacklisted, and use
plain LangOpts.Sanitize instead.
1) Some codegen decisions (turning TBAA or writable strings on/off)
shouldn't depend on the contents of blacklist.
2) llvm.asan.globals should *always* be created, even if the module
is blacklisted - soon Clang's CodeGen where we read sanitizer
blacklist files, so we should properly report which globals are
blacklisted to the backend.
llvm-svn: 212499
This adds support for simple MSVC compatibility mode intrinsics. These
intrinsics are simple in that they are either directly passed through to the
annotated MSBuiltin intrinsic or they mirror existing GCC builtins.
llvm-svn: 212378
Summary:
Because a global created by GetOrCreateLLVMGlobal() is not finalised until later viz:
extern char a[];
char f(){ return a[5];}
char a[10];
Change MangledDeclNames to use a MapVector rather than a DenseMap so that the
Metadata is output in order of original declaration, so to make deterministic
and improve human readablity.
Differential Revision: http://reviews.llvm.org/D4176
llvm-svn: 212263
This corrects SVN r212196's naming change to use the proper prefix of
`__builtin_arm_` instead of `__builtin_`.
Thanks to Yi Kong for pointing out the incorrect naming!
llvm-svn: 212253
This extends the target builtin support to allow language specific annotations
(i.e. LANGBUILTIN). This is to allow MSVC compatibility whilst retaining the
ability to have EABI targets use a __builtin_ prefix. This is merely to allow
uniformity in the EABI case where the unprefixed name is provided as an alias in
the header.
llvm-svn: 212196
See https://code.google.com/p/address-sanitizer/issues/detail?id=299 for the
original feature request.
Introduce llvm.asan.globals metadata, which Clang (or any other frontend)
may use to report extra information about global variables to ASan
instrumentation pass in the backend. This metadata replaces
llvm.asan.dynamically_initialized_globals that was used to detect init-order
bugs. llvm.asan.globals contains the following data for each global:
1) source location (file/line/column info);
2) whether it is dynamically initialized;
3) whether it is blacklisted (shouldn't be instrumented).
Source location data is then emitted in the binary and can be picked up
by ASan runtime in case it needs to print error report involving some global.
For example:
0x... is located 4 bytes to the right of global variable 'C::array' defined in '/path/to/file:17:8' (0x...) of size 40
These source locations are printed even if the binary doesn't have any
debug info.
This is an ABI-breaking change. ASan initialization is renamed to
__asan_init_v4(). Pre-built libraries compiled with older Clang will not work
with the fresh runtime.
llvm-svn: 212188
ARMv8 adds (to both AArch32 and AArch64) acquiring and releasing
variants of the exclusive operations, in line with the C++11 memory
model.
This adds support for two new intrinsics to expose them to C & C++
developers directly: __builtin_arm_ldaex and __builtin_arm_stlex, in
direct analogy with the versions with no implicit barrier.
rdar://problem/15885451
llvm-svn: 212175
The backend *can* cope with all of these now, so Clang should give it the
chance. On CPUs without cmpxchg16b (e.g. the original athlon64) LLVM can reform
the libcalls.
rdar://problem/13496295
llvm-svn: 212173
In 32b mode the reference count for block addresses
is not zero. This prevents inlining and constant
folding and causes the test to fail. Changing
the triple allows runnning the test in 64b mode.
The array in foo2 is now local instead of static until
at lower optimization levels the interprocedural constant
propagator is invoked before the global optimizer.
llvm-svn: 212092
llvm r212077 causes this test to fail. We need to reorder some passes and
possibly make other changes to reenable the optimization being tested here.
llvm-svn: 212091
This patch adds intrinsic __rdpmc to header file 'ia32intrin.h'.
Intrinsic __rdmpc can be used to read performance monitoring counters. It is
implemented as a direct call to __builtin_ia32_rdpmc.
It takes as input a value representing the index of the performance counter to
read. The value of the performance counter is then returned as a unsigned
64-bit quantity.
llvm-svn: 212053
These don't actually require any registered backend to run.
This commit tests the water with a handful of fixes for what is a more
widespread problem.
llvm-svn: 212008
This corrects the handling for i686-windows-itanium. This environment is nearly
identical to Windows MSVC, except it uses the itanium ABI for C++.
llvm-svn: 211991
Summary: This patch introduces ACLE header file, implementing extensions that can be directly mapped to existing Clang intrinsics. It implements for both AArch32 and AArch64.
Reviewers: t.p.northover, compnerd, rengolin
Reviewed By: compnerd, rengolin
Subscribers: rnk, echristo, compnerd, aemerson, mroth, cfe-commits
Differential Revision: http://reviews.llvm.org/D4296
llvm-svn: 211962
This is a fix to the code in clang which inserts padding arguments to
ensure that the ARM backend can emit AAPCS-VFP compliant code. This code
needs to track the number of registers which have been allocated in order
to do this. When passing a very large struct (>64 bytes) by value, clang
emits IR which takes a pointer to the struct, but the backend converts this
back to passing the struct in registers and on the stack. The bug was that
this was being considered by clang to only use one register, meaning that
there were situations in which padding arguments were incorrectly emitted
by clang.
llvm-svn: 211898
The NEON intrinsics in arm_neon.h are designed to work on vectors
"as-if" loaded by (V)LDR. We load vectors "as-if" (V)LD1, so the
intrinsics are currently incorrect.
This patch adds big-endian versions of the intrinsics that does the
"obvious but dumb" thing of reversing all vector inputs and all
vector outputs. This will produce extra REVs, but we trust the
optimizer to remove them.
llvm-svn: 211893
[Clang part]
These patches rename the loop unrolling and loop vectorizer metadata
such that they have a common 'llvm.loop.' prefix. Metadata name
changes:
llvm.vectorizer.* => llvm.loop.vectorizer.*
llvm.loopunroll.* => llvm.loop.unroll.*
This was a suggestion from an earlier review
(http://reviews.llvm.org/D4090) which added the loop unrolling
metadata.
Patch by Mark Heffernan.
llvm-svn: 211712
The < 8 instead of <= 8 meant that a bunch of vreinterprets were not available on v8 AArch32. Simplify the guard to just !defined(aarch64) while we're at it, and enable some v8 AArch32 testing.
llvm-svn: 211686
The C++ language requires that the address of a function be the same
across all translation units. To make __declspec(dllimport) useful,
this means that a dllimported function must also obey this rule. MSVC
implements this by dynamically querying the import address table located
in the linked executable. This means that the address of such a
function in C++ is not constant (which violates other rules).
However, the C language has no notion of ODR nor does it permit dynamic
initialization whatsoever. This requires implementations to _not_
dynamically query the import address table and instead utilize a wrapper
function that will be synthesized by the linker which will eventually
query the import address table. The effect this has is, to say the
least, perplexing.
Consider the following C program:
__declspec(dllimport) void f(void);
typedef void (*fp)(void);
static const fp var = &f;
const fp fun() { return &f; }
int main() { return fun() == var; }
MSVC will statically initialize "var" with the address of the wrapper
function and "fun" returns the address of the actual imported function.
This means that "main" will return false!
Note that LLVM's optimizers are strong enough to figure out that "main"
should return true. However, this result is dependent on having
optimizations enabled!
N.B. This change also permits the usage of dllimport declarators inside
of template arguments; they are sufficiently constant for such a
purpose. Add tests to make sure we don't regress here.
llvm-svn: 211677
According to the x86-64 ABI, structures with both floating point and
integer members are split between floating-point and general purpose
registers, and consecutive 32-bit floats can be packed into a single
floating point register.
In the case of variadic functions these are stored to memory and the position
recorded in the va_list. This was already correctly implemented in
llvm.va_start.
The problem is that the code in clang for implementing va_arg was reading
floating point registers from the wrong location.
Patch by Thomas Jablin.
Fixes PR20018.
llvm-svn: 211626
When small arguments (structures < 8 bytes or "float") are passed in a
stack slot in the ppc64 SVR4 ABI, they must reside in the least
significant part of that slot. On BE, this means that an offset needs
to be added to the stack address of the parameter, but on LE, the least
significant part of the slot has the same address as the slot itself.
For the most part, this is handled in the LLVM back-end, where I just
fixed the LE case in commit r211368.
However, there is one piece of the clang front-end that is also aware of
these stack-slot offsets: PPC64_SVR4_ABIInfo::EmitVAArg. This patch
updates that routine to take endianness into account.
llvm-svn: 211370
Relax the tests to allow for differences between release and debug builds. This
should fix the buildbots.
Thanks to Benjamin Kramer and Eric Christo for their invaluable tip that this
was release build specific issue.
llvm-svn: 211227
Add support for _InterlockedCompareExchangePointer, _InterlockExchangePointer,
_InterlockExchange. These are available as a compiler intrinsic on ARM and x86.
These are used directly by the Windows SDK headers without use of the intrin
header.
llvm-svn: 211216
In the final phase of the merge, I managed to disable a bunch of Clang
tests accidentally. Fortunately none of them seem to have broken in
the interim.
llvm-svn: 211149
There comes a time in the life of any amateur code generator when dumb string
concatenation just won't cut it any more. For NeonEmitter.cpp, that time has
come.
There were a bunch of magic type codes which meant different things depending on
the context. There were a bunch of special cases that really had no reason to be
there but the whole thing was so creaky that removing them would cause something
weird to fall over. There was a 1000 line switch statement for code generation
involving string concatenation, which actually did lexical scoping to an extent
(!!) with a bunch of semi-repeated cases.
I tried to refactor this three times in three different ways without
success. The only way forward was to rewrite the entire thing. Luckily the
testing coverage on this stuff is absolutely massive, both with regression tests
and the "emperor" random test case generator.
The main change is that previously, in arm_neon.td a bunch of "Operation"s were
defined with special names. NeonEmitter.cpp knew about these Operations and
would emit code based on a huge switch. Actually this doesn't make much sense -
the type information was held as strings, so type checking was impossible. Also
TableGen's DAG type actually suits this sort of code generation very well
(surprising that...)
So now every operation is defined in terms of TableGen DAGs. There are a bunch
of operators to use, including "op" (a generic unary or binary operator), "call"
(to call other intrinsics) and "shuffle" (take a guess...). One of the main
advantages of this apart from making it more obvious what is going on, is that
we have proper type inference. This has two obvious advantages:
1) TableGen can error on bad intrinsic definitions easier, instead of just
generating wrong code.
2) Calls to other intrinsics are typechecked too. So
we no longer need to work out whether the thing we call needs to be the Q-lane
version or the D-lane version - TableGen knows that itself!
Here's an example: before:
case OpAbdl: {
std::string abd = MangleName("vabd", typestr, ClassS) + "(__a, __b)";
if (typestr[0] != 'U') {
// vabd results are always unsigned and must be zero-extended.
std::string utype = "U" + typestr.str();
s += "(" + TypeString(proto[0], typestr) + ")";
abd = "(" + TypeString('d', utype) + ")" + abd;
s += Extend(utype, abd) + ";";
} else {
s += Extend(typestr, abd) + ";";
}
break;
}
after:
def OP_ABDL : Op<(cast "R", (call "vmovl", (cast $p0, "U",
(call "vabd", $p0, $p1))))>;
As an example of what happens if you do something wrong now, here's what happens
if you make $p0 unsigned before the call to "vabd" - that is, $p0 -> (cast "U",
$p0):
arm_neon.td:574:1: error: No compatible intrinsic found - looking up intrinsic 'vabd(uint8x8_t, int8x8_t)'
Available overloads:
- float64x2_t vabdq_v(float64x2_t, float64x2_t)
- float64x1_t vabd_v(float64x1_t, float64x1_t)
- float64_t vabdd_f64(float64_t, float64_t)
- float32_t vabds_f32(float32_t, float32_t)
... snip ...
This makes it seriously easy to work out what you've done wrong in fairly nasty
intrinsics.
As part of this I've massively beefed up the documentation in arm_neon.td too.
Things still to do / on the radar:
- Testcase generation. This was implemented in the previous version and not in
the new one, because
- Autogenerated tests are not being run. The testcase in test/ differs from
the autogenerated version.
- There were a whole slew of special cases in the testcase generation that just
felt (and looked) like hacks.
If someone really feels strongly about this, I can try and reimplement it too.
- Big endian. That's coming soon and should be a very small diff on top of this one.
llvm-svn: 211101
Most builtins date from before the "cmpxchg weak" was a gleam in the
C++ committee's eye, so fortunately not much needs to change. But a
few of them *do* acknowledge that failure is possible.
For these, we'll emit the usual cartesian product of cmpxchg
operations if we can't statically determine weakness. CodeGen can
sort it out later if the function gets inlined.
The only other non-trivial aspect of this is (I think) that we emit
the scalar expression for "IsWeak" once, at the beginning, and
propagate its value through the successive blocks. There's not much in
it, but it's slightly more consistent with the existing handling of
FailureOrder.
llvm-svn: 210932
Init-order and use-after-return modes can currently be enabled
by runtime flags. use-after-scope mode is not really working at the
moment.
The only problem I see is that users won't be able to disable extra
instrumentation for init-order and use-after-scope by a top-level Clang flag.
But this instrumentation was implicitly enabled for quite a while and
we didn't hear from users hurt by it.
llvm-svn: 210924
This is a minimal fix for clang. I'll soon add support for generating
weak variants when requested, but that's not really necessary for the
LLVM change in isolation.
llvm-svn: 210907
The vec_sld and vec_vsldoi interfaces perform a left-shift on vector
arguments for both big and little endian. However, because they rely
on the vec_perm interface which is endian-dependent, the permutation
vector needs to be reversed for LE to get the proper shift direction.
I've added some extra testing for these interfaces for LE in the
builtins-ppc-altivec.c.
llvm-svn: 210657
Instructions from __nodebug__ functions don't have file:line
information even when inlined into no-nodebug functions. As a result,
intrinsics (SSE and other) from <*intrin.h> clang headers _never_
have file:line information.
With this change, an instruction without !dbg metadata gets one from
the call instruction when inlined.
Fixes PR19001.
llvm-svn: 210459
The PowerPC vsumsws instruction, accessed via vec_sums, is defined
architecturally with a big-endian bias, in that the second input vector
and the result always reference big-endian element 3 (little-endian
element 0). For ease of porting, the programmer wants elements 3 in
both cases.
To provide this semantics, for little endian we generate a permute for
the second input vector prior to the vsumsws instruction, and generate
a permute for the result vector following the vsumsws instruction.
The correctness of this code is tested by the new sums.c test added in
a previous patch, as well as the modifications to
builtins-ppc-altivec.c in the present patch.
llvm-svn: 210449
This uncovered something strange. Diagnostics for InlineAsm have source locations
that don't really map to where they are within the .c source file.
llvm-svn: 210440
The PowerPC vector-unpack-high and vector-unpack-low instructions
are defined architecturally with a big-endian bias, in that the vector
element numbering is assumed to be "left to right" regardless of
whether the processor is in big-endian or little-endian mode. This
effectively reverses the meaning of "high" and "low." Such a
definition is unnatural for little-endian code generation.
To facilitate ease of porting, the vec_unpackh and vec_unpackl
interfaces are designed to use natural element ordering, so that
elements are numbered according to little-endian design principles
when code is generated for a little-endian target. The desired
semantics can be achieved by using the opposite instruction for
little-endian mode. That is, when a call to vec_unpackh appears in
the code, a vector-unpack-low is generated, and when a call to
vec_unpackl appears in the code, a vector-unpack-high is generated.
The correctness of this code is tested by the new unpack.c test
added in a previous patch, as well as the modifications to
builtins-ppc-altivec.c in the present patch.
Note that these interfaces were originally incorrectly implemented
when they take a vector pixel argument. This patch corrects this
implementation for both big- and little-endian code generation.
llvm-svn: 210391
Commit r210384 prematurely included changes to the little-endian
implementation of the vec_sum2s interface. This patch modifies
test/CodeGen/builtins-ppc-altivec.c to test those changes.
llvm-svn: 210389
The Altivec builtin test case test/CodeGen/builtins-ppc-altivec.c has
always been executed only for 32-bit PowerPC. These tests are equally
valid for 64-bit PowerPC. This patch updates the test to be run for
three targets: powerpc-unknown-unknown, powerpc64-unknown-unknown,
and powerpc64le-unknown-unknown. The expected code generation changes
for some of the Altivec builtins for little endian, so this patch adds
new CHECK-LE variants to the test for the powerpc64le target.
These tests satisfy the testing requirements for some previous patches
committed over the last couple of days for lib/Headers/altivec.h:
r210279 for vec_perm, r210337 for vec_mul[eo], and r210340 for
vec_pack.
llvm-svn: 210384
This patch adds support for pointer types in global named registers variables.
It'll be lowered as a pair of read/write_register and inttoptr/ptrtoint calls.
Also adds some early checks on types on SemaDecl to avoid the assert.
Tests changed accordingly. (PR19837)
llvm-svn: 210274
These intrinsics are special because they directly take a memory operand (AVX2
adds the register counterparts). Typically, other non-memop intrinsics take
registers and then it's left to isel to fold memory operands.
In order to LICM intrinsics directly reading memory, we require that no stores
are in the loop (LICM) or that the folded load accesses constant memory
(MachineLICM). When neither is the case we fail to hoist a loop-invariant
broadcast.
We can work around this limitation if we expose the load as a regular load and
then just implement the broadcast using the vector initializer syntax. This
exposes the load to LICM and other optimizations.
At the IR level this is translated into a series of insertelements. The
sequence is already recognized as a broadcast so there is no impact on the
quality of codegen.
_mm256_broadcast_pd and _mm256_broadcast_ps are not updated by this patch
because right now we lack the DAG-combiner smartness to recover the broadcast
instructions. This will be tackled in a follow-on.
There will be completing changes on the LLVM side to remove the LLVM
intrinsics and to auto-upgrade bitcode files.
Fixes <rdar://problem/16494520>
llvm-svn: 209846
Clang knows about the sanitizer blacklist and it makes no sense to
add global to the list of llvm.asan.dynamically_initialized_globals if it
will be blacklisted in the instrumentation pass anyway. Instead, we should
do as much blacklisting as possible (if not all) in the frontend.
llvm-svn: 209789
I opened a discussion on cfe-commits. Ideally we've got a few things
that need to happen. CompilerRT should probably have blacklists tests.
Asan should probably not depend on that specific field.
llvm-svn: 209766
That small change, although it looked harmless, it made emitting the LValue
on the PHI node without the proper cast. Reverting it fixes PR19841.
llvm-svn: 209663
A few (mostly CodeGen) parts of Clang were tightly coupled to the
AArch64 backend. Now that it's gone, they will not even compile.
I've also deduplicated RUN lines in many of the AArch64 tests. This
might improve "make check-all" time noticably: some of those NEON
tests were monsters.
llvm-svn: 209578
I forgot to fix this one in r209145. We use these flags on dllimport tests
to make sure we emit code for available_externaly functions and don't inline
the IR.
llvm-svn: 209564
Summary:
Previously, you could not specify the original file name when passing a preprocessed file into the compiler
Now you can use 'clang -Xclang -main-file-name -Xclang <original file name> ...'
Or 'clang -cc1 -main-file-name <original file name> ...'
llvm-svn: 209503
This is a testcase for r209227, a change in LLVM that automatically sets
visibility to default when the linkage is changed to local (rather than
asserting).
What this testcase triggers is hard to reproduce otherwise: the
`GlobalValue` is created (with non-local linkage), the visibility is set
to hidden, and then the linkage is set to local.
PR19760
llvm-svn: 209228
This is a GNU attribute that causes calls within the attributed function
to be inlined where possible. It is implemented by giving such calls the
alwaysinline attribute.
Differential Revision: http://reviews.llvm.org/D3816
llvm-svn: 209217
behavior on mismatch. The AutoUpgrader will drop incompatible debug info
any way and also emit a warning diagnostic for it.
rdar://problem/16926122
llvm-svn: 209182
This is a GNU attribute that allows split stacks to be turned off on a
per-function basis.
Differential Revision: http://reviews.llvm.org/D3817
llvm-svn: 209167
This patch implements global named registers in Clang, lowering to the just
created intrinsics in LLVM (@llvm.read/write_register). A new type of LValue
had to be created (Register), which just adds support to carry the metadata
node containing the name of the register. Two new methods to emit loads and
stores interoperate with another to emit the named metadata node.
No guarantees are being made and only non-allocatable global variable named
registers are being supported. Local named register support is unchanged.
llvm-svn: 209149
When we were padding a struct to avoid splitting it between registers and
the stack, we were throwing away the type which the argument should be coerced
to.
llvm-svn: 209122