Summary:
On targets that do not support FP16 natively LLVM currently legalizes
vectors of FP16 values by scalarizing them and promoting to FP32. This
causes problems for the following code:
void foo(int, ...);
typedef __attribute__((neon_vector_type(4))) __fp16 float16x4_t;
void bar(float16x4_t x) {
foo(42, x);
}
According to the AAPCS (appendix A.2) float16x4_t is a containerized
vector fundamental type, so 'foo' expects that the 4 16-bit FP values
are packed into 2 32-bit registers, but instead bar promotes them to
4 single precision values.
Since we already handle scalar FP16 values in the frontend by
bitcasting them to/from integers, this patch adds similar handling for
vector types and homogeneous FP16 vector aggregates.
One existing test required some adjustments because we now generate
more bitcasts (so the patch changes the test to target a machine with
native FP16 support).
Reviewers: eli.friedman, olista01, SjoerdMeijer, javed.absar, efriedma
Reviewed By: javed.absar, efriedma
Subscribers: efriedma, kristof.beyls, cfe-commits, chrib
Differential Revision: https://reviews.llvm.org/D50507
llvm-svn: 342034
This commit generalizes NRVO to cover C structs (both trivial and
non-trivial structs).
rdar://problem/33599681
Differential Revision: https://reviews.llvm.org/D44968
llvm-svn: 328809
Summary:
Upstream LLVM is changing the the prototypes of the @llvm.memcpy/memmove/memset
intrinsics. This change updates the Clang tests for this change.
The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument
which is required to be a constant integer. It represents the alignment of the
dest (and source), and so must be the minimum of the actual alignment of the
two.
This change removes the alignment argument in favour of placing the alignment
attribute on the source and destination pointers of the memory intrinsic call.
For example, code which used to read:
call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false)
will now read
call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false)
At this time the source and destination alignments must be the same (Step 1).
Step 2 of the change, to be landed shortly, will relax that contraint and allow
the source and destination to have different alignments.
llvm-svn: 322964
problems in testing, see comments in D34161 for some more details.
A fix is in progres in D35011, but a revert seems better now as the fix will
probably take some more time to land.
llvm-svn: 307277
The test being marked 'REQUIRES: long-tests' doesn't make sense. It's not the
first time the test is broken without being noticed by the committer. If the
test is too long, it should be shortened, split in multiple ones or removed
altogether. Keeping it as is is actively harmful.
(BTW, on my machine `ninja check-clang` takes 90-92 seconds with and without
this test. The difference in times is below the spread caused by random
factors.)
llvm-svn: 304302
This reverts commit 304208, since r304201 has been reverted as well.
The test needs to be turned on by default to detect breakages earlier. Will
commit this change separately.
llvm-svn: 304301
r259537 added vfma/vfms to armv7, but the builtin was only lowered
on the AArch64 side. Instead of supporting it on ARM, get rid of it.
The vfms builtin lowered to:
%nb = fsub float -0.0, %b
%r = @llvm.fma.f32(%a, %nb, %c)
Instead, define the operation in terms of vfma, and swap the
multiplicands. It now lowers to:
%na = fsub float -0.0, %a
%r = @llvm.fma.f32(%na, %b, %c)
This matches the instruction more closely, and lets current LLVM
generate the "natural" operand ordering:
fmls.2s v0, v1, v2
instead of the crooked (but equivalent):
fmls.2s v0, v2, v1
Except for theses changes, assembly is identical.
LLVM accepts both commutations, and the LLVM tests in:
test/CodeGen/AArch64/arm64-fmadd.ll
test/CodeGen/AArch64/fp-dp3.ll
test/CodeGen/AArch64/neon-fma.ll
test/CodeGen/ARM/fusedMAC.ll
already check either the new one only, or both.
Also verified against the test-suite unittests.
llvm-svn: 266807
See https://llvm.org/bugs/show_bug.cgi?id=26894 for details. This change
fixes the incorrect flags to Clang and the piping issue. It also disables
the FileCheck portion of the test, which is currently failing.
llvm-svn: 263091
This is mostly a one-time autoconversion of tests that checked assembly after
"-Owhatever" compiles to only run "opt -mem2reg" and check the assembly. This
should make them much more stable to changes in LLVM so they won't break on
unrelated changes.
"opt -mem2reg" is a compromise designed to increase the readability of tests
that check dataflow, while minimizing dependency on LLVM. Hopefully mem2reg is
stable enough that no surpises will come along.
Should address http://llvm.org/PR26815.
llvm-svn: 263048
lowering of the intrinsics.
Prior to this commit, most of the copy-related intrinsics could be optimized
away. The situation is still not ideal as there are several possibilities to
lower a given intrinsic. Currently, we match LLVM behavior.
llvm-svn: 216474
Moreover, rework some patterns to actually check the emitted instructions
instead of matching unrelated string!
E.g.,
some of the "// CHECK: vmov" were matching stuff like ".globl
funcname_with_vmov" instead of actual instructions.
llvm-svn: 216275
The < 8 instead of <= 8 meant that a bunch of vreinterprets were not available on v8 AArch32. Simplify the guard to just !defined(aarch64) while we're at it, and enable some v8 AArch32 testing.
llvm-svn: 211686
This reverts commit r184817. The failure Chandler was seeing was most likely the
bug that Bob Wilson fixed in r184870 (which was a bug caught by these tests).
To be safe, I just checked again on x86-64 mac os x/linux that this test passed
(which it did).
llvm-svn: 185110
This is a large test and thus it will only run if you pass in --param
run_long_tests=trueto LIT. This is intended so that this test can run on
buildbots and not when one runs make check.
llvm-svn: 184787