forked from OSchip/llvm-project
d10f1c04aa
These intrinsics use the __builtin_shuffle() function to extract the low and high half, respectively, of a 128-bit NEON vector. Currently, they're defined to use bitcasts to simplify the emitter, so we get code like: uint16x4_t vget_low_u32(uint16x8_t __a) { return (uint32x2_t) __builtin_shufflevector((int64x2_t) __a, (int64x2_t) __a, 0); } While this works, it results in those bitcasts going all the way through to the IR, resulting in code like: %1 = bitcast <8 x i16> %in to <2 x i64> %2 = shufflevector <2 x i64> %1, <2 x i64> undef, <1 x i32> %zeroinitializer %3 = bitcast <1 x i64> %2 to <4 x i16> We can instead easily perform the operation directly on the input vector like: uint16x4_t vget_low_u16(uint16x8_t __a) { return __builtin_shufflevector(__a, __a, 0, 1, 2, 3); } Not only is that much easier to read on its own, it also results in cleaner IR like: %1 = shufflevector <8 x i16> %in, <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3> This is both easier to read and easier for the back end to reason about effectively since the operation is obfuscating the source with bitcasts. rdar://13894163 llvm-svn: 181865 |
||
---|---|---|
clang | ||
clang-tools-extra | ||
compiler-rt | ||
debuginfo-tests | ||
libclc | ||
libcxx | ||
libcxxabi | ||
lld | ||
lldb | ||
llvm | ||
polly |