forked from OSchip/llvm-project
3100480925
This batch of intrinsics covers two sets of immediate shift instructions, which have in common that they only overwrite part of their output register and so they need an extra input giving its previous value. The VSLI and VSRI instructions shift each lane of the input vector left or right just as if they were normal immediate VSHL/VSHR, but then they only overwrite the output bits that correspond to actual shifted bits of the input. So VSLI will leave the low n bits of each output lane unchanged, and VSRI the same with the top n bits. The V[Q][R]SHR[U]N family are all narrowing shifts: they take an input vector of 2n-bit integers, shift each lane right by a constant, and then narrowing the shifted result to only n bits. So they only overwrite half of the n-bit lanes in the output register, and the B/T suffix indicates whether it's the bottom or top half of each 2n-bit lane. I've implemented the whole of the latter family using a single IR intrinsic `vshrn`, which takes a lot of i32 parameters indicating which instruction it expands to (by specifying signedness of the input and output types, whether it saturates and/or rounds, etc). Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72328 |
||
---|---|---|
.. | ||
ABITest | ||
CIndex | ||
ClangVisualizers | ||
TableGen | ||
TestUtils | ||
VtableTest | ||
analyzer | ||
check_cfc | ||
hmaptool | ||
perf-training | ||
valgrind | ||
CaptureCmd | ||
ClangDataFormat.py | ||
CmpDriver | ||
FindSpecRefs | ||
FuzzTest | ||
bash-autocomplete.sh | ||
builtin-defines.c | ||
clangdiag.py | ||
convert_arm_neon.py | ||
creduce-clang-crash.py | ||
find-unused-diagnostics.sh | ||
make-ast-dump-check.sh | ||
modfuzz.py | ||
token-delta.py |