forked from OSchip/llvm-project
fab5892f8b
This adds new node types for each intrinsic. For instance, for addv, we have AArch64ISD::UADDV, such that: (v4i32 (uaddv ...)) is the same as (v4i32 (scalar_to_vector (i32 (int_aarch64_neon_uaddv ...)))) that is, (v4i32 (INSERT_SUBREG (v4i32 (IMPLICIT_DEF)), (i32 (int_aarch64_neon_uaddv ...)), ssub) In a combine, we transform all such across-vector-lanes intrinsics to: (i32 (extract_vector_elt (uaddv ...), 0)) This has one big advantage: by making the extract_element explicit, we enable the existing patterns for lane-aware instructions to fire. This lets us avoid needlessly going through the GPRs. Consider: uint32x4_t test_mul(uint32x4_t a, uint32x4_t b) { return vmulq_n_u32(a, vaddvq_u32(b)); } We now generate: addv.4s s1, v1 mul.4s v0, v0, v1[0] instead of the previous: addv.4s s1, v1 fmov w8, s1 dup.4s v1, w8 mul.4s v0, v1, v0 rdar://20044838 llvm-svn: 231840 |
||
---|---|---|
.. | ||
Analysis | ||
AsmParser | ||
Bitcode | ||
CodeGen | ||
DebugInfo | ||
ExecutionEngine | ||
Fuzzer | ||
IR | ||
IRReader | ||
LTO | ||
LineEditor | ||
Linker | ||
MC | ||
Object | ||
Option | ||
Passes | ||
ProfileData | ||
Support | ||
TableGen | ||
Target | ||
Transforms | ||
CMakeLists.txt | ||
LLVMBuild.txt | ||
Makefile |