Go to file
Adhemerval Zanella cadcfed7aa [AArch64] Add custom lowering for v4i8 trunc store
This patch adds a custom trunc store lowering for v4i8 vector types.
Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h)
and default action for v4i8 is to extract each element and issue 4
byte stores.

A better strategy would be to extended the promoted v4i16 to v8i16
(with undef elements) and extract and store the word lane which
represents the v4i8 subvectores. The construction:

  define void @foo(<4 x i16> %x, i8* nocapture %p) {
    %0 = trunc <4 x i16> %x to <4 x i8>
    %1 = bitcast i8* %p to <4 x i8>*
    store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2
    ret void
  }

Can be optimized from:

  umov    w8, v0.h[3]
  umov    w9, v0.h[2]
  umov    w10, v0.h[1]
  umov    w11, v0.h[0]
  strb    w8, [x0, #3]
  strb    w9, [x0, #2]
  strb    w10, [x0, #1]
  strb    w11, [x0]
  ret

To:

  xtn     v0.8b, v0.8h
  str     s0, [x0]
  ret

The patch also adjust the memory cost for autovectorization, so the C
code:

  void foo (const int *src, int width, unsigned char *dst)
  {
    for (int i = 0; i < width; i++)
       *dst++ = *src++;
  }

can be vectorized to:

  .LBB0_4:                                // %vector.body
                                          // =>This Inner Loop Header: Depth=1
        ldr     q0, [x0], #16
        subs    x12, x12, #4            // =4
        xtn     v0.4h, v0.4s
        xtn     v0.8b, v0.8h
        st1     { v0.s }[0], [x2], #4
        b.ne    .LBB0_4

Instead of byte operations.

llvm-svn: 335735
2018-06-27 13:58:46 +00:00
clang [NEON] Support vldNq intrinsics in AArch32 (Clang part) 2018-06-27 13:58:43 +00:00
clang-tools-extra [clangd] Sema ranking tweaks: downrank keywords and injected names. 2018-06-27 11:43:54 +00:00
compiler-rt [CMake] Tidy up the organisation of compiler-rt when configured as a standalone 2018-06-27 12:56:34 +00:00
debuginfo-tests [debuginfo-tests] Always use the system python to invoke llgdb.py. 2018-06-10 19:38:26 +00:00
libclc atom: Use volatile pointers for cl_khr_{global,local}_int32_{base,extended}_atomics 2018-06-21 19:27:39 +00:00
libcxx [CMake] Fix install-cxx target. 2018-06-25 18:01:51 +00:00
libcxxabi [CMake] Convert paths to the right form in standalone builds on Windows 2018-06-20 20:59:18 +00:00
libunwind [CMake] Convert paths to the right form in standalone builds on Windows 2018-06-20 20:53:19 +00:00
lld [PPC64] Add support for R_PPC64_GOT_DTPREL16* relocations 2018-06-27 13:55:41 +00:00
lldb Add missing constness. 2018-06-27 07:01:07 +00:00
llgo Update copyright year to 2018. 2018-06-18 12:22:17 +00:00
llvm [AArch64] Add custom lowering for v4i8 trunc store 2018-06-27 13:58:46 +00:00
openmp [OPENMP, NVPTX] Fixes for NVPTX RTL 2018-06-25 13:43:35 +00:00
parallel-libs Update copyright year to 2018. 2018-06-18 12:22:17 +00:00
polly [ZoneAlgo] Use getDefToTarget in makeValInst. NFC. 2018-06-26 14:29:09 +00:00
README.md

README.md

Low Level Virtual Machine (LLVM)

This directory and its subdirectories contain source code for LLVM, a toolkit for the construction of highly optimized compilers, optimizers, and runtime environments.