llvm-project/llvm/lib
Jim Grosbach 7236678687 Legalize: Improve legalization of long vector extends.
When an extend more than doubles the size of the elements (e.g., a zext
from v16i8 to v16i32), the normal legalization method of splitting the
vectors will run into problems as by the time the destination vector is
legal, the source vector is illegal. The end result is the operation
often becoming scalarized, with the typical horrible performance. For
example, on x86_64, the simple input of:
define void @bar(<16 x i8> %a, <16 x i32>* %p) nounwind {
  %tmp = zext <16 x i8> %a to <16 x i32>
  store <16 x i32> %tmp, <16 x i32>*%p
  ret void
}

Generates:
  .section  __TEXT,__text,regular,pure_instructions
  .section  __TEXT,__const
  .align  5
LCPI0_0:
  .long 255                     ## 0xff
  .long 255                     ## 0xff
  .long 255                     ## 0xff
  .long 255                     ## 0xff
  .long 255                     ## 0xff
  .long 255                     ## 0xff
  .long 255                     ## 0xff
  .long 255                     ## 0xff
  .section  __TEXT,__text,regular,pure_instructions
  .globl  _bar
  .align  4, 0x90
_bar:
  vpunpckhbw  %xmm0, %xmm0, %xmm1
  vpunpckhwd  %xmm0, %xmm1, %xmm2
  vpmovzxwd %xmm1, %xmm1
  vinsertf128 $1, %xmm2, %ymm1, %ymm1
  vmovaps LCPI0_0(%rip), %ymm2
  vandps  %ymm2, %ymm1, %ymm1
  vpmovzxbw %xmm0, %xmm3
  vpunpckhwd  %xmm0, %xmm3, %xmm3
  vpmovzxbd %xmm0, %xmm0
  vinsertf128 $1, %xmm3, %ymm0, %ymm0
  vandps  %ymm2, %ymm0, %ymm0
  vmovaps %ymm0, (%rdi)
  vmovaps %ymm1, 32(%rdi)
  vzeroupper
  ret

So instead we can check if there are legal types that enable us to split
more cleverly when the input vector is already legal such that we don't
turn it into an illegal type. If the extend is such that it's more than
doubling the size of the input we check if
  - the number of vector elements is even,
  - the source type is legal,
  - the type of a split source is illegal,
  - the type of an extended (by doubling element size) source is legal, and
  - the type of that extended source when split is legal.
If the conditions are met, instead of just splitting both the
destination and the source types, we create an extend that only goes up
one "step" (doubling the element width), and the continue legalizing the
rest of the operation normally. The result is that this operates as a
new, more effecient, termination condition for the loop of "split the
operation until the destination type is legal."

With this change, the above example now compiles to:
_bar:
  vpxor %xmm1, %xmm1, %xmm1
  vpunpcklbw  %xmm1, %xmm0, %xmm2
  vpunpckhwd  %xmm1, %xmm2, %xmm3
  vpunpcklwd  %xmm1, %xmm2, %xmm2
  vinsertf128 $1, %xmm3, %ymm2, %ymm2
  vpunpckhbw  %xmm1, %xmm0, %xmm0
  vpunpckhwd  %xmm1, %xmm0, %xmm3
  vpunpcklwd  %xmm1, %xmm0, %xmm0
  vinsertf128 $1, %xmm3, %ymm0, %ymm0
  vmovaps %ymm0, 32(%rdi)
  vmovaps %ymm2, (%rdi)
  vzeroupper
  ret

This generalizes a custom lowering that was added a while back to the
ARM backend. That lowering is no longer necessary, and is removed. The
testcases for it, however, provide excellent ARM tests for this change
and so remain.

rdar://14735100

llvm-svn: 193727
2013-10-31 00:20:48 +00:00
..
Analysis SCEV: Make the final add of an inbounds GEP nuw if we know that the index is positive. 2013-10-28 07:30:06 +00:00
AsmParser Revert r193251 : Use address-taken to disambiguate global variable and indirect memops. 2013-10-27 03:08:44 +00:00
Bitcode Revert r193251 : Use address-taken to disambiguate global variable and indirect memops. 2013-10-27 03:08:44 +00:00
CodeGen Legalize: Improve legalization of long vector extends. 2013-10-31 00:20:48 +00:00
DebugInfo DWARF parser: propery handle DW_FORM_ref_sig8 and fix Windows build. 2013-10-29 16:32:19 +00:00
ExecutionEngine The FIXME was indeed fixed in the linker, comment removed. 2013-10-25 12:01:53 +00:00
IR Add calls to doInitialization() and doFinalization() in verifyFunction() 2013-10-30 22:37:51 +00:00
IRReader Add 'const' qualifiers to static const char* variables. 2013-07-16 01:17:10 +00:00
LTO Move getSymbol to TargetLoweringObjectFile. 2013-10-29 17:28:26 +00:00
Linker Add a 'deleteModule' method to the Linker class. 2013-10-16 08:59:57 +00:00
MC Move the STT_FILE symbols out of the normal symbol table processing for 2013-10-29 01:06:17 +00:00
Object Support for microMIPS jump instructions 2013-10-29 16:38:59 +00:00
Option Fix another mistake in r190442. 2013-09-10 23:22:56 +00:00
Support Add {start,end}with_lower methods to StringRef. 2013-10-30 18:32:26 +00:00
TableGen Add an error check for a typo I accidentally made in a td file that caused an assert to fire. 2013-08-20 04:22:09 +00:00
Target Legalize: Improve legalization of long vector extends. 2013-10-31 00:20:48 +00:00
Transforms Teach scalarrepl about address spaces 2013-10-30 22:54:58 +00:00
CMakeLists.txt Move LTO support library to a component, allowing it to be tested 2013-09-24 23:52:22 +00:00
LLVMBuild.txt Move LTO support library to a component, allowing it to be tested 2013-09-24 23:52:22 +00:00
Makefile Reformat Makefile. No other changes. 2013-10-30 04:03:03 +00:00