llvm-project

History

Jim Grosbach 7236678687 Legalize: Improve legalization of long vector extends. When an extend more than doubles the size of the elements (e.g., a zext from v16i8 to v16i32), the normal legalization method of splitting the vectors will run into problems as by the time the destination vector is legal, the source vector is illegal. The end result is the operation often becoming scalarized, with the typical horrible performance. For example, on x86_64, the simple input of: define void @bar(<16 x i8> %a, <16 x i32>* %p) nounwind { %tmp = zext <16 x i8> %a to <16 x i32> store <16 x i32> %tmp, <16 x i32>*%p ret void } Generates: .section __TEXT,__text,regular,pure_instructions .section __TEXT,__const .align 5 LCPI0_0: .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .section __TEXT,__text,regular,pure_instructions .globl _bar .align 4, 0x90 _bar: vpunpckhbw %xmm0, %xmm0, %xmm1 vpunpckhwd %xmm0, %xmm1, %xmm2 vpmovzxwd %xmm1, %xmm1 vinsertf128 $1, %xmm2, %ymm1, %ymm1 vmovaps LCPI0_0(%rip), %ymm2 vandps %ymm2, %ymm1, %ymm1 vpmovzxbw %xmm0, %xmm3 vpunpckhwd %xmm0, %xmm3, %xmm3 vpmovzxbd %xmm0, %xmm0 vinsertf128 $1, %xmm3, %ymm0, %ymm0 vandps %ymm2, %ymm0, %ymm0 vmovaps %ymm0, (%rdi) vmovaps %ymm1, 32(%rdi) vzeroupper ret So instead we can check if there are legal types that enable us to split more cleverly when the input vector is already legal such that we don't turn it into an illegal type. If the extend is such that it's more than doubling the size of the input we check if - the number of vector elements is even, - the source type is legal, - the type of a split source is illegal, - the type of an extended (by doubling element size) source is legal, and - the type of that extended source when split is legal. If the conditions are met, instead of just splitting both the destination and the source types, we create an extend that only goes up one "step" (doubling the element width), and the continue legalizing the rest of the operation normally. The result is that this operates as a new, more effecient, termination condition for the loop of "split the operation until the destination type is legal." With this change, the above example now compiles to: _bar: vpxor %xmm1, %xmm1, %xmm1 vpunpcklbw %xmm1, %xmm0, %xmm2 vpunpckhwd %xmm1, %xmm2, %xmm3 vpunpcklwd %xmm1, %xmm2, %xmm2 vinsertf128 $1, %xmm3, %ymm2, %ymm2 vpunpckhbw %xmm1, %xmm0, %xmm0 vpunpckhwd %xmm1, %xmm0, %xmm3 vpunpcklwd %xmm1, %xmm0, %xmm0 vinsertf128 $1, %xmm3, %ymm0, %ymm0 vmovaps %ymm0, 32(%rdi) vmovaps %ymm2, (%rdi) vzeroupper ret This generalizes a custom lowering that was added a while back to the ARM backend. That lowering is no longer necessary, and is removed. The testcases for it, however, provide excellent ARM tests for this change and so remain. rdar://14735100 llvm-svn: 193727		2013-10-31 00:20:48 +00:00
..
AArch64	[AArch64] Add support for NEON scalar floating-point compare instructions.	2013-10-30 15:19:37 +00:00
ARM	Convert another llc -filetype=obj test.	2013-10-28 21:12:15 +00:00
CPP	[tests] Cleanup initialization of test suffixes.	2013-08-16 00:37:11 +00:00
Generic	Change objectsize intrinsic to accept different address spaces.	2013-10-07 18:06:48 +00:00
Hexagon	TBAA: remove !tbaa from testing cases when they are not needed.	2013-09-30 18:17:35 +00:00
Inputs	Debug Info: add an identifier field to DICompositeType.	2013-08-26 22:39:55 +00:00
MSP430	Make sure SP is always aligned on a 2 byte boundary	2013-10-24 09:32:31 +00:00
Mips	[mips][msa] Correct definition of bins[lr] and CHECK-DAG-ize related tests	2013-10-30 15:45:42 +00:00
NVPTX	[NVPTX] Switch from StrongPHIElimination to PHIElimination in NVPTXTargetMachine, and add some missing optimization passes to addOptimizedRegAlloc	2013-10-11 12:39:39 +00:00
PowerPC	Convert another llc -filetype=obj test.	2013-10-28 22:17:19 +00:00
R600	Fix CodeGen for unaligned loads with address spaces	2013-10-30 23:30:05 +00:00
SPARC	[Sparc] Disable tail call optimization for sparc64.	2013-10-09 12:50:39 +00:00
SystemZ	[SystemZ] Set usaAA to true	2013-10-28 13:53:37 +00:00
Thumb	17309 ARM backend incorrectly lowers COPY_STRUCT_BYVAL_I32 for thumb1 targets	2013-10-17 19:52:05 +00:00
Thumb2	MachineSink: Fix and tweak critical-edge breaking heuristic.	2013-10-14 16:57:17 +00:00
X86	Legalize: Improve legalization of long vector extends.	2013-10-31 00:20:48 +00:00
XCore	XCore target fix bug in emitArrayBound() causing segmentation fault	2013-10-11 10:27:13 +00:00