llvm-project

History

Sanjay Patel 833550fc74 [x86] narrow 256-bit horizontal ops via demanded elements 256-bit horizontal math ops are an x86 monstrosity (and thankfully have not been extended to 512-bit AFAIK). The two 128-bit halves operate on separate halves of the inputs. So if we don't demand anything in the upper half of the result, we can extract the low halves of the inputs, do the math, and then insert that result into a 256-bit output. All of the extract/insert is free (ymm<-->xmm), so we're left with a narrower (cheaper) version of the original op. In the affected tests based on: https://bugs.llvm.org/show_bug.cgi?id=33758 https://bugs.llvm.org/show_bug.cgi?id=38971 ...we see that the h-op narrowing can result in further narrowing of other math via existing generic transforms. I originally drafted this patch as an exact pattern match starting from extract_vector_elt, but I thought we might see diffs starting from extract_subvector too, so I changed it to a more general demanded elements solution. There are no extra existing regression test improvements from that switch though, so we could go back. Differential Revision: https://reviews.llvm.org/D57841 llvm-svn: 353641		2019-02-10 15:22:06 +00:00
..
AArch64	Recommit "[GlobalISel] Introduce a generic floating point floor opcode, G_FFLOOR""	2019-02-09 00:37:31 +00:00
AMDGPU	[AMDGPU] Split idot4/8 signed and unsigned tests. NFC.	2019-02-09 01:02:28 +00:00
ARC	…
ARM	[DAGCombine] Optimize pow(X, 0.75) to sqrt(X) * sqrt(sqrt(X))	2019-02-08 19:50:58 +00:00
AVR	[AVR] Insert unconditional branch when inserting MBBs between blocks with fallthrough	2019-01-21 04:32:02 +00:00
BPF	[BPF] [BTF] Process FileName with absolute path correctly	2019-02-02 05:54:59 +00:00
Generic	[AVR] Remove unneeded XFAILs from the Generic CodeGen tests	2019-01-20 11:16:58 +00:00
Hexagon	[PatternMatch] add special-case uaddo matching for increment-by-one (2nd try)	2019-02-03 16:16:48 +00:00
Inputs	…
Lanai	…
MIR	[X86] Add FPCW as an implicit use on floating point load instructions.	2019-02-08 20:50:09 +00:00
MSP430	Enable integrated assembler on MSP430 by default.	2019-02-05 18:01:45 +00:00
Mips	[MIPS GlobalISel] Select any extending load and truncating store	2019-02-08 14:27:23 +00:00
NVPTX	[NVPTX] Some nvvm.read.ptx.sreg intrinsics should have IntrInaccessibleMemOnly attribute.	2019-01-26 00:28:32 +00:00
PowerPC	[DAGCombine] Optimize pow(X, 0.75) to sqrt(X) * sqrt(sqrt(X))	2019-02-08 19:50:58 +00:00
RISCV	[RISCV] Implement RV64D codegen	2019-02-01 03:53:30 +00:00
SPARC	Replace "no-frame-pointer-*" function attributes with "frame-pointer"	2019-01-14 10:55:55 +00:00
SystemZ	[SystemZ] Improved handling of the @llvm.ctlz intrinsic.	2019-02-06 19:23:31 +00:00
Thumb	[ARM] Mark 255 and 65535 as cheap for Thumb1 "And"	2019-02-04 11:58:48 +00:00
Thumb2	Revert r351938 "[ARM] Alter the register allocation order for minsize on Thumb2"	2019-01-23 21:10:48 +00:00
WebAssembly	[WebAssembly] Lower memmove to memory.copy	2019-02-05 20:57:40 +00:00
WinCFGuard	…
WinEH	[EH] Rename llvm.x86.seh.recoverfp intrinsic to llvm.eh.recoverfp	2019-01-16 00:37:13 +00:00
X86	[x86] narrow 256-bit horizontal ops via demanded elements	2019-02-10 15:22:06 +00:00
XCore	Replace "no-frame-pointer-*" function attributes with "frame-pointer"	2019-01-14 10:55:55 +00:00