llvm-project/llvm/test
Sanjay Patel 5a0cdac174 [InstCombine] canonicalize shifty abs(): ashr+add+xor --> cmp+neg+sel
We want to do this for 2 reasons:
1. Value tracking does not recognize the ashr variant, so it would fail to match for cases like D39766.
2. DAGCombiner does better at producing optimal codegen when we have the cmp+sel pattern.

More detail about what happens in the backend:
1. DAGCombiner has a generic transform for all targets to convert the scalar cmp+sel variant of abs 
   into the shift variant. That is the opposite of this IR canonicalization.
2. DAGCombiner has a generic transform for all targets to convert the vector cmp+sel variant of abs 
   into either an ABS node or the shift variant. That is again the opposite of this IR canonicalization.
3. DAGCombiner has a generic transform for all targets to convert the exact shift variants produced by #1 or #2
   into an ISD::ABS node. Note: It would be an efficiency improvement if we had #1 go directly to an ABS node 
   when that's legal/custom.
4. The pattern matching above is incomplete, so it is possible to escape the intended/optimal codegen in a 
   variety of ways.
   a. For #2, the vector path is missing the case for setlt with a '1' constant.
   b. For #3, we are missing a match for commuted versions of the shift variants.
5. Therefore, this IR canonicalization can only help get us to the optimal codegen. The version of cmp+sel 
   produced by this patch will be recognized in the DAG and converted to an ABS node when possible or the 
   shift sequence when not.
6. In the following examples with this patch applied, we may get conditional moves rather than the shift 
   produced by the generic DAGCombiner transforms. The conditional move is created using a target-specific 
   decision for any given target. Whether it is optimal or not for a particular subtarget may be up for debate.

define i32 @abs_shifty(i32 %x) {
  %signbit = ashr i32 %x, 31 
  %add = add i32 %signbit, %x  
  %abs = xor i32 %signbit, %add 
  ret i32 %abs
}

define i32 @abs_cmpsubsel(i32 %x) {
  %cmp = icmp slt i32 %x, zeroinitializer
  %sub = sub i32 zeroinitializer, %x
  %abs = select i1 %cmp, i32 %sub, i32 %x
  ret i32 %abs
}

define <4 x i32> @abs_shifty_vec(<4 x i32> %x) {
  %signbit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31> 
  %add = add <4 x i32> %signbit, %x  
  %abs = xor <4 x i32> %signbit, %add 
  ret <4 x i32> %abs
}

define <4 x i32> @abs_cmpsubsel_vec(<4 x i32> %x) {
  %cmp = icmp slt <4 x i32> %x, zeroinitializer
  %sub = sub <4 x i32> zeroinitializer, %x
  %abs = select <4 x i1> %cmp, <4 x i32> %sub, <4 x i32> %x
  ret <4 x i32> %abs
}

> $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=x86_64 -mattr=avx 
> abs_shifty:
> 	movl	%edi, %eax
> 	negl	%eax
> 	cmovll	%edi, %eax
> 	retq
> 
> abs_cmpsubsel:
> 	movl	%edi, %eax
> 	negl	%eax
> 	cmovll	%edi, %eax
> 	retq
> 
> abs_shifty_vec:
> 	vpabsd	%xmm0, %xmm0
> 	retq
> 
> abs_cmpsubsel_vec:
> 	vpabsd	%xmm0, %xmm0
> 	retq
> 
> $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=aarch64
> abs_shifty:
> 	cmp	w0, #0                  // =0
> 	cneg	w0, w0, mi
> 	ret
> 
> abs_cmpsubsel: 
> 	cmp	w0, #0                  // =0
> 	cneg	w0, w0, mi
> 	ret
>                                        
> abs_shifty_vec: 
> 	abs	v0.4s, v0.4s
> 	ret
> 
> abs_cmpsubsel_vec: 
> 	abs	v0.4s, v0.4s
> 	ret
> 
> $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=powerpc64le 
> abs_shifty:  
> 	srawi 4, 3, 31
> 	add 3, 3, 4
> 	xor 3, 3, 4
> 	blr
> 
> abs_cmpsubsel:
> 	srawi 4, 3, 31
> 	add 3, 3, 4
> 	xor 3, 3, 4
> 	blr
> 
> abs_shifty_vec:   
> 	vspltisw 3, -16
> 	vspltisw 4, 15
> 	vsubuwm 3, 4, 3
> 	vsraw 3, 2, 3
> 	vadduwm 2, 2, 3
> 	xxlxor 34, 34, 35
> 	blr
> 
> abs_cmpsubsel_vec: 
> 	vspltisw 3, -16
> 	vspltisw 4, 15
> 	vsubuwm 3, 4, 3
> 	vsraw 3, 2, 3
> 	vadduwm 2, 2, 3
> 	xxlxor 34, 34, 35
> 	blr
>

Differential Revision: https://reviews.llvm.org/D40984

llvm-svn: 320921
2017-12-16 16:41:17 +00:00
..
Analysis Reverting [JumpThreading] Preservation of DT and LVI across the pass 2017-12-13 22:01:17 +00:00
Assembler [ConstantFold] Support vector index when factoring out GEP index into preceding dimensions 2017-12-04 19:56:33 +00:00
Bindings
Bitcode Hardware-assisted AddressSanitizer (llvm part). 2017-12-09 00:21:41 +00:00
BugPoint
CodeGen [X86] Add 128 and 256-bit VPOPCNTDQ instructions. Adjust some tablegen classes LZCNT/POPCNT. 2017-12-16 02:40:28 +00:00
DebugInfo Revert "Recommit "[DWARFv5] Dump an MD5 checksum in the line-table header."" 2017-12-15 23:21:52 +00:00
Examples
ExecutionEngine [CodeGen] Unify MBB reference format in both MIR and debug output 2017-12-04 17:18:51 +00:00
Feature
FileCheck
Instrumentation [hwasan] Inline instrumentation & fixed shadow. 2017-12-13 01:16:34 +00:00
Integer
JitListener
LTO [LTO] Make processing of combined module more consistent 2017-12-16 02:10:00 +00:00
Linker
MC [PowerPC, AsmParser] Enable the mnemonic spell corrector 2017-12-16 02:42:18 +00:00
Object Add flag to ArchiveWriter to test GNU64 format more efficiently 2017-12-01 00:54:28 +00:00
ObjectYAML [WebAssembly] Add support for init functions linking metadata 2017-12-14 21:10:03 +00:00
Other [SimplifyCFG] don't sink common insts too soon (PR34603) 2017-12-14 22:05:20 +00:00
SafepointIRVerifier [SafepointIRVerifier] Allow deriving pointers from unrelocated base 2017-12-05 21:39:37 +00:00
SymbolRewriter
TableGen Add MVT::v128i1, NFC 2017-12-14 19:05:21 +00:00
ThinLTO/X86 [LTO] Update tests for r320905 2017-12-16 02:40:20 +00:00
Transforms [InstCombine] canonicalize shifty abs(): ashr+add+xor --> cmp+neg+sel 2017-12-16 16:41:17 +00:00
Unit
Verifier [Verifier] Check that GEP indexes has correct types 2017-12-14 09:33:58 +00:00
YAMLParser
tools [LTO] Make processing of combined module more consistent 2017-12-16 02:10:00 +00:00
.clang-format
CMakeLists.txt [llvm-opt-fuzzer] Add opt fuzzer to the test-depends list. 2017-11-15 15:07:37 +00:00
TestRunner.sh
lit.cfg.py Add opt-viewer testing 2017-11-29 17:07:41 +00:00
lit.site.cfg.py.in Add opt-viewer testing 2017-11-29 17:07:41 +00:00