llvm-project/llvm/test/Transforms
Sanjay Patel 5a0cdac174 [InstCombine] canonicalize shifty abs(): ashr+add+xor --> cmp+neg+sel
We want to do this for 2 reasons:
1. Value tracking does not recognize the ashr variant, so it would fail to match for cases like D39766.
2. DAGCombiner does better at producing optimal codegen when we have the cmp+sel pattern.

More detail about what happens in the backend:
1. DAGCombiner has a generic transform for all targets to convert the scalar cmp+sel variant of abs 
   into the shift variant. That is the opposite of this IR canonicalization.
2. DAGCombiner has a generic transform for all targets to convert the vector cmp+sel variant of abs 
   into either an ABS node or the shift variant. That is again the opposite of this IR canonicalization.
3. DAGCombiner has a generic transform for all targets to convert the exact shift variants produced by #1 or #2
   into an ISD::ABS node. Note: It would be an efficiency improvement if we had #1 go directly to an ABS node 
   when that's legal/custom.
4. The pattern matching above is incomplete, so it is possible to escape the intended/optimal codegen in a 
   variety of ways.
   a. For #2, the vector path is missing the case for setlt with a '1' constant.
   b. For #3, we are missing a match for commuted versions of the shift variants.
5. Therefore, this IR canonicalization can only help get us to the optimal codegen. The version of cmp+sel 
   produced by this patch will be recognized in the DAG and converted to an ABS node when possible or the 
   shift sequence when not.
6. In the following examples with this patch applied, we may get conditional moves rather than the shift 
   produced by the generic DAGCombiner transforms. The conditional move is created using a target-specific 
   decision for any given target. Whether it is optimal or not for a particular subtarget may be up for debate.

define i32 @abs_shifty(i32 %x) {
  %signbit = ashr i32 %x, 31 
  %add = add i32 %signbit, %x  
  %abs = xor i32 %signbit, %add 
  ret i32 %abs
}

define i32 @abs_cmpsubsel(i32 %x) {
  %cmp = icmp slt i32 %x, zeroinitializer
  %sub = sub i32 zeroinitializer, %x
  %abs = select i1 %cmp, i32 %sub, i32 %x
  ret i32 %abs
}

define <4 x i32> @abs_shifty_vec(<4 x i32> %x) {
  %signbit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31> 
  %add = add <4 x i32> %signbit, %x  
  %abs = xor <4 x i32> %signbit, %add 
  ret <4 x i32> %abs
}

define <4 x i32> @abs_cmpsubsel_vec(<4 x i32> %x) {
  %cmp = icmp slt <4 x i32> %x, zeroinitializer
  %sub = sub <4 x i32> zeroinitializer, %x
  %abs = select <4 x i1> %cmp, <4 x i32> %sub, <4 x i32> %x
  ret <4 x i32> %abs
}

> $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=x86_64 -mattr=avx 
> abs_shifty:
> 	movl	%edi, %eax
> 	negl	%eax
> 	cmovll	%edi, %eax
> 	retq
> 
> abs_cmpsubsel:
> 	movl	%edi, %eax
> 	negl	%eax
> 	cmovll	%edi, %eax
> 	retq
> 
> abs_shifty_vec:
> 	vpabsd	%xmm0, %xmm0
> 	retq
> 
> abs_cmpsubsel_vec:
> 	vpabsd	%xmm0, %xmm0
> 	retq
> 
> $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=aarch64
> abs_shifty:
> 	cmp	w0, #0                  // =0
> 	cneg	w0, w0, mi
> 	ret
> 
> abs_cmpsubsel: 
> 	cmp	w0, #0                  // =0
> 	cneg	w0, w0, mi
> 	ret
>                                        
> abs_shifty_vec: 
> 	abs	v0.4s, v0.4s
> 	ret
> 
> abs_cmpsubsel_vec: 
> 	abs	v0.4s, v0.4s
> 	ret
> 
> $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=powerpc64le 
> abs_shifty:  
> 	srawi 4, 3, 31
> 	add 3, 3, 4
> 	xor 3, 3, 4
> 	blr
> 
> abs_cmpsubsel:
> 	srawi 4, 3, 31
> 	add 3, 3, 4
> 	xor 3, 3, 4
> 	blr
> 
> abs_shifty_vec:   
> 	vspltisw 3, -16
> 	vspltisw 4, 15
> 	vsubuwm 3, 4, 3
> 	vsraw 3, 2, 3
> 	vadduwm 2, 2, 3
> 	xxlxor 34, 34, 35
> 	blr
> 
> abs_cmpsubsel_vec: 
> 	vspltisw 3, -16
> 	vspltisw 4, 15
> 	vsubuwm 3, 4, 3
> 	vsraw 3, 2, 3
> 	vadduwm 2, 2, 3
> 	xxlxor 34, 34, 35
> 	blr
>

Differential Revision: https://reviews.llvm.org/D40984

llvm-svn: 320921
2017-12-16 16:41:17 +00:00
..
ADCE [ADCE][Dominators] Reapply: Teach ADCE to preserve dominators 2017-08-22 16:30:21 +00:00
AddDiscriminators
AlignmentFromAssumptions
ArgumentPromotion [ArgPromotion] Preserve alignment of byval argument in new alloca 2017-08-04 17:09:11 +00:00
AtomicExpand
BDCE [BDCE] Don't check demanded bits on unsized types 2017-08-16 16:09:22 +00:00
BranchFolding
CallSiteSplitting [CallSiteSplitting] Refactor creating callsites. 2017-12-13 03:05:20 +00:00
CalledValuePropagation Add CalledValuePropagation pass 2017-10-25 13:40:08 +00:00
CodeExtractor [InlineFunction] Set debug loc for call to forward varargs. 2017-12-09 14:25:33 +00:00
CodeGenPrepare [BypassSlowDivision] Improve our handling of divisions by constants 2017-12-04 19:21:58 +00:00
ConstProp
ConstantHoisting Fix out-of-order stepping behavior in programs with hoisted constants. 2017-11-09 20:01:31 +00:00
ConstantMerge Canonicalize the representation of empty an expression in DIGlobalVariableExpression 2017-08-30 18:06:51 +00:00
Coroutines [coroutines] Add support for symmetric control transfer (musttail on coro.resumes followed by a suspend) 2017-08-25 02:25:10 +00:00
CorrelatedValuePropagation [CVP] Remove some {s|u}sub.with.overflow checks. 2017-12-05 18:14:24 +00:00
CrossDSOCFI [cfi] Build __cfi_check as Thumb when applicable. 2017-08-29 22:29:15 +00:00
DCE Add an @llvm.sideeffect intrinsic 2017-11-08 21:59:51 +00:00
DeadArgElim Remove the obsolete offset parameter from @llvm.dbg.value 2017-07-28 20:21:02 +00:00
DeadStoreElimination Add an @llvm.sideeffect intrinsic 2017-11-08 21:59:51 +00:00
DivRemPairs [DivRemPairs] split tests per target to account for bots that don't build for all targets 2017-09-09 14:10:59 +00:00
EarlyCSE [EarlyCSE] recognize swapped variants of abs/nabs as equivalent 2017-12-13 22:57:35 +00:00
EliminateAvailableExternally
EntryExitInstrumenter EntryExitInstrumenter: set DebugLocs on the inserted call instructions (PR35412) 2017-11-28 18:44:26 +00:00
ExpandMemCmp/X86 re-land [ExpandMemCmp] Split ExpandMemCmp from CodeGen into its own pass." 2017-11-03 12:12:27 +00:00
Float2Int
ForcedFunctionAttrs
FunctionAttrs Add an @llvm.sideeffect intrinsic 2017-11-08 21:59:51 +00:00
FunctionImport [LTO] Make processing of combined module more consistent 2017-12-16 02:10:00 +00:00
GCOVProfiling Canonicalize the representation of empty an expression in DIGlobalVariableExpression 2017-08-30 18:06:51 +00:00
GVN Hardware-assisted AddressSanitizer (llvm part). 2017-12-09 00:21:41 +00:00
GVNHoist [GVNHoist] Fix: PR35222 gvn-hoist incorrectly erases load 2017-12-13 19:40:07 +00:00
GVNSink Add an @llvm.sideeffect intrinsic 2017-11-08 21:59:51 +00:00
GlobalDCE
GlobalMerge Canonicalize the representation of empty an expression in DIGlobalVariableExpression 2017-08-30 18:06:51 +00:00
GlobalOpt Add an @llvm.sideeffect intrinsic 2017-11-08 21:59:51 +00:00
GlobalSplit
GuardWidening
IPConstantProp
IRCE [IRCE] Smart range intersection 2017-11-20 06:07:57 +00:00
IndVarSimplify [SCEV] Fix the movement of insertion point in expander. PR35406. 2017-12-15 05:24:42 +00:00
InferAddressSpaces InferAddressSpaces: Fix bug about replacing addrspacecast 2017-10-30 21:19:41 +00:00
InferFunctionAttrs
Inline [InlineCost] Find repeated loads in the callee 2017-12-15 14:34:41 +00:00
InstCombine [InstCombine] canonicalize shifty abs(): ashr+add+xor --> cmp+neg+sel 2017-12-16 16:41:17 +00:00
InstMerge
InstNamer
InstSimplify Reintroduce r320049, r320014 and r319894. 2017-12-13 11:21:18 +00:00
InterleavedAccess [X86][LLVM]Expanding Supports lowerInterleaved{store|load}() in X86InterleavedAccess (VF64 stride 3-4) 2017-10-02 07:35:25 +00:00
Internalize
JumpThreading Reverting [JumpThreading] Preservation of DT and LVI across the pass 2017-12-13 22:01:17 +00:00
LCSSA
LICM Re-commit : [LICM] Allow sinking when foldable in loop 2017-12-15 20:33:24 +00:00
LoadStoreVectorizer Add an @llvm.sideeffect intrinsic 2017-11-08 21:59:51 +00:00
LoopDataPrefetch
LoopDeletion [Dominators] Teach LoopDeletion to use the new incremental API 2017-08-02 18:17:52 +00:00
LoopDistribute
LoopIdiom Add an @llvm.sideeffect intrinsic 2017-11-08 21:59:51 +00:00
LoopInterchange [LoopInterchange] Fix phi node ordering miscompile. 2017-10-21 13:58:37 +00:00
LoopLoadElim
LoopPredication [Loop Predication] Teach LP about reverse loops 2017-12-04 15:11:48 +00:00
LoopReroll Remove the obsolete offset parameter from @llvm.dbg.value 2017-07-28 20:21:02 +00:00
LoopRotate Fix llvm/test/Transforms/LoopRotate/pr35210.ll in rL318237, it uses debug options. 2017-11-15 06:46:58 +00:00
LoopSimplify [SCEV] Teach SCEV to find maxBECount when loop endbound is variant 2017-10-13 14:30:43 +00:00
LoopSimplifyCFG
LoopStrengthReduce LSR: Check more intrinsic pointer operands 2017-12-11 21:38:43 +00:00
LoopUnroll loop-unroll: teach remapInstruction to update dbg.value intrinsics. 2017-11-01 23:12:35 +00:00
LoopUnswitch [LoopUnswitch] Fix a simple bug which disables loop unswitch for select statement 2017-08-29 21:45:11 +00:00
LoopVectorize Move Transforms/LoopVectorize/consecutive-ptr-cg-bug.ll into the X86 subdirectory 2017-12-16 05:10:20 +00:00
LoopVersioning
LoopVersioningLICM
LowerAtomic LowerAtomic: Don't skip optnone functions; atomic still need lowering (PR34020) 2017-08-23 15:43:28 +00:00
LowerExpectIntrinsic
LowerGuardIntrinsic
LowerInvoke
LowerSwitch
LowerTypeTests Current implementation of Value::replaceUsesExceptBlockAddr() uses UseList 2017-11-17 00:30:24 +00:00
Mem2Reg [Debugify] Add a pass to test debug info preservation 2017-12-08 21:57:28 +00:00
MemCpyOpt Revert r319482 and r319483 "[memcpyopt] Teach memcpyopt to optimize across basic blocks" 2017-12-06 01:47:55 +00:00
MergeFunc [TailRecursionElimination] Skip debug intrinsics. 2017-11-28 09:32:25 +00:00
MergeICmps Re-land "[MergeICmps] Disable mergeicmps if the target does not want to handle memcmp expansion." 2017-10-10 08:00:45 +00:00
MetaRenamer [MetaRenamer] Leave `@main` alone. 2017-08-01 05:14:45 +00:00
NameAnonGlobals
NaryReassociate
NewGVN Hardware-assisted AddressSanitizer (llvm part). 2017-12-09 00:21:41 +00:00
ObjCARC ObjCARC: do not increment past the end of the BB 2017-10-24 00:09:10 +00:00
PGOProfile [LTO] Make processing of combined module more consistent 2017-12-16 02:10:00 +00:00
PartiallyInlineLibCalls [PartiallyInlineLibCalls][x86] add TTI hook to allow sqrt inlining to depend on arg rather than result 2017-11-27 21:15:43 +00:00
PhaseOrdering [SimplifyCFG] don't sink common insts too soon (PR34603) 2017-12-14 22:05:20 +00:00
PlaceSafepoints All libcalls should be considered to be GC-leaf functions. 2017-07-27 16:49:39 +00:00
PreISelIntrinsicLowering
PruneEH
Reassociate Reassociate: add global reassociation algorithm 2017-12-12 19:18:02 +00:00
Reg2Mem
RewriteStatepointsForGC [PM] port Rewrite Statepoints For GC to the new pass manager. 2017-12-15 09:32:11 +00:00
SCCP [SCCP] Pick the right lattice value for constants. 2017-11-22 03:04:55 +00:00
SLPVectorizer [SLPVectorizer] Don't ignore scalar extraction instructions of aggregate value 2017-12-14 19:35:43 +00:00
SROA Recommit rL319407: [SROA] enable splitting for non-whole-alloca loads and stores 2017-12-01 06:05:05 +00:00
SafeStack Parse and print DIExpressions inline to ease IR and MIR testing 2017-08-23 20:31:27 +00:00
SampleProfile Include already promoted counts when computing SUM for VP. 2017-11-06 19:52:49 +00:00
Scalarizer Remove the obsolete offset parameter from @llvm.dbg.value 2017-07-28 20:21:02 +00:00
SeparateConstOffsetFromGEP
SimpleLoopUnswitch [PM/Unswitch] Teach SimpleLoopUnswitch to do non-trivial unswitching, 2017-11-17 19:58:36 +00:00
SimplifyCFG [SimplifyCFG] don't sink common insts too soon (PR34603) 2017-12-14 22:05:20 +00:00
Sink
SpeculateAroundPHIs Add a new pass to speculate around PHI nodes with constant (integer) operands when profitable. 2017-11-28 11:32:31 +00:00
SpeculativeExecution
StraightLineStrengthReduce
StripDeadPrototypes
StripSymbols Canonicalize the representation of empty an expression in DIGlobalVariableExpression 2017-08-30 18:06:51 +00:00
StructurizeCFG [Dominators] Include infinite loops in PostDominatorTree 2017-08-15 18:14:57 +00:00
TailCallElim Remove this test 2017-11-28 22:39:38 +00:00
ThinLTOBitcodeWriter ThinLTOBitcodeWriter: Try harder to discard unused references to the merged module. 2017-11-30 23:05:52 +00:00
Util [InstCombine] Add a flag to disable LowerDbgDeclare 2017-09-13 01:43:25 +00:00
WholeProgramDevirt [LTO][ThinLTO] Use the linker resolutions to mark global values as dso_local. 2017-11-04 17:04:39 +00:00