llvm-project

History

Sanjay Patel 5a0cdac174 [InstCombine] canonicalize shifty abs(): ashr+add+xor --> cmp+neg+sel We want to do this for 2 reasons: 1. Value tracking does not recognize the ashr variant, so it would fail to match for cases like D39766. 2. DAGCombiner does better at producing optimal codegen when we have the cmp+sel pattern. More detail about what happens in the backend: 1. DAGCombiner has a generic transform for all targets to convert the scalar cmp+sel variant of abs into the shift variant. That is the opposite of this IR canonicalization. 2. DAGCombiner has a generic transform for all targets to convert the vector cmp+sel variant of abs into either an ABS node or the shift variant. That is again the opposite of this IR canonicalization. 3. DAGCombiner has a generic transform for all targets to convert the exact shift variants produced by #1 or #2 into an ISD::ABS node. Note: It would be an efficiency improvement if we had #1 go directly to an ABS node when that's legal/custom. 4. The pattern matching above is incomplete, so it is possible to escape the intended/optimal codegen in a variety of ways. a. For #2, the vector path is missing the case for setlt with a '1' constant. b. For #3, we are missing a match for commuted versions of the shift variants. 5. Therefore, this IR canonicalization can only help get us to the optimal codegen. The version of cmp+sel produced by this patch will be recognized in the DAG and converted to an ABS node when possible or the shift sequence when not. 6. In the following examples with this patch applied, we may get conditional moves rather than the shift produced by the generic DAGCombiner transforms. The conditional move is created using a target-specific decision for any given target. Whether it is optimal or not for a particular subtarget may be up for debate. define i32 @abs_shifty(i32 %x) { %signbit = ashr i32 %x, 31 %add = add i32 %signbit, %x %abs = xor i32 %signbit, %add ret i32 %abs } define i32 @abs_cmpsubsel(i32 %x) { %cmp = icmp slt i32 %x, zeroinitializer %sub = sub i32 zeroinitializer, %x %abs = select i1 %cmp, i32 %sub, i32 %x ret i32 %abs } define <4 x i32> @abs_shifty_vec(<4 x i32> %x) { %signbit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31> %add = add <4 x i32> %signbit, %x %abs = xor <4 x i32> %signbit, %add ret <4 x i32> %abs } define <4 x i32> @abs_cmpsubsel_vec(<4 x i32> %x) { %cmp = icmp slt <4 x i32> %x, zeroinitializer %sub = sub <4 x i32> zeroinitializer, %x %abs = select <4 x i1> %cmp, <4 x i32> %sub, <4 x i32> %x ret <4 x i32> %abs } > $ ./opt -instcombine shiftyabs.ll -S \| ./llc -o - -mtriple=x86_64 -mattr=avx > abs_shifty: > movl %edi, %eax > negl %eax > cmovll %edi, %eax > retq > > abs_cmpsubsel: > movl %edi, %eax > negl %eax > cmovll %edi, %eax > retq > > abs_shifty_vec: > vpabsd %xmm0, %xmm0 > retq > > abs_cmpsubsel_vec: > vpabsd %xmm0, %xmm0 > retq > > $ ./opt -instcombine shiftyabs.ll -S \| ./llc -o - -mtriple=aarch64 > abs_shifty: > cmp w0, #0 // =0 > cneg w0, w0, mi > ret > > abs_cmpsubsel: > cmp w0, #0 // =0 > cneg w0, w0, mi > ret > > abs_shifty_vec: > abs v0.4s, v0.4s > ret > > abs_cmpsubsel_vec: > abs v0.4s, v0.4s > ret > > $ ./opt -instcombine shiftyabs.ll -S \| ./llc -o - -mtriple=powerpc64le > abs_shifty: > srawi 4, 3, 31 > add 3, 3, 4 > xor 3, 3, 4 > blr > > abs_cmpsubsel: > srawi 4, 3, 31 > add 3, 3, 4 > xor 3, 3, 4 > blr > > abs_shifty_vec: > vspltisw 3, -16 > vspltisw 4, 15 > vsubuwm 3, 4, 3 > vsraw 3, 2, 3 > vadduwm 2, 2, 3 > xxlxor 34, 34, 35 > blr > > abs_cmpsubsel_vec: > vspltisw 3, -16 > vspltisw 4, 15 > vsubuwm 3, 4, 3 > vsraw 3, 2, 3 > vadduwm 2, 2, 3 > xxlxor 34, 34, 35 > blr > Differential Revision: https://reviews.llvm.org/D40984 llvm-svn: 320921		2017-12-16 16:41:17 +00:00
..
ADCE	[ADCE][Dominators] Reapply: Teach ADCE to preserve dominators	2017-08-22 16:30:21 +00:00
AddDiscriminators	…
AlignmentFromAssumptions	…
ArgumentPromotion	[ArgPromotion] Preserve alignment of byval argument in new alloca	2017-08-04 17:09:11 +00:00
AtomicExpand	…
BDCE	[BDCE] Don't check demanded bits on unsized types	2017-08-16 16:09:22 +00:00
BranchFolding	…
CallSiteSplitting	[CallSiteSplitting] Refactor creating callsites.	2017-12-13 03:05:20 +00:00
CalledValuePropagation	Add CalledValuePropagation pass	2017-10-25 13:40:08 +00:00
CodeExtractor	[InlineFunction] Set debug loc for call to forward varargs.	2017-12-09 14:25:33 +00:00
CodeGenPrepare	[BypassSlowDivision] Improve our handling of divisions by constants	2017-12-04 19:21:58 +00:00
ConstProp	…
ConstantHoisting	Fix out-of-order stepping behavior in programs with hoisted constants.	2017-11-09 20:01:31 +00:00
ConstantMerge	Canonicalize the representation of empty an expression in DIGlobalVariableExpression	2017-08-30 18:06:51 +00:00
Coroutines	[coroutines] Add support for symmetric control transfer (musttail on coro.resumes followed by a suspend)	2017-08-25 02:25:10 +00:00
CorrelatedValuePropagation	[CVP] Remove some {s\|u}sub.with.overflow checks.	2017-12-05 18:14:24 +00:00
CrossDSOCFI	[cfi] Build __cfi_check as Thumb when applicable.	2017-08-29 22:29:15 +00:00
DCE	Add an @llvm.sideeffect intrinsic	2017-11-08 21:59:51 +00:00
DeadArgElim	Remove the obsolete offset parameter from @llvm.dbg.value	2017-07-28 20:21:02 +00:00
DeadStoreElimination	Add an @llvm.sideeffect intrinsic	2017-11-08 21:59:51 +00:00
DivRemPairs	[DivRemPairs] split tests per target to account for bots that don't build for all targets	2017-09-09 14:10:59 +00:00
EarlyCSE	[EarlyCSE] recognize swapped variants of abs/nabs as equivalent	2017-12-13 22:57:35 +00:00
EliminateAvailableExternally	…
EntryExitInstrumenter	EntryExitInstrumenter: set DebugLocs on the inserted call instructions (PR35412)	2017-11-28 18:44:26 +00:00
ExpandMemCmp/X86	re-land [ExpandMemCmp] Split ExpandMemCmp from CodeGen into its own pass."	2017-11-03 12:12:27 +00:00
Float2Int	…
ForcedFunctionAttrs	…
FunctionAttrs	Add an @llvm.sideeffect intrinsic	2017-11-08 21:59:51 +00:00
FunctionImport	[LTO] Make processing of combined module more consistent	2017-12-16 02:10:00 +00:00
GCOVProfiling	Canonicalize the representation of empty an expression in DIGlobalVariableExpression	2017-08-30 18:06:51 +00:00
GVN	Hardware-assisted AddressSanitizer (llvm part).	2017-12-09 00:21:41 +00:00
GVNHoist	[GVNHoist] Fix: PR35222 gvn-hoist incorrectly erases load	2017-12-13 19:40:07 +00:00
GVNSink	Add an @llvm.sideeffect intrinsic	2017-11-08 21:59:51 +00:00
GlobalDCE	…
GlobalMerge	Canonicalize the representation of empty an expression in DIGlobalVariableExpression	2017-08-30 18:06:51 +00:00
GlobalOpt	Add an @llvm.sideeffect intrinsic	2017-11-08 21:59:51 +00:00
GlobalSplit	…
GuardWidening	…
IPConstantProp	…
IRCE	[IRCE] Smart range intersection	2017-11-20 06:07:57 +00:00
IndVarSimplify	[SCEV] Fix the movement of insertion point in expander. PR35406.	2017-12-15 05:24:42 +00:00
InferAddressSpaces	InferAddressSpaces: Fix bug about replacing addrspacecast	2017-10-30 21:19:41 +00:00
InferFunctionAttrs	…
Inline	[InlineCost] Find repeated loads in the callee	2017-12-15 14:34:41 +00:00
InstCombine	[InstCombine] canonicalize shifty abs(): ashr+add+xor --> cmp+neg+sel	2017-12-16 16:41:17 +00:00
InstMerge	…
InstNamer	…
InstSimplify	Reintroduce r320049, r320014 and r319894.	2017-12-13 11:21:18 +00:00
InterleavedAccess	[X86][LLVM]Expanding Supports lowerInterleaved{store\|load}() in X86InterleavedAccess (VF64 stride 3-4)	2017-10-02 07:35:25 +00:00
Internalize	…
JumpThreading	Reverting [JumpThreading] Preservation of DT and LVI across the pass	2017-12-13 22:01:17 +00:00
LCSSA	…
LICM	Re-commit : [LICM] Allow sinking when foldable in loop	2017-12-15 20:33:24 +00:00
LoadStoreVectorizer	Add an @llvm.sideeffect intrinsic	2017-11-08 21:59:51 +00:00
LoopDataPrefetch	…
LoopDeletion	[Dominators] Teach LoopDeletion to use the new incremental API	2017-08-02 18:17:52 +00:00
LoopDistribute	…
LoopIdiom	Add an @llvm.sideeffect intrinsic	2017-11-08 21:59:51 +00:00
LoopInterchange	[LoopInterchange] Fix phi node ordering miscompile.	2017-10-21 13:58:37 +00:00
LoopLoadElim	…
LoopPredication	[Loop Predication] Teach LP about reverse loops	2017-12-04 15:11:48 +00:00
LoopReroll	Remove the obsolete offset parameter from @llvm.dbg.value	2017-07-28 20:21:02 +00:00
LoopRotate	Fix llvm/test/Transforms/LoopRotate/pr35210.ll in rL318237, it uses debug options.	2017-11-15 06:46:58 +00:00
LoopSimplify	[SCEV] Teach SCEV to find maxBECount when loop endbound is variant	2017-10-13 14:30:43 +00:00
LoopSimplifyCFG	…
LoopStrengthReduce	LSR: Check more intrinsic pointer operands	2017-12-11 21:38:43 +00:00
LoopUnroll	loop-unroll: teach remapInstruction to update dbg.value intrinsics.	2017-11-01 23:12:35 +00:00
LoopUnswitch	[LoopUnswitch] Fix a simple bug which disables loop unswitch for select statement	2017-08-29 21:45:11 +00:00
LoopVectorize	Move Transforms/LoopVectorize/consecutive-ptr-cg-bug.ll into the X86 subdirectory	2017-12-16 05:10:20 +00:00
LoopVersioning	…
LoopVersioningLICM	…
LowerAtomic	LowerAtomic: Don't skip optnone functions; atomic still need lowering (PR34020)	2017-08-23 15:43:28 +00:00
LowerExpectIntrinsic	…
LowerGuardIntrinsic	…
LowerInvoke	…
LowerSwitch	…
LowerTypeTests	Current implementation of Value::replaceUsesExceptBlockAddr() uses UseList	2017-11-17 00:30:24 +00:00
Mem2Reg	[Debugify] Add a pass to test debug info preservation	2017-12-08 21:57:28 +00:00
MemCpyOpt	Revert r319482 and r319483 "[memcpyopt] Teach memcpyopt to optimize across basic blocks"	2017-12-06 01:47:55 +00:00
MergeFunc	[TailRecursionElimination] Skip debug intrinsics.	2017-11-28 09:32:25 +00:00
MergeICmps	Re-land "[MergeICmps] Disable mergeicmps if the target does not want to handle memcmp expansion."	2017-10-10 08:00:45 +00:00
MetaRenamer	[MetaRenamer] Leave `@main` alone.	2017-08-01 05:14:45 +00:00
NameAnonGlobals	…
NaryReassociate	…
NewGVN	Hardware-assisted AddressSanitizer (llvm part).	2017-12-09 00:21:41 +00:00
ObjCARC	ObjCARC: do not increment past the end of the BB	2017-10-24 00:09:10 +00:00
PGOProfile	[LTO] Make processing of combined module more consistent	2017-12-16 02:10:00 +00:00
PartiallyInlineLibCalls	[PartiallyInlineLibCalls][x86] add TTI hook to allow sqrt inlining to depend on arg rather than result	2017-11-27 21:15:43 +00:00
PhaseOrdering	[SimplifyCFG] don't sink common insts too soon (PR34603)	2017-12-14 22:05:20 +00:00
PlaceSafepoints	All libcalls should be considered to be GC-leaf functions.	2017-07-27 16:49:39 +00:00
PreISelIntrinsicLowering	…
PruneEH	…
Reassociate	Reassociate: add global reassociation algorithm	2017-12-12 19:18:02 +00:00
Reg2Mem	…
RewriteStatepointsForGC	[PM] port Rewrite Statepoints For GC to the new pass manager.	2017-12-15 09:32:11 +00:00
SCCP	[SCCP] Pick the right lattice value for constants.	2017-11-22 03:04:55 +00:00
SLPVectorizer	[SLPVectorizer] Don't ignore scalar extraction instructions of aggregate value	2017-12-14 19:35:43 +00:00
SROA	Recommit rL319407: [SROA] enable splitting for non-whole-alloca loads and stores	2017-12-01 06:05:05 +00:00
SafeStack	Parse and print DIExpressions inline to ease IR and MIR testing	2017-08-23 20:31:27 +00:00
SampleProfile	Include already promoted counts when computing SUM for VP.	2017-11-06 19:52:49 +00:00
Scalarizer	Remove the obsolete offset parameter from @llvm.dbg.value	2017-07-28 20:21:02 +00:00
SeparateConstOffsetFromGEP	…
SimpleLoopUnswitch	[PM/Unswitch] Teach SimpleLoopUnswitch to do non-trivial unswitching,	2017-11-17 19:58:36 +00:00
SimplifyCFG	[SimplifyCFG] don't sink common insts too soon (PR34603)	2017-12-14 22:05:20 +00:00
Sink	…
SpeculateAroundPHIs	Add a new pass to speculate around PHI nodes with constant (integer) operands when profitable.	2017-11-28 11:32:31 +00:00
SpeculativeExecution	…
StraightLineStrengthReduce	…
StripDeadPrototypes	…
StripSymbols	Canonicalize the representation of empty an expression in DIGlobalVariableExpression	2017-08-30 18:06:51 +00:00
StructurizeCFG	[Dominators] Include infinite loops in PostDominatorTree	2017-08-15 18:14:57 +00:00
TailCallElim	Remove this test	2017-11-28 22:39:38 +00:00
ThinLTOBitcodeWriter	ThinLTOBitcodeWriter: Try harder to discard unused references to the merged module.	2017-11-30 23:05:52 +00:00
Util	[InstCombine] Add a flag to disable LowerDbgDeclare	2017-09-13 01:43:25 +00:00
WholeProgramDevirt	[LTO][ThinLTO] Use the linker resolutions to mark global values as dso_local.	2017-11-04 17:04:39 +00:00