llvm-project

History

Eli Friedman 73e8a784e6 [SelectionDAG] Improve the legalisation lowering of UMULO. There is no way in the universe, that doing a full-width division in software will be faster than doing overflowing multiplication in software in the first place, especially given that this same full-width multiplication needs to be done anyway. This patch replaces the previous implementation with a direct lowering into an overflowing multiplication algorithm based on half-width operations. Correctness of the algorithm was verified by exhaustively checking the output of this algorithm for overflowing multiplication of 16 bit integers against an obviously correct widening multiplication. Baring any oversights introduced by porting the algorithm to DAG, confidence in correctness of this algorithm is extremely high. Following table shows the change in both t = runtime and s = space. The change is expressed as a multiplier of original, so anything under 1 is “better” and anything above 1 is worse. +-------+-----------+-----------+-------------+-------------+ \| Arch \| u64u64 t \| u64u64 s \| u128u128 t \| u128u128 s \| +-------+-----------+-----------+-------------+-------------+ \| X64 \| - \| - \| ~0.5 \| ~0.64 \| \| i686 \| ~0.5 \| ~0.6666 \| ~0.05 \| ~0.9 \| \| armv7 \| - \| ~0.75 \| - \| ~1.4 \| +-------+-----------+-----------+-------------+-------------+ Performance numbers have been collected by running overflowing multiplication in a loop under `perf` on two x86_64 (one Intel Haswell, other AMD Ryzen) based machines. Size numbers have been collected by looking at the size of function containing an overflowing multiply in a loop. All in all, it can be seen that both performance and size has improved except in the case of armv7 where code size has regressed for 128-bit multiply. u128*u128 overflowing multiply on 32-bit platforms seem to benefit from this change a lot, taking only 5% of the time compared to original algorithm to calculate the same thing. The final benefit of this change is that LLVM is now capable of lowering the overflowing unsigned multiply for integers of any bit-width as long as the target is capable of lowering regular multiplication for the same bit-width. Previously, 128-bit overflowing multiply was the widest possible. Patch by Simonas Kazlauskas! Differential Revision: https://reviews.llvm.org/D50310 llvm-svn: 339922		2018-08-16 18:39:39 +00:00
..
2009-07-17-CrossRegClassCopy.ll	…
2009-07-21-ISelBug.ll	[ARM] Generate consistent frame records for Thumb2	2016-08-23 09:19:22 +00:00
2009-07-23-CPIslandBug.ll	…
2009-07-30-PEICrash.ll	…
2009-08-01-WrongLDRBOpc.ll	…
2009-08-02-CoalescerBug.ll	…
2009-08-04-CoalescerAssert.ll	…
2009-08-04-CoalescerBug.ll	…
2009-08-04-ScavengerAssert.ll	…
2009-08-04-SubregLoweringBug.ll	Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)	2018-01-19 17:13:12 +00:00
2009-08-04-SubregLoweringBug2.ll	…
2009-08-04-SubregLoweringBug3.ll	…
2009-08-06-SpDecBug.ll	…
2009-08-07-CoalescerBug.ll	…
2009-08-07-NeonFPBug.ll	…
2009-08-08-ScavengerAssert.ll	…
2009-08-10-ISelBug.ll	…
2009-08-21-PostRAKill4.ll	…
2009-09-01-PostRAProlog.ll	Fix an old memset signature in 2009-09-01-PostRAProlog.ll test causing a buildbot failure	2016-06-23 16:07:10 +00:00
2009-10-15-ITBlockBranch.ll	…
2009-11-01-CopyReg2RegBug.ll	…
2009-11-11-ScavengerAssert.ll	…
2009-11-13-STRDBug.ll	…
2009-12-01-LoopIVUsers.ll	[SCEV] Try to reuse existing value during SCEV expansion	2016-02-04 01:27:38 +00:00
2010-01-06-TailDuplicateLabels.ll	…
2010-01-19-RemovePredicates.ll	…
2010-02-11-phi-cycle.ll	ARM: stop emitting blx instructions for most calls on MachO.	2016-05-10 19:17:47 +00:00
2010-02-24-BigStack.ll	…
2010-03-08-addi12-ccout.ll	…
2010-03-15-AsmCCClobber.ll	ARM: convert ORR instructions to ADD where possible on Thumb.	2018-06-20 12:09:44 +00:00
2010-04-15-DynAllocBug.ll	…
2010-04-26-CopyRegCrash.ll	…
2010-05-24-rsbs.ll	…
2010-06-14-NEONCoalescer.ll	[CodeGen] Don't print "pred:" and "opt:" in -debug output	2018-01-09 17:31:07 +00:00
2010-06-19-ITBlockCrash.ll	…
2010-06-21-TailMergeBug.ll	…
2010-08-10-VarSizedAllocaBug.ll	…
2010-11-22-EpilogueBug.ll	Generalize MergeBlockIntoPredecessor. Replace uses of MergeBasicBlockIntoOnlyPred.	2018-06-20 22:01:04 +00:00
2010-12-03-AddSPNarrowing.ll	…
2011-04-21-FILoweringBug.ll	…
2011-06-07-TwoAddrEarlyClobber.ll	…
2011-12-16-T2SizeReduceAssert.ll	…
2012-01-13-CBNZBug.ll	Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)	2018-01-19 17:13:12 +00:00
2013-02-19-tail-call-register-hint.ll	…
2013-03-02-vduplane-nonconstant-source-index.ll	…
2013-03-06-vector-sext-operand-scalarize.ll	…
aapcs.ll	[ARM] Return true in enableMultipleCopyHints().	2018-02-16 09:51:01 +00:00
aligned-constants.ll	…
aligned-spill.ll	[ARM] Generate consistent frame records for Thumb2	2016-08-23 09:19:22 +00:00
bfi.ll	…
bfx.ll	…
bicbfi.ll	[CodeGen] Always use `printReg` to print registers in both MIR and debug	2017-11-30 16:12:24 +00:00
buildvector-crash.ll	…
carry.ll	…
cbnz.ll	CodeGen: Allow small copyable blocks to "break" the CFG.	2017-01-31 23:48:32 +00:00
cmp-frame.ll	ARM: Don't rewrite add reg, $sp, 0 -> mov reg, $sp if the add defines CPSR.	2018-02-27 19:00:59 +00:00
constant-islands-jump-table.ll	…
constant-islands-new-island-padding.ll	…
constant-islands-new-island.ll	[ARM] Make -mcpu=generic schedule for an in-order core (Cortex-A8).	2017-06-28 07:07:03 +00:00
constant-islands.ll	ARM: Do not use llc -march in tests.	2017-08-01 22:20:49 +00:00
cortex-fp.ll	ARM: Do not use llc -march in tests.	2017-08-01 22:20:49 +00:00
crash.ll	…
cross-rc-coalescing-1.ll	…
cross-rc-coalescing-2.ll	[Thumb2] fix typo in test from r332548	2018-05-17 03:24:25 +00:00
div.ll	…
emit-unwinding.ll	ARM: use r7 as the frame-pointer on all MachO targets.	2016-04-11 22:27:40 +00:00
float-cmp.ll	[ARM] Materialise some boolean values to avoid a branch	2018-02-16 09:23:59 +00:00
float-intrinsics-double.ll	ARM: match GCC's behaviour for builtins	2017-01-13 16:25:33 +00:00
float-intrinsics-float.ll	ARM: match GCC's behaviour for builtins	2017-01-13 16:25:33 +00:00
float-ops.ll	[Thumb] Teach ISel how to lower compares of AND bitmasks efficiently	2016-12-15 09:38:59 +00:00
frame-pointer.ll	Re-land "[Thumb] Save/restore high registers in Thumb1 pro/epilogues"	2016-10-11 21:14:03 +00:00
frameless.ll	…
frameless2.ll	…
ifcvt-compare.ll	CodeGen: Allow small copyable blocks to "break" the CFG.	2017-01-31 23:48:32 +00:00
ifcvt-neon-deprecated.mir	Followup on Proposal to move MIR physical register namespace to '$' sigil.	2018-01-31 22:04:26 +00:00
ifcvt-no-branch-predictor.ll	[CodeGen] Add a new pass for PostRA sink	2018-03-22 20:06:47 +00:00
ifcvt-rescan-bug-2016-08-22.ll	[CodeGen] Unify MBB reference format in both MIR and debug output	2017-12-04 17:18:51 +00:00
ifcvt-rescan-diamonds.ll	IfConversion: Fix bug introduced by rescanning diamonds.	2016-09-02 18:29:26 +00:00
inflate-regs.ll	…
inlineasm.ll	…
intrinsics-cc.ll	[ARM] Honor -mfloat-abi for libcall calling convention	2017-10-26 21:42:32 +00:00
intrinsics-coprocessor.ll	ARM: Do not use llc -march in tests.	2017-08-01 22:20:49 +00:00
large-call.ll	…
large-stack.ll	ARM: Do not use llc -march in tests.	2017-08-01 22:20:49 +00:00
ldr-str-imm12.ll	Elide stores which are overwritten without being observed.	2017-05-16 19:43:56 +00:00
lit.local.cfg	…
longMACt.ll	…
lsr-deficiency.ll	[Thumb] Select (CMPZ X, -C) -> (CMPZ (ADDS X, C), 0)	2016-09-09 12:52:24 +00:00
machine-licm.ll	Make the canonicalisation on shifts benifit to more case.	2016-12-23 02:56:07 +00:00
mul_const.ll	…
pic-load.ll	…
segmented-stacks.ll	ARM: Do not use llc -march in tests.	2017-08-01 22:20:49 +00:00
setjmp_longjmp.ll	…
stack_guard_remat.ll	Add address space mangling to lifetime intrinsics	2017-04-10 20:18:21 +00:00
t2sizereduction.mir	Followup on Proposal to move MIR physical register namespace to '$' sigil.	2018-01-31 22:04:26 +00:00
tail-call-r9.ll	…
tbb-removeadd.mir	Followup on Proposal to move MIR physical register namespace to '$' sigil.	2018-01-31 22:04:26 +00:00
thumb2-adc.ll	…
thumb2-add.ll	…
thumb2-add2.ll	…
thumb2-add3.ll	…
thumb2-add4.ll	…
thumb2-add5.ll	…
thumb2-add6.ll	…
thumb2-and.ll	…
thumb2-and2.ll	…
thumb2-asr.ll	…
thumb2-asr2.ll	…
thumb2-bcc.ll	…
thumb2-bfc.ll	…
thumb2-bic.ll	…
thumb2-branch.ll	…
thumb2-call-tc.ll	…
thumb2-call.ll	ARM: stop emitting blx instructions for most calls on MachO.	2016-05-10 19:17:47 +00:00
thumb2-cbnz.ll	Codegen: Fix broken assumption in Tail Merge.	2016-06-24 18:16:36 +00:00
thumb2-clz.ll	…
thumb2-cmn.ll	[ARM] Materialise some boolean values to avoid a branch	2018-02-16 09:23:59 +00:00
thumb2-cmn2.ll	[ARM] Materialise some boolean values to avoid a branch	2018-02-16 09:23:59 +00:00
thumb2-cmp.ll	[ARM] Treat cmn immediates as legal in isLegalICmpImmediate.	2018-07-10 23:44:37 +00:00
thumb2-cpsr-liveness.ll	Fix PR26655: Bail out if all regs of an inst BUNDLE have the correct kill flag	2016-05-10 17:57:27 +00:00
thumb2-eor.ll	…
thumb2-eor2.ll	…
thumb2-ifcvt1-tc.ll	…
thumb2-ifcvt1.ll	CodeGen: If Convert blocks that would form a diamond when tail-merged.	2016-08-24 21:34:27 +00:00
thumb2-ifcvt2.ll	MachO: trap unreachable instructions	2018-04-13 22:25:20 +00:00
thumb2-ifcvt3.ll	…
thumb2-jtb.ll	Generalize MergeBlockIntoPredecessor. Replace uses of MergeBasicBlockIntoOnlyPred.	2018-06-20 22:01:04 +00:00
thumb2-ldm.ll	[ARM] Generate consistent frame records for Thumb2	2016-08-23 09:19:22 +00:00
thumb2-ldr.ll	…
thumb2-ldr_ext.ll	…
thumb2-ldr_post.ll	…
thumb2-ldr_pre.ll	…
thumb2-ldrb.ll	…
thumb2-ldrd.ll	…
thumb2-ldrh.ll	…
thumb2-lsl.ll	…
thumb2-lsl2.ll	…
thumb2-lsr.ll	…
thumb2-lsr2.ll	…
thumb2-lsr3.ll	…
thumb2-mla.ll	…
thumb2-mls.ll	…
thumb2-mov.ll	…
thumb2-mul.ll	…
thumb2-mulhi.ll	…
thumb2-mvn.ll	…
thumb2-mvn2.ll	…
thumb2-neg.ll	…
thumb2-orn.ll	…
thumb2-orn2.ll	…
thumb2-orr.ll	…
thumb2-orr2.ll	…
thumb2-pack.ll	[ARM] Remove t2xtpk feature from tests	2017-03-09 15:14:32 +00:00
thumb2-rev.ll	[ARM] Remove t2xtpk feature from tests	2017-03-09 15:14:32 +00:00
thumb2-rev16.ll	ARM: Do not use llc -march in tests.	2017-08-01 22:20:49 +00:00
thumb2-ror.ll	…
thumb2-rsb.ll	…
thumb2-rsb2.ll	…
thumb2-sbc.ll	…
thumb2-select.ll	…
thumb2-select_xform.ll	[ARM] Return true in enableMultipleCopyHints().	2018-02-16 09:51:01 +00:00
thumb2-shifter.ll	…
thumb2-smla.ll	[ARM] Remove t2xtpk feature from tests	2017-03-09 15:14:32 +00:00
thumb2-smul.ll	[ARM] Remove t2xtpk feature from tests	2017-03-09 15:14:32 +00:00
thumb2-spill-q.ll	[Thumb] preserve test intent by removing undef	2018-05-16 22:47:42 +00:00
thumb2-str.ll	…
thumb2-str_post.ll	…
thumb2-str_pre.ll	…
thumb2-strb.ll	…
thumb2-strh.ll	…
thumb2-sub.ll	…
thumb2-sub2.ll	…
thumb2-sub3.ll	…
thumb2-sub4.ll	…
thumb2-sub5.ll	…
thumb2-sxt-uxt.ll	[ARM] Replace HasT2ExtractPack with HasDSP	2017-02-17 15:42:44 +00:00
thumb2-sxt_rot.ll	[ARM] Replace HasT2ExtractPack with HasDSP	2017-02-17 15:42:44 +00:00
thumb2-tbb.ll	[Thumb-1] Synthesize TBB/TBH instructions to make use of compressed jump tables	2016-11-01 13:37:41 +00:00
thumb2-tbh.ll	[Thumb-1] Synthesize TBB/TBH instructions to make use of compressed jump tables	2016-11-01 13:37:41 +00:00
thumb2-teq.ll	[ARM] Materialise some boolean values to avoid a branch	2018-02-16 09:23:59 +00:00
thumb2-teq2.ll	[ARM] Materialise some boolean values to avoid a branch	2018-02-16 09:23:59 +00:00
thumb2-tst.ll	[ARM] Materialise some boolean values to avoid a branch	2018-02-16 09:23:59 +00:00
thumb2-tst2.ll	[ARM] Materialise some boolean values to avoid a branch	2018-02-16 09:23:59 +00:00
thumb2-uxt_rot.ll	[ARM] Replace HasT2ExtractPack with HasDSP	2017-02-17 15:42:44 +00:00
thumb2-uxtb.ll	ARM: convert ORR instructions to ADD where possible on Thumb.	2018-06-20 12:09:44 +00:00
tls1.ll	…
tls2.ll	Don't print (PLT) on arm.	2016-06-16 16:09:53 +00:00
tpsoft.ll	…
umulo-64-legalisation-lowering.ll	[SelectionDAG] Improve the legalisation lowering of UMULO.	2018-08-16 18:39:39 +00:00
umulo-128-legalisation-lowering.ll	[SelectionDAG] Improve the legalisation lowering of UMULO.	2018-08-16 18:39:39 +00:00
v8_IT_1.ll	…
v8_IT_2.ll	…
v8_IT_3.ll	[arm] Fix Unnecessary reloads from GOT.	2017-11-13 20:45:38 +00:00
v8_IT_4.ll	CodeGen: Allow small copyable blocks to "break" the CFG.	2017-01-31 23:48:32 +00:00
v8_IT_5.ll	[Dominators] Include infinite loops in PostDominatorTree	2017-08-15 18:14:57 +00:00
v8_IT_6.ll	…