llvm-project

History

Eli Friedman 73e8a784e6 [SelectionDAG] Improve the legalisation lowering of UMULO. There is no way in the universe, that doing a full-width division in software will be faster than doing overflowing multiplication in software in the first place, especially given that this same full-width multiplication needs to be done anyway. This patch replaces the previous implementation with a direct lowering into an overflowing multiplication algorithm based on half-width operations. Correctness of the algorithm was verified by exhaustively checking the output of this algorithm for overflowing multiplication of 16 bit integers against an obviously correct widening multiplication. Baring any oversights introduced by porting the algorithm to DAG, confidence in correctness of this algorithm is extremely high. Following table shows the change in both t = runtime and s = space. The change is expressed as a multiplier of original, so anything under 1 is “better” and anything above 1 is worse. +-------+-----------+-----------+-------------+-------------+ \| Arch \| u64u64 t \| u64u64 s \| u128u128 t \| u128u128 s \| +-------+-----------+-----------+-------------+-------------+ \| X64 \| - \| - \| ~0.5 \| ~0.64 \| \| i686 \| ~0.5 \| ~0.6666 \| ~0.05 \| ~0.9 \| \| armv7 \| - \| ~0.75 \| - \| ~1.4 \| +-------+-----------+-----------+-------------+-------------+ Performance numbers have been collected by running overflowing multiplication in a loop under `perf` on two x86_64 (one Intel Haswell, other AMD Ryzen) based machines. Size numbers have been collected by looking at the size of function containing an overflowing multiply in a loop. All in all, it can be seen that both performance and size has improved except in the case of armv7 where code size has regressed for 128-bit multiply. u128*u128 overflowing multiply on 32-bit platforms seem to benefit from this change a lot, taking only 5% of the time compared to original algorithm to calculate the same thing. The final benefit of this change is that LLVM is now capable of lowering the overflowing unsigned multiply for integers of any bit-width as long as the target is capable of lowering regular multiplication for the same bit-width. Previously, 128-bit overflowing multiply was the widest possible. Patch by Simonas Kazlauskas! Differential Revision: https://reviews.llvm.org/D50310 llvm-svn: 339922		2018-08-16 18:39:39 +00:00
..
2007-01-31-RegInfoAssert.ll	…
2007-02-02-JoinIntervalsCrash.ll	…
2007-05-05-InvalidPushPop.ll	…
2009-06-18-ThumbCommuteMul.ll	…
2009-07-20-TwoAddrBug.ll	…
2009-07-27-PEIAssert.ll	…
2009-08-12-ConstIslandAssert.ll	…
2009-08-12-RegInfoAssert.ll	…
2009-08-20-ISelBug.ll	…
2009-12-17-pre-regalloc-taildup.ll	…
2010-06-18-SibCallCrash.ll	…
2010-07-01-FuncAlign.ll	…
2010-07-15-debugOrdering.ll	…
2011-05-11-DAGLegalizer.ll	Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)	2018-01-19 17:13:12 +00:00
2011-06-16-NoGPRs.ll	…
2011-EpilogueBug.ll	…
2012-04-26-M0ISelBug.ll	…
2014-06-10-thumb1-ldst-opt-bug.ll	…
DbgValueOtherTargets.test	…
PR17309.ll	…
PR35481.ll	[ARM] Fix PR35481	2018-01-08 11:32:37 +00:00
PR36658.mir	[ARM] Fix "Constant pool entry out of range!" in Thumb1 mode	2018-03-23 17:53:27 +00:00
addr-modes.ll	[ARM, Thumb1] Prevent ARMTargetLowering::isLegalAddressingMode from accepting illegal modes	2017-08-24 10:00:25 +00:00
and_neg.ll	…
asmprinter-bug.ll	…
barrier.ll	…
bic_imm.ll	[ARM] Adjust AND immediates to make them cheaper to select.	2018-08-10 21:21:53 +00:00
branchless-cmp.ll	[ARM] Materialise some boolean values to avoid a branch	2018-02-16 09:23:59 +00:00
callee_save.ll	…
cmp-add-fold.ll	…
cmp-fold.ll	…
constants.ll	[ARM] Materialise some boolean values to avoid a branch	2018-02-16 09:23:59 +00:00
copy_thumb.ll	…
cortex-m0-unaligned-access.ll	…
dyn-stackalloc.ll	Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)	2018-01-19 17:13:12 +00:00
fastcc.ll	…
fpconv.ll	…
fpow.ll	…
frame-access.ll	[ARM] Fix access to stack arguments when re-aligning SP in Armv6m	2018-03-02 15:47:14 +00:00
frame_thumb.ll	…
i8-phi-ext.ll	[CodeGen] Emit more precise AssertZext/AssertSext nodes.	2018-07-11 23:26:35 +00:00
iabs.ll	…
inlineasm-imm-thumb.ll	…
inlineasm-thumb.ll	…
ispositive.ll	…
large-stack.ll	[ARM] Dynamic stack alignment for 16-bit Thumb	2017-10-22 11:56:35 +00:00
ldm-merge-call.ll	…
ldm-merge-struct.ll	…
ldm-stm-base-materialization-thumb2.ll	Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)	2018-01-19 17:13:12 +00:00
ldm-stm-base-materialization.ll	Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)	2018-01-19 17:13:12 +00:00
ldm-stm-postinc.ll	…
ldr_ext.ll	…
ldr_frame.ll	…
lit.local.cfg	…
litpoolremat.ll	…
long-setcc.ll	[ARM] Materialise some boolean values to avoid a branch	2018-02-16 09:23:59 +00:00
long.ll	[ARM] Return true in enableMultipleCopyHints().	2018-02-16 09:51:01 +00:00
long_shift.ll	…
machine-cse-physreg.mir	Followup on Proposal to move MIR physical register namespace to '$' sigil.	2018-01-31 22:04:26 +00:00
mature-mc-support.ll	…
mul.ll	…
mvn.ll	[ARM] Fix issue with large xor constants.	2018-02-22 09:38:57 +00:00
optionaldef-scheduling.ll	…
pop.ll	…
pr35836.ll	Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"	2018-02-27 16:59:10 +00:00
pr35836_2.ll	[ARM] Allow the scheduler to clone a node with glue to avoid a copy CPSR ↔ GPR.	2018-01-31 09:23:43 +00:00
push.ll	…
remove-unneeded-push-pop.ll	…
rev.ll	…
segmented-stacks-dynamic.ll	…
segmented-stacks.ll	…
select.ll	Revert "[ARM] Lower lower saturate to 0 and lower saturate to -1 using bit-operations"	2018-01-31 22:55:19 +00:00
shift-and.ll	[ARM] Adjust AND immediates to make them cheaper to select.	2018-08-10 21:21:53 +00:00
sjljehprepare-lower-vector.ll	…
stack-access.ll	…
stack-coloring-without-frame-ptr.ll	Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)	2018-01-19 17:13:12 +00:00
stack-frame.ll	…
stack_guard_remat.ll	…
stm-deprecated.ll	…
stm-merge.ll	…
stm-scavenging.ll	[LivePhysRegs] Fix handling of return instructions.	2018-02-06 23:00:17 +00:00
tbb-reuse.mir	Followup on Proposal to move MIR physical register namespace to '$' sigil.	2018-01-31 22:04:26 +00:00
thumb-imm.ll	…
thumb-ldm.ll	…
thumb-shrink-wrapping.ll	[ARM] Allow CMPZ transforms even if the input has multiple uses.	2018-06-08 21:16:56 +00:00
thumb1-cmp.ll	[ARM] Testcase for Thumb1 cmp with constants.	2018-06-19 00:12:13 +00:00
trap.ll	…
triple.ll	…
tst_teq.ll	…
umulo-128-legalisation-lowering.ll	[SelectionDAG] Improve the legalisation lowering of UMULO.	2018-08-16 18:39:39 +00:00
unord.ll	…
vargs.ll	…