llvm-project

History

Eli Friedman 73e8a784e6 [SelectionDAG] Improve the legalisation lowering of UMULO. There is no way in the universe, that doing a full-width division in software will be faster than doing overflowing multiplication in software in the first place, especially given that this same full-width multiplication needs to be done anyway. This patch replaces the previous implementation with a direct lowering into an overflowing multiplication algorithm based on half-width operations. Correctness of the algorithm was verified by exhaustively checking the output of this algorithm for overflowing multiplication of 16 bit integers against an obviously correct widening multiplication. Baring any oversights introduced by porting the algorithm to DAG, confidence in correctness of this algorithm is extremely high. Following table shows the change in both t = runtime and s = space. The change is expressed as a multiplier of original, so anything under 1 is “better” and anything above 1 is worse. +-------+-----------+-----------+-------------+-------------+ \| Arch \| u64u64 t \| u64u64 s \| u128u128 t \| u128u128 s \| +-------+-----------+-----------+-------------+-------------+ \| X64 \| - \| - \| ~0.5 \| ~0.64 \| \| i686 \| ~0.5 \| ~0.6666 \| ~0.05 \| ~0.9 \| \| armv7 \| - \| ~0.75 \| - \| ~1.4 \| +-------+-----------+-----------+-------------+-------------+ Performance numbers have been collected by running overflowing multiplication in a loop under `perf` on two x86_64 (one Intel Haswell, other AMD Ryzen) based machines. Size numbers have been collected by looking at the size of function containing an overflowing multiply in a loop. All in all, it can be seen that both performance and size has improved except in the case of armv7 where code size has regressed for 128-bit multiply. u128*u128 overflowing multiply on 32-bit platforms seem to benefit from this change a lot, taking only 5% of the time compared to original algorithm to calculate the same thing. The final benefit of this change is that LLVM is now capable of lowering the overflowing unsigned multiply for integers of any bit-width as long as the target is capable of lowering regular multiplication for the same bit-width. Previously, 128-bit overflowing multiply was the widest possible. Patch by Simonas Kazlauskas! Differential Revision: https://reviews.llvm.org/D50310 llvm-svn: 339922		2018-08-16 18:39:39 +00:00
..
addc-adde-sube-subc.ll	[RISCV] Implement frame pointer elimination	2018-01-18 11:34:02 +00:00
align.ll	[RISCV] Change function alignment to 4 bytes, and 2 bytes for RVC	2018-04-12 11:30:59 +00:00
alloca.ll	[RISCV] Expand function call to "call" pseudoinstruction	2018-04-25 14:19:12 +00:00
alu32.ll	[RISCV] Expand codegen -> compression sanity checks and move to a single file	2018-04-18 20:17:29 +00:00
analyze-branch.ll	[RISCV] Expand function call to "call" pseudoinstruction	2018-04-25 14:19:12 +00:00
arith-with-overflow.ll	[RISCV] Add tests for overflow intrinsics	2018-06-19 06:45:47 +00:00
atomic-cmpxchg.ll	[RISCV] Codegen support for atomic operations on RV32I	2018-06-13 11:58:46 +00:00
atomic-fence.ll	[RISCV] Codegen support for atomic operations on RV32I	2018-06-13 11:58:46 +00:00
atomic-load-store.ll	[RISCV] Add codegen support for atomic load/stores with RV32A	2018-06-13 12:04:51 +00:00
atomic-rmw.ll	[RISCV] Codegen support for atomic operations on RV32I	2018-06-13 11:58:46 +00:00
bare-select.ll	[RISCV] Codegen support for RV32F floating point comparison operations	2018-03-21 15:11:02 +00:00
blockaddress.ll	[RISCV] Peephole optimisation for load/store of global values or constant addresses	2018-03-19 11:54:28 +00:00
branch-relaxation.ll	[RISCV] Implement frame pointer elimination	2018-01-18 11:34:02 +00:00
branch.ll	[RISCV] Expand codegen -> compression sanity checks and move to a single file	2018-04-18 20:17:29 +00:00
bswap-ctlz-cttz-ctpop.ll	[RISCV] Set CostPerUse for registers	2018-05-23 21:34:30 +00:00
byval.ll	[RISCV] Separate base from offset in lowerGlobalAddress	2018-05-17 18:14:53 +00:00
calling-conv-sext-zext.ll	[RISCV] Expand function call to "call" pseudoinstruction	2018-04-25 14:19:12 +00:00
calling-conv.ll	[RISCV] Set CostPerUse for registers	2018-05-23 21:34:30 +00:00
calls.ll	[RISCV] Expand function call to "call" pseudoinstruction	2018-04-25 14:19:12 +00:00
compress-inline-asm.ll	[RISCV] Tablegen-driven Instruction Compression.	2018-04-06 21:07:05 +00:00
compress.ll	[RISCV] Add test changes missed from rL330293	2018-04-18 20:36:12 +00:00
disable-tail-calls.ll	[RISCV] Lower the tail pseudoinstruction	2018-05-23 22:44:08 +00:00
div.ll	[RISCV] Expand function call to "call" pseudoinstruction	2018-04-25 14:19:12 +00:00
double-arith.ll	[RISCV] Add codegen support for RV32D floating point arithmetic operations	2018-04-12 05:42:42 +00:00
double-br-fcmp.ll	[RISCV] Expand function call to "call" pseudoinstruction	2018-04-25 14:19:12 +00:00
double-calling-conv.ll	[RISCV] Expand function call to "call" pseudoinstruction	2018-04-25 14:19:12 +00:00
double-convert.ll	[RISCV] Codegen support for RV32D floating point conversion operations	2018-04-12 05:47:15 +00:00
double-fcmp.ll	[RISCV] Codegen support for RV32D floating point comparison operations	2018-04-12 05:50:06 +00:00
double-imm.ll	[RISCV] Add tests missed in r329871	2018-04-12 05:36:44 +00:00
double-intrinsics.ll	[RISCV] Expand function call to "call" pseudoinstruction	2018-04-25 14:19:12 +00:00
double-mem.ll	[RISCV] Set CostPerUse for registers	2018-05-23 21:34:30 +00:00
double-previous-failure.ll	[RISCV] Expand function call to "call" pseudoinstruction	2018-04-25 14:19:12 +00:00
double-select-fcmp.ll	[RISCV] Codegen support for RV32D floating point comparison operations	2018-04-12 05:50:06 +00:00
double-stack-spill-restore.ll	[RISCV] Expand function call to "call" pseudoinstruction	2018-04-25 14:19:12 +00:00
fixups-diff.ll	[RISCV][MC] Don't fold symbol differences if requiresDiffExpressionRelocations is true	2018-08-16 11:26:37 +00:00
float-arith.ll	[RISCV] Introduce pattern for materialising immediates with 0 for lower 12 bits	2018-04-18 20:34:23 +00:00
float-br-fcmp.ll	[RISCV] Expand function call to "call" pseudoinstruction	2018-04-25 14:19:12 +00:00
float-convert.ll	[RISCV] Add codegen for RV32F arithmetic and conversion operations	2018-03-20 12:45:35 +00:00
float-fcmp.ll	[RISCV] Codegen support for RV32F floating point comparison operations	2018-03-21 15:11:02 +00:00
float-imm.ll	[RISCV] Add codegen for RV32F floating point load/store	2018-03-20 13:26:12 +00:00
float-mem.ll	[RISCV] Separate base from offset in lowerGlobalAddress	2018-05-17 18:14:53 +00:00
float-select-fcmp.ll	[RISCV] Codegen support for RV32F floating point comparison operations	2018-03-21 15:11:02 +00:00
fp128.ll	[RISCV] Separate base from offset in lowerGlobalAddress	2018-05-17 18:14:53 +00:00
frame.ll	[RISCV] Expand function call to "call" pseudoinstruction	2018-04-25 14:19:12 +00:00
frameaddr-returnaddr.ll	[RISCV] Expand function call to "call" pseudoinstruction	2018-04-25 14:19:12 +00:00
get-setcc-result-type.ll	[RISCV] Define getSetCCResultType for setting vector setCC type	2018-02-02 02:43:18 +00:00
hoist-global-addr-base.ll	[RISCV] Add machine function pass to merge base + offset	2018-06-27 20:51:42 +00:00
i32-icmp.ll	[RISCV] Implement frame pointer elimination	2018-01-18 11:34:02 +00:00
imm-cse.ll	[RISCV] Add imm-cse.ll test case	2018-04-18 20:25:07 +00:00
imm.ll	[RISCV] Introduce pattern for materialising immediates with 0 for lower 12 bits	2018-04-18 20:34:23 +00:00
indirectbr.ll	[RISC-V] Fix a test case to not include label names as those aren't	2018-06-21 05:42:05 +00:00
init-array.ll	[RISCV] Use init_array instead of ctors for RISCV target, by default	2018-03-24 18:37:19 +00:00
inline-asm.ll	[RISCV] Peephole optimisation for load/store of global values or constant addresses	2018-03-19 11:54:28 +00:00
interrupt-attr-args-error.ll	[RISCV] Add support for _interrupt attribute	2018-07-26 17:49:43 +00:00
interrupt-attr-invalid.ll	[RISCV] Add support for _interrupt attribute	2018-07-26 17:49:43 +00:00
interrupt-attr-nocall.ll	[RISCV] Add support for _interrupt attribute	2018-07-26 17:49:43 +00:00
interrupt-attr-ret-error.ll	[RISCV] Add support for _interrupt attribute	2018-07-26 17:49:43 +00:00
interrupt-attr.ll	[RISCV] Add support for _interrupt attribute	2018-07-26 17:49:43 +00:00
jumptable.ll	[RISCV] Implement frame pointer elimination	2018-01-18 11:34:02 +00:00
large-stack.ll	[RISCV] Implement frame pointer elimination	2018-01-18 11:34:02 +00:00
lit.local.cfg	…
lsr-legaladdimm.ll	[RISCV] Implement isLegalAddImmediate	2018-04-26 13:00:37 +00:00
mem.ll	[RISCV] Separate base from offset in lowerGlobalAddress	2018-05-17 18:14:53 +00:00
mul.ll	[RISCV] Expand function call to "call" pseudoinstruction	2018-04-25 14:19:12 +00:00
musttail-call.ll	[RISCV] Lower the tail pseudoinstruction	2018-05-23 22:44:08 +00:00
option-norvc.ll	[RISCV] Support .option rvc and norvc assembler directives	2018-05-11 17:30:28 +00:00
option-rvc.ll	[RISCV] Support .option rvc and norvc assembler directives	2018-05-11 17:30:28 +00:00
rem.ll	[RISCV] Expand function call to "call" pseudoinstruction	2018-04-25 14:19:12 +00:00
remat.ll	[RISCV] Set CostPerUse for registers	2018-05-23 21:34:30 +00:00
rotl-rotr.ll	[RISCV] Implement frame pointer elimination	2018-01-18 11:34:02 +00:00
select-cc.ll	[RISCV] Implement frame pointer elimination	2018-01-18 11:34:02 +00:00
sext-zext-trunc.ll	[RISCV] Implement frame pointer elimination	2018-01-18 11:34:02 +00:00
shifts.ll	[RISCV] Expand function call to "call" pseudoinstruction	2018-04-25 14:19:12 +00:00
tail-calls.ll	[RISCV] Fixed test case failure due to r338047	2018-07-31 00:36:28 +00:00
umulo-128-legalisation-lowering.ll	[SelectionDAG] Improve the legalisation lowering of UMULO.	2018-08-16 18:39:39 +00:00
vararg.ll	[RISCV] Expand function call to "call" pseudoinstruction	2018-04-25 14:19:12 +00:00
wide-mem.ll	[RISCV] Separate base from offset in lowerGlobalAddress	2018-05-17 18:14:53 +00:00
zext-with-load-is-free.ll	[RISCV] Separate base from offset in lowerGlobalAddress	2018-05-17 18:14:53 +00:00