Danila Kutenin
68c011aa08
[builtins] Optimize udivmodti4 for many platforms.
...
Summary:
While benchmarking uint128 division we found out that it has huge latency for small divisors
https://reviews.llvm.org/D83027
```
Benchmark Time(ns) CPU(ns) Iterations
--------------------------------------------------------------------------------------------------
BM_DivideIntrinsic128UniformDivisor<unsigned __int128> 13.0 13.0 55000000
BM_DivideIntrinsic128UniformDivisor<__int128> 14.3 14.3 50000000
BM_RemainderIntrinsic128UniformDivisor<unsigned __int128> 13.5 13.5 52000000
BM_RemainderIntrinsic128UniformDivisor<__int128> 14.1 14.1 50000000
BM_DivideIntrinsic128SmallDivisor<unsigned __int128> 153 153 5000000
BM_DivideIntrinsic128SmallDivisor<__int128> 170 170 3000000
BM_RemainderIntrinsic128SmallDivisor<unsigned __int128> 153 153 5000000
BM_RemainderIntrinsic128SmallDivisor<__int128> 155 155 5000000
```
This patch suggests a more optimized version of the division:
If the divisor is 64 bit, we can proceed with the divq instruction on x86 or constant multiplication mechanisms for other platforms. Once both divisor and dividend are not less than 2**64, we use branch free subtract algorithm, it has at most 64 cycles. After that our benchmarks improved significantly
```
Benchmark Time(ns) CPU(ns) Iterations
--------------------------------------------------------------------------------------------------
BM_DivideIntrinsic128UniformDivisor<unsigned __int128> 11.0 11.0 64000000
BM_DivideIntrinsic128UniformDivisor<__int128> 13.8 13.8 51000000
BM_RemainderIntrinsic128UniformDivisor<unsigned __int128> 11.6 11.6 61000000
BM_RemainderIntrinsic128UniformDivisor<__int128> 13.7 13.7 52000000
BM_DivideIntrinsic128SmallDivisor<unsigned __int128> 27.1 27.1 26000000
BM_DivideIntrinsic128SmallDivisor<__int128> 29.4 29.4 24000000
BM_RemainderIntrinsic128SmallDivisor<unsigned __int128> 27.9 27.8 26000000
BM_RemainderIntrinsic128SmallDivisor<__int128> 29.1 29.1 25000000
```
If not using divq instrinsics, it is still much better
```
Benchmark Time(ns) CPU(ns) Iterations
--------------------------------------------------------------------------------------------------
BM_DivideIntrinsic128UniformDivisor<unsigned __int128> 12.2 12.2 58000000
BM_DivideIntrinsic128UniformDivisor<__int128> 13.5 13.5 52000000
BM_RemainderIntrinsic128UniformDivisor<unsigned __int128> 12.7 12.7 56000000
BM_RemainderIntrinsic128UniformDivisor<__int128> 13.7 13.7 51000000
BM_DivideIntrinsic128SmallDivisor<unsigned __int128> 30.2 30.2 24000000
BM_DivideIntrinsic128SmallDivisor<__int128> 33.2 33.2 22000000
BM_RemainderIntrinsic128SmallDivisor<unsigned __int128> 31.4 31.4 23000000
BM_RemainderIntrinsic128SmallDivisor<__int128> 33.8 33.8 21000000
```
PowerPC benchmarks:
Was
```
BM_DivideIntrinsic128UniformDivisor<unsigned __int128> 22.3 22.3 32000000
BM_DivideIntrinsic128UniformDivisor<__int128> 23.8 23.8 30000000
BM_RemainderIntrinsic128UniformDivisor<unsigned __int128> 22.5 22.5 32000000
BM_RemainderIntrinsic128UniformDivisor<__int128> 24.9 24.9 29000000
BM_DivideIntrinsic128SmallDivisor<unsigned __int128> 394 394 2000000
BM_DivideIntrinsic128SmallDivisor<__int128> 397 397 2000000
BM_RemainderIntrinsic128SmallDivisor<unsigned __int128> 399 399 2000000
BM_RemainderIntrinsic128SmallDivisor<__int128> 397 397 2000000
```
With this patch
```
BM_DivideIntrinsic128UniformDivisor<unsigned __int128> 21.7 21.7 33000000
BM_DivideIntrinsic128UniformDivisor<__int128> 23.0 23.0 31000000
BM_RemainderIntrinsic128UniformDivisor<unsigned __int128> 21.9 21.9 33000000
BM_RemainderIntrinsic128UniformDivisor<__int128> 23.9 23.9 30000000
BM_DivideIntrinsic128SmallDivisor<unsigned __int128> 32.7 32.6 23000000
BM_DivideIntrinsic128SmallDivisor<__int128> 33.4 33.4 21000000
BM_RemainderIntrinsic128SmallDivisor<unsigned __int128> 31.1 31.1 22000000
BM_RemainderIntrinsic128SmallDivisor<__int128> 33.2 33.2 22000000
```
My email: danilak@google.com , I don't have commit rights
Reviewers: howard.hinnant, courbet, MaskRay
Reviewed By: courbet
Subscribers: steven.zhang, #sanitizers
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D81809
2020-07-10 09:59:16 +02:00
Petr Hosek
0ba22f51d1
[builtins] Use single line C++/C99 comment style
...
Use the uniform single line C++/99 style for code comments.
This is part of the cleanup proposed in "[RFC] compiler-rt builtins
cleanup and refactoring".
Differential Revision: https://reviews.llvm.org/D60352
llvm-svn: 359411
2019-04-28 22:47:49 +00:00
Petr Hosek
082b89b25f
[builtins] Reformat builtins with clang-format
...
Update formatting to use the LLVM style.
This is part of the cleanup proposed in "[RFC] compiler-rt builtins
cleanup and refactoring".
Differential Revision: https://reviews.llvm.org/D60351
llvm-svn: 359410
2019-04-28 21:53:32 +00:00
Chandler Carruth
57b08b0944
Update more file headers across all of the LLVM projects in the monorepo
...
to reflect the new license. These used slightly different spellings that
defeated my regular expressions.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351648
2019-01-19 10:56:40 +00:00
Joerg Sonnenberger
4e08a634e1
Don't take short cuts trying to avoid conditionals. This leads to
...
negative shift amounts and/or shifts wider than the type. VAX traps on
the former, X86 and other platforms produce incorrect results on the
latter.
llvm-svn: 204193
2014-03-18 22:10:36 +00:00
Joerg Sonnenberger
6e99daab4c
Consistently use COMPILER_RT_ABI for all public symbols.
...
Move prototypes into headers and fix a few inconsistencies.
llvm-svn: 202591
2014-03-01 15:30:50 +00:00
Joerg Sonnenberger
938b0df72d
Introduce CRT_HAS_128BIT, currently for all __LP64__ platforms.
...
Use it to enable the various functions for TI mode.
llvm-svn: 201909
2014-02-21 23:53:03 +00:00
Alexey Samsonov
a6b264b51d
Move original compiler-rt functions (libgcc replacement) to lib/builtins directory
...
llvm-svn: 201393
2014-02-14 09:20:33 +00:00