Commit Graph

8 Commits

Author SHA1 Message Date
Danila Kutenin 68c011aa08 [builtins] Optimize udivmodti4 for many platforms.
Summary:
While benchmarking uint128 division we found out that it has huge latency for small divisors

https://reviews.llvm.org/D83027

```
Benchmark                                                   Time(ns)        CPU(ns)     Iterations
--------------------------------------------------------------------------------------------------
BM_DivideIntrinsic128UniformDivisor<unsigned __int128>            13.0           13.0     55000000
BM_DivideIntrinsic128UniformDivisor<__int128>                     14.3           14.3     50000000
BM_RemainderIntrinsic128UniformDivisor<unsigned __int128>         13.5           13.5     52000000
BM_RemainderIntrinsic128UniformDivisor<__int128>                  14.1           14.1     50000000
BM_DivideIntrinsic128SmallDivisor<unsigned __int128>             153            153        5000000
BM_DivideIntrinsic128SmallDivisor<__int128>                      170            170        3000000
BM_RemainderIntrinsic128SmallDivisor<unsigned __int128>          153            153        5000000
BM_RemainderIntrinsic128SmallDivisor<__int128>                   155            155        5000000
```

This patch suggests a more optimized version of the division:

If the divisor is 64 bit, we can proceed with the divq instruction on x86 or constant multiplication mechanisms for other platforms. Once both divisor and dividend are not less than 2**64, we use branch free subtract algorithm, it has at most 64 cycles. After that our benchmarks improved significantly

```
Benchmark                                                   Time(ns)        CPU(ns)     Iterations
--------------------------------------------------------------------------------------------------
BM_DivideIntrinsic128UniformDivisor<unsigned __int128>            11.0           11.0     64000000
BM_DivideIntrinsic128UniformDivisor<__int128>                     13.8           13.8     51000000
BM_RemainderIntrinsic128UniformDivisor<unsigned __int128>         11.6           11.6     61000000
BM_RemainderIntrinsic128UniformDivisor<__int128>                  13.7           13.7     52000000
BM_DivideIntrinsic128SmallDivisor<unsigned __int128>              27.1           27.1     26000000
BM_DivideIntrinsic128SmallDivisor<__int128>                       29.4           29.4     24000000
BM_RemainderIntrinsic128SmallDivisor<unsigned __int128>           27.9           27.8     26000000
BM_RemainderIntrinsic128SmallDivisor<__int128>                    29.1           29.1     25000000
```

If not using divq instrinsics, it is still much better

```
Benchmark                                                   Time(ns)        CPU(ns)     Iterations
--------------------------------------------------------------------------------------------------
BM_DivideIntrinsic128UniformDivisor<unsigned __int128>            12.2           12.2     58000000
BM_DivideIntrinsic128UniformDivisor<__int128>                     13.5           13.5     52000000
BM_RemainderIntrinsic128UniformDivisor<unsigned __int128>         12.7           12.7     56000000
BM_RemainderIntrinsic128UniformDivisor<__int128>                  13.7           13.7     51000000
BM_DivideIntrinsic128SmallDivisor<unsigned __int128>              30.2           30.2     24000000
BM_DivideIntrinsic128SmallDivisor<__int128>                       33.2           33.2     22000000
BM_RemainderIntrinsic128SmallDivisor<unsigned __int128>           31.4           31.4     23000000
BM_RemainderIntrinsic128SmallDivisor<__int128>                    33.8           33.8     21000000
```

PowerPC benchmarks:

Was
```
BM_DivideIntrinsic128UniformDivisor<unsigned __int128>            22.3           22.3     32000000
BM_DivideIntrinsic128UniformDivisor<__int128>                     23.8           23.8     30000000
BM_RemainderIntrinsic128UniformDivisor<unsigned __int128>         22.5           22.5     32000000
BM_RemainderIntrinsic128UniformDivisor<__int128>                  24.9           24.9     29000000
BM_DivideIntrinsic128SmallDivisor<unsigned __int128>             394            394        2000000
BM_DivideIntrinsic128SmallDivisor<__int128>                      397            397        2000000
BM_RemainderIntrinsic128SmallDivisor<unsigned __int128>          399            399        2000000
BM_RemainderIntrinsic128SmallDivisor<__int128>                   397            397        2000000
```

With this patch
```
BM_DivideIntrinsic128UniformDivisor<unsigned __int128>            21.7           21.7     33000000
BM_DivideIntrinsic128UniformDivisor<__int128>                     23.0           23.0     31000000
BM_RemainderIntrinsic128UniformDivisor<unsigned __int128>         21.9           21.9     33000000
BM_RemainderIntrinsic128UniformDivisor<__int128>                  23.9           23.9     30000000
BM_DivideIntrinsic128SmallDivisor<unsigned __int128>              32.7           32.6     23000000
BM_DivideIntrinsic128SmallDivisor<__int128>                       33.4           33.4     21000000
BM_RemainderIntrinsic128SmallDivisor<unsigned __int128>           31.1           31.1     22000000
BM_RemainderIntrinsic128SmallDivisor<__int128>                    33.2           33.2     22000000
```

My email: danilak@google.com, I don't have commit rights

Reviewers: howard.hinnant, courbet, MaskRay

Reviewed By: courbet

Subscribers: steven.zhang, #sanitizers

Tags: #sanitizers

Differential Revision: https://reviews.llvm.org/D81809
2020-07-10 09:59:16 +02:00
Petr Hosek 0ba22f51d1 [builtins] Use single line C++/C99 comment style
Use the uniform single line C++/99 style for code comments.

This is part of the cleanup proposed in "[RFC] compiler-rt builtins
cleanup and refactoring".

Differential Revision: https://reviews.llvm.org/D60352

llvm-svn: 359411
2019-04-28 22:47:49 +00:00
Petr Hosek 082b89b25f [builtins] Reformat builtins with clang-format
Update formatting to use the LLVM style.

This is part of the cleanup proposed in "[RFC] compiler-rt builtins
cleanup and refactoring".

Differential Revision: https://reviews.llvm.org/D60351

llvm-svn: 359410
2019-04-28 21:53:32 +00:00
Chandler Carruth 57b08b0944 Update more file headers across all of the LLVM projects in the monorepo
to reflect the new license. These used slightly different spellings that
defeated my regular expressions.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351648
2019-01-19 10:56:40 +00:00
Joerg Sonnenberger 4e08a634e1 Don't take short cuts trying to avoid conditionals. This leads to
negative shift amounts and/or shifts wider than the type. VAX traps on
the former, X86 and other platforms produce incorrect results on the
latter.

llvm-svn: 204193
2014-03-18 22:10:36 +00:00
Joerg Sonnenberger 6e99daab4c Consistently use COMPILER_RT_ABI for all public symbols.
Move prototypes into headers and fix a few inconsistencies.

llvm-svn: 202591
2014-03-01 15:30:50 +00:00
Joerg Sonnenberger 938b0df72d Introduce CRT_HAS_128BIT, currently for all __LP64__ platforms.
Use it to enable the various functions for TI mode.

llvm-svn: 201909
2014-02-21 23:53:03 +00:00
Alexey Samsonov a6b264b51d Move original compiler-rt functions (libgcc replacement) to lib/builtins directory
llvm-svn: 201393
2014-02-14 09:20:33 +00:00