Commit Graph

469 Commits

Author SHA1 Message Date
Craig Topper 35f708a3c9 [builtins] Inline __paritysi2 into __paritydi2 and inline __paritydi2 into __parityti2.
No point in making __parityti2 go through 2 calls to get to
__paritysi2.

Reviewed By: MaskRay, efriedma

Differential Revision: https://reviews.llvm.org/D87218
2020-09-07 17:57:39 -07:00
Brad Smith 8542dab909 [compiler-rt] Implement __clear_cache() on OpenBSD/arm 2020-09-06 15:54:24 -04:00
Anatoly Trosinenko 93eed63d2f [builtins] Make __div[sdt]f3 handle denormal results
This patch introduces denormal result support to soft-float division
implementation unified by D85031.

Reviewed By: sepavloff

Differential Revision: https://reviews.llvm.org/D85032
2020-09-01 21:52:34 +03:00
Anatoly Trosinenko 0e90d8d4fe [builtins] Unify the softfloat division implementation
This patch replaces three different pre-existing implementations of
__div[sdt]f3 LibCalls with a generic one - like it is already done for
many other LibCalls.

Reviewed By: sepavloff

Differential Revision: https://reviews.llvm.org/D85031
2020-09-01 19:05:50 +03:00
Anatoly Trosinenko 11cf6346fd [NFC][compiler-rt] Factor out __div[sdt]i3 and __mod[dt]i3 implementations
Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D86400
2020-08-30 16:14:08 +03:00
Anatoly Trosinenko fce035eae9 [NFC][compiler-rt] Factor out __mulo[sdt]i4 implementations to .inc file
The existing implementations are almost identical except for width of the
integer type.

Factor them out to int_mulo_impl.inc for better maintainability.

This patch is almost identical to D86277.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D86289
2020-08-27 14:33:48 +03:00
Anatoly Trosinenko 182d14db07 [NFC][compiler-rt] Factor out __mulv[sdt]i3 implementations to .inc file
The existing implementations are almost identical except for width of the
integer type.

Factor them out to int_mulv_impl.inc for better maintainability.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D86277
2020-08-27 14:33:48 +03:00
David Tenty f8454d60b8 [AIX][compiler-rt][builtins] Don't add ppc builtin implementations that require __int128 on AIX
since __int128 currently isn't supported on AIX.

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D85972
2020-08-25 11:35:38 -04:00
Freddy Ye e02d081f2b [X86] Support -march=sapphirerapids
Support -march=sapphirerapids for x86.
Compare with Icelake Server, it includes 14 more new features. They are
amxtile, amxint8, amxbf16, avx512bf16, avx512vp2intersect, cldemote,
enqcmd, movdir64b, movdiri, ptwrite, serialize, shstk, tsxldtrk, waitpkg.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D86503
2020-08-25 14:21:21 +08:00
Shoaib Meenai 2c80e2fe51 [runtimes] Use llvm-libtool-darwin for runtimes build
It's full featured now and we can use it for the runtimes build instead
of relying on an external libtool, which means the CMAKE_HOST_APPLE
restriction serves no purpose either now. Restrict llvm-lipo to Darwin
targets while I'm here, since it's only needed there.

Reviewed By: phosek

Differential Revision: https://reviews.llvm.org/D86367
2020-08-24 13:48:30 -07:00
Luís Marques 57903cf093 [compiler-rt][RISCV] Use muldi3 builtin assembly implementation
D80465 added an assembly implementation of muldi3 for RISC-V but it didn't
add it to the cmake `*_SOURCES` list, so the C implementation was being used
instead. This patch fixes that.

Differential Revision: https://reviews.llvm.org/D86036
2020-08-21 13:06:35 +01:00
Craig Topper df9a9bb7be [X86] Correct the implementation of the testFeature macro in getIntelProcessorTypeAndSubtype to do a proper bit test.
Instead of ANDing with a one hot mask representing the bit to
be tested, we were ANDing with just the bit number. This tests
multiple bits none of them the correct one.

This caused skylake-avx512, cascadelake and cooperlake to all
be misdetected. Based on experiments with the Intel SDE, it seems
that all of these CPUs are being detected as being cooperlake.
This is bad since its the newest CPU of the 3.
2020-08-20 23:50:45 -07:00
Louis Dionne afa1afd410 [CMake] Bump CMake minimum version to 3.13.4
This upgrade should be friction-less because we've already been ensuring
that CMake >= 3.13.4 is used.

This is part of the effort discussed on llvm-dev here:

  http://lists.llvm.org/pipermail/llvm-dev/2020-April/140578.html

Differential Revision: https://reviews.llvm.org/D78648
2020-07-22 14:25:07 -04:00
Nico Weber 669b070936 cmake list formatting fix 2020-07-16 18:29:48 -04:00
Ryan Prichard 15b37e1cfa [builtins] Omit 80-bit builtins on Android and MSVC
long double is a 64-bit double-precision type on:
 - MSVC (32- and 64-bit x86)
 - Android (32-bit x86)

long double is a 128-bit quad-precision type on x86_64 Android.

The assembly variants of the 80-bit builtins are correct, but some of
the builtins are implemented in C and require that long double be the
80-bit type passed via an x87 register.

Reviewed By: compnerd

Differential Revision: https://reviews.llvm.org/D82153
2020-07-16 15:11:26 -07:00
Ryan Prichard 8cbb6ccc7f [builtins] Cleanup generic-file filtering
Split filter_builtin_sources into two functions:
 - filter_builtin_sources that removes generic files when an
   arch-specific file is selected.
 - darwin_filter_builtin_sources that implements the EXCLUDE/INCLUDE
   lists (using the files in lib/builtins/Darwin-excludes).

darwin_filter_builtin_sources delegates to filter_builtin_sources.

Previously, lib/builtins/CMakeLists.txt had a number of calls to
filter_builtin_sources (with a confusing/broken use of the
`excluded_list` parameter), as well as a redundant arch-vs-generic
filtering for the non-Apple code path at the end of the file. Replace
all of this with a single call to filter_builtin_sources.

Remove i686_SOURCES. Previously, this list contained only the
arch-specific files common to 32-bit and 64-bit x86, which is a strange
set. Normally the ${ARCH}_SOURCES list contains everything needed for
the arch. "i686" isn't in ALL_BUILTIN_SUPPORTED_ARCH.

NFCI, but i686_SOURCES won't be defined, and the order of files in
${arch}_SOURCES lists will change.

Differential Revision: https://reviews.llvm.org/D82151
2020-07-13 16:53:07 -07:00
Ryan Prichard f398e0f3d1 [builtins][Android] Define HAS_80_BIT_LONG_DOUBLE to 0
Android 32-bit x86 uses a 64-bit long double.

Android 64-bit x86 uses a 128-bit quad-precision long double.

Differential Revision: https://reviews.llvm.org/D82152
2020-07-13 16:53:07 -07:00
Craig Topper b92c2bb6a2 [X86] Add CPU name strings to getIntelProcessorTypeAndSubtype and getAMDProcessorTypeAndSubtype in compiler-rt.
These aren't used in compiler-rt, but I plan to make a similar
change to the equivalent code in Host.cpp where the mapping from
type/subtype is an unnecessary complication. Having the CPU strings
here will help keep the code somewhat synchronized.
2020-07-12 12:59:25 -07:00
Danila Kutenin 68c011aa08 [builtins] Optimize udivmodti4 for many platforms.
Summary:
While benchmarking uint128 division we found out that it has huge latency for small divisors

https://reviews.llvm.org/D83027

```
Benchmark                                                   Time(ns)        CPU(ns)     Iterations
--------------------------------------------------------------------------------------------------
BM_DivideIntrinsic128UniformDivisor<unsigned __int128>            13.0           13.0     55000000
BM_DivideIntrinsic128UniformDivisor<__int128>                     14.3           14.3     50000000
BM_RemainderIntrinsic128UniformDivisor<unsigned __int128>         13.5           13.5     52000000
BM_RemainderIntrinsic128UniformDivisor<__int128>                  14.1           14.1     50000000
BM_DivideIntrinsic128SmallDivisor<unsigned __int128>             153            153        5000000
BM_DivideIntrinsic128SmallDivisor<__int128>                      170            170        3000000
BM_RemainderIntrinsic128SmallDivisor<unsigned __int128>          153            153        5000000
BM_RemainderIntrinsic128SmallDivisor<__int128>                   155            155        5000000
```

This patch suggests a more optimized version of the division:

If the divisor is 64 bit, we can proceed with the divq instruction on x86 or constant multiplication mechanisms for other platforms. Once both divisor and dividend are not less than 2**64, we use branch free subtract algorithm, it has at most 64 cycles. After that our benchmarks improved significantly

```
Benchmark                                                   Time(ns)        CPU(ns)     Iterations
--------------------------------------------------------------------------------------------------
BM_DivideIntrinsic128UniformDivisor<unsigned __int128>            11.0           11.0     64000000
BM_DivideIntrinsic128UniformDivisor<__int128>                     13.8           13.8     51000000
BM_RemainderIntrinsic128UniformDivisor<unsigned __int128>         11.6           11.6     61000000
BM_RemainderIntrinsic128UniformDivisor<__int128>                  13.7           13.7     52000000
BM_DivideIntrinsic128SmallDivisor<unsigned __int128>              27.1           27.1     26000000
BM_DivideIntrinsic128SmallDivisor<__int128>                       29.4           29.4     24000000
BM_RemainderIntrinsic128SmallDivisor<unsigned __int128>           27.9           27.8     26000000
BM_RemainderIntrinsic128SmallDivisor<__int128>                    29.1           29.1     25000000
```

If not using divq instrinsics, it is still much better

```
Benchmark                                                   Time(ns)        CPU(ns)     Iterations
--------------------------------------------------------------------------------------------------
BM_DivideIntrinsic128UniformDivisor<unsigned __int128>            12.2           12.2     58000000
BM_DivideIntrinsic128UniformDivisor<__int128>                     13.5           13.5     52000000
BM_RemainderIntrinsic128UniformDivisor<unsigned __int128>         12.7           12.7     56000000
BM_RemainderIntrinsic128UniformDivisor<__int128>                  13.7           13.7     51000000
BM_DivideIntrinsic128SmallDivisor<unsigned __int128>              30.2           30.2     24000000
BM_DivideIntrinsic128SmallDivisor<__int128>                       33.2           33.2     22000000
BM_RemainderIntrinsic128SmallDivisor<unsigned __int128>           31.4           31.4     23000000
BM_RemainderIntrinsic128SmallDivisor<__int128>                    33.8           33.8     21000000
```

PowerPC benchmarks:

Was
```
BM_DivideIntrinsic128UniformDivisor<unsigned __int128>            22.3           22.3     32000000
BM_DivideIntrinsic128UniformDivisor<__int128>                     23.8           23.8     30000000
BM_RemainderIntrinsic128UniformDivisor<unsigned __int128>         22.5           22.5     32000000
BM_RemainderIntrinsic128UniformDivisor<__int128>                  24.9           24.9     29000000
BM_DivideIntrinsic128SmallDivisor<unsigned __int128>             394            394        2000000
BM_DivideIntrinsic128SmallDivisor<__int128>                      397            397        2000000
BM_RemainderIntrinsic128SmallDivisor<unsigned __int128>          399            399        2000000
BM_RemainderIntrinsic128SmallDivisor<__int128>                   397            397        2000000
```

With this patch
```
BM_DivideIntrinsic128UniformDivisor<unsigned __int128>            21.7           21.7     33000000
BM_DivideIntrinsic128UniformDivisor<__int128>                     23.0           23.0     31000000
BM_RemainderIntrinsic128UniformDivisor<unsigned __int128>         21.9           21.9     33000000
BM_RemainderIntrinsic128UniformDivisor<__int128>                  23.9           23.9     30000000
BM_DivideIntrinsic128SmallDivisor<unsigned __int128>              32.7           32.6     23000000
BM_DivideIntrinsic128SmallDivisor<__int128>                       33.4           33.4     21000000
BM_RemainderIntrinsic128SmallDivisor<unsigned __int128>           31.1           31.1     22000000
BM_RemainderIntrinsic128SmallDivisor<__int128>                    33.2           33.2     22000000
```

My email: danilak@google.com, I don't have commit rights

Reviewers: howard.hinnant, courbet, MaskRay

Reviewed By: courbet

Subscribers: steven.zhang, #sanitizers

Tags: #sanitizers

Differential Revision: https://reviews.llvm.org/D81809
2020-07-10 09:59:16 +02:00
Sid Manning baca8f977e [compiler-rt][Hexagon] Remove fma/fmin/max code
This code should reside in the c-library.

Differential Revision: https://reviews.llvm.org/D82263
2020-07-07 19:50:04 -05:00
Anatoly Trosinenko 0ee439b705 [builtins] Change si_int to int in some helper declarations
This patch changes types of some integer function arguments or return values from `si_int` to the default `int` type to make it more compatible with `libgcc`.

The compiler-rt/lib/builtins/README.txt has a link to the [libgcc specification](http://gcc.gnu.org/onlinedocs/gccint/Libgcc.html#Libgcc). This specification has an explicit note on `int`, `float` and other such types being just illustrations in some cases while the actual types are expressed with machine modes.

Such usage of always-32-bit-wide integer type may lead to issues on 16-bit platforms such as MSP430. Provided [libgcc2.h](https://gcc.gnu.org/git/?p=gcc.git;a=blob_plain;f=libgcc/libgcc2.h;hb=HEAD) can be used as a reference for all targets supported by the libgcc, this patch fixes some existing differences in helper declarations.

This patch is expected to not change behavior at all for targets with 32-bit `int` type.

Differential Revision: https://reviews.llvm.org/D81285
2020-06-30 11:07:02 +03:00
Anatoly Trosinenko a4e8f7fe3f [builtins] Improve compatibility with 16 bit targets
Some parts of existing codebase assume the default `int` type to be (at least) 32 bit wide. On 16 bit targets such as MSP430 this may cause Undefined Behavior or results being defined but incorrect.

Differential Revision: https://reviews.llvm.org/D81408
2020-06-26 15:31:11 +03:00
Anatoly Trosinenko a931ec7ca0 [builtins] Move more float128-related helpers to GENERIC_TF_SOURCES list
There are two different _generic_ lists of source files in the compiler-rt/lib/builtins/CMakeLists.txt. Now there is no simple way to not use the tf-variants of helpers at all.

Since there exists a separate `GENERIC_TF_SOURCES` list, it seems quite natural to move all float128-related helpers there. If it is not possible for some reason, it would be useful to have an explanation of that reason somewhere near the `GENERIC_TF_SOURCES` definition.

Differential Revision: https://reviews.llvm.org/D81282
2020-06-25 22:32:49 +03:00
Craig Topper 23654d9e7a Recommit "[X86] Calculate the needed size of the feature arrays in _cpu_indicator_init and getHostCPUName using the size of the feature enum."
Hopefully this version will fix the previously buildbot failure
2020-06-22 13:32:03 -07:00
Craig Topper bebea4221d Revert "[X86] Calculate the needed size of the feature arrays in _cpu_indicator_init and getHostCPUName using the size of the feature enum."
Seems to breaking build.

This reverts commit 5ac144fe64.
2020-06-22 12:20:40 -07:00
Craig Topper 5ac144fe64 [X86] Calculate the needed size of the feature arrays in _cpu_indicator_init and getHostCPUName using the size of the feature enum.
Move 0 initialization up to the caller so we don't need to know
the size.
2020-06-22 11:46:20 -07:00
Craig Topper 90406d62e5 [X86] Add cooperlake and tigerlake to the enum in cpu_model.c
I forgot to do this when I added then to _cpu_indicator_init.
2020-06-21 16:20:26 -07:00
Craig Topper 0e6c9316d4 [X86] Add cooperlake detection to _cpu_indicator_init.
libgcc has this enum encoding defined for a while, but their
detection code is missing. I've raised a bug with them so that
should get fixed soon.
2020-06-21 13:02:33 -07:00
Craig Topper 35f7d58328 [X86] Set the cpu_vendor in __cpu_indicator_init to VENDOR_OTHER if cpuid isn't supported on the CPU.
We need to set the cpu_vendor to a non-zero value to indicate
that we already called __cpu_indicator_init once.

This should only happen on a 386 or 486 CPU.
2020-06-20 15:36:04 -07:00
Ryan Prichard 8627190f31 [builtins] Fix typos in comments
Differential Revision: https://reviews.llvm.org/D82146
2020-06-19 16:08:04 -07:00
David Tenty 8aef01eed4 [AIX][compiler-rt] Pick the right form of COMPILER_RT_ALIAS for AIX
Summary: we use the alias attribute, similar to what is done for ELF.

Reviewers: ZarkoCA, jasonliu, hubert.reinterpretcast, sfertile

Reviewed By: jasonliu

Subscribers: dberris, aheejin, mstorsjo, #sanitizers

Tags: #sanitizers

Differential Revision: https://reviews.llvm.org/D81120
2020-06-16 14:10:40 -04:00
Craig Topper 033bf61cc5 [X86] Remove brand_id check from cpu_indicator_init.
Brand index was a feature some Pentium III and Pentium 4 CPUs.
It provided an index into a software lookup table to provide a
brand name for the CPU. This is separate from the family/model.

It's unclear to me why this index being non-zero was used to
block checking family/model. None of the CPUs that had a non-zero
brand index are supported by __builtin_cpu_is or target
multi-versioning so this should have no real effect.
2020-06-12 20:35:48 -07:00
Craig Topper 94ccb2acbf [X86] Combine to two feature variables in __cpu_indicator_init into an array and pass them around as pointer we can treat as an array.
This simplifies the indexing code to set and test bits.
2020-06-12 18:30:41 -07:00
Craig Topper e424a3526a [X86] Explicitly initialize __cpu_features2 global in compiler-rt to 0.
Seems like this may be needed in order for the linker to find the
symbol. At least on my Mac.
2020-06-12 18:30:34 -07:00
kamlesh kumar e31ccee1b0 [RISCV-V] Provide muldi3 builtin assembly implementation
Provides an assembly implementation of muldi3 for RISC-V, to solve bug 43388.
Since the implementation is the same as for mulsi3, that code was moved to
`riscv/int_mul_impl.inc` and is now reused by both `mulsi3.S` and `muldi3.S`.

Differential Revision: https://reviews.llvm.org/D80465
2020-06-02 21:04:55 +01:00
Kazushi (Jam) Marukawa dedaf3a2ac [VE] Dynamic stack allocation
Summary:
This patch implements dynamic stack allocation for the VE target. Changes:
* compiler-rt: `__ve_grow_stack` to request stack allocation on the VE.
* VE: base pointer support, dynamic stack allocation.

Differential Revision: https://reviews.llvm.org/D79084
2020-05-27 10:11:06 +02:00
Craig Topper 2bb822bc90 [X86] Add family/model for Intel Comet Lake CPUs for -march=native and function multiversioning
This adds the family/model returned by CPUID for some Intel
Comet Lake CPUs. Instruction set and tuning wise these are
the same as "skylake".

These are not in the Intel SDM yet, but these should be correct.
2020-05-24 00:29:25 -07:00
Craig Topper 95bc21f32f [X86] Add avx512vp2intersect feature to compiler-rt's feature detection to match libgcc. 2020-05-21 21:54:54 -07:00
Kamil Rytarowski f61f6ffe11 [compiler-rt] [builtin] Switch the return type of __atomic_compare_exchange_##n to bool
Summary:
Synchronize the function definition with the LLVM documentation.

https://llvm.org/docs/Atomics.html#libcalls-atomic

GCC also returns bool for the same atomic builtin.

Reviewers: theraven

Reviewed By: theraven

Subscribers: theraven, dberris, jfb, #sanitizers

Tags: #sanitizers

Differential Revision: https://reviews.llvm.org/D79845
2020-05-13 14:09:02 +02:00
Ayke van Laethem 4d41df6482
[builtins] Support architectures with 16-bit int
This is the first patch in a series to add support for the AVR target.
This patch includes changes to make compiler-rt more target independent
by not relying on the width of an int or long.

Differential Revision: https://reviews.llvm.org/D78662
2020-04-26 01:22:10 +02:00
Fangrui Song 17772995d4 [builtins] Add missing header in D77912 and make __builtin_clzll more robust 2020-04-17 08:29:58 -07:00
Ayke van Laethem d9e5691843
[builtins] Fix unprototypes function declaration
The following declarations were missing a prototype:

    FE_ROUND_MODE __fe_getround();
    int __fe_raise_inexact();

Discovered while fixing a bug in Clang related to unprototyped function
calls (see the previous commit).

Differential Revision: https://reviews.llvm.org/D78205
2020-04-15 23:44:51 +02:00
Fangrui Song b541196eb4 [builtins] Make __umodsi3/__udivdi3/__umoddi3 standalone (shift and subtract)
@kamleshbhalui reported that when the Standard Extension M
(Multiplication and Division) is disabled for RISC-V,
`__udivdi3` will call __udivmodti4 which will in turn calls `__udivdi3`.

This patch moves __udivsi3 (shift and subtract) to int_div_impl.inc
`__udivXi3`, optimize a bit, add a `__umodXi3`, and use `__udivXi3` and
`__umodXi3` to define `__udivsi3` `__umodsi3` `__udivdi3` `__umoddi3`.

Reviewed By: kamleshbhalui

Differential Revision: https://reviews.llvm.org/D77912
2020-04-14 10:38:37 -07:00
Shoaib Meenai f481256bfe [builtins] Build for arm64e for Darwin
https://github.com/apple/swift/pull/30112/ makes the Swift standard
library for iOS build for arm64e. If you're building Swift against your
own LLVM, this in turn requires having the builtins built for arm64e,
otherwise you won't be able to use the builtins (which will in turn lead
to an undefined symbol for `__isOSVersionAtLeast`). Make the builtins
build for arm64e to fix this.

Differential Revision: https://reviews.llvm.org/D76041
2020-03-11 22:01:44 -07:00
Luís Marques 99a8cc2b7d [compiler-rt][builtins][RISCV] Port __clear_cache to RISC-V Linux
Implements `__clear_cache` for RISC-V Linux. We can't just use `fence.i` on
Linux, because the Linux thread might be scheduled on another hart, and the
`fence.i` instruction only flushes the icache of the current hart.
2020-03-05 16:44:47 +00:00
Steven Wu 387c3f74fd [compiler-rt] Build all alias in builtin as private external on Darwin
Summary:
For builtin compiler-rt, it is built with visibility hidden by default
to avoid the client exporting symbols from libclang static library. The
compiler option -fvisibility=hidden doesn't work on the aliases in c files
because they are created with inline assembly. On Darwin platform,
thoses aliases are exported by default if they are reference by the client.

Fix the issue by adding ".private_extern" to all the aliases if the
library is built with visibility hidden.

rdar://problem/58960296

Reviewers: dexonsmith, arphaman, delcypher, kledzik

Reviewed By: delcypher

Subscribers: dberris, jkorous, ributzka, #sanitizers, llvm-commits

Tags: #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D73577
2020-02-26 09:29:11 -08:00
Sid Manning d37cbda5f9 [Hexagon] Define __ELF__ by default.
Differential Revision: https://reviews.llvm.org/D74972
2020-02-21 16:10:31 -06:00
Sam Clegg 2f172d8d3c [compiler-rt] Compile __powitf2 under wasm
See https://github.com/emscripten-core/emscripten/issues/10374
See https://reviews.llvm.org/D74274

Differential Revision: https://reviews.llvm.org/D74275
2020-02-11 17:35:07 -08:00
Petr Hosek c96eeebca8 [CMake] compiler-rt: Add COMPILER_RT_BUILTINS_ENABLE_PIC
The configuration for -fPIC in the builtins library when built standalone
is unconditional, stating that the flags would "normally be added... by
the llvm cmake step"

This is untrue, as the llvm cmake step checks LLVM_ENABLE_PIC, which allows
a client to turn off -fPIC.

I've added an option when compiler-rt builtins are configured standalone, such
as when built as part of the LLVM runtimes system, to guard the application of
-fPIC for users that want it.

Patch By: JamesNagurne

Differential Revision: https://reviews.llvm.org/D72950
2020-01-31 15:57:18 -08:00
Yi Kong acc79aa0e7 Revert "Revert 1689ad27af "[builtins] Implement rounding mode support for i386/x86_64""
Don't build specilised fp_mode.c on MSVC since it does not support
inline ASM on x86_64.

This reverts commit a19f0eec94.
2019-11-27 17:29:20 -08:00