Go to file
Sanjay Patel e2e589288f Use rcpss/rcpps (X86) to speed up reciprocal calcs (PR21385).
This is a first step for generating SSE rcp instructions for reciprocal
calcs when fast-math allows it. This is very similar to the rsqrt optimization
enabled in D5658 ( http://reviews.llvm.org/rL220570 ).

For now, be conservative and only enable this for AMD btver2 where performance
improves significantly both in terms of latency and throughput.

We may never enable this codegen for Intel Core* chips because the divider circuits
are just too fast. On SandyBridge, divss can be as fast as 10 cycles versus the 21
cycle critical path for the rcp + mul + sub + mul + add estimate.

Follow-on patches may allow configuration of the number of Newton-Raphson refinement
steps, add AVX512 support, and enable the optimization for more chips.

More background here: http://llvm.org/bugs/show_bug.cgi?id=21385

Differential Revision: http://reviews.llvm.org/D6175

llvm-svn: 221706
2014-11-11 20:51:00 +00:00
clang PR16091 continued: Debug Info for member functions with undeduced return types. 2014-11-11 20:44:45 +00:00
clang-tools-extra [clang-tidy] google-readability-function: skip std::nullptr_t 2014-11-05 11:08:39 +00:00
compiler-rt [ASan] Fix use of -asan-instrument-assembly in tests 2014-11-11 13:44:08 +00:00
debuginfo-tests New round of fixes for "Always compile debuginfo-tests for the host triple" 2014-10-18 23:47:59 +00:00
libclc Prune CRLF. 2014-10-27 12:37:26 +00:00
libcxx Fix typo in allocator_traits::construct. This fixes PR14175, which shows up if an allocator has a no-args construct method 2014-11-11 19:22:33 +00:00
libcxxabi Make sure only NEON enabled devices save/restore D16+ registers 2014-11-07 16:33:58 +00:00
lld [mach-o] Fix lazy binding offsets 2014-11-11 01:31:18 +00:00
lldb Move a bunch of summary formatters to oneliner mode. This makes more cases eligible for oneline printing, and fixes rdar://18120906 2014-11-11 19:52:12 +00:00
llvm Use rcpss/rcpps (X86) to speed up reciprocal calcs (PR21385). 2014-11-11 20:51:00 +00:00
openmp I apologise in advance for the size of this check-in. At Intel we do 2014-10-07 16:25:50 +00:00
polly Safely generate new loop metadata node 2014-11-07 21:44:18 +00:00