This one is enabled only under -ffast-math. There are cases where the
difference between the value computed and the correct value is huge
even for ffast-math, e.g. as Steven pointed out:
x = -1, y = -4
log(pow(-1), 4) = 0
4*log(-1) = NaN
I checked what GCC does and apparently they do the same optimization
(which result in the dramatic difference). Future work might try to
make this (slightly) less worse.
Differential Revision: http://reviews.llvm.org/D14400
llvm-svn: 254263
This fixes buildbots in systems that std::to_string is not present. It
also tidies the output of the diagnostic to render doubles a bit better
(thanks Ben Kramer for help with string streams and format).
llvm-svn: 254261
We could already recognise shuffle(FSUB, FADD) -> ADDSUB, this allow us to recognise shuffle(FADD, FSUB) -> ADDSUB by commuting the shuffle mask prior to matching.
llvm-svn: 254259
Add/Subtract.
Add missing tests that accidentally were not committed in rL254250.
Differential Revision: http://reviews.llvm.org/D14982
llvm-svn: 254251
Add/Subtract.
The following instructions are added to AArch32 instruction set:
- VQRDMLAH: Vector Saturating Rounding Doubling Multiply Accumulate
Returning High Half
- VQRDMLSH: Vector Saturating Rounding Doubling Multiply Subtract
Returning High Half
The following instructions are added to AArch64 instruction set:
- SQRDMLAH: Signed Saturating Rounding Doubling Multiply Accumulate
Returning High Half
- SQRDMLSH: Signed Saturating Rounding Doubling Multiply Subtract
Returning High Half
This patch adds intrinsic and ACLE macro support for these instructions,
as well as corresponding tests.
Differential Revision: http://reviews.llvm.org/D14982
llvm-svn: 254250
This is the last step to enable profile runtime to share the same value prof
data format and reader/writer code with llvm host tools. The VP related
data structures are moved to a section in InstrProfData.inc enabled with macro
INSTR_PROF_VALUE_PROF_DATA, and common API implementations are enabled with
INSTR_PROF_COMMON_API_IMPL. There should be no functional change.
llvm-svn: 254235
Added FMADD/FMSUB/FNMADD/FNMSUB tests for all types
Added load folding tests for 512-bit vectors
NOTE: Many of the AVX512 FMA instructions don't yet commute/fold correctly
As discussed on D14909
llvm-svn: 254232
Serial queues need extra happens-before between individual tasks executed in the same queue. This patch adds `Acquire(queue)` before the executed task and `Release(queue)` just after it (for serial queues only). Added a test case.
Differential Revision: http://reviews.llvm.org/D15011
llvm-svn: 254229
This patch ports the assembly file tsan_rtl_amd64.S to OS X, where we need several changes:
* Some assembler directives are not available on OS X (.hidden, .type, .size)
* Symbol names need to start with an underscore (added a ASM_TSAN_SYMBOL macro for that).
* To make the interceptors work, we ween to name the function "_wrap_setjmp" (added ASM_TSAN_SYMBOL_INTERCEPTOR for that).
* Calling the original setjmp is done with a simple "jmp _setjmp".
* __sigsetjmp doesn't exist on OS X.
Differential Revision: http://reviews.llvm.org/D14947
llvm-svn: 254228
This patch implements dynamic realignment of stack objects for targets
with a non-realigned stack pointer. Behaviour in FunctionLoweringInfo
is changed so that for a target that has StackRealignable set to
false, over-aligned static allocas are considered to be variable-sized
objects and are handled with DYNAMIC_STACKALLOC nodes.
It would be good to group aligned allocas into a single big alloca as
an optimization, but this is yet todo.
SystemZ benefits from this, due to its stack frame layout.
New tests SystemZ/alloca-03.ll for aligned allocas, and
SystemZ/alloca-04.ll for "no-realign-stack" attribute on functions.
Review and help from Ulrich Weigand and Hal Finkel.
llvm-svn: 254227
There's a few more lit tests that require features not available on OS X (MAP_32BIT, pthread_setname_np), let's mark them with "UNSUPPORTED: darwin".
Differential Revision: http://reviews.llvm.org/D14923
llvm-svn: 254225
Pthread spinlocks are not available on OS X and this test doesn't really require a spinlock.
Differential Revision: http://reviews.llvm.org/D14949
llvm-svn: 254224
When a race on file descriptors is detected, `FindThreadByUidLocked()` is called to retrieve ThreadContext with a specific unique_id. However, this ThreadContext might not exist in the thread registry anymore (it may have been recycled), in which case `FindThreadByUidLocked` will cause an assertion failure in `GetThreadLocked`. Adding a test case that reproduces this, producing:
FATAL: ThreadSanitizer CHECK failed: sanitizer_common/sanitizer_thread_registry.h:92 "((tid)) < ((n_contexts_))" (0x34, 0x34)
This patch fixes this by replacing the loop with `FindThreadContextLocked`.
Differential Revision: http://reviews.llvm.org/D14984
llvm-svn: 254223