This one is enabled only under -ffast-math. There are cases where the
difference between the value computed and the correct value is huge
even for ffast-math, e.g. as Steven pointed out:
x = -1, y = -4
log(pow(-1), 4) = 0
4*log(-1) = NaN
I checked what GCC does and apparently they do the same optimization
(which result in the dramatic difference). Future work might try to
make this (slightly) less worse.
Differential Revision: http://reviews.llvm.org/D14400
llvm-svn: 254263
This fixes buildbots in systems that std::to_string is not present. It
also tidies the output of the diagnostic to render doubles a bit better
(thanks Ben Kramer for help with string streams and format).
llvm-svn: 254261
We could already recognise shuffle(FSUB, FADD) -> ADDSUB, this allow us to recognise shuffle(FADD, FSUB) -> ADDSUB by commuting the shuffle mask prior to matching.
llvm-svn: 254259
Add/Subtract.
Add missing tests that accidentally were not committed in rL254250.
Differential Revision: http://reviews.llvm.org/D14982
llvm-svn: 254251
Add/Subtract.
The following instructions are added to AArch32 instruction set:
- VQRDMLAH: Vector Saturating Rounding Doubling Multiply Accumulate
Returning High Half
- VQRDMLSH: Vector Saturating Rounding Doubling Multiply Subtract
Returning High Half
The following instructions are added to AArch64 instruction set:
- SQRDMLAH: Signed Saturating Rounding Doubling Multiply Accumulate
Returning High Half
- SQRDMLSH: Signed Saturating Rounding Doubling Multiply Subtract
Returning High Half
This patch adds intrinsic and ACLE macro support for these instructions,
as well as corresponding tests.
Differential Revision: http://reviews.llvm.org/D14982
llvm-svn: 254250
This is the last step to enable profile runtime to share the same value prof
data format and reader/writer code with llvm host tools. The VP related
data structures are moved to a section in InstrProfData.inc enabled with macro
INSTR_PROF_VALUE_PROF_DATA, and common API implementations are enabled with
INSTR_PROF_COMMON_API_IMPL. There should be no functional change.
llvm-svn: 254235
Added FMADD/FMSUB/FNMADD/FNMSUB tests for all types
Added load folding tests for 512-bit vectors
NOTE: Many of the AVX512 FMA instructions don't yet commute/fold correctly
As discussed on D14909
llvm-svn: 254232
Serial queues need extra happens-before between individual tasks executed in the same queue. This patch adds `Acquire(queue)` before the executed task and `Release(queue)` just after it (for serial queues only). Added a test case.
Differential Revision: http://reviews.llvm.org/D15011
llvm-svn: 254229
This patch ports the assembly file tsan_rtl_amd64.S to OS X, where we need several changes:
* Some assembler directives are not available on OS X (.hidden, .type, .size)
* Symbol names need to start with an underscore (added a ASM_TSAN_SYMBOL macro for that).
* To make the interceptors work, we ween to name the function "_wrap_setjmp" (added ASM_TSAN_SYMBOL_INTERCEPTOR for that).
* Calling the original setjmp is done with a simple "jmp _setjmp".
* __sigsetjmp doesn't exist on OS X.
Differential Revision: http://reviews.llvm.org/D14947
llvm-svn: 254228
This patch implements dynamic realignment of stack objects for targets
with a non-realigned stack pointer. Behaviour in FunctionLoweringInfo
is changed so that for a target that has StackRealignable set to
false, over-aligned static allocas are considered to be variable-sized
objects and are handled with DYNAMIC_STACKALLOC nodes.
It would be good to group aligned allocas into a single big alloca as
an optimization, but this is yet todo.
SystemZ benefits from this, due to its stack frame layout.
New tests SystemZ/alloca-03.ll for aligned allocas, and
SystemZ/alloca-04.ll for "no-realign-stack" attribute on functions.
Review and help from Ulrich Weigand and Hal Finkel.
llvm-svn: 254227
There's a few more lit tests that require features not available on OS X (MAP_32BIT, pthread_setname_np), let's mark them with "UNSUPPORTED: darwin".
Differential Revision: http://reviews.llvm.org/D14923
llvm-svn: 254225
Pthread spinlocks are not available on OS X and this test doesn't really require a spinlock.
Differential Revision: http://reviews.llvm.org/D14949
llvm-svn: 254224
When a race on file descriptors is detected, `FindThreadByUidLocked()` is called to retrieve ThreadContext with a specific unique_id. However, this ThreadContext might not exist in the thread registry anymore (it may have been recycled), in which case `FindThreadByUidLocked` will cause an assertion failure in `GetThreadLocked`. Adding a test case that reproduces this, producing:
FATAL: ThreadSanitizer CHECK failed: sanitizer_common/sanitizer_thread_registry.h:92 "((tid)) < ((n_contexts_))" (0x34, 0x34)
This patch fixes this by replacing the loop with `FindThreadContextLocked`.
Differential Revision: http://reviews.llvm.org/D14984
llvm-svn: 254223
Raw profile writer needs to write all data of one kind in one continuous block,
so the buffer needs to be pre-allocated and passed to the writer method in
pieces for function profile data. The change adds the support for raw value data
writing.
llvm-svn: 254219
This is the autoconf analog of r251201. I realize autoconf is
deprecated, but while it's in tree, it should at least be kept working.
Also add the deprecation message to configure.ac such that AutoRegen
actually picks ip up.
llvm-svn: 254215