This patch, together with a matching clang patch (https://reviews.llvm.org/D38683), implements the lowering of X86 broadcastm intrinsics to IR.
Differential Revision: https://reviews.llvm.org/D38684
Change-Id: I709ac0b34641095397e994c8ff7e15d1315b3540
llvm-svn: 317458
Next step is to use them for the legacy FMA scalar intrinsics as well. This will enable the legacy intrinsics to use EVEX encoded opcodes and the extended registers.
llvm-svn: 317453
Use feature names instead of CPU names.
A future commit will add avx512vl command lines to demonstrate missed use of EVEX instructions.
llvm-svn: 317451
The Traced flag is unnecessary because we only need to set the index
to -1 to mark a symbol for tracing.
Differential Revision: https://reviews.llvm.org/D39672
llvm-svn: 317450
Now that DefinedRegular is the only remaining derived class of
Defined, we can merge the two classes.
Differential Revision: https://reviews.llvm.org/D39667
llvm-svn: 317448
Common symbols are now represented with a DefinedRegular that points
to a BssSection, even during symbol resolution.
Differential Revision: https://reviews.llvm.org/D39666
llvm-svn: 317447
I don't remember why I made shared symbols one type of defined symbols.
Shared symbols aren't undefined, so it could be considered defined, but
categorizing three symbols as:
- defined
- really defined
- shared
- undefined
is not as intuitive as
- defined
- shared
- undefined
to me. So, in this patch, I made a change to stop handling shared
symbols as defined symbols.
Surprisingly, I didn't have to update any tests for this change.
Differential Revision: https://reviews.llvm.org/D39394
llvm-svn: 317446
single-iteration loop
This fixes PR34681. Avoid adding the "Stride == 1" predicate when we know that
Stride >= Trip-Count. Such a predicate will effectively optimize a single
or zero iteration loop, as Trip-Count <= Stride == 1.
Differential Revision: https://reviews.llvm.org/D38785
llvm-svn: 317438
The TR6 document is expected to be publically released around November 15.
This patch does not implement OMPT for libomptarget.
Patch by Simon Convent and Joachim Protze
Differential Revision: https://reviews.llvm.org/D39182
llvm-svn: 317436
This is part of the renaming of data types from OpenMP TR4 to TR6
Patch by Simon Convent
Differential Revision: https://reviews.llvm.org/D39326
llvm-svn: 317435
reverted my changes will be committed later after fixing the failure
This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8.
Differential Revision: https://reviews.llvm.org/D39403
llvm-svn: 317433
This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8.
Differential Revision: https://reviews.llvm.org/D39403
llvm-svn: 317432
This is an implementation of PR26223.
Currently optimizeMemoryInst optimization tries to fold address computation
if all possible way to get compute the address are of the form
baseGV + base + scale * Index + offset
where scale and offset are constants and baseGV, base and Index are exactly
the same instructions if defined.
The patch extends this optimization to allow different bases. In this case
it tries to find/build a Phi node merging all possible bases and use this Phi node
as a base for sunk address computation. Also it supports Select instruction on
the way.
The main motivation for this scope extension is GCRelocateInst.
If there is a relocation of derived pointer it will be represented as relocation of base + offset.
Also there will be a Phi node merging address computation for relocated derived pointer
and derived pointer itself. If we have a Phi node merging original base and relocated base
and can fold the address computation of derived pointer then we can potentially reduce
the code size and Phi node for derived pointer. The later can have a positive impact to
register allocator.
Reviewers: efriedma, dberlin, mkazantsev, reames, john.brawn
Reviewed By: john.brawn
Subscribers: javed.absar, john.brawn, dneilson, llvm-commits
Differential Revision: https://reviews.llvm.org/D36073
llvm-svn: 317429
That class is used only by LinkerScript.cpp, so we should move it to
that file. Also, it no longer has to be a "factory" class. It can just
be a non-member function.
llvm-svn: 317427
r317396 changed the way how we handle the -defsym option. The option is
now handled using the infrastructure for the linker script.
We used to handle both -defsym and -wrap using the same set of functions
in the symbol table. Now, we don't need to do that.
This patch rewrites the functions so that they become more straightforward.
The new functions directly handle -wrap rather than abstract it.
llvm-svn: 317426
Basically a regression after r316268.
However the diagnostic is correct, but the test coverage is bad.
So just like rL316500, introduce yet more tests,
and adjust the release notes.
See https://bugs.llvm.org/show_bug.cgi?id=35200
llvm-svn: 317421
Summary:
AVX512 added RCP14 and RSQRT instructions which improve accuracy over the legacy RCP and RSQRT instruction, but not enough accuracy to remove the need for a Newton Raphson refinement.
Currently we use these new instructions for the legacy packed SSE instrinics, but not the scalar instrinsics. And we use it for fast math optimization of division and reciprocal sqrt.
I think switching the legacy instrinsics maybe surprising to the user since it changes the answer based on which processor you're using regardless of any fastmath settings. It's also weird that we did something different between scalar and packed.
As far at the reciprocal estimation, I think it creates unnecessary deltas in our output behavior (and prevents EVEX->VEX). A little playing around with gcc and icc and godbolt suggest they don't change which instructions they use here.
This patch adds new X86ISD nodes for the RCP14/RSQRT14 and uses those for the new intrinsics. Leaving the old intrinsics to use the old instructions.
Going forward I think our focus should be on
-Supporting 512-bit vectors, which will have to use the RCP14/RSQRT14.
-Using RSQRT28/RCP28 to remove the Newton Raphson step on processors with AVX512ER
-Supporting double precision.
Reviewers: zvi, DavidKreitzer, RKSimon
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39583
llvm-svn: 317413
Summary:
Posix core files sometime don't contain enough information to correctly
detect the OS. If that is the case we should use the OS from the target
instead as it will contain usable information in more cases and if the
target and the core contain different OS-es then we are already in a
pretty bad state so moving from an unknown OS to a known (but possibly
incorrect) OS will do no harm.
We already had similar code in place for MIPS. This change tries to make
it more generic by using ArchSpec::MergeFrom and extends it to all
architectures but some MIPS specific issue prevent us from getting rid
of special casing MIPS.
Reviewers: clayborg, nitesh.jain
Subscribers: aemerson, sdardis, arichardson, kristof.beyls, lldb-commits
Differential Revision: https://reviews.llvm.org/D36046
llvm-svn: 317411