AND/BIC instructions do accept SP/PC, so the register class should be
more generic (rGPR -> GPR) to cope with that case. Adding more tests.
llvm-svn: 255034
Summary:
We don't check the size operand on ext/dext*/ins/dins* yet because the
permitted range depends on the pos argument and we can't check that using
this mechanism.
The bug was that dextu/dinsu accepted 0..31 in the pos operand instead of 32..63.
Reviewers: vkalintiris
Subscribers: llvm-commits, dsanders
Differential Revision: http://reviews.llvm.org/D15190
llvm-svn: 255015
ARMv8.2-A adds 16-bit floating point versions of all existing SIMD
floating-point instructions. This is an optional extension, so all of
these instructions require the FeatureFullFP16 subtarget feature.
Note that VFP without SIMD is not a valid combination for any version of
ARMv8-A, but I have ensured that these instructions all depend on both
FeatureNEON and FeatureFullFP16 for consistency.
The ".2h" vector type specifier is now legal (for the scalar pairwise
reduction instructions), so some unrelated tests have been modified as
different error messages are emitted. This is not a problem as the
invalid operands are still caught.
llvm-svn: 255010
Reduces the scope over which the struct is visible, making its usages
obvious. I did not move structs in cases where this wasn't a clear
win (the struct is too large, or is grouped in some other interesting
way).
llvm-svn: 255003
The StringRef constructor is unnecessary (since we're converting to
std::string anyway), and having it requires an explicit call to
StringRef's or std::string's constructor.
llvm-svn: 255000
Currently, vectors of halfs end up as ConstantVectors, but there isn't
a good reason they can't be ConstantDataVectors. This should save some
memory.
llvm-svn: 254991
It's strange to duplicate the logic for emitting FP values into
emitGlobalConstantDataSequential, and it's even stranger that we end
up printing the verbose assembly comments differently between the two
paths. Just call into emitGlobalConstantFP rather than crudely
duplicating its logic.
llvm-svn: 254988
Summary:
Also add a stricter post-condition for IndVarSimplify.
Fixes PR25578. Test case by Michael Zolotukhin.
Reviewers: hfinkel, atrick, mzolotukhin
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15059
llvm-svn: 254977
Summary:
(Note: the problematic invocation of hoistIVInc that caused PR24804 came
from IndVarSimplify, not from SCEVExpander itself)
Fixes PR24804. Test case by David Majnemer.
Reviewers: hfinkel, majnemer, atrick, mzolotukhin
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15058
llvm-svn: 254976
We were using unneccessarily large initial sizes for these SmallVectors. This was wasting around 50kb of memory for the O3 pipeline, even after the uniquing changes. We're still using around 20kb which is a bit much, but it's definitely better. This is about a 6% improvement in total O3 memory usage.
Note: The raw data on structure size which were used to pick these thresholds can be found in the review thread.
Differential Revision: http://reviews.llvm.org/D15244
llvm-svn: 254974
A manipulation (in this case, mkdir) can make slack between creating and touching %t.older/evenlen.
I would make this rewrote with python if this were still unstable.
llvm-svn: 254965
254950 ended up being not NFC. The previous code was overriding the flags for whether an instruction read or wrote memory using the target specific flags returned via TTI. I'd missed this in my refactoring. Since I mistakenly built only x86 and didn't notice the number of unsupported tests, I didn't catch that before the original checkin.
This raises an interesting issue though. Given we have function attributes (i.e. readonly, readnone, argmemonly) which describe the aliasing of intrinsics, why does TTI have this information overriding the instruction definition at all? I see no reason for this, but decided to preserve existing behavior for the moment. The root issue might be that we don't have a "writeonly" attribute.
Original commit message:
[EarlyCSE] Simplify and invert ParseMemoryInst [NFCI]
Restructure ParseMemoryInst - which was introduced to abstract over target specific load and stores instructions - to just query the underlying instructions. In theory, this could be slightly slower than caching the results, but in practice, it's very unlikely to be measurable.
The simple query scheme makes it far easier to understand, and much easier to extend with new queries. Given I'm about to need to add new query types, doing the cleanup first seemed worthwhile.
Do we still believe the target specific intrinsic handling is worthwhile in EarlyCSE? It adds quite a bit of complexity and makes the code harder to read. Being able to delete the abstraction entirely would be wonderful.
llvm-svn: 254957
This is supposed to force-link the Interpreter, by inserting a dead
call to LLVMLinkInInterpreter().
Since it is actually an empty function, there is no reason for the
call to be dead.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 254956
Restructure ParseMemoryInst - which was introduced to abstract over target specific load and stores instructions - to just query the underlying instructions. In theory, this could be slightly slower than caching the results, but in practice, it's very unlikely to be measurable.
The simple query scheme makes it far easier to understand, and much easier to extend with new queries. Given I'm about to need to add new query types, doing the cleanup first seemed worthwhile.
Do we still believe the target specific intrinsic handling is worthwhile in EarlyCSE? It adds quite a bit of complexity and makes the code harder to read. Being able to delete the abstraction entirely would be wonderful.
llvm-svn: 254950
When considering foo->bar inlining, if there is an indirect call in foo which gets resolved to a direct call (say baz), then we try to inline baz into bar with a threshold T and subtract max(T - Cost(bar->baz), 0) from Cost(foo->bar). This patch uses max(Threshold(bar->baz) - Cost(bar->baz)) instead, where Thresheld(bar->baz) could be different from T due to bonuses or subtractions. Threshold(bar->baz) - Cost(bar->baz) better represents the desirability of inlining baz into bar.
Differential Revision: http://reviews.llvm.org/D14309
llvm-svn: 254945
Summary:
Add a field on the PassManagerBuilder that clang or gold can use to pass
down a pointer to the function index in memory to use for importing when
the ThinLTO backend is triggered. Add support to supply this to the
function import pass.
Reviewers: joker.eph, dexonsmith
Subscribers: davidxl, llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D15024
llvm-svn: 254926
The 2-element vector case shows a surprising bug: we failed to
eliminate ops on undefs, so there are 4 fmax calls even though
there can only be 2 valid elements in the inputs.
llvm-svn: 254920
This is needed to support linking of module-level metadata as a
postpass after function importing, where we will be leaving temporary
metadata on imported instructions until the postpass metadata import.
Also added unittest. Split from D14838.
llvm-svn: 254914
FP logic instructions are supported in DQ extension on AVX-512 target.
I use integer operations instead.
Added tests.
I also enabled FABS in this patch in order to check ANDPS.
The operations are FOR, FXOR, FAND, FANDN.
The instructions, that supported for 512-bit vector under DQ are:
VORPS/PD, VXORPS/PD, VANDPS/PD, FANDNPS/PD.
Differential Revision: http://reviews.llvm.org/D15110
llvm-svn: 254913
Summary: This reverts r254234, and adds a simple fix for the annoying case of use-after-free.
Reviewers: rengolin
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D15236
llvm-svn: 254912
Summary:
valid-xfail.s is for instructions that should be valid in the given ISA but
incorrectly fail. DSP/DSPr2 instructions are correct to fail since DSP/DSPr2 is
not enabled.
Reviewers: vkalintiris
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D15072
llvm-svn: 254911
Patterns were missing for KNL target for <8 x i32>, <8 x float> masked load/store.
This intrinsic comes with all legal types:
<8 x float> @llvm.masked.load.v8f32(<8 x float>* %addr, i32 align, <8 x i1> %mask, <8 x float> %passThru),
but still requires lowering, because VMASKMOVPS, VMASKMOVDQU32 work with 512-bit vectors only.
All data operands should be widened to 512-bit vector.
The mask operand should be widened to v16i1 with zeroes.
Differential Revision: http://reviews.llvm.org/D15265
llvm-svn: 254909
Now instead of changing it to the new format and then linking, it just
handles the old format while copying it over.
The main differences are:
* There is no rauw in the source module.
* An old format input is always upgraded.
The first item helps with having a sane API that passes in a GV list to
the linker.
The second one is a small step in deprecating the old format.
llvm-svn: 254907
Summary:
We are inserting both Scope and SP into the Seen map and check whether
it was already there in which case we skip the validation (the idea
being that we already checked this Subprogram before). However,
if (Scope == SP) as MDNodes, then inserting the Scope, will trigger
the Seen check causing us to incorrectly not validate this !dbg
attachment. Fix this by not performing the SP Seen check if Scope == SP
Reviewers: pcc, dexonsmith, dblaikie
Subscribers: dblaikie, llvm-commits
Differential Revision: http://reviews.llvm.org/D14697
llvm-svn: 254887
Also, switch to x86-64 because once we can lower these to something
more reasonable, there will be less noise in the checks. And add
AVX runs because those will be different than SSE.
llvm-svn: 254879
According to x86 spec, loopz and loopnz should be supported for Intel syntax, where loopz is equivalent to loope and loopnz is equivalent to loopne.
Differential Revision: http://reviews.llvm.org/D15148
llvm-svn: 254877
This removes the code path that generate "synchronous" (only correct at call site) CFA.
We will probably want to re-introduce it once we are capable of emitting different
.eh_frame and .debug_frame sections.
Differential Revision: http://reviews.llvm.org/D14948
llvm-svn: 254874
Summary:
There are `SelectPatternFlavor`s that don't represent min or max idioms,
and we should not be passing those to `getCmpPredicateForMinMax`.
Fixes PR25745.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15249
llvm-svn: 254869