There's a special case in EmitLoweredSelect() that produces an improved
lowering for cmov(cmov) patterns. However this special lowering is
currently broken if the inner cmov has multiple users so this patch
stops using it in this case.
If you wonder why this wasn't fixed by continuing to use the special
lowering and inserting a 2nd PHI for the inner cmov: I believe this
would incur additional copies/register pressure so the special lowering
does not improve upon the normal one anymore in this case.
This fixes http://llvm.org/PR26256 (= rdar://24329747)
llvm-svn: 258729
Instructions can be DCE'd after the RegStackify pass. If the instruction which
would be the pop for what would be a push is removed, don't use a push.
llvm-svn: 258694
This patch was originally committed as r257885, but was reverted due to windows
failures. The cause of these failures has been fixed under r258677, hence
re-committing the original patch.
llvm-svn: 258683
This patch was originally committed as r257884, but was reverted due to windows
failures. The cause of these failures has been fixed under r258677, hence
re-committing the original patch.
llvm-svn: 258682
This patch was originally committed as r257883, but was reverted due to windows
failures. The cause of these failures has been fixed under r258677, hence
re-committing the original patch.
llvm-svn: 258681
VPMADD52LUQ - Packed Multiply of Unsigned 52-bit Integers and Add the Low 52-bit Products to Qword Accumulators
VPMADD52HUQ - Packed Multiply of Unsigned 52-bit Unsigned Integers and Add High 52-bit Products to 64-bit Accumulators
Differential Revision: http://reviews.llvm.org/D16407
llvm-svn: 258680
This was originally committed as r255762, but reverted as it broke windows
bots. Re-commitiing the exact same patch, as the underlying cause was fixed by
r258677.
ARMv8.2-A adds 16-bit floating point versions of all existing VFP
floating-point instructions. This is an optional extension, so all of
these instructions require the FeatureFullFP16 subtarget feature.
The assembly for these instructions uses S registers (AArch32 does not
have H registers), but the instructions have ".f16" type specifiers
rather than ".f32" or ".f64". The top 16 bits of each source register
are ignored, and the top 16 bits of the destination register are set to
zero.
These instructions are mostly the same as the 32- and 64-bit versions,
but they use coprocessor 9 rather than 10 and 11.
Two new instructions, VMOVX and VINS, have been added to allow packing
and extracting two 16-bit floats stored in the top and bottom halves of
an S register.
New fixup kinds have been added for the PC-relative load and store
instructions, but no ELF relocations have been added as they have a
range of 512 bytes.
Differential Revision: http://reviews.llvm.org/D15038
llvm-svn: 258678
This is a fix for https://llvm.org/bugs/show_bug.cgi?id=22796.
The previous implementation of ClassInfo::operator< allowed cycles of classes
such that x < y < z < x, meaning that a list of them cannot be correctly
sorted, and the sort order could differ with different standard libraries.
The original implementation sorted classes by ValueName if they were otherwise
equal. This isn't strictly necessary, but some backends seem to accidentally
rely on it. If I reverse this comparison I get 8 test failures spread across
the AArch64, Mips and X86 backends, so I have left it in until those backends
can be fixed.
There was one case in the X86 backend where the observable behaviour of the
assembler is changed by this patch. This was because some of the memory asm
operands were not marked as children of X86MemAsmOperand.
Differential Revision: http://reviews.llvm.org/D16141
llvm-svn: 258677
Use AVX1 FP instructions (vmaskmovps/pd) in place of the AVX2 int instructions (vpmaskmovd/q).
Differential Revision: http://reviews.llvm.org/D16528
llvm-svn: 258675
Changes in X86.td:
I set features of Intel processors in incremental form: IVB = SNB + X HSW = IVB + X ..
I added Skylake client processor and defined it's features
FeatureADX was missing on KNL
Added some new features to appropriate processors SMAP, IFMA, PREFETCHWT1, VMFUNC and others
Differential Revision: http://reviews.llvm.org/D16357
llvm-svn: 258659
Generalised mask generation / subvector extraction to use the input/output types directly instead of an if/else through all the currently accepted types.
llvm-svn: 258645
Summary:
Previously, we would just output "foo = bar" in the assembly, and then
ptxas would choke. Now we die before emitting any invalid code.
Reviewers: echristo
Subscribers: jholewinski, llvm-commits, jhen, tra
Differential Revision: http://reviews.llvm.org/D16490
llvm-svn: 258638
If the INSERTPS zeroes out all the referenced elements from either of the 2 input vectors (and the input is not already UNDEF), then set that input to UNDEF to reduce dependencies.
llvm-svn: 258622
Some of the conditions necessary to produce ccmp sequences were only
checked in recursive calls to emitConjunctionDisjunctionTree() after
some of the earlier expressions were already built. Move all checks over
to isConjunctionDisjunctionTree() so they are all checked before we
start emitting instructions.
Also rename some variable to better reflect their usage.
llvm-svn: 258605
Previously it failed to add NumArgRegs to the offset and so clobbered an
already-used register. Now just start the numbering after the arg regs
and don't duplicate the add. Test coverage for this coming shortly with
the implementation of byval.
llvm-svn: 258597
The intrinsic target prefix should match the target name
as it appears in the triple.
This is not yet complete, but gets most of the important ones.
llvm.AMDGPU.* intrinsics used by mesa and libclc are still handled
for compatability for now.
llvm-svn: 258557
The promote alloca pass didn't handle these intrinsics and crashed.
These intrinsics should accept any address space, but for now just
erase them to avoid breaking.
llvm-svn: 258537
The current behavior is incorrect, as the two CCs returned by
changeFPCCToAArch64CC, intended to be OR'ed, are instead used
in an AND ccmp chain.
Consider:
define i32 @t(float %a, float %b, float %c, float %d, i32 %e, i32 %f) {
%cc1 = fcmp one float %a, %b
%cc2 = fcmp olt float %c, %d
%and = and i1 %cc1, %cc2
%r = select i1 %and, i32 %e, i32 %f
ret i32 %r
}
Assuming (%a < %b) and (%c < %d); we used to do:
fcmp s0, s1 # nzcv <- 1000
orr w8, wzr, #0x1 # w8 <- 1
csel w9, w8, wzr, mi # w9 <- 1
csel w8, w8, w9, gt # w8 <- 1
fcmp s2, s3 # nzcv <- 1000
cset w9, mi # w9 <- 1
tst w8, w9 # (w8 & w9) == 1, so: nzcv <- 0000
csel w0, w0, w1, ne # w0 <- w0
We now do:
fcmp s2, s3 # nzcv <- 1000
fccmp s0, s1, #0, mi # mi, so: nzcv <- 1000
fccmp s0, s1, #8, le # !le, so: nzcv <- 1000
csel w0, w0, w1, pl # !pl, so: w0 <- w1
In other words, we transformed:
(c < d) && ((a < b) || (a > b))
into:
(c < d) && (a u>= b) && (a u<= b)
whereas, per De Morgan's, we wanted:
(c < d) && !((a u>= b) && (a u<= b))
Note that this problem doesn't occur in the test-suite.
changeFPCCToAArch64CC produces disjunct CCs; here, one -> mi/gt.
We can't represent that in the fccmp chain; it can't express
arbitrary OR sequences, as one comment explains:
In general we can create code for arbitrary "... (and (and A B) C)"
sequences. We can also implement some "or" expressions, because
"(or A B)" is equivalent to "not (and (not A) (not B))" and we can
implement some negation operations. [...] However there is no way
to negate the result of a partial sequence.
Instead, introduce changeFPCCToANDAArch64CC, which produces the
conjunct cond codes:
- (a one b)
== ((a olt b) || (a ogt b))
== ((a ord b) && (a une b))
- (a ueq b)
== ((a uno b) || (a oeq b))
== ((a ule b) && (a uge b))
Note that, at first, one might think that, when PushNegate is true,
we should use the disjunct CCs, in effect doing:
(a || b)
= !(!a && !(b))
= !(!a && !(b1 || b2)) <- changeFPCCToAArch64CC(b, b1, b2)
= !(!a && !b1 && !b2)
However, we can take advantage of the fact that the CC is already
negated, which lets us avoid special-casing PushNegate and doing
the simpler to reason about:
(a || b)
= !(!a && (!b))
= !(!a && (b1 && b2)) <- changeFPCCToANDAArch64CC(!b, b1, b2)
= !(!a && b1 && b2)
This makes both emitConditionalCompare cases behave identically,
and produces correct ccmp sequences for the 2-CC fcmps.
llvm-svn: 258533
We verify that the op tree is eligible for CCMP emission in
isConjunctionDisjunctionTree, but it's also possible that
emitConjunctionDisjunctionTree fails later.
The initial check is useful, as it avoids building nodes
that will get discarded.
Still, make sure that inconsistencies don't happen with
an assert.
llvm-svn: 258532
Summary:
SETCC with f16 vectors has OperationAction set to Expand but still gets
lowered to FCM* intrinsics based on its result type. This patch skips
lowering of VSETCC if the operand is an f16 vector.
v4 and v8 tests included.
Reviewers: ab, jmolloy
Subscribers: srhines, llvm-commits
Differential Revision: http://reviews.llvm.org/D15361
llvm-svn: 258471
Better handling of the annoying pshuflw/pshufhw ops which only shuffle lower/upper halves of a vector.
Added vXi16 unary shuffle support for cases where i16 elements (from the same half of the source) are being splatted to the whole of one of the halves. This avoids the general lowering case which must shuffle the 32-bit elements first - meaning that we used to end up with unnecessary duplicate pshuflw/pshufhw shuffles.
Note this has the side effect of a lot of SSSE3 test cases no longer needing to use PSHUFB, as it falls below the 3 op combine threshold for when PSHUFB is typically worth it. I've raised PR26183 to discuss if the threshold should be changed and whether we need to make it more specific to the target CPU.
Differential Revision: http://reviews.llvm.org/D14901
llvm-svn: 258440
Summary:
And use it in PPCLoopDataPrefetch.cpp.
@hfinkel, please let me know if your preference would be to preserve the
ppc-loop-prefetch-cache-line option in order to be able to override the
value of TTI::getCacheLineSize for PPC.
Reviewers: hfinkel
Subscribers: hulx2000, mcrosier, mssimpso, hfinkel, llvm-commits
Differential Revision: http://reviews.llvm.org/D16306
llvm-svn: 258419
Summary:
This is now the same as the behaviour of the GNU assembler. This was done
as it is required in order to build the Linux kernel with the integrated
assembler enabled.
Reviewers: dsanders, vkalintiris
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D13594
llvm-svn: 258400
There's an overloading of the "movsd" and "cmpsd" instructions, e.g. movsd can be either "Move Data from String to String" or "Move or Merge Scalar Double-Precision Floating-Point Value".
The former should produce warnings when parsing a memory operand that is not ESI/EDI, but the latter should not.
Fixed the code to produce warnings only after making sure we're dealing with the first case.
Expanded the tests of the produced warnings + fixed RUN line of the test so that it would check both stdout and stderr
Differential Revision: http://reviews.llvm.org/D16359
llvm-svn: 258393
Summary:
Currently the SI scheduler can be selected via command line option,
but it turned out it would be better if it was selectable via a Target Attribute.
This patch adds "si-scheduler" attribute to the backend.
Reviewers: tstellarAMD, echristo
Subscribers: echristo, arsenm
Differential Revision: http://reviews.llvm.org/D16192
llvm-svn: 258386
Summary:
While working on uniform branching, I've hit a few cases where we emit
i1 SETCC operations.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D16233
llvm-svn: 258352
There was support for writing the AArch64 big endian data fixup entries in
the .eh_frame section in BE. This is changed to write all such fixup
entries in BE with no restriction on the section. This is similar to
the existing support for fixup entries for ARM.
A test is added to check the length field in the .debug_line section as
this is an example of where such a fixup occurs.
Differential Revision: http://reviews.llvm.org/D16064
llvm-svn: 258320
There was a bug in my rL258132 because there's an overloading of the "movsd" and "cmpsd" instructions, e.g. movsd can be either "Move Data from String to String" (the case I wanted to handle) or "Move or Merge Scalar Double-Precision Floating-Point Value" (the case that causes the asserts).
Added code for escaping the unfamiliar scenarios and falling back to old behviour.
Also changed the asserts to llvm_unreachable.
llvm-svn: 258312
The AArch64 .inst directive was implemented using EmitIntValue, which resulted
in both $x and $d (code and data) mapping symbols being emitted at the same
address. This fixes it to only emit the $x mapping symbol.
EmitIntValue also emits the value in big-endian order when targeting big-endian
systems, but instructions are always emitted in little-endian order for
AArch64.
Differential Revision: http://reviews.llvm.org/D16349
llvm-svn: 258308
calling convention.
The implementation of the related callbacks in the x86 backend for such
functions are not ready to deal with a prologue block that is not the entry
block of the function.
This fixes PR26107, but the longer term solution would be to fix those callbacks.
llvm-svn: 258221
As vector shuffles can only reference two inputs many (V)INSERTPS patterns end up being split over two targets shuffles.
This patch adds combines to attempt to combine (V)INSERTPS nodes with input/output nodes that are just zeroing out these additional vector elements.
Differential Revision: http://reviews.llvm.org/D16072
llvm-svn: 258205
Summary:
GEPOperator: provide getResultElementType alongside getSourceElementType.
This is made possible by adding a result element type field to GetElementPtrConstantExpr, which GetElementPtrInst already has.
GEP: replace get(Pointer)ElementType uses with get{Source,Result}ElementType.
Reviewers: mjacob, dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16275
llvm-svn: 258145
Teach the register stackifier to rematerialize constants that have multiple
uses instead of leaving them in registers. In the WebAssembly encoding, it's
the same code size to materialize most constants as it is to read a value
from a register.
llvm-svn: 258142
According to x86 spec "xlat m8" is a legal instruction and it is equivalent to "xlatb".
Differential Revision: http://reviews.llvm.org/D15150
llvm-svn: 258135
The following are legal according to X86 spec:
ins mem, DX
outs DX, mem
lods mem
stos mem
scas mem
cmps mem, mem
movs mem, mem
Differential Revision: http://reviews.llvm.org/D14827
llvm-svn: 258132
This breaks the tests that were meant for testing
64-bit inline immediates, so move those to shl where
they won't be broken up.
This should be repeated for the other related bit ops.
llvm-svn: 258095