This is the last step needed to fix PR33325:
https://bugs.llvm.org/show_bug.cgi?id=33325
We're trading branch and compares for loads and logic ops.
This makes the code smaller and hopefully faster in most cases.
The 24-byte test shows an interesting construct: we load the trailing scalar
elements into vector registers and generate the same pcmpeq+movmsk code that
we expected for a pair of full vector elements (see the 32- and 64-byte tests).
Differential Revision: https://reviews.llvm.org/D41714
llvm-svn: 321934
Without this we allow "vmovd %rax, %xmm0", but not "vmovd %rax, %xmm16"
This exists due to continue a silly bug where really old versions of the GNU assembler required movd instead of movq on these instructions. This compatibility hack then crept forward to avx version too, but we didn't propagate it to avx512.
llvm-svn: 321903
This behavior existed to work with an old version of the gnu assembler on MacOS that only accepted this form. Newer versions of GNU assembler and the current LLVM derived version of the assembler on MacOS support movq as well.
llvm-svn: 321898
Currently the assembler would accept, e.g. `ldr r0, [s0, #12]` and similar.
This patch add checks that only general-purpose registers are used in address
operands, shifted registers, and shift amounts.
Differential revision: https://reviews.llvm.org/D39910
llvm-svn: 321866
Instead of using, for example, `dup v0.4s, wzr`, which transfers between
register files, use the more efficient `movi v0.4s, #0` instead.
Differential revision: https://reviews.llvm.org/D41515
llvm-svn: 321824
Wide Thumb2 instructions should be emitted into the object file as pairs of
16-bit words of the appropriate endianness, not one 32-bit word.
Differential revision: https://reviews.llvm.org/D41185
llvm-svn: 321799
Select G_PHI to PHI and manually constrain the result register. This is
very similar to how COPY is handled, so extract and reuse some of that
code.
llvm-svn: 321797
We used to handle G_CONSTANT with pointer type by forcing the type of
the result register to s32 and then letting TableGen handle it.
Unfortunately, setting the type only works for generic virtual
registers, that haven't yet been constrained to a register class (e.g.
those used only by a COPY later on). If the result register has already
been constrained as a use of a previously selected instruction, then
setting the type will assert.
It would be nice to be able to teach TableGen to select pointer
constants the same as integer constants, but since it's such an edge
case (at the moment the only pointer constant that we're generally
interested in is 0, and that is mostly used for comparisons and selects,
which are also not supported by TableGen) it's probably not worth the
effort right now. Instead, handle pointer constants with some trivial
handwritten code.
llvm-svn: 321793
This custom inserter was added in r124272 at which time it added about bunch of Defs for Win64. In r150708, those defs were removed leaving only the "return BB". So I think this means the custom inserter is a NOP these days.
This patch removes the remaining code and stops tagging the instructions for custom insertion
Differential Revision: https://reviews.llvm.org/D41671
llvm-svn: 321747
Currently we use SIGN_EXTEND in lowerMasksToReg as part of calling convention setup, but we don't require a specific value for the upper bits.
This patch changes it to ANY_EXTEND which will be lowered as SIGN_EXTEND if it ends up sticking around.
llvm-svn: 321746
After D41349, we can now directly access MCSubtargetInfo from
createARM*AsmBackend. This patch makes use of this, avoiding the need to
create a fresh MCSubtargetInfo (which was previously always done with a blank
CPU and feature string). Given the total size of the change remains pretty
tiny and we're removing the old explicit destructor, I changed the STI field
to a reference rather than a pointer.
Differential Revision: https://reviews.llvm.org/D41693
llvm-svn: 321707
Summary:
Add a register class for SVE predicate operands that can only be p0-p7 (as opposed to p0-p15)
Patch [1/3] in a series to add predicated ADD/SUB instructions for SVE.
Reviewers: rengolin, mcrosier, evandro, fhahn, echristo, olista01, SjoerdMeijer, javed.absar
Reviewed By: fhahn
Subscribers: aemerson, javed.absar, tschuett, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D41441
llvm-svn: 321699
Currently it's not possible to access MCSubtargetInfo from a TgtMCAsmBackend.
D20830 threaded an MCSubtargetInfo reference through
MCAsmBackend::relaxInstruction, but this isn't the only function that would
benefit from access. This patch removes the Triple and CPUString arguments
from createMCAsmBackend and replaces them with MCSubtargetInfo.
This patch just changes the interface without making any intentional
functional changes. Once in, several cleanups are possible:
* Get rid of the awkward MCSubtargetInfo handling in ARMAsmBackend
* Support 16-bit instructions when valid in MipsAsmBackend::writeNopData
* Get rid of the CPU string parsing in X86AsmBackend and just use a SubtargetFeature for HasNopl
* Emit 16-bit nops in RISCVAsmBackend::writeNopData if the compressed instruction set extension is enabled (see D41221)
This change initially exposed PR35686, which has since been resolved in r321026.
Differential Revision: https://reviews.llvm.org/D41349
llvm-svn: 321692
This is an extension of D31156 with the goal that we'll allow memcmp() == 0 expansion
for x86 to use 2 pairs of loads per block.
The memcmp expansion pass (formerly part of CGP) will generate this kind of pattern
with oversized integer compares, so we want to transform these into x86-specific vector
nodes before legalization splits things into scalar chunks.
See PR33325 for more details:
https://bugs.llvm.org/show_bug.cgi?id=33325
Differential Revision: https://reviews.llvm.org/D41618
llvm-svn: 321656
Tests updated to explicitly use fast-isel at -O0 instead of implicitly.
This change also allows an explicit -fast-isel option to override an
implicitly enabled global-isel. Otherwise -fast-isel would have no effect at -O0.
Differential Revision: https://reviews.llvm.org/D41362
llvm-svn: 321655
Summary:
isReg() in AArch64AsmParser.cpp is a bit of a misnomer, and would be better named 'isScalarReg()' instead.
Patch [1/3] in a series to add operand constraint checks for SVE's predicated ADD/SUB.
Reviewers: rengolin, mcrosier, evandro, fhahn, echristo
Reviewed By: fhahn
Subscribers: aemerson, javed.absar, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D41445
llvm-svn: 321646
Currently the promotion for these ignores the normal getTypeToPromoteTo and instead just tries to double the element width. This is because the default behavior of getTypeToPromote to just adds 1 to the SimpleVT, which has the affect of increasing the element count while keeping the scalar size the same.
If multiple steps are required to get to a legal operation type, int_to_fp will be promoted multiple times. And fp_to_int will keep trying wider types in a loop until it finds one that works.
getTypeToPromoteTo does have the ability to query a promotion map to get the type and not do the increasing behavior. It seems better to just let the target specify the promotion type in the map explicitly instead of letting the legalizer iterate via widening.
FWIW, it's worth I think for any other vector operations that need to be promoted, we have to specify the type explicitly because the default behavior of getTypeToPromote isn't useful for vectors. The other types of promotion already require either the element count is constant or the total vector width is constant, but neither happens by incrementing the SimpleVT enum.
Differential Revision: https://reviews.llvm.org/D40664
llvm-svn: 321629
We can use zmm move with zero masking for this. We already had patterns for using a masked move, but we didn't check for the zero masking case separately.
llvm-svn: 321612
The CONCAT_VECTORS will be lowered to INSERT_SUBVECTOR later. In the modified cases this seems to be enough to trick a later DAG combine into running in a different order than allows the ANDs to be removed.
I'll admit this is a bit of a hack that happens to work, but using CONCAT_VECTORS is more consistent with other legalization code anyway.
llvm-svn: 321611
Don't combine buildvector(binop(),binop(),binop(),binop()) -> binop(buildvector(), buildvector()) if its a splat - keep the binop scalar and just splat the result to avoid large vector constants.
llvm-svn: 321607
We end up using an i8 load via an isel pattern from v8i1 anyway. This just makes it more explicit. This seems to improve codgen in some cases and I'd like to kill off some of the load patterns.
llvm-svn: 321598
If the callee and caller use different calling convensions, we cannot apply TCO if the callee requires arguments on stack; e.g. C calling convention and Fast CC use the same registers for parameter passing, but the stack offset is not necessarily same.
This patch also recommit r319218 "[PowerPC] Allow tail calls of fastcc functions from C CallingConv functions." by @sfertile since the problem reported in r320106 should be fixed.
Differential Revision: https://reviews.llvm.org/D40893
llvm-svn: 321579
Initially, if the `c` constraint applied to the wrong data type that
causes LLVM to assert. This commit replaces the assert by an error
message.
llvm-svn: 321565
Currently all images are lowered to have a single
image PseudoSourceValue. Image stores happen to have
overly strict mayLoad/mayStore/hasSideEffects flags
set on them, so this happens to work. When these
are fixed to be correct, the scheduler breaks
this because the identical PSVs are assumed to
be the same address. These need to be unique
to the image resource value.
llvm-svn: 321555
As noted in PR34686, we are relying on a PSHUFD+PSHUFLW+PSHUFHW shuffle chain for most general vXi16 unary shuffles.
This patch checks for simpler PSHUFLW+PSHUFD and PSHUFHW+PSHUFD cases beforehand, building on some existing code that just handled splat shuffles.
By doing so we also prevent premature use of PSHUFB shuffles which can be slower and require the creation/loading of constant shuffle masks.
We now have the 'fast-variable-shuffle' option for hardware that prefers combining 2 or more shuffles to VPSHUFB etc.
Differential Revision: https://reviews.llvm.org/D38318
llvm-svn: 321553
Revision 320791 introduced a pass that transforms reg+reg instructions to
reg+imm if they're fed by "load immediate". However, it didn't
handle out-of-range shifts correctly as reported in PR35688.
This patch fixes that and therefore the PR.
Furthermore, there was undefined behaviour in the patch where the RHS of an
initialization expression was 32 bits and constant `1` was shifted left 32
bits. This was fixed by ensuring the RHS is 64 bits just like the LHS.
Differential Revision: https://reviews.llvm.org/D41369
llvm-svn: 321551
Previously we used an extend from v8i1 to v8i32/v8i64. Then extracted to the final width. But if we have VLX we should extract first. This way we don't end up with an overly large extend.
This allows us to use vcmpeq to make all ones for the sign extend when DQI isn't available. Otherwise we get a VPTERNLOG.
If we make v2i1/v4i1 legal like proposed in D41560, we could always do this and rely on the lowering of the extend to widen when necessary.
llvm-svn: 321538
-Use MinAlign instead of std::min.
-Use SelectionDAG::getMemBasePlusOffset.
-Apply offset to the pointer info for the second load/store created.
llvm-svn: 321536
The exception handler thunk needs to reference the LSDA of the parent
function, which won't be emitted if it's available_externally.
Fixes PR35736. ThinLTO ends up producing available_externally functions
that use _CxxFrameHandler3.
llvm-svn: 321532
If there are 17 or more leading zeros to the v4i32 elements, then we can use PMADD for the integer multiply when PMULLD is unavailable or slow.
The 17 bits need to be zero as the PMADDWD performs a v8i16 signed-mul-extend + pairwise-add - the upper 16 so we're adding a zero pair and the 17th bit so we don't incorrectly sign extend.
Differential Revision: https://reviews.llvm.org/D41484
llvm-svn: 321516
My original implementation ran as a DAG combine post type legalization, but it turns out we don't run that DAG combine step if type legalization didn't change anything. Attempts to make the combine run before type legalization as well hit other issues.
So just do it in LowerMUL where we can catch more cases.
llvm-svn: 321496
r319980 added new patterns to the machine combiner for transforming (fsub (fmul
x y) z) into (fmla (fneg z) x y). That is, fsub's where the first source
operand is an fmul are transformed. We previously only matched the case where
the second source operand of an fsub was an fmul, transforming (fsub z (fmul x
y)) into (fmls z x y). Now, if we have an fsub where both source operands are
fmuls, both of the above patterns are applicable.
However, the order in which we add the patterns to the list of candidates
determines the transformation that takes place, since only the first pattern
that matches will be used. This patch changes the order these two patterns are
added to the list of candidates such that we prefer the case where the second
source operand is an fmul (the fmls case), rather than the other one (the
fmla/fneg case). When both source operands are fmuls, this ordering results in
fewer instructions.
Differential Revision: https://reviews.llvm.org/D41587
llvm-svn: 321491
Returning SDValue() means nothing changed, SDValue(N,0) means there was a change but the worklist management was taken care of.
I don't know if this has a real effect other than making sure the combine counter in the DAG combiner gets updated, but it is the correct thing to do.
llvm-svn: 321463
Normally we catch this during lowering, but vXi64 mul is considered legal when we have AVX512DQ.
This DAG combine allows us to avoid PMULLQ with AVX512DQ if we can prove its unnecessary. PMULLQ is 3 uops that take 4 cycles each. While pmuldq/pmuludq is only one 4 cycle uop.
llvm-svn: 321437
Match regular x87 memory fold instructions with load/sideeffects tags, to prevent the schedulers from re-ordering them across the fnstcw/fldcw sequences for truncating stores while they are still pseudo during the stack conversion pass.
llvm-svn: 321424
Previously we extended v2i1 to v2f64 and then tried to use cvtuqq2pd/cvtqq2pd, but that only works with avx512dq. So we ended up scalarizing it. Now we widen to v4i1 first and extend to v4i32.
llvm-svn: 321420
Immediately after it is created we check if its equal to another EVT. Then we inconsistently use one or the other variables in the code below.
Instead do the equality check directly on the getValueType result and remove the variable. Use the origina VT variable throughout the remaining code.
llvm-svn: 321406
getOperand returns an SDValue that contains the node and the result number. There is no guarantee that the result number if 0. By using the -> operator we are calling SDNode::getValueType rather than SDValue::getValueType. This requires supplying a result number and we shouldn't assume it was 0.
I don't have a test case. Just noticed while cleaning up some other code and saw that it occurred in other places.
llvm-svn: 321397
Re-land r321234. It had to be reverted because it broke the shared
library build. The shared library build broke because there was a
missing LLVMBuild dependency from lib/Passes (which calls
TargetMachine::getTargetIRAnalysis) to lib/Target. As far as I can
tell, this problem was always there but was somehow masked
before (perhaps because TargetMachine::getTargetIRAnalysis was a
virtual function).
Original commit message:
This makes the TargetMachine interface a bit simpler. We still need
the std::function in TargetIRAnalysis to avoid having to add a
dependency from Analysis to Target.
See discussion:
http://lists.llvm.org/pipermail/llvm-dev/2017-December/119749.html
I avoided adding all of the backend owners to this review since the
change is simple, but let me know if you feel differently about this.
Reviewers: echristo, MatzeB, hfinkel
Reviewed By: hfinkel
Subscribers: jholewinski, jfb, arsenm, dschuff, mcrosier, sdardis, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, llvm-commits
Differential Revision: https://reviews.llvm.org/D41464
llvm-svn: 321375
Despite what the comment said there isn't better codegen for 512-bit vectors. The 128/256/512 bit implementation jus stores to memory and loads an element. There's no advantage to doing that with a larger size. In fact in many cases it causes a stack realignment and generates worse code.
llvm-svn: 321369
Pointer constants are pretty rare, since we usually represent them as
integer constants and then cast to pointer. One notable exception is the
null pointer constant, which is represented directly as a G_CONSTANT 0
with pointer type. Mark it as legal and make sure it is selected like
any other integer constant.
llvm-svn: 321354
Previously prefetch was only considered legal if sse was enabled, but it should be supported with 3dnow as well.
The prfchw flag now imply at least some form of prefetch without the write hint is available, either the sse or 3dnow version. This is true even if 3dnow and sse are explicitly disabled.
Similarly prefetchwt1 feature implies availability of prefetchw and the the prefetcht0/1/2/nta instructions. This way we can support _MM_HINT_ET0 using prefetchw and _MM_HINT_ET1 with prefetchwt1. And its assumed that if we have levels for the write hint we would have levels for the non-write hint, thus why we enable the sse prefetch instructions.
I believe this behavior is consistent with gcc. I've updated the prefetch.ll to test all of these combinations.
llvm-svn: 321335
The penalty is currently getting applied in a bunch of places where it
doesn't make sense, like bitcasts (which are free) and calls (which
were getting the call penalty applied twice). Instead, just apply the
penalty to binary operators and floating-point casts.
While I'm here, also fix getFPOpCost() to do the right thing in more
cases, so we don't have to dig into function attributes.
Differential Revision: https://reviews.llvm.org/D41522
llvm-svn: 321332
This should only affect what we do for v8i16. Previously we went to v8i64, but if we have VLX we only need v8i32. This prevents an unnecessary zmm usage.
llvm-svn: 321303
We should have equally good shuffle options for v8i32 with VLX. This was spotted during my attempts to remove 512-bit vectors from SKX.
We still use 512-bits for v16i1, v32i1, and v64i1. I'm less sure we can handle those well with narrower vectors. i32 and i64 element sizes get the best shuffle support.
llvm-svn: 321291
The build failure was caused by an assertion in pre-legalization DAGCombine:
Combining: t6: ppcf128 = uint_to_fp t5
... into: t20: f32 = PPCISD::FCFIDUS t19
which is clearly wrong since ppcf128 are definitely different type with f32 and
we cannot change the node value type when do DAGCombine. The fix is don't
handle ppc_fp128 or i1 conversions in PPCTargetLowering::combineFPToIntToFP and
leave it to downstream to legalize it and expand it to small legal types.
Differential Revision: https://reviews.llvm.org/D41411
llvm-svn: 321276
Summary:
This makes the TargetMachine interface a bit simpler. We still need
the std::function in TargetIRAnalysis to avoid having to add a
dependency from Analysis to Target.
See discussion:
http://lists.llvm.org/pipermail/llvm-dev/2017-December/119749.html
I avoided adding all of the backend owners to this review since the
change is simple, but let me know if you feel differently about this.
Reviewers: echristo, MatzeB, hfinkel
Reviewed By: hfinkel
Subscribers: jholewinski, jfb, arsenm, dschuff, mcrosier, sdardis, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, llvm-commits
Differential Revision: https://reviews.llvm.org/D41464
llvm-svn: 321234
The AArch64 backend contains code to optimize {s,u}{add,sub}.with.overflow during SelectionDAG. This commit ports that code to the ARM backend.
Differential revision: https://reviews.llvm.org/D35635
llvm-svn: 321224
Add support for 'objdump -print-imm-hex' for imm64, operand imm
and branch target. If user programs encode immediate values
as hex numbers, such an option will make it easy to correlate
asm insns with source code. This option also makes it easy
to correlate imm values with insn encoding.
There is one changed behavior in this patch. In old way, we
print the 64bit imm as u64:
O << (uint64_t)Op.getImm();
and the new way is:
O << formatImm(Op.getImm());
The formatImm is defined in llvm/MC/MCInstPrinter.h as
format_object<int64_t> formatImm(int64_t Value)
So the new way to print 64bit imm is i64 type.
If a 64bit value has the highest bit set, the old way
will print the value as a positive value and the
new way will print as a negative value. The new way
is consistent with x86_64.
For the code (see the test program):
...
if (a == 0xABCDABCDabcdabcdULL)
...
x86_64 objdump, with and without -print-imm-hex, looks like:
48 b8 cd ab cd ab cd ab cd ab movabsq $-6067004223159161907, %rax
48 b8 cd ab cd ab cd ab cd ab movabsq $-0x5432543254325433, %rax
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 321215
Multiple Closure objects can be created and stored for a single function. It's not a good idea to devote so many fields of it to storing pointers and references to global data structures of the pass. The closure class should only store the things needed to represent the closure itself.
This patch refactors many of the methods of Closure to belong to the pass object and to pass around a reference to the current Closure. The Closure class gains a few simple methods to add instructions and edges, and to return iterators to edges and instructions
Differential Revision: https://reviews.llvm.org/D41327
llvm-svn: 321213
Gather/scatter can implicitly sign extend from i32->i64 on indices. So if we know the sign bit of the input to a zext is 0 we can use the implicit extension.
llvm-svn: 321209
The function createTailCallBranchInstr assumes that the iterator MBBI is valid.
However, only one use of MBBI is guarded in the function.
Fix this by adding an assert.
Differential Revision: https://reviews.llvm.org/D41358
llvm-svn: 321205
This patch turns shuffles of fadd/fsub with fmul into fmsubadd.
Patch by Dmitry Venikov
Differential Revision: https://reviews.llvm.org/D40335
llvm-svn: 321200
We get an assertion in RegBankSelect for code along the lines of
my_32_bit_int = my_64_bit_int, which tends to translate into a 64-bit
load, followed by a G_TRUNC, followed by a 32-bit store. This appears in
a couple of places in the test-suite.
At the moment, the legalizer doesn't distinguish between integer and
floating point scalars, so a 64-bit load will be marked as legal for
targets with VFP, and so will the rest of the sequence, leading to a
slightly bizarre G_TRUNC reaching RegBankSelect.
Since the current support for 64-bit integers is rather immature, this
patch works around the issue by explicitly handling this case in
RegBankSelect and InstructionSelect. In the future, we may want to
revisit this decision and make sure 64-bit integer loads are narrowed
before reaching RegBankSelect.
llvm-svn: 321165
Summary:
Implement lower of unsigned saturation on an interval [0, k] where k + 1 is a power of two using USAT instruction in a similar way to how [~k, k] is lowered using SSAT on ARM models that supports it.
Patch by Marten Svanfeldt
Reviewers: t.p.northover, pbarrio, eastig, SjoerdMeijer, javed.absar, fhahn
Reviewed By: fhahn
Subscribers: fhahn, aemerson, javed.absar, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D41348
llvm-svn: 321164
This patch resubmits the SVE ZIP1/ZIP2 patch series consisting of
of r320992, r320986, r320973, and r320970 by reverting
https://reviews.llvm.org/rL321024.
The issue that caused r321024 has been addressed in https://reviews.llvm.org/rL321158,
so this patch-series should be safe to resubmit.
llvm-svn: 321163
Implement the 'Current Cache Size' register that has been introduced
as part of the Armv8.3 architecture. I originally missed this, and
(hopefully) should be the final patch for assembler support.
Differential Revision: https://reviews.llvm.org/D41396
llvm-svn: 321155
The gather instruction will implicitly sign extend to the pointer width, we don't need to further extend it. This can prevent unnecessary splitting in some cases.
There's still an issue that lowering on non-VLX can introduce another sign extend that doesn't get combined with shifts from a lowered sign_extend_inreg.
llvm-svn: 321152
Not sure how to test this cause I think the worst that happens is that we don't revisit the node a second time to look for additional combines. We used UpdateNodeOperands so the updating the DAG work was already done.
llvm-svn: 321148
This patch fixes a bug in the redundant compare elimination reported in https://reviews.llvm.org/rL320786 and re-enables the optimization.
The redundant compare elimination assumes that we can replace signed comparison with unsigned comparison for the equality check. But due to the difference in the sign extension behavior we cannot change the opcode if the comparison is against an immediate and the most significant bit of the immediate is one.
Differential Revision: https://reviews.llvm.org/D41385
llvm-svn: 321147
These optimizations depend on the ExplicitLocals pass to lower TEE
instructions, which is disabled in the ELF ABI, so disable them too.
llvm-svn: 321131
We try to prevent shuffle combining to value types that would stop the folding of masked operations, but by just returning early, we were failing to try different shuffle types.
The TODOs are all still relevant here to improve codegen but we're lacking test examples.
llvm-svn: 321085
As mentioned in D38318 and D40865, modern Intel processors prefer to combine multiple shuffles to a variable shuffle mask (PSHUFB/VPERMPS etc.) instead of having multiple stage 'fixed' shuffles which put more pressure on Port 5 (at the expense of extra shuffle mask loads).
This patch provides a FeatureFastVariableShuffle target flag for Haswell+ CPUs that prefers combining 2 or more fixed shuffles to a single variable shuffle (default is 3 shuffles).
The long term aim is to drive more of this from schedule data (probably via the MC) but we're not close to being ready for that yet.
Differential Revision: https://reviews.llvm.org/D41323
llvm-svn: 321074
Extension to D39729 which performed this for vXi16, with the same bit flipping to handle SMAX/SMIN/UMAX cases, vXi8 UMIN horizontal reductions can be performed.
This makes use of the fact that by performing a pair-wise i8 SHUFFLE/UMIN before PHMINPOSUW, we both get the UMIN of each pair but also zero-extend the upper bits ready for v8i16.
Differential Revision: https://reviews.llvm.org/D41294
llvm-svn: 321070
This instruction is encoded as zero, so we have handle that case when checking
for unimplemented opcodes when producing the encoding for an instruction.
llvm-svn: 321066
BWI supports shifting by word amounts. Even if VLX isn't support we can still widen to v32i16 and extract the lower half. For SKX its preferrable to not use 512-bit vector if we can.
llvm-svn: 321059
Previously, we were checking for MVTs with sizes betwen 8 and 64 which only includes i8, i16, i32, and i64 today. But I don't think we should assume that and should list the types that are legal for x86. I also don't think we need i64 since type legalization is guaranteed to split those up.
llvm-svn: 321058
My reading of the SDM says that all bits of the shift amount are used. If the value of the element is larger than the number of bits the result the shift result is zero. So I think we need to zero_extend here to avoid garbage in the upper bits.
In reality we lower any_extend as zero_extend so in most cases it would be hard to hit this.
llvm-svn: 321055
Note:
- X86ISelLowering: setLibcallName(SINCOS) was superfluous as
InitLibcalls() already does it.
- ARMISelLowering: Setting libcallnames for sincos/sincosf seemed
superfluous as in the darwin case it wouldn't be used while for all
other cases InitLibcalls already does it.
llvm-svn: 321036
Empty string should be equivalent to "generic" which doesn't allow NOPL. Force tests to use specificy 'pentiumpro' to guarantee NOPL.
Fixes PR35686
llvm-svn: 321026
This reverts changes r320992, r320986, r320973, and r320970.
r320970 by itself breaks the test case, and the rest depend on it.
Test case will land soon.
llvm-svn: 321024
LR was undefined entering outlined functions that contain calls. This made the
machine verifier unhappy when expensive checks were enabled. This fixes that.
llvm-svn: 321014
Summary: Patch [4/4] in a series to add parsing of predicates and properly parse SVE ZIP1/ZIP2 instructions. This patch further improves diagnostic messages for when the SVE feature is not specified.
Reviewers: rengolin, fhahn, olista01, echristo, efriedma
Reviewed By: fhahn
Subscribers: sdardis, aemerson, javed.absar, tschuett, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D40363
llvm-svn: 320992
Fix an off by one error in the bounds checking for 'dinsu' and update
the ranges in the test comments so that they are accurate.
This version has the correct commit message.
Reviewers: atanasyan
Differential Revision: https://reviews.llvm.org/D41183
llvm-svn: 320991
r319524 has made more G_MERGE_VALUES/G_UNMERGE_VALUES pairs legal than
are supported by the rest of the pipeline. Restrict that to only the
cases that we can currently handle: packing 32-bit values into 64-bit
ones, when we have hardware FP.
llvm-svn: 320980
Fix an off by one error in the bounds checking for 'dinsu' and update
the ranges in the test comments so that they are accurate.
Reviewers: atanasyan
https://reviews.llvm.org/D41183
llvm-svn: 320974
For Cylone, the instruction "movi.2d vD, #0" is executed incorrectly in some rare
circumstances. Work around the issue conservatively by avoiding the instruction entirely.
This patch changes CodeGen so that problematic instructions are never
generated, and the AsmParser so that an equivalent instruction is used (with a
warning).
llvm-svn: 320965
The block I moved things that need BWI and 512-bit or VLX is incorrectly qualified with just hasBWI || hasVLX. Here I've qualified it with hasBWI && (hasAVX512 || hasVLX) where the hasAVX512 will be replaced with allowing 512-bit vectors in an upcoming patch.
llvm-svn: 320957
Summary:
We had no tests for this and we couldn't do the optimization because of a bad use count check. We need to know how many non-undef pieces of the build vector were filled in and ensure our use count is equal to that. But on the shuffle combine version we need the use count to be 2.
The missing coverage was noticed during the review of D40335.
Reviewers: RKSimon, zvi, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41133
llvm-svn: 320950
Assuming we can safely adjust the broadcast index for the new type to keep it suitably aligned, then peek through BITCASTs when looking for the broadcast source.
Fixes PR32007
llvm-svn: 320933
In those cases, the pass thru operand of the methods isn't used. The calls to the scalar version were passing a MVT::i1 zero, which is an illegal type at the stage this code runs.
llvm-svn: 320928
Previously we promoted to v8i64, but we don't need to go all the way to 512-bits. If we have VLX we can use the 256-bit instruction. And even if we don't have VLX we can widen v8i32 to v16i32 and drop the upper half.
llvm-svn: 320926
We had a lot of separate 32 and 64 instructions that had the same scheduling data. This merges them into the same regular expression. This is pretty consistent with a lot of other instructions.
llvm-svn: 320924
r307148 added an assembly mnemonic spelling correction support and enabled it
on ARM. This enables that support on PowerPC as well.
Patch by Dmitry Venikov, thanks!
Differential Revision: https://reviews.llvm.org/D40552
llvm-svn: 320911
This is primarily to reduce stack usage, but ordering the use queue
according to the position in the code (earlier instructions visited
before later ones) reduces the number of unnecessary bottoms due to
visiting instructions out of order, e.g.
%reg1 = copy %reg0
%reg2 = copy %reg0
%reg3 = and %reg1, %reg2
Here, reg3 should be known to be same as reg0-2, but if reg3 is
evaluated after reg1 is updated, but before reg2 is updated, the two
inputs to the and will appear different, causing reg3 to become
bottom.
llvm-svn: 320866
This seemed to work due to a quirk in the X86 MC encoder that didn't emit a REX byte that the AND64ri8 implies when in 32-bit mode. This made the encoding the same as AND32ri8. I tried to add an assert to catch the dropped REX prefix that caught this.
llvm-svn: 320864
The target independent nodes will get legalized to the target specific nodes by their own legalization process. Someday I'd like to stop using a target specific for zero extends and truncates of legal types so the less places we reference the target specific opcode the better.
llvm-svn: 320863
When I wrote it I thought we were missing a potential optimization for KNL. But investigating further shows that for KNL we still do the optimal thing by widening to v4f32 and then using special isel patterns to widen again to zmm a register.
llvm-svn: 320862
Summary:
Currently we don't handle v32i1/v64i1 insert_vector_elt correctly as we fail to look at the number of elements closely and assume it can only be v16i1 or v8i1.
We also can't type legalize v64i1 insert_vector_elt correctly on KNL due to the type not being byte addressable as required by the legalizing through memory accesses path requires.
For the first issue, the patch now tries to pick a 512-bit register with the correct number of elements and promotes to that.
For the second issue, we now extend the vector to a byte addressable type, do the stores to memory, load the two halves, and then truncate the halves back to the original type. Technically since we changed the type, we may not need two loads, but actually checking that is more work and for the v64i1 case we do need them.
Reviewers: RKSimon, delena, spatel, zvi
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40942
llvm-svn: 320849
The asm parser wasn't preventing these from being accepted in 32-bit mode. Instructions that use a GR64 register are protected by the parser rejecting the register in 32-bit mode.
llvm-svn: 320846
This instruction doesn't access memory. It juse use a similar looking memory encoding. Don't require Intel syntax to put "qword ptr" in front of it.
llvm-svn: 320845
This has no effect due to a top level "let Predicates =" around the instructions. But its also not required because the GR64 usage in the instruction guarantees it can never match.
llvm-svn: 320843
There was a top level "let Predicates =" in the .td file that was overriding the Requires on each instruction.
I've added an assert to the code emitter to catch more cases like this. I'm sure this isn't the only place where the right predicates aren't being applied. This assert already found that we don't block btq/btsq/btrq in 32-bit mode.
llvm-svn: 320830
Work towards the unification of MIR and debug output by printing
`%stack.0` instead of `<fi#0>`, and `%fixed-stack.0` instead of
`<fi#-4>` (supposing there are 4 fixed stack objects).
Only debug syntax is affected.
Differential Revision: https://reviews.llvm.org/D41027
llvm-svn: 320827
c.slli/c.srli/c.srai allow a 5-bit shift in RV32C and a 6-bit shift in RV64C.
This patch adds uimmlog2xlennonzero to reflect this constraint as well as
tests.
Differential Revision: https://reviews.llvm.org/D41216
Patch by Shiva Chen.
llvm-svn: 320799
This patch switches the default for -riscv-no-aliases to false
and updates all affected MC and CodeGen tests. As recommended in
D41071, MC tests use the canonical instructions and the CodeGen
tests use the aliases.
Additionally, for the f and d instructions with rounding mode,
the tests for the aliased versions are moved and tightened such
that they can actually detect if alias emission is enabled.
(see D40902 for context)
Differential Revision: https://reviews.llvm.org/D41225
Patch by Mario Werner.
llvm-svn: 320797
This patch adds the necessary infrastructure to convert instructions that
take two register operands to those that take a register and immediate if
the necessary operand is produced by a load-immediate. Furthermore, it uses
this infrastructure to perform such conversions twice - first at MachineSSA
and then pre-emit.
There are a number of reasons we may end up with opportunities for this
transformation, including but not limited to:
- X-Form instructions chosen since the exact offset isn't available at ISEL time
- Atomic instructions with constant operands (we will add patterns for this
in the future)
- Tail duplication may duplicate code where one block contains this redundancy
- When emitting compare-free code in PPCDAGToDAGISel, we don't handle constant
comparands specially
Furthermore, this patch moves the initialization of PPCMIPeepholePass so that
it can be used for MIR tests.
llvm-svn: 320791
A couple places didn't use the same SDValue variables to connect everything all the way through.
I don't have a test case for a bug in insert into the lower bits of a non-zero, non-undef vector. Not sure the best way to create that. We don't create the case when lowering concat_vectors which is the main way to get insert_subvectors.
llvm-svn: 320790
The compare elimination peephole introduced in https://reviews.llvm.org/rL312514
causes a miscompile in AMDGPUInstrInfo.cpp which in turn causes some AMDGPU
test case failures in stage2 bootstrap testing. This miscompile didn't cause any
test case failures until https://reviews.llvm.org/rL320614, so it appeared as if
that patch caused these failures.
Disabling this transformation for now to bring the build bots back to green and
the author of the patch will investigate the miscompile.
llvm-svn: 320786
We have several instructions that were introduced in AVX512F that are only available in 512-bit form on KNL. We still make use of them for 128/256 by artificially widening and extracting during isel.
This commit separates these operations from the true 512-bit operations. This way we can qualify the normal 512-bit operations with needing 512-bit register support. And these special operations will get qualified with needing 512-bit registers OR VLX.
The 512-bit register qualification will be introduced in a future patch this just gets everything grouped to minimize deltas on that patch.
llvm-svn: 320782
Previously they were sort of interleaved in with XMM/YMM/ZMM action related code.
Trying to separate things so its easier to split 512-bit vectors later.
llvm-svn: 320781
Move it into the separate hasVLX block later in the constructor.
I'm trying to separate 128/256 and 512-bit related code so we can eventually qualify the hasAVX512 block with support for 512-bit vectors required by the prefer-vector-width feature support being talked about in D41096.
llvm-svn: 320779