Commit Graph

45623 Commits

Author SHA1 Message Date
Craig Topper f78b75fb59 [X86] Use CONCAT_VECTORS instead of INSERT_SUBVECTOR for padding v4i1/v2i1 vector to v8i1 pre-legalize.
The CONCAT_VECTORS will be lowered to INSERT_SUBVECTOR later. In the modified cases this seems to be enough to trick a later DAG combine into running in a different order than allows the ANDs to be removed.

I'll admit this is a bit of a hack that happens to work, but using CONCAT_VECTORS is more consistent with other legalization code anyway.

llvm-svn: 321611
2017-12-31 19:17:52 +00:00
Simon Pilgrim b000675374 [X86][AVX2] Combine extract(broadcast(scalar_value)) --> scalar_value
As it has a scalar source we don't treat it as a target shuffle so needs special handling.

llvm-svn: 321610
2017-12-31 18:59:30 +00:00
Simon Pilgrim f205ec716b [X86][SSE] Don't vectorize splat buildvector of binops (PR30780)
Don't combine buildvector(binop(),binop(),binop(),binop()) -> binop(buildvector(), buildvector()) if its a splat - keep the binop scalar and just splat the result to avoid large vector constants.

llvm-svn: 321607
2017-12-31 17:07:47 +00:00
Craig Topper f0f6eefb49 [X86] Add a DAG combine to widen (i4 (bitcast (v4i1))) before type legalization sees the i4 and changes to load/store.
Same for v2i1 and i2.

llvm-svn: 321602
2017-12-31 09:50:38 +00:00
Craig Topper 7f39623533 [X86] Add a DAG combine to fix (v4i1 (bitcast (i4))) before type legalization sees the i4 and changes to load/store.
Same for i2 and v2i1.

llvm-svn: 321601
2017-12-31 08:25:50 +00:00
Craig Topper 876ec0b558 [X86] Prevent combining (v8i1 (bitconvert (i8 load)))->(v8i1 load) if we don't have DQI.
We end up using an i8 load via an isel pattern from v8i1 anyway. This just makes it more explicit. This seems to improve codgen in some cases and I'd like to kill off some of the load patterns.

llvm-svn: 321598
2017-12-31 07:38:41 +00:00
Craig Topper 6159f5ebd8 [X86] Remove patterns for load/store of vXi with bitcasts to/from integer.
This is better handled by a DAG combine if its not already being done. No lit tests fail from the removal of these patterns.

llvm-svn: 321597
2017-12-31 07:38:36 +00:00
Craig Topper a362dee774 [X86] Remove AND32ri8 from pattern for v1i1 load.
I don't think anything would actually expect the other bits to be zero.

llvm-svn: 321596
2017-12-31 07:38:33 +00:00
Craig Topper 7ba1b76854 [X86] Fix a crash when returning a <1 x i1> value>
llvm-svn: 321595
2017-12-31 07:38:30 +00:00
Craig Topper 1d0e2e82bc [X86] Cleanup store splitting in LowerTruncatingStore
Use getMemBasePlusOffset and calculate proper pointer info and alignment for the second store.

llvm-svn: 321594
2017-12-31 07:38:26 +00:00
Benjamin Kramer c7fc81e659 Use phi ranges to simplify code. No functionality change intended.
llvm-svn: 321585
2017-12-30 15:27:33 +00:00
Hiroshi Inoue ca3cdd7f27 [PowerPC] fix a bug in TCO eligibility check
If the callee and caller use different calling convensions, we cannot apply TCO if the callee requires arguments on stack; e.g. C calling convention and Fast CC use the same registers for parameter passing, but the stack offset is not necessarily same.

This patch also recommit r319218 "[PowerPC] Allow tail calls of fastcc functions from C CallingConv functions." by @sfertile since the problem reported in r320106 should be fixed.

Differential Revision: https://reviews.llvm.org/D40893

llvm-svn: 321579
2017-12-30 08:09:04 +00:00
Craig Topper 97cc7b0377 [X86] Remove isel patterns for kshifts with types that don't support kshift natively.
We should only be creating natively supported kshifts now.

llvm-svn: 321577
2017-12-30 06:45:46 +00:00
Craig Topper c5fd31a802 [X86] Custom legalize vXi1 extract_subvector with KSHIFTR.
This allows us to remove some isel patterns.

This is mostly NFC, but we now use KSHIFTB instead of KSHIFTW with DQI.

llvm-svn: 321576
2017-12-30 06:45:43 +00:00
Simon Atanasyan d41feef40f [mips] Provide correct descriptions of asm constraints in the comments. NFC
llvm-svn: 321566
2017-12-29 19:18:30 +00:00
Simon Atanasyan 970f686faa [mips] Replace assert by an error message
Initially, if the `c` constraint applied to the wrong data type that
causes LLVM to assert. This commit replaces the assert by an error
message.

llvm-svn: 321565
2017-12-29 19:18:24 +00:00
Matt Arsenault e19bc2ee0f AMDGPU: Use unique PSVs for buffer resources
Also fixes using the wrong memory type for some
intrinsics when custom lowering them.

llvm-svn: 321557
2017-12-29 17:18:21 +00:00
Matt Arsenault d94b63d765 AMDGPU: Remove mayLoad/hasSideEffects from MIMG stores
Atomics still have hasSideEffects set on them because
of the mess that is the memory properties.

llvm-svn: 321556
2017-12-29 17:18:18 +00:00
Matt Arsenault 905f3518ba AMDGPU: Implement getTgtMemIntrinsic for images
Currently all images are lowered to have a single
image PseudoSourceValue. Image stores happen to have
overly strict mayLoad/mayStore/hasSideEffects flags
set on them, so this happens to work. When these
are fixed to be correct, the scheduler breaks
this because the identical PSVs are assumed to
be the same address. These need to be unique
to the image resource value.

llvm-svn: 321555
2017-12-29 17:18:14 +00:00
Simon Pilgrim c701596e86 [X86][SSE] Match PSHUFLW/PSHUFHW + PSHUFD vXi16 shuffle patterns (PR34686)
As noted in PR34686, we are relying on a PSHUFD+PSHUFLW+PSHUFHW shuffle chain for most general vXi16 unary shuffles.

This patch checks for simpler PSHUFLW+PSHUFD and PSHUFHW+PSHUFD cases beforehand, building on some existing code that just handled splat shuffles.

By doing so we also prevent premature use of PSHUFB shuffles which can be slower and require the creation/loading of constant shuffle masks.

We now have the 'fast-variable-shuffle' option for hardware that prefers combining 2 or more shuffles to VPSHUFB etc.

Differential Revision: https://reviews.llvm.org/D38318

llvm-svn: 321553
2017-12-29 14:41:50 +00:00
Dmitry Preobrazhensky 414e05383f [AMDGPU][MC] Incorrect parsing of flat/global atomic modifiers
See bug 35730: https://bugs.llvm.org/show_bug.cgi?id=35730

Differential Revision: https://reviews.llvm.org/D41598

Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 321552
2017-12-29 13:55:11 +00:00
Nemanja Ivanovic 4e1f5e0734 [PowerPC] Fix for PR35688 - handle out-of-range values for r+r to r+i conversion
Revision 320791 introduced a pass that transforms reg+reg instructions to
reg+imm if they're fed by "load immediate". However, it didn't
handle out-of-range shifts correctly as reported in PR35688.
This patch fixes that and therefore the PR.

Furthermore, there was undefined behaviour in the patch where the RHS of an
initialization expression was 32 bits and constant `1` was shifted left 32
bits. This was fixed by ensuring the RHS is 64 bits just like the LHS.

Differential Revision: https://reviews.llvm.org/D41369

llvm-svn: 321551
2017-12-29 12:22:27 +00:00
Andrew V. Tischenko 03ddad853d Fix incorrect operand sizes for some MMX instructions: punpcklwd, punpcklbw and punpckldq.
Differential Revision: https://reviews.llvm.org/D41595

llvm-svn: 321549
2017-12-29 08:31:01 +00:00
Craig Topper 55cf880900 [X86] When lowering extending loads from v2i1/v4i1, if we have VLX, use a narrower extend.
Previously we used an extend from v8i1 to v8i32/v8i64. Then extracted to the final width. But if we have VLX we should extract first. This way we don't end up with an overly large extend.

This allows us to use vcmpeq to make all ones for the sign extend when DQI isn't available. Otherwise we get a VPTERNLOG.

If we make v2i1/v4i1 legal like proposed in D41560, we could always do this and rely on the lowering of the extend to widen when necessary.

llvm-svn: 321538
2017-12-28 19:46:11 +00:00
Craig Topper c0b6cb1e47 [X86] Use ISD::CONCAT_VECTORS when splitting 256-bit loads in combineLoad.
llvm-svn: 321537
2017-12-28 19:46:06 +00:00
Craig Topper 4b311da3a4 [X86] Fix inconsistencies in different places where we split loads/stores.
-Use MinAlign instead of std::min.
-Use SelectionDAG::getMemBasePlusOffset.
-Apply offset to the pointer info for the second load/store created.

llvm-svn: 321536
2017-12-28 19:46:03 +00:00
Craig Topper 05cf1f338f [X86] Emit ISD::TRUNCATE instead of X86ISD::VTRUNC from LowerZERO_EXTEND_Mask/LowerSIGN_EXTEND_Mask.
The truncate will be lowered X86ISD::VTRUNC later.

llvm-svn: 321534
2017-12-28 19:45:58 +00:00
Craig Topper 88e26a99f8 [X86] Remove unnecessary patterns for sign extending vXi1 without VLX.
The custom lowering already widens the result type to 512-bits if VLX isn't supported.

llvm-svn: 321533
2017-12-28 19:45:55 +00:00
Reid Kleckner a2d119a059 [WinEH] Don't emit state stores or EH thunks for available_externally functions
The exception handler thunk needs to reference the LSDA of the parent
function, which won't be emitted if it's available_externally.

Fixes PR35736. ThinLTO ends up producing available_externally functions
that use _CxxFrameHandler3.

llvm-svn: 321532
2017-12-28 18:41:31 +00:00
Benjamin Kramer 3a13ed60ba Avoid int to string conversion in Twine or raw_ostream contexts.
Some output changes from uppercase hex to lowercase hex, no other functionality change intended.

llvm-svn: 321526
2017-12-28 16:58:54 +00:00
Simon Pilgrim 62411e4d4f [X86][SSE] Use PMADDWD for v4i32 multiplies with 17 or more leading zeros
If there are 17 or more leading zeros to the v4i32 elements, then we can use PMADD for the integer multiply when PMULLD is unavailable or slow.

The 17 bits need to be zero as the PMADDWD performs a v8i16 signed-mul-extend + pairwise-add - the upper 16 so we're adding a zero pair and the 17th bit so we don't incorrectly sign extend.

Differential Revision: https://reviews.llvm.org/D41484

llvm-svn: 321516
2017-12-28 10:05:49 +00:00
Craig Topper 55cfa89f20 [X86] Add CLWB to icelake.
Per Table 1-1 in October 2017 edition of Intel® Architecture Instruction Set Extensions and Future Features

llvm-svn: 321501
2017-12-27 22:04:04 +00:00
Craig Topper 72bbbeb2a7 [X86] Reimplement r321437 using custom lowering instead of as a DAG combine.
My original implementation ran as a DAG combine post type legalization, but it turns out we don't run that DAG combine step if type legalization didn't change anything. Attempts to make the combine run before type legalization as well hit other issues.

So just do it in LowerMUL where we can catch more cases.

llvm-svn: 321496
2017-12-27 19:09:40 +00:00
Matthew Simpson 9439f54902 [AArch64] Change order of candidate FMLS patterns
r319980 added new patterns to the machine combiner for transforming (fsub (fmul
x y) z) into (fmla (fneg z) x y). That is, fsub's where the first source
operand is an fmul are transformed. We previously only matched the case where
the second source operand of an fsub was an fmul, transforming (fsub z (fmul x
y)) into (fmls z x y). Now, if we have an fsub where both source operands are
fmuls, both of the above patterns are applicable.

However, the order in which we add the patterns to the list of candidates
determines the transformation that takes place, since only the first pattern
that matches will be used. This patch changes the order these two patterns are
added to the list of candidates such that we prefer the case where the second
source operand is an fmul (the fmls case), rather than the other one (the
fmla/fneg case). When both source operands are fmuls, this ordering results in
fewer instructions.

Differential Revision: https://reviews.llvm.org/D41587

llvm-svn: 321491
2017-12-27 15:25:01 +00:00
Benjamin Kramer 293f34301e [X86] Fix vmul combine for AVX1 targets.
v8i32 is legal von AVX1, but it doesn't have pmuludq for it.

llvm-svn: 321490
2017-12-27 13:31:50 +00:00
Craig Topper 428d87e559 [X86] Return SDValue(N, 0) instead of an SDValue() after a successful combine.
Returning SDValue() means nothing changed, SDValue(N,0) means there was a change but the worklist management was taken care of.

I don't know if this has a real effect other than making sure the combine counter in the DAG combiner gets updated, but it is the correct thing to do.

llvm-svn: 321463
2017-12-26 22:22:58 +00:00
Andrew V. Tischenko 1dd7856af5 It's a fix for Bug 35741 - can't use comments after x86 prefixes.
Differential Revision: https://reviews.llvm.org/D41579

llvm-svn: 321459
2017-12-26 18:29:52 +00:00
Craig Topper 162439dcdf [X86] Pass itins.rr/itins.rm through properly for some instructions.
llvm-svn: 321452
2017-12-26 05:43:05 +00:00
Craig Topper 9b800c692e [X86] Use SSE_INTMUL_ITINS_P for the AVX-512 MUL instructions to match their SSE/AVX counterparts.
llvm-svn: 321451
2017-12-26 05:43:04 +00:00
Craig Topper e0b9b5ef2b [X86] Fix typo in assert message.
llvm-svn: 321450
2017-12-26 05:43:02 +00:00
Craig Topper 705fef3ef3 [X86] Add a DAG combines to turn vXi64 muls into VPMULDQ/VPMULUDQ if the upper bits are all sign bits or zeros.
Normally we catch this during lowering, but vXi64 mul is considered legal when we have AVX512DQ.

This DAG combine allows us to avoid PMULLQ with AVX512DQ if we can prove its unnecessary. PMULLQ is 3 uops that take 4 cycles each. While pmuldq/pmuludq is only one 4 cycle uop.

llvm-svn: 321437
2017-12-25 06:47:10 +00:00
Craig Topper fabeb27e36 [X86] Make some helper methods static functions instead. NFC
llvm-svn: 321433
2017-12-25 00:54:53 +00:00
Craig Topper b2cd8485dc [X86] Use SelectionDAG::getFPExtendOrRound to simplify some code.
llvm-svn: 321432
2017-12-25 00:54:51 +00:00
Benjamin Kramer 802e6255b2 Make helpers static. No functionality change.
llvm-svn: 321425
2017-12-24 12:46:22 +00:00
Simon Pilgrim e0434fad16 [X86][X87] Mark pseudo memory fold instructions as load/sideeffects (PR21160, PR34080, PR34454).
Match regular x87 memory fold instructions with load/sideeffects tags, to prevent the schedulers from re-ordering them across the fnstcw/fldcw sequences for truncating stores while they are still pseudo during the stack conversion pass.

llvm-svn: 321424
2017-12-24 12:20:21 +00:00
Craig Topper 2d1d9a11c1 [X86] Fix (v2f64 (s/uint_to_fp (v2i1))) to avoid scalarization without AVX512DQ.
Previously we extended v2i1 to v2f64 and then tried to use cvtuqq2pd/cvtqq2pd, but that only works with avx512dq. So we ended up scalarizing it. Now we widen to v4i1 first and extend to v4i32.

llvm-svn: 321420
2017-12-24 06:51:36 +00:00
Craig Topper 2e308a9b2f [X86] Add assembler predicates to BITALG/VBMI2/VNNI features to be consistent with the other AVX512 ISAs.
llvm-svn: 321416
2017-12-24 02:05:17 +00:00
Craig Topper 62fd123731 [X86] Teach WidenMaskArithmetic to handle any constant buildvector on the RHS not just all zeros/ones.
llvm-svn: 321415
2017-12-24 01:03:31 +00:00
Craig Topper 06dad14797 [X86] Remove type restrictions from WidenMaskArithmetic.
This can help AVX-512 code where mask types are legal allowing us to remove extends and truncates to/from mask types.

llvm-svn: 321408
2017-12-23 18:53:05 +00:00
Craig Topper e79a7a4b2e [X86] In WidenMaskArithmetic, make sure we check the input type of a truncate on N1.
Later in the code we explicitly bypass the truncate so we should be checking its type to make sure that it's safe.

llvm-svn: 321407
2017-12-23 18:53:03 +00:00
Craig Topper dbbbb8532c [X86] Remove unneeded EVT variable. NFC
Immediately after it is created we check if its equal to another EVT. Then we inconsistently use one or the other variables in the code below.

Instead do the equality check directly on the getValueType result and remove the variable. Use the origina VT variable throughout the remaining code.

llvm-svn: 321406
2017-12-23 18:53:01 +00:00
Simon Pilgrim fc01bf86d5 [X86][X87] Wrap FpI_ pseudo to use PseudoI. NFCI.
llvm-svn: 321405
2017-12-23 17:25:59 +00:00
Simon Pilgrim 730cbc8f8e [X86] Add default InstrItinClass to PseudoI
This will be used to help tidyup existing pseudos that we've added scheduling info to.

llvm-svn: 321401
2017-12-23 10:47:21 +00:00
Craig Topper b8e7ab8231 [X86] Pass the right VT to the getZeroExtendInReg introduced in r321398
Apparently we don't have tests for this which I didn't realize before. I'll try to fix that but wanted to fix the obvious bug.

llvm-svn: 321399
2017-12-23 06:52:03 +00:00
Craig Topper ed4a87f6a8 [X86] Use SelectionDAG::getZeroExtendInReg instead of implementing it manually.
llvm-svn: 321398
2017-12-23 02:54:52 +00:00
Craig Topper d6a8f2e67d [SelectionDAG][X86] Don't use ->getValueType(0) after a call to getOperand to get the type of the operand.
getOperand returns an SDValue that contains the node and the result number. There is no guarantee that the result number if 0. By using the -> operator we are calling SDNode::getValueType rather than SDValue::getValueType. This requires supplying a result number and we shouldn't assume it was 0.

I don't have a test case. Just noticed while cleaning up some other code and saw that it occurred in other places.

llvm-svn: 321397
2017-12-23 02:54:50 +00:00
Sanjoy Das 26d11ca4b0 (Re-landing) Expose a TargetMachine::getTargetTransformInfo function
Re-land r321234.  It had to be reverted because it broke the shared
library build.  The shared library build broke because there was a
missing LLVMBuild dependency from lib/Passes (which calls
TargetMachine::getTargetIRAnalysis) to lib/Target.  As far as I can
tell, this problem was always there but was somehow masked
before (perhaps because TargetMachine::getTargetIRAnalysis was a
virtual function).

Original commit message:

This makes the TargetMachine interface a bit simpler.  We still need
the std::function in TargetIRAnalysis to avoid having to add a
dependency from Analysis to Target.

See discussion:
http://lists.llvm.org/pipermail/llvm-dev/2017-December/119749.html

I avoided adding all of the backend owners to this review since the
change is simple, but let me know if you feel differently about this.

Reviewers: echristo, MatzeB, hfinkel

Reviewed By: hfinkel

Subscribers: jholewinski, jfb, arsenm, dschuff, mcrosier, sdardis, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, llvm-commits

Differential Revision: https://reviews.llvm.org/D41464

llvm-svn: 321375
2017-12-22 18:21:59 +00:00
Dmitry Preobrazhensky 471adf7fdc [AMDGPU][MC] Corrected handling of negative expressions
See bug 35716: https://bugs.llvm.org/show_bug.cgi?id=35716

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D41488

llvm-svn: 321372
2017-12-22 18:03:35 +00:00
Craig Topper 576335f998 [X86] When lowering insert_vector_elt/extract_vector_elt of vXi1 with a non-constant index just use either a 128-bit type or the vXi8 type with the correct number of elements.
Despite what the comment said there isn't better codegen for 512-bit vectors. The 128/256/512 bit implementation jus stores to memory and loads an element. There's no advantage to doing that with a larger size. In fact in many cases it causes a stack realignment and generates worse code.

llvm-svn: 321369
2017-12-22 17:18:11 +00:00
Craig Topper eff84ed204 [X86] Improve the printing of address mode during isel matching.
Fix some inconsistent new line behavior and only print the FrameIndex when the address mode is a FrameIndexBase addressing mode.

llvm-svn: 321368
2017-12-22 17:18:10 +00:00
Dmitry Preobrazhensky c5b0c172f6 [AMDGPU][MC] Corrected parsing of optional operands for ds_swizzle_b32
See bug 35645: https://bugs.llvm.org/show_bug.cgi?id=35645

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D41186

llvm-svn: 321367
2017-12-22 17:13:28 +00:00
Dmitry Preobrazhensky 2713495318 [AMDGPU][MC] Added support of 256- and 512-bit tuples of ttmp registers
See bug 35561: https://bugs.llvm.org/show_bug.cgi?id=35561

This patch also affects implementation of SGPR and VGPR registers though changes are cosmetic.

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D41437

llvm-svn: 321359
2017-12-22 15:18:06 +00:00
Diana Picus 28a6d0e639 [ARM GlobalISel] Support G_INTTOPTR and G_PTRTOINT for s32
Mark conversions between pointers and 32-bit scalars as legal, map them
to the GPR and select to a simple COPY.

llvm-svn: 321356
2017-12-22 13:05:51 +00:00
Diana Picus 68773859c8 [ARM GlobalISel] Support pointer constants
Pointer constants are pretty rare, since we usually represent them as
integer constants and then cast to pointer. One notable exception is the
null pointer constant, which is represented directly as a G_CONSTANT 0
with pointer type. Mark it as legal and make sure it is selected like
any other integer constant.

llvm-svn: 321354
2017-12-22 11:09:18 +00:00
Craig Topper e2873a17d7 [X86] Add missing initialization for the HasPREFETCHWT1 subtarget variable.
llvm-svn: 321340
2017-12-22 03:53:14 +00:00
Craig Topper 67885f5d58 [X86] Enable PRFCHW feature on KNL/KNM and all CPUs inherited from Broadwell.
llvm-svn: 321336
2017-12-22 02:41:12 +00:00
Craig Topper e268598dd3 [X86] Add prefetchwt1 instruction and overhaul priorities and isel enabling for prefetch instructions.
Previously prefetch was only considered legal if sse was enabled, but it should be supported with 3dnow as well.

The prfchw flag now imply at least some form of prefetch without the write hint is available, either the sse or 3dnow version. This is true even if 3dnow and sse are explicitly disabled.

Similarly prefetchwt1 feature implies availability of prefetchw and the the prefetcht0/1/2/nta instructions. This way we can support _MM_HINT_ET0 using prefetchw and _MM_HINT_ET1 with prefetchwt1. And its assumed that if we have levels for the write hint we would have levels for the non-write hint, thus why we enable the sse prefetch instructions.

I believe this behavior is consistent with gcc. I've updated the prefetch.ll to test all of these combinations.

llvm-svn: 321335
2017-12-22 02:30:30 +00:00
Craig Topper 9befe89367 [X86] Use SIGN_EXTEND to implement ANY_EXTEND from vXi1.
llvm-svn: 321334
2017-12-22 02:30:26 +00:00
Eli Friedman 39ed9a602b [Inliner] Restrict soft-float inlining penalty.
The penalty is currently getting applied in a bunch of places where it
doesn't make sense, like bitcasts (which are free) and calls (which
were getting the call penalty applied twice). Instead, just apply the
penalty to binary operators and floating-point casts.

While I'm here, also fix getFPOpCost() to do the right thing in more
cases, so we don't have to dig into function attributes.

Differential Revision: https://reviews.llvm.org/D41522

llvm-svn: 321332
2017-12-22 02:08:08 +00:00
Craig Topper 8772228963 [X86] Use SIGN_EXTEND rather than ZERO_EXTEND for lowering extract_vector_elt from vXi1 with a non-const index.
We have a better range of instructions we can use if we can fill with the value i1 value rather than zeroing.

llvm-svn: 321315
2017-12-21 22:08:23 +00:00
Craig Topper 742ac98d01 [X86] When lowering truncates to vXi1, don't sign extend i16/i8 types to 512-bit if we have VLX.
This should only affect what we do for v8i16. Previously we went to v8i64, but if we have VLX we only need v8i32. This prevents an unnecessary zmm usage.

llvm-svn: 321303
2017-12-21 20:45:13 +00:00
Craig Topper 410a289b79 [X86] Promote v8i1 shuffles to v8i32 instead of v8i64 if we have VLX.
We should have equally good shuffle options for v8i32 with VLX. This was spotted during my attempts to remove 512-bit vectors from SKX.

We still use 512-bits for v16i1, v32i1, and v64i1. I'm less sure we can handle those well with narrower vectors. i32 and i64 element sizes get the best shuffle support.

llvm-svn: 321291
2017-12-21 18:44:06 +00:00
Simon Pilgrim 4de5bb093c [X86][SSE] Split large PAVGB/PAVGW vectors to legal widths
Patch to allow detectAVGPattern handle vectors larger than the legal size (128 SSE2, 256 AVX2, 512 AVX512BW), splitting the vectors accordingly.

Differential Revision: https://reviews.llvm.org/D41440

llvm-svn: 321288
2017-12-21 18:12:31 +00:00
Tony Jiang eba757e45c [PowerPC] Fix parest build failure in SPEC2017.
The build failure was caused by an assertion in pre-legalization DAGCombine:

Combining: t6: ppcf128 = uint_to_fp t5
... into: t20: f32 = PPCISD::FCFIDUS t19

which is clearly wrong since ppcf128 are definitely different type with f32 and
we cannot change the node value type when do DAGCombine. The fix is don't
handle ppc_fp128 or i1 conversions in PPCTargetLowering::combineFPToIntToFP and
leave it to downstream to legalize it and expand it to small legal types.

Differential Revision: https://reviews.llvm.org/D41411

llvm-svn: 321276
2017-12-21 15:42:50 +00:00
Sam Parker 98727bc261 [ARM] Armv8-R DFB instruction
Implement MC support for the Armv8-R 'Data Full Barrier' instruction.

Differential Revision: https://reviews.llvm.org/D41430

llvm-svn: 321256
2017-12-21 11:17:49 +00:00
Craig Topper 72c22f4366 [X86] Use PSHUFB for v32i16 shuffles before falling back to VPERMW/VPERMI2W.
PSHUFB has the ability to implicitly 0 elements which VPERMI2W can't do. So give a chance to use it first.

llvm-svn: 321251
2017-12-21 08:22:51 +00:00
Craig Topper 38af615b4c [X86] Use VPERMI2B for v16i8 shuffles if we have VBMI+VLX and would have otherwise used two PSHUFBs ORed together.
llvm-svn: 321249
2017-12-21 07:31:30 +00:00
Craig Topper 03b2bc4838 [X86] Use VPERMB/VPERMI2B for v32i8 shuffle lowering if VBMI and VLX are supported.
llvm-svn: 321248
2017-12-21 05:58:31 +00:00
Sanjoy Das 747d1114d6 Revert "Expose a TargetMachine::getTargetTransformInfo function"
This reverts commit r321234.  It breaks the -DBUILD_SHARED_LIBS=ON build.

llvm-svn: 321243
2017-12-21 02:34:39 +00:00
Sanjoy Das 0c3de350b4 Expose a TargetMachine::getTargetTransformInfo function
Summary:
This makes the TargetMachine interface a bit simpler.  We still need
the std::function in TargetIRAnalysis to avoid having to add a
dependency from Analysis to Target.

See discussion:
http://lists.llvm.org/pipermail/llvm-dev/2017-December/119749.html

I avoided adding all of the backend owners to this review since the
change is simple, but let me know if you feel differently about this.

Reviewers: echristo, MatzeB, hfinkel

Reviewed By: hfinkel

Subscribers: jholewinski, jfb, arsenm, dschuff, mcrosier, sdardis, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, llvm-commits

Differential Revision: https://reviews.llvm.org/D41464

llvm-svn: 321234
2017-12-21 01:06:58 +00:00
Reid Kleckner 82b117f07f Attempt to pacify 4.8.5 with makeArrayRef
llvm-svn: 321233
2017-12-21 00:28:34 +00:00
Joel Galenson 6f4e827e4c [ARM] Optimize {s,u}{add,sub}.with.overflow.
The AArch64 backend contains code to optimize {s,u}{add,sub}.with.overflow during SelectionDAG.  This commit ports that code to the ARM backend.

Differential revision: https://reviews.llvm.org/D35635

llvm-svn: 321224
2017-12-20 22:25:39 +00:00
Krzysztof Parzyszek 3f84c0f5d8 [Hexagon] Use ArrayRef member functions instead of custom ones
llvm-svn: 321221
2017-12-20 20:54:13 +00:00
Krzysztof Parzyszek e4ce92cabf [Hexagon] Allow construction of HVX vector predicates
Handle BUILD_VECTOR of boolean values.

llvm-svn: 321220
2017-12-20 20:49:43 +00:00
Krzysztof Parzyszek fb0fcacb9d [Hexagon] Legalize vector elements to i32 in buildVector32/64
llvm-svn: 321218
2017-12-20 20:33:49 +00:00
Yonghong Song 25bf825961 bpf: add support for objdump -print-imm-hex
Add support for 'objdump -print-imm-hex' for imm64, operand imm
and branch target. If user programs encode immediate values
as hex numbers, such an option will make it easy to correlate
asm insns with source code. This option also makes it easy
to correlate imm values with insn encoding.

There is one changed behavior in this patch. In old way, we
print the 64bit imm as u64:
  O << (uint64_t)Op.getImm();
and the new way is:
  O << formatImm(Op.getImm());

The formatImm is defined in llvm/MC/MCInstPrinter.h as
  format_object<int64_t> formatImm(int64_t Value)

So the new way to print 64bit imm is i64 type.
If a 64bit value has the highest bit set, the old way
will print the value as a positive value and the
new way will print as a negative value. The new way
is consistent with x86_64.
For the code (see the test program):
 ...
 if (a == 0xABCDABCDabcdabcdULL)
 ...
x86_64 objdump, with and without -print-imm-hex, looks like:
 48 b8 cd ab cd ab cd ab cd ab   movabsq $-6067004223159161907, %rax
 48 b8 cd ab cd ab cd ab cd ab   movabsq $-0x5432543254325433, %rax

Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 321215
2017-12-20 19:39:58 +00:00
Craig Topper 8ec5632521 [X86] Refactor DomainReassignment pass to make the Closure class not stores references to the main data structures of the pass itself
Multiple Closure objects can be created and stored for a single function. It's not a good idea to devote so many fields of it to storing pointers and references to global data structures of the pass. The closure class should only store the things needed to represent the closure itself.

This patch refactors many of the methods of Closure to belong to the pass object and to pass around a reference to the current Closure. The Closure class gains a few simple methods to add instructions and edges, and to return iterators to edges and instructions

Differential Revision: https://reviews.llvm.org/D41327

llvm-svn: 321213
2017-12-20 19:36:43 +00:00
Craig Topper 07820f2fe4 [X86] Remove zext from vXi32 to vXi64 on indices of gather/scatter instructions if we can prove the pre-extended value is positive.
Gather/scatter can implicitly sign extend from i32->i64 on indices. So if we know the sign bit of the input to a zext is 0 we can use the implicit extension.

llvm-svn: 321209
2017-12-20 19:25:33 +00:00
Stefan Pintilie 4241821848 [PowerPC] Added an assert to make sure that the MBBI iterator is valid.
The function createTailCallBranchInstr assumes that the iterator MBBI is valid.
However, only one use of MBBI is guarded in the function.
Fix this by adding an assert.

Differential Revision: https://reviews.llvm.org/D41358

llvm-svn: 321205
2017-12-20 19:07:44 +00:00
Matt Arsenault f7f59b5292 [AMDGPU, AsmParser] Enable the mnemonic spell corrector.
Patch by Dmitry Venikov

llvm-svn: 321202
2017-12-20 18:52:57 +00:00
Craig Topper bc92e00f2e [X86] Implement the fusing of MUL+SUBADD to FMSUBADD
This patch turns shuffles of fadd/fsub with fmul into fmsubadd.

Patch by Dmitry Venikov

Differential Revision: https://reviews.llvm.org/D40335

llvm-svn: 321200
2017-12-20 18:05:15 +00:00
Krzysztof Parzyszek 8f6b0c850a [Hexagon] Adjust the value type for BCvt in LowerFormalArguments
llvm-svn: 321177
2017-12-20 14:44:05 +00:00
Sander de Smalen a2f3bed642 Trivial commit to force LLVM to run TableGen for Mips target after
a change to the AsmMatcherEmitter, and should fix the buildbot
failure on llvm-clang-x86_64-expensive-checks-win.

The issue is also described here:
http://lists.llvm.org/pipermail/llvm-dev/2017-December/119617.html

llvm-svn: 321170
2017-12-20 12:45:40 +00:00
Diana Picus 75ce852abe [ARM GlobalISel] Fix assertion in RegBankSelect
We get an assertion in RegBankSelect for code along the lines of
my_32_bit_int = my_64_bit_int, which tends to translate into a 64-bit
load, followed by a G_TRUNC, followed by a 32-bit store. This appears in
a couple of places in the test-suite.

At the moment, the legalizer doesn't distinguish between integer and
floating point scalars, so a 64-bit load will be marked as legal for
targets with VFP, and so will the rest of the sequence, leading to a
slightly bizarre G_TRUNC reaching RegBankSelect.

Since the current support for 64-bit integers is rather immature, this
patch works around the issue by explicitly handling this case in
RegBankSelect and InstructionSelect. In the future, we may want to
revisit this decision and make sure 64-bit integer loads are narrowed
before reaching RegBankSelect.

llvm-svn: 321165
2017-12-20 11:27:10 +00:00
Florian Hahn c3aa6d83fd [ARM] Lower unsigned saturation to USAT
Summary:
Implement lower of unsigned saturation on an interval [0, k] where k + 1 is a power of two using USAT instruction in a similar way to how [~k, k] is lowered using SSAT on ARM models that supports it.

Patch by Marten Svanfeldt

Reviewers: t.p.northover, pbarrio, eastig, SjoerdMeijer, javed.absar, fhahn

Reviewed By: fhahn

Subscribers: fhahn, aemerson, javed.absar, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D41348

llvm-svn: 321164
2017-12-20 11:13:57 +00:00
Sander de Smalen cd6be960ce [AArch64][SVE] Re-submit patch series for ZIP1/ZIP2
This patch resubmits the SVE ZIP1/ZIP2 patch series consisting of
of r320992, r320986, r320973, and r320970 by reverting
https://reviews.llvm.org/rL321024.

The issue that caused r321024 has been addressed in https://reviews.llvm.org/rL321158,
so this patch-series should be safe to resubmit.

llvm-svn: 321163
2017-12-20 11:02:42 +00:00
Tim Northover 6db5d027c6 AArch64: fix one more place movi.2d could be created.
Somehow got missed out of r320965.

llvm-svn: 321162
2017-12-20 10:45:39 +00:00
Sander de Smalen c067c30d9e [AArch64] Asm: Fix parsing of register aliases that have a name starting with 'z'
Summary: This fixes an issue as identified by @rnk in https://reviews.llvm.org/rL321029.

Reviewers: rnk, fhahn, rengolin, efriedma, echristo, olista01

Reviewed By: rnk, fhahn

Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits, rnk

Differential Revision: https://reviews.llvm.org/D41382

llvm-svn: 321158
2017-12-20 09:45:45 +00:00
Sam Parker daed9de622 [AArch64] CCSIDR2 system register
Implement the 'Current Cache Size' register that has been introduced
as part of the Armv8.3 architecture. I originally missed this, and
(hopefully) should be the final patch for assembler support.

Differential Revision: https://reviews.llvm.org/D41396

llvm-svn: 321155
2017-12-20 08:56:41 +00:00
Craig Topper abed821c36 [X86] Optimize sign extends on index operand to gather/scatter to not sign extend past i32.
The gather instruction will implicitly sign extend to the pointer width, we don't need to further extend it. This can prevent unnecessary splitting in some cases.

There's still an issue that lowering on non-VLX can introduce another sign extend that doesn't get combined with shifts from a lowered sign_extend_inreg.

llvm-svn: 321152
2017-12-20 07:36:59 +00:00
Martin Storsjo 2778fd0b59 [AArch64] Implement stack probing for windows
Differential Revision: https://reviews.llvm.org/D41131

llvm-svn: 321150
2017-12-20 06:51:45 +00:00
Craig Topper 158d54d954 [X86] Add a missing return to combineGatherScatter after sucessful combine.
Not sure how to test this cause I think the worst that happens is that we don't revisit the node a second time to look for additional combines. We used UpdateNodeOperands so the updating the DAG work was already done.

llvm-svn: 321148
2017-12-20 06:44:50 +00:00
Hiroshi Inoue 11e571e0c6 [PowerPC] fix a bug in redundant compare elimination
This patch fixes a bug in the redundant compare elimination reported in https://reviews.llvm.org/rL320786 and re-enables the optimization.

The redundant compare elimination assumes that we can replace signed comparison with unsigned comparison for the equality check. But due to the difference in the sign extension behavior we cannot change the opcode if the comparison is against an immediate and the most significant bit of the immediate is one.

Differential Revision: https://reviews.llvm.org/D41385

llvm-svn: 321147
2017-12-20 05:18:19 +00:00
Craig Topper aee3acb9a8 [X86] Remove code from combineSext that looks for MVT::i1 after operation legalization which can never happen.
Type legalization guarantees this to be impossible since MVT::i1 isn't a legal type.

llvm-svn: 321132
2017-12-20 01:00:01 +00:00
Dan Gohman b5f53449e4 [WebAssembly] Disable tee_local optimizations when targeting the ELF ABI.
These optimizations depend on the ExplicitLocals pass to lower TEE
instructions, which is disabled in the ELF ABI, so disable them too.

llvm-svn: 321131
2017-12-20 00:59:28 +00:00
Craig Topper fbdb236a8a [X86] Add an assert to indicate that there is only once specific VT allowed at a certain point in LowerMULH.
Helps with code readability a little.

llvm-svn: 321118
2017-12-19 22:38:09 +00:00
Adrian Prantl 0e6694d111 Silence a bunch of implicit fallthrough warnings
llvm-svn: 321114
2017-12-19 22:05:25 +00:00
Mark Searles e4f067ebe2 [AMDGPU] Turn off MergeConsecutiveStores() before Instruction Selection for AMDGPU. Commit dbbb6c5fc3642987430866dffdf710df4f616ac7 turned on MergeConsecutiveStores() before Instruction Selection for all targets. Enough AMDGPU compiles go into an infinite loop ( MergeConsecutiveStores() merges two stores; LegalizeStoreOps() un-merges; MergeConsecutiveStores() re-merges, etc. ) to warrant turning it off until the issues can be addressed.
Differential Revision: https://reviews.llvm.org/D41377

llvm-svn: 321100
2017-12-19 19:26:23 +00:00
Simon Pilgrim d873b6f6ba [X86][AVX512] Attempt target shuffle combining to different types instead of early-out
We try to prevent shuffle combining to value types that would stop the folding of masked operations, but by just returning early, we were failing to try different shuffle types.

The TODOs are all still relevant here to improve codegen but we're lacking test examples.

llvm-svn: 321085
2017-12-19 16:54:07 +00:00
Simon Pilgrim 3feaf2a207 [X86] Fix uninitialized variable sanitizer warning from rL321074
llvm-svn: 321076
2017-12-19 14:34:35 +00:00
Simon Pilgrim fd5df639a3 [X86][SSE] Add cpu feature for aggressive combining to variable shuffles
As mentioned in D38318 and D40865, modern Intel processors prefer to combine multiple shuffles to a variable shuffle mask (PSHUFB/VPERMPS etc.) instead of having multiple stage 'fixed' shuffles which put more pressure on Port 5 (at the expense of extra shuffle mask loads).

This patch provides a FeatureFastVariableShuffle target flag for Haswell+ CPUs that prefers combining 2 or more fixed shuffles to a single variable shuffle (default is 3 shuffles).

The long term aim is to drive more of this from schedule data (probably via the MC) but we're not close to being ready for that yet.

Differential Revision: https://reviews.llvm.org/D41323

llvm-svn: 321074
2017-12-19 13:16:43 +00:00
David Green 110844d21c [ARM] Register the Thumb2SizeReducePass. NFC
Also adds a simple test case.

llvm-svn: 321072
2017-12-19 12:19:08 +00:00
Simon Pilgrim f6d4ab6daf [X86][SSE] Use (V)PHMINPOSUW for vXi8 SMAX/SMIN/UMAX/UMIN horizontal reductions (PR32841)
Extension to D39729 which performed this for vXi16, with the same bit flipping to handle SMAX/SMIN/UMAX cases, vXi8 UMIN horizontal reductions can be performed.

This makes use of the fact that by performing a pair-wise i8 SHUFFLE/UMIN before PHMINPOSUW, we both get the UMIN of each pair but also zero-extend the upper bits ready for v8i16.

Differential Revision: https://reviews.llvm.org/D41294

llvm-svn: 321070
2017-12-19 12:02:40 +00:00
Simon Dardis 1ade566c45 [mips] Handle the emission of microMIPSr6 sll instruction when used as a nop.
This instruction is encoded as zero, so we have handle that case when checking
for unimplemented opcodes when producing the encoding for an instruction.

llvm-svn: 321066
2017-12-19 11:16:22 +00:00
Craig Topper 13142b10d5 [X86] Don't extend v16i8 non-uniform shifts to v16i32 if we have BWI. Use v16i16 instead.
BWI supports shifting by word amounts. Even if VLX isn't support we can still widen to v32i16 and extract the lower half. For SKX its preferrable to not use 512-bit vector if we can.

llvm-svn: 321059
2017-12-19 06:59:10 +00:00
Craig Topper 6e3091c265 [X86] Use a specific list of MVTs in combineShiftRightArithmetic instead of iterating over every integer VT and checking their size.
Previously, we were checking for MVTs with sizes betwen 8 and 64 which only includes i8, i16, i32, and i64 today. But I don't think we should assume that and should list the types that are legal for x86. I also don't think we need i64 since type legalization is guaranteed to split those up.

llvm-svn: 321058
2017-12-19 06:29:00 +00:00
Craig Topper eb13a418e1 [X86] Remove unnecessary check for integer VT from combineShiftRightArithmetic.
I doubt there's any way to create a ashr for an FP type.

llvm-svn: 321057
2017-12-19 06:28:58 +00:00
Craig Topper da853a9c2f [X86] Remove dead code for turning vector shifts by large amounts into a zero vector.
Pretty sure these are handled by a target independent DAG combine that turns them into undef these days.

llvm-svn: 321056
2017-12-19 05:21:50 +00:00
Craig Topper ad3a554889 [X86] Use ZERO_EXTEND instead of ANY_EXTEND when extending the shift amount for a non-uniform shift.
My reading of the SDM says that all bits of the shift amount are used. If the value of the element is larger than the number of bits the result the shift result is zero. So I think we need to zero_extend here to avoid garbage in the upper bits.

In reality we lower any_extend as zero_extend so in most cases it would be hard to hit this.

llvm-svn: 321055
2017-12-19 04:52:04 +00:00
Reid Kleckner 73177e71bf Fix Wasm as a follow up to r321035 and the other one
This array is tightly coupled with the .def file. Someone should look
into fixing that.

llvm-svn: 321050
2017-12-19 01:08:53 +00:00
Craig Topper f19121d647 [X86] Don't use NOPL when the assembler is passed an empty CPU string.
This recommits the change from r321026. I have a fix for the lld test now.

llvm-svn: 321038
2017-12-18 23:31:43 +00:00
Matthias Braun a4852d2c19 X86/AArch64/ARM: Factor out common sincos_stret logic; NFCI
Note:
- X86ISelLowering: setLibcallName(SINCOS) was superfluous as
  InitLibcalls() already does it.
- ARMISelLowering: Setting libcallnames for sincos/sincosf seemed
  superfluous as in the darwin case it wouldn't be used while for all
  other cases InitLibcalls already does it.

llvm-svn: 321036
2017-12-18 23:19:42 +00:00
Matthias Braun a92cecfbda AArch64/X86: Factor out common bzero logic; NFC
llvm-svn: 321035
2017-12-18 23:14:28 +00:00
Krzysztof Parzyszek e704583f23 [Hexagon] Cache loads to select to avoid traversing mutating DAG
llvm-svn: 321034
2017-12-18 23:13:27 +00:00
Craig Topper 46832126e1 Revert part of r321026 "[X86] Don't use NOPL when the assembler is passed an empty CPU string." while I investigate how to fix an lld test failure.
Looks like lld also needs to pass a -mcpu in some of its tests

llvm-svn: 321033
2017-12-18 22:20:10 +00:00
Jessica Paquette 8565d3af84 [MachineOutliner][NFC] Gardening: use std::any_of instead of bool + loop
River Riddle suggested to use std::any_of instead of the bool + loop thing on
r320229. This commit does that.

llvm-svn: 321028
2017-12-18 21:44:52 +00:00
Craig Topper 4802d4e23e [X86] Don't use NOPL when the assembler is passed an empty CPU string. Update tests to force a CPU with NOPL
Empty string should be equivalent to "generic" which doesn't allow NOPL. Force tests to use specificy 'pentiumpro' to guarantee NOPL.

Fixes PR35686

llvm-svn: 321026
2017-12-18 21:37:27 +00:00
Reid Kleckner 37517a2ddd Revert "[AArch64][SVE] Asm" changes, they broke libjpeg_turbo
This reverts changes r320992, r320986, r320973, and r320970.

r320970 by itself breaks the test case, and the rest depend on it.

Test case will land soon.

llvm-svn: 321024
2017-12-18 20:58:25 +00:00
Dimitry Andric e4f5d01033 Fix more inconsistent line endings. NFC.
llvm-svn: 321016
2017-12-18 19:46:56 +00:00
Jessica Paquette 02c124d644 [MachineOutliner] Recommit r320229
LR was undefined entering outlined functions that contain calls. This made the
machine verifier unhappy when expensive checks were enabled. This fixes that.

llvm-svn: 321014
2017-12-18 19:33:21 +00:00
Benjamin Kramer efc7c88ea8 [PPC] Also disable the pre-emit version of reg+reg to reg+imm transformation.
This has the same issue as the early pass disabled in r321010.

llvm-svn: 321013
2017-12-18 19:21:56 +00:00
Benjamin Kramer f4cc67acb6 [PPC] Disable reg+reg to reg+imm transformation.
It creates invalid instructions. PR35688.

llvm-svn: 321010
2017-12-18 18:56:57 +00:00
Dimitry Andric e44dea9f6b Fix inconsistent line endings in HexagonVectorLoopCarriedReuse.cpp. NFC.
llvm-svn: 321009
2017-12-18 18:56:00 +00:00
Krzysztof Parzyszek eba8c0c61b [Hexagon] Higher versions of HVX imply presence of lower versions
The code in Hexagon_MC::completeHVXFeatures wasn't setting all HVX-
related features correctly.

llvm-svn: 321008
2017-12-18 18:51:57 +00:00
Dimitry Andric ca5b0f3f12 Fix inconsistent line endings in ARCDisassembler.cpp. NFC.
llvm-svn: 321006
2017-12-18 18:45:37 +00:00
Krzysztof Parzyszek 7259263790 i[Hexagon] ANY_EXTEND_VECTOR_INREG should be Custom, not Legal in r321004
llvm-svn: 321005
2017-12-18 18:41:52 +00:00
Krzysztof Parzyszek 6b589e593d [Hexagon] Generate HVX code for vector sign-, zero- and any-extends
Implement any-extend as zero-extend.

llvm-svn: 321004
2017-12-18 18:32:27 +00:00
Krzysztof Parzyszek 5439a70d97 [Hexagon] Prefer to widen HVX vectors instead of promoting
llvm-svn: 321002
2017-12-18 18:21:01 +00:00
Sander de Smalen 09f56a54d0 [AArch64][SVE] Asm: Improve diagnostics further when +sve is not specified
Summary: Patch [4/4] in a series to add parsing of predicates and properly parse SVE ZIP1/ZIP2 instructions. This patch further improves diagnostic messages for when the SVE feature is not specified.

Reviewers: rengolin, fhahn, olista01, echristo, efriedma

Reviewed By: fhahn

Subscribers: sdardis, aemerson, javed.absar, tschuett, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D40363

llvm-svn: 320992
2017-12-18 16:48:53 +00:00
Simon Dardis fd8c65e868 Reland "[mips] Fix the target specific instruction verifier"
Fix an off by one error in the bounds checking for 'dinsu' and update
the ranges in the test comments so that they are accurate.

This version has the correct commit message.

Reviewers: atanasyan

Differential Revision: https://reviews.llvm.org/D41183

llvm-svn: 320991
2017-12-18 15:56:40 +00:00
Sean Fertile 5fb624a3b8 [Memcpy Loop Lowering] Remove the fixed int8 lowering.
Switch over to the lowering that uses target supplied operand types.

Differential Revision: https://reviews.llvm.org/D41201

llvm-svn: 320989
2017-12-18 15:31:14 +00:00
Diana Picus 8ee540c01a [ARM GlobalISel] Fix G_(UN)MERGE_VALUES handling after r319524
r319524 has made more G_MERGE_VALUES/G_UNMERGE_VALUES pairs legal than
are supported by the rest of the pipeline. Restrict that to only the
cases that we can currently handle: packing 32-bit values into 64-bit
ones, when we have hardware FP.

llvm-svn: 320980
2017-12-18 13:22:28 +00:00
Simon Dardis f70af977af Revert "[mips] Fix the target specific instruction verifier"
This reverts commit r320974. The commit message lacked the Differential Revison: line.

llvm-svn: 320975
2017-12-18 12:30:34 +00:00
Simon Dardis c3c0d4590b [mips] Fix the target specific instruction verifier
Fix an off by one error in the bounds checking for 'dinsu' and update
the ranges in the test comments so that they are accurate.

Reviewers: atanasyan

https://reviews.llvm.org/D41183

llvm-svn: 320974
2017-12-18 12:24:17 +00:00
Sander de Smalen fce0c1c45b [AArch64][SVE] Asm: Add ZIP1/ZIP2 instructions (predicate/data vectors)
Summary: Patch [2/4] in a series to add parsing of predicates and properly parse SVE ZIP1/ZIP2 instructions.

Reviewers: rengolin, kristof.beyls, fhahn, mcrosier, evandro

Reviewed By: fhahn

Subscribers: aemerson, javed.absar, llvm-commits, tschuett

Differential Revision: https://reviews.llvm.org/D40361

llvm-svn: 320973
2017-12-18 11:29:59 +00:00
Sander de Smalen ce1e0975f4 [AArch64][SVE] Asm: Add SVE predicate register definitions and parsing support
Summary: Patch [1/4] in a series to add parsing of predicates and properly parse SVE ZIP1/ZIP2 instructions.

Reviewers: rengolin, kristof.beyls, fhahn, mcrosier, evandro, echristo, efriedma

Reviewed By: fhahn

Subscribers: aemerson, javed.absar, llvm-commits, tschuett

Differential Revision: https://reviews.llvm.org/D40360

llvm-svn: 320970
2017-12-18 11:26:34 +00:00
Tim Northover 9097a07e4e AArch64: work around how Cyclone handles "movi.2d vD, #0".
For Cylone, the instruction "movi.2d vD, #0" is executed incorrectly in some rare
circumstances. Work around the issue conservatively by avoiding the instruction entirely.

This patch changes CodeGen so that problematic instructions are never
generated, and the AsmParser so that an equivalent instruction is used (with a
warning).

llvm-svn: 320965
2017-12-18 10:36:00 +00:00
Craig Topper 8e2837cc6e [X86] Fix mistake that I made when splitting up the setOperationAction calls recently.
The block I moved things that need BWI and 512-bit or VLX is incorrectly qualified with just hasBWI || hasVLX. Here I've qualified it with hasBWI && (hasAVX512 || hasVLX) where the hasAVX512 will be replaced with allowing 512-bit vectors in an upcoming patch.

llvm-svn: 320957
2017-12-18 04:50:05 +00:00
Craig Topper fd8d040820 [X86] Make the code that creates fmaddsub from build_vector of extracts and inserts functional and add tests.
Summary:
We had no tests for this and we couldn't do the optimization because of a bad use count check. We need to know how many non-undef pieces of the build vector were filled in and ensure our use count is equal to that. But on the shuffle combine version we need the use count to be 2.

The missing coverage was noticed during the review of D40335.

Reviewers: RKSimon, zvi, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41133

llvm-svn: 320950
2017-12-17 18:23:45 +00:00
Simon Pilgrim b1b30286bf Remove superfluous break after a return. NFCI.
llvm-svn: 320941
2017-12-17 11:01:33 +00:00
Craig Topper 5992535e1a [X86DomainReassignment] Store legal domains in a std::bitset instead of using a SmallVector that really only ever has one element as a set.
llvm-svn: 320940
2017-12-17 03:16:23 +00:00
Craig Topper ee1e71e576 [X86] Use extract_vector_elt instead of X86ISD::VEXTRACT for isel of vXi1 extractions.
llvm-svn: 320937
2017-12-17 01:35:48 +00:00
Craig Topper c0c2d19e08 [X86] Canonicalize extract_vector_elt from vXi1 to always return MVT::i32.
This allows us to remove some isel patterns that allowed MVT::i8 result type.

llvm-svn: 320936
2017-12-17 01:35:47 +00:00
Craig Topper c609dc8f55 [X86] Don't create X86ISD::VEXTRACT nodes directly. Use EXTRACT_VECTOR_ELT and allow that to be legaized to VEXTRACT.
I think we can remove the VEXTRACT node completely and use a canonicalized EXTRACT_VECTOR_ELT instead. This is a first step.

llvm-svn: 320935
2017-12-17 01:35:44 +00:00
Simon Pilgrim 5c0c93ed4c Fix unused variable warning.
llvm-svn: 320934
2017-12-16 23:37:51 +00:00
Simon Pilgrim 4c9e8215e9 [X86][AVX] lowerVectorShuffleAsBroadcast - aggressively peek through BITCASTs
Assuming we can safely adjust the broadcast index for the new type to keep it suitably aligned, then peek through BITCASTs when looking for the broadcast source.

Fixes PR32007

llvm-svn: 320933
2017-12-16 23:32:18 +00:00
Simon Pilgrim 88c10bc969 [X86][AVX] Use extract128BitVector helper. NFCI.
llvm-svn: 320932
2017-12-16 23:09:57 +00:00
Simon Pilgrim f3b6da00f5 [X86][AVX] Fix failed broadcast fold
Strip excess BITCASTs from EXTRACT_SUBVECTOR input

llvm-svn: 320930
2017-12-16 22:57:17 +00:00
Craig Topper 849b717c86 [X86] Don't pass a zero input to the passthru operand of getVectorMaskingNode/getScalarMaskingNode when its going to emit an ISD::OR/ISD::AND. NFCI
In those cases, the pass thru operand of the methods isn't used. The calls to the scalar version were passing a MVT::i1 zero, which is an illegal type at the stage this code runs.

llvm-svn: 320928
2017-12-16 21:12:24 +00:00
Craig Topper 93253e189c [X86] Have getVectorMaskingNode return an ISD::AND for X86ISD::VPSHUFBITQMB instead of creating a select with one input being 0.
llvm-svn: 320927
2017-12-16 21:12:23 +00:00
Craig Topper 1260a4e826 [X86] When using vpopcntdq for ctpop of v8i16 vectors, only promote to v8i32.
Previously we promoted to v8i64, but we don't need to go all the way to 512-bits. If we have VLX we can use the 256-bit instruction. And even if we don't have VLX we can widen v8i32 to v16i32 and drop the upper half.

llvm-svn: 320926
2017-12-16 19:31:36 +00:00
Craig Topper a42a2ba221 [X86] Combine some more scheduler model entries using regular expressions.
We had a lot of separate 32 and 64 instructions that had the same scheduling data. This merges them into the same regular expression. This is pretty consistent with a lot of other instructions.

llvm-svn: 320924
2017-12-16 18:35:31 +00:00
Craig Topper 17a311831c [X86] Use instrs instead of instregex for gather/scatter instructions in the scheduler models. Combine into single InstrRW entries.
The reduces the number of scheduler groups in subtarget info.

llvm-svn: 320923
2017-12-16 18:35:29 +00:00
Craig Topper 1c7d07c601 [X86] Remove unneeded code for handling the old kunpck intrinsics.
llvm-svn: 320917
2017-12-16 06:58:30 +00:00
Hal Finkel e86a8b79b5 [PowerPC, AsmParser] Enable the mnemonic spell corrector
r307148 added an assembly mnemonic spelling correction support and enabled it
on ARM. This enables that support on PowerPC as well.

Patch by Dmitry Venikov, thanks!

Differential Revision: https://reviews.llvm.org/D40552

llvm-svn: 320911
2017-12-16 02:42:18 +00:00
Craig Topper c08960597c [X86] Add 128 and 256-bit VPOPCNTDQ instructions. Adjust some tablegen classes LZCNT/POPCNT.
I think when this instruction was first published it was only for a Knights CPU and thus VLX version was missing.

llvm-svn: 320910
2017-12-16 02:40:28 +00:00
Craig Topper 6b129fde5a [X86] Add back the assert from r320830 that was reverted in r320850
Hopefully r320864 has fixed the offending case that failed the assert.

llvm-svn: 320898
2017-12-16 00:33:16 +00:00
David Blaikie 2110924909 Fix WebAssembly backend for some LLVM API changes
llvm-svn: 320893
2017-12-15 23:52:06 +00:00
Matthias Braun f1caa2833f MachineFunction: Return reference from getFunction(); NFC
The Function can never be nullptr so we can return a reference.

llvm-svn: 320884
2017-12-15 22:22:58 +00:00
Krzysztof Parzyszek 058d3cec15 [Hexagon] Remove recursion in visitUsesOf, replace with use queue
This is primarily to reduce stack usage, but ordering the use queue
according to the position in the code (earlier instructions visited
before later ones) reduces the number of unnecessary bottoms due to
visiting instructions out of order, e.g.
  %reg1 = copy %reg0
  %reg2 = copy %reg0
  %reg3 = and %reg1, %reg2
Here, reg3 should be known to be same as reg0-2, but if reg3 is
evaluated after reg1 is updated, but before reg2 is updated, the two
inputs to the and will appear different, causing reg3 to become
bottom.

llvm-svn: 320866
2017-12-15 21:34:05 +00:00
Krzysztof Parzyszek 266d6f03a1 [Hexagon] Handle concat_vectors of all allowed HVX types
llvm-svn: 320865
2017-12-15 21:23:12 +00:00
Craig Topper 6b8ac481f1 [X86] Use AND32ri8 instead of AND64ri8 in Asan code in EmitCallAsanReport for 32-bit mode.
This seemed to work due to a quirk in the X86 MC encoder that didn't emit a REX byte that the AND64ri8 implies when in 32-bit mode. This made the encoding the same as AND32ri8. I tried to add an assert to catch the dropped REX prefix that caught this.

llvm-svn: 320864
2017-12-15 21:18:06 +00:00
Craig Topper 422ed23298 [X86] In LowerVectorCTPOP use ISD::ZERO_EXTEND/ISD::TRUNCATE instead of the target specific nodes.
The target independent nodes will get legalized to the target specific nodes by their own legalization process. Someday I'd like to stop using a target specific for zero extends and truncates of legal types so the less places we reference the target specific opcode the better.

llvm-svn: 320863
2017-12-15 21:18:05 +00:00
Craig Topper f08ab74ae3 [X86] Remove unnecessary TODO.
When I wrote it I thought we were missing a potential optimization for KNL. But investigating further shows that for KNL we still do the optimal thing by widening to v4f32 and then using special isel patterns to widen again to zmm a register.

llvm-svn: 320862
2017-12-15 20:57:18 +00:00
Krzysztof Parzyszek 29832a6c8b [Hexagon] Fix operand-swapping PatFrag for atomic stores
PatFrag now has the atomicity information stored as bit fields. They
need to be copied to the new PatFrag.

llvm-svn: 320855
2017-12-15 20:13:57 +00:00
Craig Topper df2521a638 [X86] Remove assert in X86MCCodeEmitter.cpp that was added in r320830.
It seems to be failing real code which is concerning, but we were silently getting away with it. I'll investigate further.

llvm-svn: 320850
2017-12-15 19:38:14 +00:00
Craig Topper 3fb8386685 [SelectionDAG][X86] Fix insert_vector_elt lowering for v32i1/v64i1 with non-constant index
Summary:
Currently we don't handle v32i1/v64i1 insert_vector_elt correctly as we fail to look at the number of elements closely and assume it can only be v16i1 or v8i1.

We also can't type legalize v64i1 insert_vector_elt correctly on KNL due to the type not being byte addressable as required by the legalizing through memory accesses path requires.

For the first issue, the patch now tries to pick a 512-bit register with the correct number of elements and promotes to that.

For the second issue, we now extend the vector to a byte addressable type, do the stores to memory, load the two halves, and then truncate the halves back to the original type. Technically since we changed the type, we may not need two loads, but actually checking that is more work and for the v64i1 case we do need them.

Reviewers: RKSimon, delena, spatel, zvi

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D40942

llvm-svn: 320849
2017-12-15 19:35:22 +00:00
Craig Topper 23c348850f [X86] Add 'Requires<[In64BitMode]>' to a bunch of instructions that only have memory and immediate operands.
The asm parser wasn't preventing these from being accepted in 32-bit mode. Instructions that use a GR64 register are protected by the parser rejecting the register in 32-bit mode.

llvm-svn: 320846
2017-12-15 19:01:51 +00:00
Craig Topper 914b1d524c [X86] Change BNDLDX to use anymem instead of i64mem for itsmemory operand.
This instruction doesn't access memory. It juse use a similar looking memory encoding. Don't require Intel syntax to put "qword ptr" in front of it.

llvm-svn: 320845
2017-12-15 19:01:50 +00:00
Craig Topper 446f3e2084 [X86] Remove the 'Requires' In64BitMode/Not64BitMode from the LWP instructions.
These aren't doing anything due to a top level "let Predicates =". I think the GR32/GR64 register class protects these anyway.

llvm-svn: 320844
2017-12-15 19:01:49 +00:00
Craig Topper 365e8aa5d5 [X86] Remove the 'Requires<[In64BitMode]>' from SHSTK instructions.
This has no effect due to a top level "let Predicates =" around the instructions. But its also not required because the GR64 usage in the instruction guarantees it can never match.

llvm-svn: 320843
2017-12-15 19:01:48 +00:00
Evandro Menezes a9134e86f1 [AArch64] Fix typo in the ASIMD instruction optimization pass
Fix typo in the representative instruction replacement.

Also, fix formatting and reword some comments.

llvm-svn: 320839
2017-12-15 18:26:54 +00:00
Andrew V. Tischenko 22f0742dda Fix for bug PR35549 - Repeated schedule comments.
Differential Revision: https://reviews.llvm.org/D40960

llvm-svn: 320837
2017-12-15 18:13:05 +00:00
Craig Topper a16395008c [X86] Fix XSAVE64 and similar instructions to not be allowed by the assembler in 32-bit mode.
There was a top level "let Predicates =" in the .td file that was overriding the Requires on each instruction.

I've added an assert to the code emitter to catch more cases like this. I'm sure this isn't the only place where the right predicates aren't being applied. This assert already found that we don't block btq/btsq/btrq in 32-bit mode.

llvm-svn: 320830
2017-12-15 17:22:58 +00:00
Francis Visoiu Mistrih 0b5bdceabf [CodeGen] Print stack object references as %(fixed-)stack.0 in both MIR and debug output
Work towards the unification of MIR and debug output by printing
`%stack.0` instead of `<fi#0>`, and `%fixed-stack.0` instead of
`<fi#-4>` (supposing there are 4 fixed stack objects).

Only debug syntax is affected.

Differential Revision: https://reviews.llvm.org/D41027

llvm-svn: 320827
2017-12-15 16:33:45 +00:00
Craig Topper ad9221d684 [X86] Widen (v2i32 (fp_to_uint v2f64)) to (v8i32 (fp_to_uint v8f64)) during legalization if we have AVX512F, but not VLX. NFC
Previously we widened it using isel patterns.

llvm-svn: 320824
2017-12-15 16:22:20 +00:00
Nemanja Ivanovic 6ab32dea12 Fix the second build bot break introduced by r320791.
llvm-svn: 320811
2017-12-15 14:17:45 +00:00
Nemanja Ivanovic 1794cdc481 Fix code causing fallthrough warnings in the PPC back end.
llvm-svn: 320806
2017-12-15 11:47:48 +00:00
Alex Bradbury 0ad4c265d7 [RISCV] Change shift amount operand of RVC shift instructions to uimmlog2xlennonzero
c.slli/c.srli/c.srai allow a 5-bit shift in RV32C and a 6-bit shift in RV64C.
This patch adds uimmlog2xlennonzero to reflect this constraint as well as
tests.

Differential Revision: https://reviews.llvm.org/D41216

Patch by Shiva Chen.

llvm-svn: 320799
2017-12-15 10:20:51 +00:00
Nemanja Ivanovic 74ecf59cc0 Fix the build bot break introduced by r320791.
llvm-svn: 320798
2017-12-15 09:51:34 +00:00
Alex Bradbury 59136ffab1 [RISCV] Enable emission of alias instructions by default
This patch switches the default for -riscv-no-aliases to false
and updates all affected MC and CodeGen tests. As recommended in
D41071, MC tests use the canonical instructions and the CodeGen
tests use the aliases.

Additionally, for the f and d instructions with rounding mode,
the tests for the aliased versions are moved and tightened such
that they can actually detect if alias emission is enabled.
(see D40902 for context)

Differential Revision: https://reviews.llvm.org/D41225

Patch by Mario Werner.

llvm-svn: 320797
2017-12-15 09:47:01 +00:00
Nemanja Ivanovic 6995e5dae7 [PowerPC] Convert r+r instructions to r+i (pre and post RA)
This patch adds the necessary infrastructure to convert instructions that
take two register operands to those that take a register and immediate if
the necessary operand is produced by a load-immediate. Furthermore, it uses
this infrastructure to perform such conversions twice - first at MachineSSA
and then pre-emit.

There are a number of reasons we may end up with opportunities for this
transformation, including but not limited to:
- X-Form instructions chosen since the exact offset isn't available at ISEL time
- Atomic instructions with constant operands (we will add patterns for this
  in the future)
- Tail duplication may duplicate code where one block contains this redundancy
- When emitting compare-free code in PPCDAGToDAGISel, we don't handle constant
  comparands specially

Furthermore, this patch moves the initialization of PPCMIPeepholePass so that
it can be used for MIR tests.

llvm-svn: 320791
2017-12-15 07:27:53 +00:00
Craig Topper 7cfacbf6ea [X86] Fix a couple bugs in my recent changes to vXi1 insert_subvector lowering.
A couple places didn't use the same SDValue variables to connect everything all the way through.

I don't have a test case for a bug in insert into the lower bits of a non-zero, non-undef vector. Not sure the best way to create that. We don't create the case when lowering concat_vectors which is the main way to get insert_subvectors.

llvm-svn: 320790
2017-12-15 07:16:41 +00:00
Yaxun Liu c41e2f6e7b Recommit CodeGen: Fix assertion in machine inst sheduler due to llvm.dbg.value
The regression on ppc64 was not due to this commit.

llvm-svn: 320788
2017-12-15 03:56:57 +00:00
Nemanja Ivanovic 0d47d32caa Disabling r312514 as it causes miscompiles that show up on bootstrap
The compare elimination peephole introduced in https://reviews.llvm.org/rL312514
causes a miscompile in AMDGPUInstrInfo.cpp which in turn causes some AMDGPU
test case failures in stage2 bootstrap testing. This miscompile didn't cause any
test case failures until https://reviews.llvm.org/rL320614, so it appeared as if
that patch caused these failures.
Disabling this transformation for now to bring the build bots back to green and
the author of the patch will investigate the miscompile.

llvm-svn: 320786
2017-12-15 01:38:03 +00:00
Craig Topper 1a1e6d6cf6 [X86] Add a TODO about v8i1 CONCAT_VECTORS.
llvm-svn: 320784
2017-12-15 01:03:46 +00:00
Craig Topper 5ebf3ac9c2 [X86] Further rearrange the setOperationAction calls to separate the ones that require 512-bit registers OR VLX into separate sections. NFCI
We have several instructions that were introduced in AVX512F that are only available in 512-bit form on KNL. We still make use of them for 128/256 by artificially widening and extracting during isel.

This commit separates these operations from the true 512-bit operations. This way we can qualify the normal 512-bit operations with needing 512-bit register support. And these special operations will get qualified with needing 512-bit registers OR VLX.

The 512-bit register qualification will be introduced in a future patch this just gets everything grouped to minimize deltas on that patch.

llvm-svn: 320782
2017-12-15 01:03:43 +00:00
Craig Topper 07a28f777e [X86] Group setOperationActions related to vXi1 masks together. NFCI
Previously they were sort of interleaved in with XMM/YMM/ZMM action related code.

Trying to separate things so its easier to split 512-bit vectors later.

llvm-svn: 320781
2017-12-15 01:03:42 +00:00
Craig Topper b89bc20a64 [X86] Make ISD::INSERT_SUBVECTOR v8i1 legal with AVX512F because we should be custom lowering inserting v1i1 into v8i1 under this.
I don't have a test case at the moment. Just noticed while auditing things.

llvm-svn: 320780
2017-12-15 01:03:40 +00:00
Craig Topper 212070486d [X86] Move some of the hasVLX qualified code out of the main hasAVX512 block in the X86ISelLowering constructor. NFCI
Move it into the separate hasVLX block later in the constructor.

I'm trying to separate 128/256 and 512-bit related code so we can eventually qualify the hasAVX512 block with support for 512-bit vectors required by the prefer-vector-width feature support being talked about in D41096.

llvm-svn: 320779
2017-12-15 01:03:38 +00:00