Commit Graph

45459 Commits

Author SHA1 Message Date
Simon Atanasyan d41feef40f [mips] Provide correct descriptions of asm constraints in the comments. NFC
llvm-svn: 321566
2017-12-29 19:18:30 +00:00
Simon Atanasyan 970f686faa [mips] Replace assert by an error message
Initially, if the `c` constraint applied to the wrong data type that
causes LLVM to assert. This commit replaces the assert by an error
message.

llvm-svn: 321565
2017-12-29 19:18:24 +00:00
Matt Arsenault e19bc2ee0f AMDGPU: Use unique PSVs for buffer resources
Also fixes using the wrong memory type for some
intrinsics when custom lowering them.

llvm-svn: 321557
2017-12-29 17:18:21 +00:00
Matt Arsenault d94b63d765 AMDGPU: Remove mayLoad/hasSideEffects from MIMG stores
Atomics still have hasSideEffects set on them because
of the mess that is the memory properties.

llvm-svn: 321556
2017-12-29 17:18:18 +00:00
Matt Arsenault 905f3518ba AMDGPU: Implement getTgtMemIntrinsic for images
Currently all images are lowered to have a single
image PseudoSourceValue. Image stores happen to have
overly strict mayLoad/mayStore/hasSideEffects flags
set on them, so this happens to work. When these
are fixed to be correct, the scheduler breaks
this because the identical PSVs are assumed to
be the same address. These need to be unique
to the image resource value.

llvm-svn: 321555
2017-12-29 17:18:14 +00:00
Simon Pilgrim c701596e86 [X86][SSE] Match PSHUFLW/PSHUFHW + PSHUFD vXi16 shuffle patterns (PR34686)
As noted in PR34686, we are relying on a PSHUFD+PSHUFLW+PSHUFHW shuffle chain for most general vXi16 unary shuffles.

This patch checks for simpler PSHUFLW+PSHUFD and PSHUFHW+PSHUFD cases beforehand, building on some existing code that just handled splat shuffles.

By doing so we also prevent premature use of PSHUFB shuffles which can be slower and require the creation/loading of constant shuffle masks.

We now have the 'fast-variable-shuffle' option for hardware that prefers combining 2 or more shuffles to VPSHUFB etc.

Differential Revision: https://reviews.llvm.org/D38318

llvm-svn: 321553
2017-12-29 14:41:50 +00:00
Dmitry Preobrazhensky 414e05383f [AMDGPU][MC] Incorrect parsing of flat/global atomic modifiers
See bug 35730: https://bugs.llvm.org/show_bug.cgi?id=35730

Differential Revision: https://reviews.llvm.org/D41598

Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 321552
2017-12-29 13:55:11 +00:00
Nemanja Ivanovic 4e1f5e0734 [PowerPC] Fix for PR35688 - handle out-of-range values for r+r to r+i conversion
Revision 320791 introduced a pass that transforms reg+reg instructions to
reg+imm if they're fed by "load immediate". However, it didn't
handle out-of-range shifts correctly as reported in PR35688.
This patch fixes that and therefore the PR.

Furthermore, there was undefined behaviour in the patch where the RHS of an
initialization expression was 32 bits and constant `1` was shifted left 32
bits. This was fixed by ensuring the RHS is 64 bits just like the LHS.

Differential Revision: https://reviews.llvm.org/D41369

llvm-svn: 321551
2017-12-29 12:22:27 +00:00
Andrew V. Tischenko 03ddad853d Fix incorrect operand sizes for some MMX instructions: punpcklwd, punpcklbw and punpckldq.
Differential Revision: https://reviews.llvm.org/D41595

llvm-svn: 321549
2017-12-29 08:31:01 +00:00
Craig Topper 55cf880900 [X86] When lowering extending loads from v2i1/v4i1, if we have VLX, use a narrower extend.
Previously we used an extend from v8i1 to v8i32/v8i64. Then extracted to the final width. But if we have VLX we should extract first. This way we don't end up with an overly large extend.

This allows us to use vcmpeq to make all ones for the sign extend when DQI isn't available. Otherwise we get a VPTERNLOG.

If we make v2i1/v4i1 legal like proposed in D41560, we could always do this and rely on the lowering of the extend to widen when necessary.

llvm-svn: 321538
2017-12-28 19:46:11 +00:00
Craig Topper c0b6cb1e47 [X86] Use ISD::CONCAT_VECTORS when splitting 256-bit loads in combineLoad.
llvm-svn: 321537
2017-12-28 19:46:06 +00:00
Craig Topper 4b311da3a4 [X86] Fix inconsistencies in different places where we split loads/stores.
-Use MinAlign instead of std::min.
-Use SelectionDAG::getMemBasePlusOffset.
-Apply offset to the pointer info for the second load/store created.

llvm-svn: 321536
2017-12-28 19:46:03 +00:00
Craig Topper 05cf1f338f [X86] Emit ISD::TRUNCATE instead of X86ISD::VTRUNC from LowerZERO_EXTEND_Mask/LowerSIGN_EXTEND_Mask.
The truncate will be lowered X86ISD::VTRUNC later.

llvm-svn: 321534
2017-12-28 19:45:58 +00:00
Craig Topper 88e26a99f8 [X86] Remove unnecessary patterns for sign extending vXi1 without VLX.
The custom lowering already widens the result type to 512-bits if VLX isn't supported.

llvm-svn: 321533
2017-12-28 19:45:55 +00:00
Reid Kleckner a2d119a059 [WinEH] Don't emit state stores or EH thunks for available_externally functions
The exception handler thunk needs to reference the LSDA of the parent
function, which won't be emitted if it's available_externally.

Fixes PR35736. ThinLTO ends up producing available_externally functions
that use _CxxFrameHandler3.

llvm-svn: 321532
2017-12-28 18:41:31 +00:00
Benjamin Kramer 3a13ed60ba Avoid int to string conversion in Twine or raw_ostream contexts.
Some output changes from uppercase hex to lowercase hex, no other functionality change intended.

llvm-svn: 321526
2017-12-28 16:58:54 +00:00
Simon Pilgrim 62411e4d4f [X86][SSE] Use PMADDWD for v4i32 multiplies with 17 or more leading zeros
If there are 17 or more leading zeros to the v4i32 elements, then we can use PMADD for the integer multiply when PMULLD is unavailable or slow.

The 17 bits need to be zero as the PMADDWD performs a v8i16 signed-mul-extend + pairwise-add - the upper 16 so we're adding a zero pair and the 17th bit so we don't incorrectly sign extend.

Differential Revision: https://reviews.llvm.org/D41484

llvm-svn: 321516
2017-12-28 10:05:49 +00:00
Craig Topper 55cfa89f20 [X86] Add CLWB to icelake.
Per Table 1-1 in October 2017 edition of Intel® Architecture Instruction Set Extensions and Future Features

llvm-svn: 321501
2017-12-27 22:04:04 +00:00
Craig Topper 72bbbeb2a7 [X86] Reimplement r321437 using custom lowering instead of as a DAG combine.
My original implementation ran as a DAG combine post type legalization, but it turns out we don't run that DAG combine step if type legalization didn't change anything. Attempts to make the combine run before type legalization as well hit other issues.

So just do it in LowerMUL where we can catch more cases.

llvm-svn: 321496
2017-12-27 19:09:40 +00:00
Matthew Simpson 9439f54902 [AArch64] Change order of candidate FMLS patterns
r319980 added new patterns to the machine combiner for transforming (fsub (fmul
x y) z) into (fmla (fneg z) x y). That is, fsub's where the first source
operand is an fmul are transformed. We previously only matched the case where
the second source operand of an fsub was an fmul, transforming (fsub z (fmul x
y)) into (fmls z x y). Now, if we have an fsub where both source operands are
fmuls, both of the above patterns are applicable.

However, the order in which we add the patterns to the list of candidates
determines the transformation that takes place, since only the first pattern
that matches will be used. This patch changes the order these two patterns are
added to the list of candidates such that we prefer the case where the second
source operand is an fmul (the fmls case), rather than the other one (the
fmla/fneg case). When both source operands are fmuls, this ordering results in
fewer instructions.

Differential Revision: https://reviews.llvm.org/D41587

llvm-svn: 321491
2017-12-27 15:25:01 +00:00
Benjamin Kramer 293f34301e [X86] Fix vmul combine for AVX1 targets.
v8i32 is legal von AVX1, but it doesn't have pmuludq for it.

llvm-svn: 321490
2017-12-27 13:31:50 +00:00
Craig Topper 428d87e559 [X86] Return SDValue(N, 0) instead of an SDValue() after a successful combine.
Returning SDValue() means nothing changed, SDValue(N,0) means there was a change but the worklist management was taken care of.

I don't know if this has a real effect other than making sure the combine counter in the DAG combiner gets updated, but it is the correct thing to do.

llvm-svn: 321463
2017-12-26 22:22:58 +00:00
Andrew V. Tischenko 1dd7856af5 It's a fix for Bug 35741 - can't use comments after x86 prefixes.
Differential Revision: https://reviews.llvm.org/D41579

llvm-svn: 321459
2017-12-26 18:29:52 +00:00
Craig Topper 162439dcdf [X86] Pass itins.rr/itins.rm through properly for some instructions.
llvm-svn: 321452
2017-12-26 05:43:05 +00:00
Craig Topper 9b800c692e [X86] Use SSE_INTMUL_ITINS_P for the AVX-512 MUL instructions to match their SSE/AVX counterparts.
llvm-svn: 321451
2017-12-26 05:43:04 +00:00
Craig Topper e0b9b5ef2b [X86] Fix typo in assert message.
llvm-svn: 321450
2017-12-26 05:43:02 +00:00
Craig Topper 705fef3ef3 [X86] Add a DAG combines to turn vXi64 muls into VPMULDQ/VPMULUDQ if the upper bits are all sign bits or zeros.
Normally we catch this during lowering, but vXi64 mul is considered legal when we have AVX512DQ.

This DAG combine allows us to avoid PMULLQ with AVX512DQ if we can prove its unnecessary. PMULLQ is 3 uops that take 4 cycles each. While pmuldq/pmuludq is only one 4 cycle uop.

llvm-svn: 321437
2017-12-25 06:47:10 +00:00
Craig Topper fabeb27e36 [X86] Make some helper methods static functions instead. NFC
llvm-svn: 321433
2017-12-25 00:54:53 +00:00
Craig Topper b2cd8485dc [X86] Use SelectionDAG::getFPExtendOrRound to simplify some code.
llvm-svn: 321432
2017-12-25 00:54:51 +00:00
Benjamin Kramer 802e6255b2 Make helpers static. No functionality change.
llvm-svn: 321425
2017-12-24 12:46:22 +00:00
Simon Pilgrim e0434fad16 [X86][X87] Mark pseudo memory fold instructions as load/sideeffects (PR21160, PR34080, PR34454).
Match regular x87 memory fold instructions with load/sideeffects tags, to prevent the schedulers from re-ordering them across the fnstcw/fldcw sequences for truncating stores while they are still pseudo during the stack conversion pass.

llvm-svn: 321424
2017-12-24 12:20:21 +00:00
Craig Topper 2d1d9a11c1 [X86] Fix (v2f64 (s/uint_to_fp (v2i1))) to avoid scalarization without AVX512DQ.
Previously we extended v2i1 to v2f64 and then tried to use cvtuqq2pd/cvtqq2pd, but that only works with avx512dq. So we ended up scalarizing it. Now we widen to v4i1 first and extend to v4i32.

llvm-svn: 321420
2017-12-24 06:51:36 +00:00
Craig Topper 2e308a9b2f [X86] Add assembler predicates to BITALG/VBMI2/VNNI features to be consistent with the other AVX512 ISAs.
llvm-svn: 321416
2017-12-24 02:05:17 +00:00
Craig Topper 62fd123731 [X86] Teach WidenMaskArithmetic to handle any constant buildvector on the RHS not just all zeros/ones.
llvm-svn: 321415
2017-12-24 01:03:31 +00:00
Craig Topper 06dad14797 [X86] Remove type restrictions from WidenMaskArithmetic.
This can help AVX-512 code where mask types are legal allowing us to remove extends and truncates to/from mask types.

llvm-svn: 321408
2017-12-23 18:53:05 +00:00
Craig Topper e79a7a4b2e [X86] In WidenMaskArithmetic, make sure we check the input type of a truncate on N1.
Later in the code we explicitly bypass the truncate so we should be checking its type to make sure that it's safe.

llvm-svn: 321407
2017-12-23 18:53:03 +00:00
Craig Topper dbbbb8532c [X86] Remove unneeded EVT variable. NFC
Immediately after it is created we check if its equal to another EVT. Then we inconsistently use one or the other variables in the code below.

Instead do the equality check directly on the getValueType result and remove the variable. Use the origina VT variable throughout the remaining code.

llvm-svn: 321406
2017-12-23 18:53:01 +00:00
Simon Pilgrim fc01bf86d5 [X86][X87] Wrap FpI_ pseudo to use PseudoI. NFCI.
llvm-svn: 321405
2017-12-23 17:25:59 +00:00
Simon Pilgrim 730cbc8f8e [X86] Add default InstrItinClass to PseudoI
This will be used to help tidyup existing pseudos that we've added scheduling info to.

llvm-svn: 321401
2017-12-23 10:47:21 +00:00
Craig Topper b8e7ab8231 [X86] Pass the right VT to the getZeroExtendInReg introduced in r321398
Apparently we don't have tests for this which I didn't realize before. I'll try to fix that but wanted to fix the obvious bug.

llvm-svn: 321399
2017-12-23 06:52:03 +00:00
Craig Topper ed4a87f6a8 [X86] Use SelectionDAG::getZeroExtendInReg instead of implementing it manually.
llvm-svn: 321398
2017-12-23 02:54:52 +00:00
Craig Topper d6a8f2e67d [SelectionDAG][X86] Don't use ->getValueType(0) after a call to getOperand to get the type of the operand.
getOperand returns an SDValue that contains the node and the result number. There is no guarantee that the result number if 0. By using the -> operator we are calling SDNode::getValueType rather than SDValue::getValueType. This requires supplying a result number and we shouldn't assume it was 0.

I don't have a test case. Just noticed while cleaning up some other code and saw that it occurred in other places.

llvm-svn: 321397
2017-12-23 02:54:50 +00:00
Sanjoy Das 26d11ca4b0 (Re-landing) Expose a TargetMachine::getTargetTransformInfo function
Re-land r321234.  It had to be reverted because it broke the shared
library build.  The shared library build broke because there was a
missing LLVMBuild dependency from lib/Passes (which calls
TargetMachine::getTargetIRAnalysis) to lib/Target.  As far as I can
tell, this problem was always there but was somehow masked
before (perhaps because TargetMachine::getTargetIRAnalysis was a
virtual function).

Original commit message:

This makes the TargetMachine interface a bit simpler.  We still need
the std::function in TargetIRAnalysis to avoid having to add a
dependency from Analysis to Target.

See discussion:
http://lists.llvm.org/pipermail/llvm-dev/2017-December/119749.html

I avoided adding all of the backend owners to this review since the
change is simple, but let me know if you feel differently about this.

Reviewers: echristo, MatzeB, hfinkel

Reviewed By: hfinkel

Subscribers: jholewinski, jfb, arsenm, dschuff, mcrosier, sdardis, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, llvm-commits

Differential Revision: https://reviews.llvm.org/D41464

llvm-svn: 321375
2017-12-22 18:21:59 +00:00
Dmitry Preobrazhensky 471adf7fdc [AMDGPU][MC] Corrected handling of negative expressions
See bug 35716: https://bugs.llvm.org/show_bug.cgi?id=35716

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D41488

llvm-svn: 321372
2017-12-22 18:03:35 +00:00
Craig Topper 576335f998 [X86] When lowering insert_vector_elt/extract_vector_elt of vXi1 with a non-constant index just use either a 128-bit type or the vXi8 type with the correct number of elements.
Despite what the comment said there isn't better codegen for 512-bit vectors. The 128/256/512 bit implementation jus stores to memory and loads an element. There's no advantage to doing that with a larger size. In fact in many cases it causes a stack realignment and generates worse code.

llvm-svn: 321369
2017-12-22 17:18:11 +00:00
Craig Topper eff84ed204 [X86] Improve the printing of address mode during isel matching.
Fix some inconsistent new line behavior and only print the FrameIndex when the address mode is a FrameIndexBase addressing mode.

llvm-svn: 321368
2017-12-22 17:18:10 +00:00
Dmitry Preobrazhensky c5b0c172f6 [AMDGPU][MC] Corrected parsing of optional operands for ds_swizzle_b32
See bug 35645: https://bugs.llvm.org/show_bug.cgi?id=35645

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D41186

llvm-svn: 321367
2017-12-22 17:13:28 +00:00
Dmitry Preobrazhensky 2713495318 [AMDGPU][MC] Added support of 256- and 512-bit tuples of ttmp registers
See bug 35561: https://bugs.llvm.org/show_bug.cgi?id=35561

This patch also affects implementation of SGPR and VGPR registers though changes are cosmetic.

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D41437

llvm-svn: 321359
2017-12-22 15:18:06 +00:00
Diana Picus 28a6d0e639 [ARM GlobalISel] Support G_INTTOPTR and G_PTRTOINT for s32
Mark conversions between pointers and 32-bit scalars as legal, map them
to the GPR and select to a simple COPY.

llvm-svn: 321356
2017-12-22 13:05:51 +00:00
Diana Picus 68773859c8 [ARM GlobalISel] Support pointer constants
Pointer constants are pretty rare, since we usually represent them as
integer constants and then cast to pointer. One notable exception is the
null pointer constant, which is represented directly as a G_CONSTANT 0
with pointer type. Mark it as legal and make sure it is selected like
any other integer constant.

llvm-svn: 321354
2017-12-22 11:09:18 +00:00