llvm-project

Commit Graph

Author	SHA1	Message	Date
Mircea Trofin	499a66ecc0	Silence warning in assert introduced in rL349973. Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56030 llvm-svn: 349975	2018-12-21 23:02:10 +00:00
Mircea Trofin	b53eeb6f4c	[llvm] API for encoding/decoding DWARF discriminators. Summary: Added a pair of APIs for encoding/decoding the 3 components of a DWARF discriminator described in http://lists.llvm.org/pipermail/llvm-dev/2016-October/106532.html: the base discriminator, the duplication factor (useful in profile-guided optimization) and the copy index (used to identify copies of code in cases like loop unrolling) The encoding packs 3 unsigned values in 32 bits. This CL addresses 2 issues: - communicates overflow back to the user - supports encoding all 3 components together. Current APIs assume a sequencing of events. For example, creating a new discriminator based on an existing one by changing the base discriminator was not supported. Reviewers: davidxl, danielcdh, wmi, dblaikie Reviewed By: dblaikie Subscribers: zzheng, dmgreen, aprantl, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D55681 llvm-svn: 349973	2018-12-21 22:48:50 +00:00
Craig Topper	e58cd9cbc6	[X86] Add isel patterns to match BMI/TBMI instructions when lowering has turned the root nodes into one of the flag producing binops. This fixes the patterns that have or/and as a root. 'and' is handled differently since thy usually have a CMP wrapped around them. I had to look for uses of the CF flag because all these nodes have non-standard CF flag behavior. A real or/xor would always clear CF. In practice we shouldn't be using the CF flag from these nodes as far as I know. Differential Revision: https://reviews.llvm.org/D55813 llvm-svn: 349962	2018-12-21 21:42:43 +00:00
Craig Topper	62ec024d3b	[X86] Don't allow optimizeCompareInstr to replace a CMP with BEXTR if the sign flag is used. The BEXTR instruction documents the SF bit as undefined. The TBM BEXTR instruction has the same issue, but I'm not sure how to test it. With the control being an immediate we can determine the sign bit is 0 or the BEXTR would have been removed. Fixes PR40060 Differential Revision: https://reviews.llvm.org/D55807 llvm-svn: 349956	2018-12-21 21:16:26 +00:00
Sanjay Patel	80187b8a17	[x86] add movddup specialization for build vector lowering (PR37502) This is admittedly a narrow fix for the problem: https://bugs.llvm.org/show_bug.cgi?id=37502 ...but as the XOP restriction shows, it's a maze to get this right. In the motivating example, note that we have movddup before SSE4.1 and again with AVX2. That's because insertps isn't available pre-SSE41 and vbroadcast is (more generally) available with AVX2 (and the splat is reduced to movddup via isel pattern). Differential Revision: https://reviews.llvm.org/D55898 llvm-svn: 349937	2018-12-21 18:48:32 +00:00
Simon Pilgrim	57733507fe	[X86] Always use the version of computeKnownBits that returns a value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the old KnownBits output paramater version. llvm-svn: 349902	2018-12-21 14:25:14 +00:00
Simon Pilgrim	5d403f6bf8	[X86][SSE] Auto upgrade PADDS/PSUBS intrinsics to SADD_SAT/SSUB_SAT generic intrinsics (llvm) This auto upgrades the signed SSE saturated math intrinsics to SADD_SAT/SSUB_SAT generic intrinsics. Clang counterpart: https://reviews.llvm.org/D55890 Differential Revision: https://reviews.llvm.org/D55894 llvm-svn: 349892	2018-12-21 09:04:14 +00:00
Craig Topper	54f1a7be13	[X86] Refactor hasNoCarryFlagUses and hasNoSignFlagUses in X86ISelDAGToDAG.cpp to tranlate opcode to condition code using the helpers in X86InstrInfo.cpp. This shortens the switches in X86ISelDAGToDAG.cpp to only need to check condition code instead of a list of opcodes. This also fixes a bug where the memory forms of SETcc were missing from hasNoCarryFlagUses. llvm-svn: 349868	2018-12-21 01:14:25 +00:00
Craig Topper	e0cff10289	[X86] Add memory forms of some SETCC instructions to hasNoCarryFlagUses. Found while working on another patch llvm-svn: 349867	2018-12-21 01:14:23 +00:00
Simon Pilgrim	2a25360ae3	[X86] Auto upgrade XOP/AVX512 rotation intrinsics to generic funnel shift intrinsics (llvm) This emits FSHL/FSHR generic intrinsics for the XOP VPROT and AVX512 VPROL/VPROR rotation intrinsics. Clang counterpart: https://reviews.llvm.org/D55937 Differential Revision: https://reviews.llvm.org/D55938 llvm-svn: 349795	2018-12-20 19:01:07 +00:00
Simon Pilgrim	09c081176a	[X86][AVX512] Don't custom lower v16i8 rotations. As discussed on D55747, the expansion to (wider) shifts is better on all AVX512 cases, not just BWI. llvm-svn: 349763	2018-12-20 14:38:35 +00:00
Clement Courbet	36a3480385	Re-land r349731 "[CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads. Update PPC ir following GEP->bitcat to bitcat->GEP->bitcat change. llvm-svn: 349747	2018-12-20 13:01:04 +00:00
Clement Courbet	e22cf4d7cb	Revert r349731 "[CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads." Forgot to update PowerPC tests for the GEP->bitcast change. llvm-svn: 349733	2018-12-20 09:58:33 +00:00
Clement Courbet	1bb6e1b0f2	[CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads. Summary: This allows expanding {7,11,13,14,15,21,22,23,25,26,27,28,29,30,31}-byte memcmp in just two loads on X86. These were previously calling memcmp. Reviewers: spatel, gchatelet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55263 llvm-svn: 349731	2018-12-20 09:13:47 +00:00
Craig Topper	9ca2f5605e	[X86] Disable custom widening of signed/unsigned add/sub saturation intrinsics under -x86-experimental-vector-widening-legalization. Generic legalization should take care of this. llvm-svn: 349714	2018-12-20 01:32:06 +00:00
Craig Topper	217b3b20d8	[X86] Remove TLI variable from ReplaceNodeResults. NFC We're already in X86TargetLowering which is a derived class of TargetLowering. We can just call methods directly. llvm-svn: 349695	2018-12-19 23:13:03 +00:00
Craig Topper	d16da2b479	[X86] Remove a bunch of 'else' after returns in reduceVMULWidth. NFC This reduces indentation and makes it obvious this function always returns something. llvm-svn: 349671	2018-12-19 19:39:34 +00:00
Craig Topper	84a00bd98a	[X86] Don't match TESTrr from (cmp (and X, Y), 0) during isel. Defer to post processing The (cmp (and X, Y) 0) pattern is greedy and ends up forming a TESTrr and consuming the and when it might be better to use one of the BMI/TBM like BLSR or BLSI. This patch moves removes the pattern from isel and adds a post processing check to combine TESTrr+ANDrr into just a TESTrr. With this patch we are able to select the BMI/TBM instructions, but we'll also emit a TESTrr when the result is compared to 0. In many cases the peephole pass will be able to use optimizeCompareInstr to remove the TEST, but its probably not perfect. Differential Revision: https://reviews.llvm.org/D55870 llvm-svn: 349661	2018-12-19 18:49:13 +00:00
Craig Topper	291470347a	[X86] Fix assert fails in pass X86AvoidSFBPass Fixes https://bugs.llvm.org/show_bug.cgi?id=38743 The function removeRedundantBlockingStores is supposed to remove any blocking stores contained in each other in lockingStoresDispSizeMap. But it currently looks only at the previous one, which will miss some cases that result in assert. This patch refine the function to check all previous layouts until find the uncontained one. So all redundant stores will be removed. Patch by Pengfei Wang Differential Revision: https://reviews.llvm.org/D55642 llvm-svn: 349660	2018-12-19 18:45:57 +00:00
Craig Topper	18a9d545e1	[X86] Add BSR to isUseDefConvertible. We already had BSF here as part of __builtin_ffs improvements and I was just wondering yesterday whether we should have BSR there. This addresses one issue from PR40090. llvm-svn: 349531	2018-12-18 20:03:54 +00:00
Craig Topper	8434ef7d1e	[X86] Don't use SplitOpsAndApply to create ISD::UADDSAT/ISD::USUBSAT nodes. Let type legalization and op legalization deal with it. Now that we've switched to target independent nodes we can rely on generic infrastructure to do the legalization for us. llvm-svn: 349526	2018-12-18 19:29:08 +00:00
Nikita Popov	f6058ff140	[X86] Use SADDSAT/SSUBSAT instead of ADDS/SUBS Migrate the X86 backend from X86ISD opcodes ADDS and SUBS to generic ISD opcodes SADDSAT and SSUBSAT. This also improves scodegen for @llvm.sadd.sat() and @llvm.ssub.sat() intrinsics. This is a followup to D55787 and part of PR40056. Differential Revision: https://reviews.llvm.org/D55833 llvm-svn: 349520	2018-12-18 18:28:22 +00:00
Craig Topper	20a6db5a84	[X86] Create PSUBUS from (add (umax X, C), -C) InstCombine seems to canonicalize or PSUB patter into a max with the cosntant and an add with an inverse of the constant. This patch recognizes this pattern and turns it into PSUBUS. Future work could improve undef element handling. Fixes some of PR40053 Differential Revision: https://reviews.llvm.org/D55780 llvm-svn: 349519	2018-12-18 18:26:25 +00:00
Simon Pilgrim	1411917431	[X86][SSE] Don't use 'sign bit select' vXi8 ROTL lowering for constant rotation amounts Noticed by @spatel on D55747 - we get much better codegen if we use the regular shift expansion. llvm-svn: 349510	2018-12-18 17:31:11 +00:00
Simon Pilgrim	e9effe9744	[X86][SSE] Don't use 'sign bit select' vXi8 ROTL lowering for splat rotation amounts Noticed by @spatel on D55747 - we get much better codegen if we use the regular shift expansion. llvm-svn: 349500	2018-12-18 16:02:23 +00:00
Nikita Popov	665ab08178	[X86] Use UADDSAT/USUBSAT instead of ADDUS/SUBUS Replace the X86ISD opcodes ADDUS and SUBUS with generic ISD opcodes UADDSAT and USUBSAT. As a side-effect, this also makes codegen for the @llvm.uadd.sat and @llvm.usub.sat intrinsics reasonable. This only replaces use in the X86 backend, and does not move any of the ADDUS/SUBUS X86 specific combines into generic codegen. Differential Revision: https://reviews.llvm.org/D55787 llvm-svn: 349481	2018-12-18 13:23:03 +00:00
Simon Pilgrim	8488a44c34	[X86][SSE] Move VSRAI sign extend in reg fold into SimplifyDemandedBits (VSRAI (VSHLI X, C1), C1) --> X iff NumSignBits(X) > C1 This works better as part of SimplifyDemandedBits than part of the general combine. llvm-svn: 349462	2018-12-18 09:11:34 +00:00
Simon Pilgrim	26c630f416	[X86][SSE] Replace (VSRLI (VSRAI X, Y), 31) -> (VSRLI X, 31) fold. This fold was incredibly specific - replace with a SimplifyDemandedBits fold to remove a VSRAI if only the original sign bit is demanded (its guaranteed to stay the same). Test change is merely a rescheduling. llvm-svn: 349459	2018-12-18 08:55:47 +00:00
Craig Topper	1ff7356f96	[X86] Const correct some helper functions X86InstrInfo.cpp. NFC llvm-svn: 349440	2018-12-18 04:58:05 +00:00
Simon Pilgrim	7e2975a44c	[X86][SSE] Improve immediate vector shift known bits handling. Convert VSRAI to VSRLI is the sign bit is known zero and improve KnownBits output for all shift instruction. Fixes the poor codegen comments in D55768. llvm-svn: 349407	2018-12-17 22:09:47 +00:00
Craig Topper	8c9d772991	[X86] Add T1MSKC and TZMSK to isDefConvertible used by optimizeCompareInstr. These seem to have been missed when the other TBM instructions were added. llvm-svn: 349404	2018-12-17 21:50:06 +00:00
Simon Pilgrim	6b5e0b7b2b	[X86][SSE] Split SimplifyDemandedBitsForTargetNode X86ISD::VSRLI/VSRAI handling. First step towards adding more capable combines to fix comments in D55768. llvm-svn: 349400	2018-12-17 21:36:17 +00:00
Craig Topper	728cbc0378	Convert (CMP (srl/shl X, C), 0) to (CMP (and X, C'), 0) when only the zero flag is used. This allows a TEST to be used and can be combined with any AND that may already exist as an input to the shift. This was already done in EmitTest, but was easily tricked by multiple uses because the setcc might be used by multiple instructions. Once the SETCC and users are legalized then we can look for the shift to be used by a single CMP, but the CMP itself can have multiple users. This appears to fix the case in PR39968. llvm-svn: 349385	2018-12-17 20:02:16 +00:00
Simon Pilgrim	9274f17a5e	[TargetLowering] Add DemandedElts mask to SimplifyDemandedBits (PR40000) This is an initial patch to add the necessary support for a DemandedElts argument to SimplifyDemandedBits, more closely matching computeKnownBits and to help improve vector codegen. I've added only a small amount of the changes necessary to get at least one test to update - a lot more can be done but I'd like to add these methodically with proper test coverage, at the same time the hope is to slowly move some/all of SimplifyDemandedVectorElts into SimplifyDemandedBits as well. Differential Revision: https://reviews.llvm.org/D55768 llvm-svn: 349374	2018-12-17 18:43:43 +00:00
Tim Northover	256a16d031	FastIsel: take care to update iterators when removing instructions. We keep a few iterators into the basic block we're selecting while performing FastISel. Usually this is fine, but occasionally code wants to remove already-emitted instructions. When this happens we have to be careful to update those iterators so they're not pointint at dangling memory. llvm-svn: 349365	2018-12-17 17:25:53 +00:00
Craig Topper	fa4907d671	[X86] Fix bad operand lookup for cmov introduced in r349315 The CC is operand 2 not operand 3. llvm-svn: 349330	2018-12-17 06:40:35 +00:00
Simon Pilgrim	d0c9e43b1c	[X86] Pull out constant splat rotation detection. We had 3 different approaches - consistently use getTargetConstantBitsFromNode and allow undef elts. llvm-svn: 349319	2018-12-16 19:46:04 +00:00
Craig Topper	10f8892837	[X86] Remove truncation handling from EmitTest. Replace it with a DAG combine. I'd like to try to move a lot of the flag matching out of EmitTest and push it to isel or isel preprocessing. This is a step towards that. The test-shrink-bug.ll changie is an improvement because we are no longer interfering with test shrink handling in isel. The pr34137.ll change is a regression, but the IR came from -O0 and was not reduced by InstCombine. So it contains a lot of redundancies like duplicate loads that made it combine poorly. llvm-svn: 349315	2018-12-16 18:35:55 +00:00
Sanjay Patel	13ac2f15b0	[x86] increment/decrement constant vector with min/max in vsetcc lowering (PR39859) This is part of fixing PR39859: https://bugs.llvm.org/show_bug.cgi?id=39859 We have a crippled vector ISA, so we have to invert a typical fold and create min/max here. As discussed in the bug report, we can probably do better by using saturating subtract when it's available, but we should have this improvement for the min/max patterns regardless. Alive proofs: https://rise4fun.com/Alive/zsf https://rise4fun.com/Alive/Qrl Differential Revision: https://reviews.llvm.org/D55515 llvm-svn: 349304	2018-12-16 15:05:48 +00:00
Simon Pilgrim	52c982406e	[X86] Begin cleaning up combineOr -> SHLD/SHRD. NFCI. In preparation for converting to funnel shifts. llvm-svn: 349286	2018-12-15 21:11:49 +00:00
Simon Pilgrim	ef7b5949e5	[X86] Lower to SHLD/SHRD on slow machines for optsize Use consistent rules for when to lower to SHLD/SHRD for slow machines - fixes a weird issue where funnel shift gets expanded but then X86ISelLowering's combineOr sees the optsize and combines to SHLD/SHRD, but now with the modulo amount guard...... llvm-svn: 349285	2018-12-15 19:43:44 +00:00
Craig Topper	1fc257d97f	[X86] Rename hasNoSignedComparisonUses to hasNoSignFlagUses. Add the instruction that only modify the O flag to the waiver list. The only caller of this turns CMP with 0 into TEST. CMP with 0 and TEST both set OF to 0 so we should have no issues with instructions that only use OF. Though I don't think there's any reason we would read just OF after a compare with 0 anyway. So this probably isn't an observable change. llvm-svn: 349223	2018-12-15 01:07:19 +00:00
Craig Topper	5c304eac41	[X86] Make hasNoCarryFlagUses/hasNoSignedComparisonUses take an SDValue that indicates which result is the flag result. NFCI hasNoCarryFlagUses hardcoded that the flag result is 1 and used that to filter which uses were of interest. hasNoSignedComparisonUses just assumes the only result is flags and checks whether any user of the node is a CopyToReg instruction. After this patch we now do a result number check in both and rely on the caller to provide the result number. This shouldn't change behavior it was just an odd difference between the two functions that I noticed. llvm-svn: 349222	2018-12-15 01:07:16 +00:00
Craig Topper	257ce3871e	[DAGCombiner][X86] Prevent visitSIGN_EXTEND from returning N when (sext (setcc)) already has the target desired type for the setcc Summary: If the setcc already has the target desired type we can reach the getSetCC/getSExtOrTrunc after the MatchingVecType check with the exact same types as the nodes we started with. This causes those causes VsetCC to be CSEd to N0 and the getSExtOrTrunc will CSE to N. When we return N, the caller will think that meant we called CombineTo and did our own worklist management. But that's not what happened. This prevents target hooks from being called for the node. To fix this, I've now returned SDValue if the setcc is already the desired type. But to avoid some regressions in X86 I've had to disable one of the target combines that wasn't being reached before in the case of a (sext (setcc)). If we get vector widening legalization enabled that entire function will be deleted anyway so hopefully this is only for the short term. Reviewers: RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55459 llvm-svn: 349137	2018-12-14 08:28:24 +00:00
Craig Topper	178abc59ac	[X86] Demote EmitTest to a helper function of EmitCmp. Route all callers except EmitCmp through EmitCmp. This requires the two callers to manifest a 0 to make EmitCmp call EmitTest. I'm looking into changing how we combine TEST and flag setting instructions to not be part of lowering. And instead be part of DAG combine or isel. Which will mean EmitTest will probably become gutted and maybe disappear entirely. llvm-svn: 349094	2018-12-13 23:55:30 +00:00
Mircea Trofin	41c729e78e	[llvm] Address base discriminator overflow in X86DiscriminateMemOps Summary: Macros are expanded on a single line. In case of large expansions, with sufficiently many instructions with memory operands (and when -fdebug-info-for-profiling is requested), we may be unable to generate new base discriminator values - new values overflow (base discriminators may not be larger than 2^12). This CL warns instead of asserting in such a case. A subsequent CL will add APIs to check for overflow before creating new debug info. See https://bugs.llvm.org/show_bug.cgi?id=39890 Reviewers: davidxl, wmi, gbedwell Reviewed By: davidxl Subscribers: aprantl, llvm-commits Differential Revision: https://reviews.llvm.org/D55643 llvm-svn: 349075	2018-12-13 19:40:59 +00:00
Simon Pilgrim	b5aaa673c6	[X86][SSE] Add SSE vector imm/var shift support to SimplifyDemandedVectorEltsForTargetNode llvm-svn: 349057	2018-12-13 16:39:29 +00:00
Simon Pilgrim	b0b2f1503a	[X86][SSE] Fix all remaining modulo vector rotation amounts (PR38243) There's still a couple of minor SimplifyDemandedElts regressions in some of the shift amount splats that will be fixed in future patches. llvm-svn: 349052	2018-12-13 15:50:31 +00:00
Simon Pilgrim	ba91ff4a86	[X86][SSE] Fix modulo rotation amounts for v8i16/v16i16/v4i32 (PR38243) llvm-svn: 349047	2018-12-13 15:23:09 +00:00
Simon Pilgrim	7c84f7ae3a	[X86][SSE] Merge the vXi16/vXi32 vector rotation expansion cases. NFCI. Merged the repeated code into a single if(). llvm-svn: 349040	2018-12-13 14:51:28 +00:00
Simon Pilgrim	320fd7383f	[X86][BWI] Don't custom lower vXi8 rotations. We always expand to shifts anyhow - test changes are just different scheduling only. llvm-svn: 349034	2018-12-13 13:44:33 +00:00
Simon Pilgrim	ab973a45b9	[DAGCombine] Moved X86 rotate_amount % bitwidth == 0 early out to DAGCombiner Remove common code from custom lowering (code is still safe if somehow a zero value gets used). llvm-svn: 349028	2018-12-13 12:23:32 +00:00
Simon Pilgrim	77fc551d1a	[TargetLowering] Add ISD::ROTL/ROTR vector expansion Move existing rotation expansion code into TargetLowering and set it up for vectors as well. Ideally this would share more of the funnel shift expansion, but we handle the shift amount modulo quite differently at the moment. Begun removing x86 vector rotate custom lowering to use the expansion. llvm-svn: 349025	2018-12-13 11:20:48 +00:00
Craig Topper	a048d58de7	[X86] Remove assert leftover from when i1 was a legal type. Add more accurate assert. NFC llvm-svn: 349007	2018-12-13 06:14:25 +00:00
Craig Topper	d1c61861dd	[X86] Don't emit MULX by default with BMI2 MULX has somewhat improved register allocation constraints compared to the legacy MUL instruction. Both output registers are encoded instead of fixed to EAX/EDX, but EDX is used as input. It also doesn't touch flags. Unfortunately, the encoding is longer. Prefering it whenever BMI2 is enabled is probably not optimal. Choosing it should somehow be a function of register allocation constraints like converting adds to three address. gcc and icc definitely don't pick MULX by default. Not sure what if any rules they have for using it. Differential Revision: https://reviews.llvm.org/D55565 llvm-svn: 348975	2018-12-12 21:21:31 +00:00
Craig Topper	4937adf75f	[X86] Emit SBB instead of SETCC_CARRY from LowerSELECT. Break false dependency on the SBB input. I'm hoping we can just replace SETCC_CARRY with SBB. This is another step towards that. I've explicitly used zero as the input to the setcc to avoid a false dependency that we've had with the SETCC_CARRY. I changed one of the patterns that used NEG to instead use an explicit compare with 0 on the LHS. We needed the zero anyway to avoid the false dependency. The negate would clobber its input register. By using a CMP we can avoid that which could be useful. Differential Revision: https://reviews.llvm.org/D55414 llvm-svn: 348959	2018-12-12 19:20:21 +00:00
Simon Pilgrim	eb508f8ccb	[SelectionDAG] Add a generic isSplatValue function This patch introduces a generic function to determine whether a given vector type is known to be a splat value for the specified demanded elements, recursing up the DAG looking for BUILD_VECTOR or VECTOR_SHUFFLE splat patterns. It also keeps track of the elements that are known to be UNDEF - it returns true if all the demanded elements are UNDEF (as this may be useful under some circumstances), so this needs to be handled by the caller. A wrapper variant is also provided that doesn't take the DemandedElts or UndefElts arguments for cases where we just want to know if the SDValue is a splat or not (with/without UNDEFS). I had hoped to completely remove the X86 local version of this function, but I'm seeing some regressions in shift/rotate codegen that will take a little longer to fix and I hope to get this in sooner so I can continue work on PR38243 which needs more capable splat detection. Differential Revision: https://reviews.llvm.org/D55426 llvm-svn: 348953	2018-12-12 18:32:29 +00:00
Sanjay Patel	44eaa492b8	[x86] allow 8-bit adds to be promoted by convertToThreeAddress() to form LEA This extends the code that handles 16-bit add promotion to form LEA to also allow 8-bit adds. That allows us to combine add ops with register moves and save some instructions. This is another step towards allowing add truncation in generic DAGCombiner (see D54640). Differential Revision: https://reviews.llvm.org/D55494 llvm-svn: 348946	2018-12-12 17:58:27 +00:00
Craig Topper	1fe466689b	[X86] Combine vpmovdw+vpacksswb into vpmovdb. This is similar to the combine we already have for vpmovdw+vpackuswb. llvm-svn: 348910	2018-12-12 05:56:01 +00:00
Craig Topper	b51283bfd7	Fix not correct imm operand assertion for SUB32ri in X86CondBrFolding::analyzeCompare Summary: When doing X86CondBrFolding::analyzeCompare, it will meet the SUB32ri instruction as below to use the global address for its operand, %733:gr32 = SUB32ri %62:gr32(tied-def 0), @img2buf_normal, implicit-def $eflags JNE_1 %bb.41, implicit $eflags so the assertion "assert(MI.getOperand(ValueIndex).isImm() && "Expecting Imm operand")" is not correct and change the assert to if make X86CondBrFolding::analyzeCompare return false as not finding the compare for this Patch by Jianping Chen Reviewers: smaslov, LuoYuanke, liutianle, Jianping Reviewed By: Jianping Subscribers: lebedev.ri, llvm-commits Differential Revision: https://reviews.llvm.org/D54250 llvm-svn: 348853	2018-12-11 15:32:14 +00:00
Sanjay Patel	05e36982dd	[x86] clean up code for converting 16-bit ops to LEA; NFC As discussed in D55494, we want to extend this to handle 8-bit ops too, but that could be extended further to enable this on 32-bit systems too. llvm-svn: 348851	2018-12-11 15:29:40 +00:00
Sanjay Patel	9765ba5f86	[x86] remove dead code for 16-bit LEA formation; NFC As discussed in: D55494 ...this code has been disabled/dead for a long time (the code references Athlon and Pentium 4), and there's almost no chance that it will be used given the last decade of uarch evolution. Also, in SDAG we promote 16-bit ops to 32-bit, so there's almost no way to test this code any more. llvm-svn: 348845	2018-12-11 14:05:03 +00:00
Amara Emerson	5ec146046c	[GlobalISel] Restrict G_MERGE_VALUES capability and replace with new opcodes. This patch restricts the capability of G_MERGE_VALUES, and uses the new G_BUILD_VECTOR and G_CONCAT_VECTORS opcodes instead in the appropriate places. This patch also includes AArch64 support for selecting G_BUILD_VECTOR of <4 x s32> and <2 x s64> vectors. Differential Revisions: https://reviews.llvm.org/D53629 llvm-svn: 348788	2018-12-10 18:44:58 +00:00
Sanjay Patel	134f56e702	[x86] fix formatting; NFC This should really be generalized to allow increment and/or we should replace it by using ISD::matchUnaryPredicate(). See D55515 for context. llvm-svn: 348776	2018-12-10 17:23:44 +00:00
Cameron McInally	872ed41a1e	[AVX512] Update typo in comment Should be "Sae" for "Suppress All Exceptions". NFC llvm-svn: 348763	2018-12-10 15:21:35 +00:00
Nikita Popov	e79477895e	[X86] Fix AvoidStoreForwardingBlocks pass for negative displacements Fixes https://bugs.llvm.org/show_bug.cgi?id=39926. The size of the first copy was computed as std::abs(std::abs(LdDisp2) - std::abs(LdDisp1)), which results in skipped bytes if the signs of LdDisp2 and LdDisp1 differ. As far as I can see, this should just be LdDisp2 - LdDisp1. The case where LdDisp1 > LdDisp2 is already handled in the code above, in which case LdDisp2 is set to LdDisp1 and this subtraction will evaluate to Size1 = 0, which is the correct value to skip an overlapping copy. Differential Revision: https://reviews.llvm.org/D55485 llvm-svn: 348750	2018-12-10 10:16:50 +00:00
Craig Topper	02b614abc8	[X86] Merge addcarryx/addcarry intrinsic into a single addcarry intrinsic. Both intrinsics do the exact same thing so we really only need one. Earlier in the 8.0 cycle we changed the signature of this intrinsic without renaming it. But it looks difficult to get the autoupgrade code to allow me to merge the intrinsics and change the signature at the same time. So I've renamed the intrinsic slightly for the new merged intrinsic. I'm skipping autoupgrading from the previous new to 8.0 signature. I've also renamed the subborrow for consistency. llvm-svn: 348737	2018-12-10 06:07:50 +00:00
Craig Topper	2b09d17d93	[X86] If the carry input to an addcarry/subborrow intrinsic is known to be 0, emit a flag setting ADD/SUB instead of ADC/SBB. Previously we had to take the carry in and add -1 to it to set the carry flag so we could use it with ADC/SBB. But if we know its 0 then we don't need to bother. This should go a long way towards fixing PR24545. llvm-svn: 348727	2018-12-09 18:02:37 +00:00
Nico Weber	b961661977	Remove unneeded dependency from lib/Target/X86/Utils/ to lib/IR (aka Core). The dependency was added in r213995 in response to r213986 which did make X86/Utils depend on IR, but r256680 later removed that dependency again. llvm-svn: 348724	2018-12-09 15:15:13 +00:00
Sanjay Patel	19bc850220	[x86] don't try to convert add with undef operands to LEA The existing code tries to handle an undef operand while transforming an add to an LEA, but it's incomplete because we will crash on the i16 test with the debug output shown below. It's better to just give up instead. Really, GlobalIsel should have folded these before we could get into trouble. # Machine code for function add_undef_i16: NoPHIs, TracksLiveness, Legalized, RegBankSelected, Selected bb.0 (%ir-block.0): liveins: $edi %1:gr32 = COPY killed $edi %0:gr16 = COPY %1.sub_16bit:gr32 %5:gr64_nosp = IMPLICIT_DEF %5.sub_16bit:gr64_nosp = COPY %0:gr16 %6:gr64_nosp = IMPLICIT_DEF %6.sub_16bit:gr64_nosp = COPY %2:gr16 %4:gr32 = LEA64_32r killed %5:gr64_nosp, 1, killed %6:gr64_nosp, 0, $noreg %3:gr16 = COPY killed %4.sub_16bit:gr32 $ax = COPY killed %3:gr16 RET 0, implicit killed $ax # End machine code for function add_undef_i16. * Bad machine code: Reading virtual register without a def * - function: add_undef_i16 - basic block: %bb.0 (0x7fe6cd83d940) - instruction: %6.sub_16bit:gr64_nosp = COPY %2:gr16 - operand 1: %2:gr16 LLVM ERROR: Found 1 machine code errors. Differential Revision: https://reviews.llvm.org/D54710 llvm-svn: 348722	2018-12-09 14:40:37 +00:00
Simon Pilgrim	e9d8275e43	[X86] Extend pfm counter coverage for llvm-exegesis Extension to rL348617, turns out llvm-exegesis doesn't need to match the perf counter name against a scheduler model resource name - so I've added a few more counters that I could find in the libpfm4 source code (and fix a typo in the knl/knm retired_uops counter - which uses 'all' instead of 'any'). llvm-svn: 348721	2018-12-09 13:45:15 +00:00
Simon Pilgrim	9b8fdab26c	[X86] Replace instregex with instrs list. NFCI. llvm-svn: 348626	2018-12-07 18:47:05 +00:00
Craig Topper	ba3ab78291	[X86] Initialize and Register X86CondBrFoldingPass To make X86CondBrFoldingPass can be run with --run-pass option, this can test one wrong assertion on analyzeCompare function for SUB32ri when its operand is not imm Patch by Jianping Chen Differential Revision: https://reviews.llvm.org/D55412 llvm-svn: 348620	2018-12-07 18:10:34 +00:00
Simon Pilgrim	6155b32250	[X86] Improve pfm counter coverage for llvm-exegesis This patch attempts to improve pfm perf counter coverage for all the x86 CPUs that libpfm4 supports. Intel/AMD CPU families tend to share names for cycle/uops counters so even if they don't have a scheduler model yet they can at least use the default values (checked against the libpfm4 source code). The remaining CPUs (where their port/pipe resource counters are known) I've tried to add to the existing model mappings. These are untested but don't represent a regression to current llvm-exegesis behaviour for these CPUs. Differential Revision: https://reviews.llvm.org/D55432 llvm-svn: 348617	2018-12-07 17:48:40 +00:00
David Green	ca29c271d2	[Targets] Add errors for tiny and kernel codemodel on targets that don't support them Adds fatal errors for any target that does not support the Tiny or Kernel codemodels by rejigging the getEffectiveCodeModel calls. Differential Revision: https://reviews.llvm.org/D50141 llvm-svn: 348585	2018-12-07 12:10:23 +00:00
Simon Pilgrim	9c7d85bc62	[X86] Add ivybridge to llvm-exegesis PFM counter mappings llvm-svn: 348575	2018-12-07 09:27:35 +00:00
Craig Topper	2c7a9476e0	[X86] Directly create ADC/SBB nodes instead of using ADD/SUB with (and SETCC_CARRY, 1) This addresses a FIXME and avoids depending on an isel pattern match I think. I've remove the isel patterns too since he have no lit tests left that cover them. Hopefully that really means they are unused. I'm trying to decide if we need SETCC_CARRY. This removes one of its usages. Differential Revision: https://reviews.llvm.org/D55355 llvm-svn: 348536	2018-12-06 22:26:59 +00:00
Simon Pilgrim	bb650daeaf	[X86] Refactored IsSplatVector to use switch. NFCI. Initial step towards making the function more generic (and probably move into SelectionDAG). This is necessary to avoid massive codegen bloat for PR38243 (Add modulo rotate support to LowerRotate). llvm-svn: 348498	2018-12-06 16:29:14 +00:00
Craig Topper	6a6d77b851	[X86] Remove some leftover code for handling an i1 setcc type. NFC We should only need to handle i8 now. llvm-svn: 348460	2018-12-06 07:00:02 +00:00
Chandler Carruth	71c14a36a2	[SLH] Fix a nasty bug in SLH. Whenever we effectively take the address of a basic block we need to manually update that basic block to reflect that fact or later passes such as tail duplication and tail merging can break the invariants of the code. =/ Sadly, there doesn't appear to be any good way of automating this or even writing a reasonable assert to catch it early. The change seems trivially and obviously correct, but sadly the only really good test case I have is 1000s of basic blocks. I've tried directly writing a test case that happens to make tail duplication do something that crashes later on, but this appears to require an amazingly complex set of conditions that I've not yet reproduced. The change is technically covered by the tests because we mark the blocks as having their address taken, but that doesn't really count as properly testing the functionality. llvm-svn: 348374	2018-12-05 15:42:11 +00:00
Simon Pilgrim	32483668d7	[X86][SSE] Begun adding modulo rotate support to LowerRotate Prep work for PR38243 - mainly adding comments on where we need to add modulo support (doing so at the moment causes massive codegen regressions). I've also consistently added support for modulo folding for uniform constants (although at the moment we have no way to trigger this) and removed the old assertions. llvm-svn: 348366	2018-12-05 14:46:37 +00:00
Simon Pilgrim	180639afe5	[SelectionDAG] Initial support for FSHL/FSHR funnel shift opcodes (PR39467) This is an initial patch to add a minimum level of support for funnel shifts to the SelectionDAG and to begin wiring it up to the X86 SHLD/SHRD instructions. Some partial legalization code has been added to handle the case for 'SlowSHLD' where we want to expand instead and I've added a few DAG combines so we don't get regressions from the existing DAG builder expansion code. Differential Revision: https://reviews.llvm.org/D54698 llvm-svn: 348353	2018-12-05 11:12:12 +00:00
Nirav Dave	ce26c27b2a	[SelectionDAG] Redefine isGAPlusOffset in terms of unwrapAddress. NFCI. llvm-svn: 348288	2018-12-04 17:59:43 +00:00
Simon Pilgrim	07843640d5	[X86][SSE] Add SimplifyDemandedBitsForTargetNode handling for MOVMSK Moves existing SimplifyDemandedBits call out of combineMOVMSK and add SimplifyDemandedVectorElts call based on the sign bits we need. llvm-svn: 348282	2018-12-04 16:52:32 +00:00
Simon Pilgrim	b1d6db7693	[X86] Remove unnecessary peekThroughEXTRACT_SUBVECTORs call. The GetSplatValue/IsSplatVector call will call this anyhow and the later code is just for a v2i64 type so doesn't need it. llvm-svn: 348253	2018-12-04 12:21:43 +00:00
Simon Pilgrim	0add090e24	[TargetLowering] expandFP_TO_UINT - avoid FPE due to out of range conversion (PR17686) PR17686 demonstrates that for some targets FP exceptions can fire in cases where the FP_TO_UINT is expanded using a FP_TO_SINT instruction. The existing code converts both the inrange and outofrange cases using FP_TO_SINT and then selects the result, this patch changes this for 'strict' cases to pre-select the FP_TO_SINT input and the offset adjustment. The X87 cases don't need the strict flag but generates much nicer code with it.... Differential Revision: https://reviews.llvm.org/D53794 llvm-svn: 348251	2018-12-04 11:21:30 +00:00
Craig Topper	35585aff34	[X86] Remove custom DAG combine for SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG. We only needed this because it provided really aggressive constant folding even through constant pool entries created from build_vectors. The main case was for vXi8 MULH legalization which was happening as part of legalize DAG instead of as part of legalize vector ops. Now its part of vector op legalization and we've added special handling for build vectors of all constants there. This has removed the need for this code on the list tests we have. llvm-svn: 348237	2018-12-04 04:51:07 +00:00
Sanjay Patel	d24f63477d	[DAGCombiner] narrow truncated vector binops when legal This is the smallest vector enhancement I could find to D54640. Here, we're allowing narrowing to only legal vector ops because we'll see regressions without that. All of the test diffs are wins from what I can tell. With AVX/AVX512, we can shrink ymm/zmm ops to xmm. x86 vector multiplies are the problem case that we're avoiding due to the patchwork ISA, and it's not clear to me if we can dance around those regressions using TLI hooks or if we need preliminary patches to plug those holes. Differential Revision: https://reviews.llvm.org/D55126 llvm-svn: 348195	2018-12-03 21:57:35 +00:00
Craig Topper	5440b63fa8	[X86] Teach LowerMUL/LowerMULH for vXi8 to unpack constant RHS. Summary: We need to unpackl and unpackh the operands to use two vXi16 multiplies. Previously it looks like the low unpack would get constant folded at least in the 128-bit case after shuffle lowering turned the unpackl into ZERO_EXTEND_VECTOR_INREG and X86 custom DAG combined it. The same doesn't happen for the high half. So we'd load a constant and then shuffle it. But the low half would just be loaded and used by the multiply directly. After this patch we now end up with a constant pool entry for the low and high unpacks separately with no shuffle operations. This is a step towards removing custom constant folding for ZERO_EXTEND_VECTOR_INREG/SIGN_EXTEND_VECTOR_INREG in the X86 backend. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55165 llvm-svn: 348159	2018-12-03 18:26:27 +00:00
Craig Topper	e35b01f8ea	[X86] Add DAG combine to combine a v8i32->v8i16 truncate with a packuswb that truncates v8i16->v8i8. Summary: Under -x86-experimental-vector-widening-legalization, fp_to_uint/fp_to_sint with a smaller than 128 bit vector type results are custom type legalized by promoting the result to a 128 bit vector by promoting the elements, inserting an assertzext/assertsext, then truncating back to original type. The truncate will be further legalizdd to a pack shuffle. In the case of a v8i8 result type, we'll end up with a v8i16 fp_to_sint. This will need to be further legalized during vector op legalization by promoting to v8i32 and then truncating again. Under avx2 this produces good code with two pack instructions, but Under avx512 this will result in a truncate instruction and a packuswb instruction. But we should be able to get away with a single truncate instruction. The other option is to promote all the way to vXi32 result type during the first type legalization. But in some experimentation that seemed to require more work to produce good code for other configurations. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54836 llvm-svn: 348158	2018-12-03 18:26:24 +00:00
Craig Topper	959b415e2f	[X86] Add a DAG combine to turn stores of vXi1 on pre-avx512 targets into a bitcast and a store of a iX scalar. llvm-svn: 348104	2018-12-02 19:47:14 +00:00
Craig Topper	6f54ff57fd	[X86] Fix bad comment. NFC llvm-svn: 348103	2018-12-02 19:47:13 +00:00
Craig Topper	204e4110e0	[X86] Simplify LowerBITCAST code for v2i32/v4i16/v8i8/i64->mmx/i64/f64 bitcast. Previously this code generated its own extracts and build_vector. But we can use a simpler concat_vectors or scalar_to_vector operation and let type legalization do additional legalization of those operations. llvm-svn: 348087	2018-12-02 07:52:39 +00:00
Craig Topper	4bb077910a	[X86] Add custom type legalization for v2i32/v4i16/v8i8->mmx bitcasts to avoid a store/load to/from the stack. Widen the input to a 128 bit vector by padding with undef elements. Then use a movdq2q to convert from xmm register to mmx register. llvm-svn: 348086	2018-12-02 05:46:50 +00:00
Craig Topper	ec096a1dae	[X86] Custom type legalize v2i32/v4i16/v8i8->i64 bitcasts in 64-bit mode similar to what's done when the destination is f64. The generic legalizer will fall back to a stack spill that uses a truncating store. That store will get expanded into a shuffle and non-truncating store on pre-avx512 targets. Once that happens the stack store/load pair will be combined away leaving behind the shuffle and bitcasts. On avx512 targets the truncating store is legal so doesn't get folded away. By custom legalizing it we can avoid this churn and maybe produce better code. llvm-svn: 348085	2018-12-02 05:46:48 +00:00
Craig Topper	f4b13927e7	[X86] Don't use zero_extend_vector_inreg for mulhu lowering with sse 4.1 Summary: With sse4.1 we use two zero_extend_vector_inreg and a pshufd to expand the v16i8 input into two v8i16 vectors for the multiply. That's 3 shuffles to extend one operand. The other operand is usually constant as this is mostly used by division by constant optimization. Pre sse4.1 we use a punpckhbw and a punpcklbw with a zero vector. That's two shuffles and an xor and a copy due to tied register constraints. That seems maybe better than the 3 shuffles. With AVX we avoid the copy so that's obviously better. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55138 llvm-svn: 348079	2018-12-01 19:26:31 +00:00
Craig Topper	4d80f199e8	[X86] Change vXi8 MULHU lowering to unpack high and low half of lanes instead of extracting and concating low and high half registers. This reduces the number of shuffle operations that need to be done. The splitting strategy requires the shuffle unit for the extraction and the extension. With the unpack strategy the unpacks accomplish a splitting and extending in one operation. llvm-svn: 348019	2018-11-30 18:43:18 +00:00
Craig Topper	8191307d09	[X86] Prefer lowerVectorShuffleAsBitMask over using a avx512 masked operation when avx512bw/avx512vl is enabled. This does require a constant pool load instead of loading an immediate into a gpr, moving to a k register and masking. But its less instructions and more consistent with previous ISAs. It probably opens up more combine opportunities as one of the test cases demonstrates. llvm-svn: 348018	2018-11-30 18:43:15 +00:00
Craig Topper	a2133061c0	[X86] Emit PACKUS directly from the v16i8 LowerMULH code instead of using a shuffle. llvm-svn: 347967	2018-11-30 08:32:05 +00:00
Craig Topper	6e4b266a0d	[X86] Change the pre-sse4.1 code in the v16i8 MULHU lowering to be what we get after DAG combine cleans it up. Previously we emitted a punpcklbw/punpckhbw to move the byte elements into the upper half of 16 bit elements then shifted right by 8 to zero the upper bits. After DAG combine we end up with punpcklbw/punpckhbw into the lower bits with zeros in the uppers bits and no shifts. So just emit that directly. llvm-svn: 347966	2018-11-30 08:32:01 +00:00
Craig Topper	0850e8a6b6	[X86] Fix a couple types in SimplifyDemandedVectorEltsForTargetNode. NFCI We had a EVT variable capturing the result of getSimpleValueType which returns an MVT. Another place using EVT that could have been MVT. And an 'int' that should be 'unsigned'. llvm-svn: 347959	2018-11-30 06:23:55 +00:00
Mircea Trofin	5e0b21fb45	Fix build warnings introduced in rL347938 Summary: Suppressed warnings in release builds due to variable used only in assert statement. Subscribers: llvm-commits, eraman, mgorny Differential Revision: https://reviews.llvm.org/D55100 llvm-svn: 347939	2018-11-30 01:53:17 +00:00
Mircea Trofin	f1a49e8525	Revert "Revert r347596 "Support for inserting profile-directed cache prefetches"" Summary: This reverts commit d8517b96dfbd42e6a8db33c50d1fa1e58e63fbb9. Fix: correct the use of DenseMap. Reviewers: davidxl, hans, wmi Reviewed By: wmi Subscribers: mgorny, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D55088 llvm-svn: 347938	2018-11-30 01:01:52 +00:00
Craig Topper	73c1d75d58	[X86] Change the pre-type legalization DAG combine added in r347898 into a custom type legalization operation instead. This seems to produce the same results on the tests we have. llvm-svn: 347912	2018-11-29 20:18:58 +00:00
Craig Topper	129d529ab3	[SelectionDAG][AArch64][X86] Move legalization of vector MULHS/MULHU from LegalizeDAG to LegalizeVectorOps I believe we should be legalizing these with the rest of vector binary operations. If any custom lowering is required for these nodes, this will give the DAG combine between LegalizeVectorOps and LegalizeDAG to run on the custom code before constant build_vectors are lowered in LegalizeDAG. I've moved MULHU/MULHS handling in AArch64 from Lowering to isel. Moving the lowering earlier caused build_vector+extract_subvector simplifications to kick in which made the generated code worse. Differential Revision: https://reviews.llvm.org/D54276 llvm-svn: 347902	2018-11-29 19:36:17 +00:00
Craig Topper	6cd0b17078	[X86] Add a DAG combine pre type legalization to widen division by constant splat on narrow vectors to avoid scalarization This is another patch for -x86-experimental-vector-widening. This pre widens narrow division by constants so that we can get pass the legal type check in the generic DAG combiner. Otherwise we end up scalarizing. I've restricted this to splats for now because it was easy to just call DAG.getConstant. Not sure what we should do for non-splat? Increase the element size?Widen the constant vector by padding with 1? Differential Revision: https://reviews.llvm.org/D54919 llvm-svn: 347898	2018-11-29 19:13:38 +00:00
Hans Wennborg	6e3be9d12e	Revert r347596 "Support for inserting profile-directed cache prefetches" It causes asserts building BoringSSL. See https://crbug.com/91009#c3 for repro. This also reverts the follow-ups: Revert r347724 "Do not insert prefetches with unsupported memory operands." Revert r347606 "[X86] Add dependency from X86 to ProfileData after rL347596" Revert r347607 "Add new passes to X86 pipeline tests" llvm-svn: 347864	2018-11-29 13:58:02 +00:00
Andrea Di Biagio	373a4ccf6c	[llvm-mca][MC] Add the ability to declare which processor resources model load/store queues (PR36666). This patch adds the ability to specify via tablegen which processor resources are load/store queue resources. A new tablegen class named MemoryQueue can be optionally used to mark resources that model load/store queues. Information about the load/store queue is collected at 'CodeGenSchedule' stage, and analyzed by the 'SubtargetEmitter' to initialize two new fields in struct MCExtraProcessorInfo named `LoadQueueID` and `StoreQueueID`. Those two fields are identifiers for buffered resources used to describe the load queue and the store queue. Field `BufferSize` is interpreted as the number of entries in the queue, while the number of units is a throughput indicator (i.e. number of available pickers for loads/stores). At construction time, LSUnit in llvm-mca checks for the presence of extra processor information (i.e. MCExtraProcessorInfo) in the scheduling model. If that information is available, and fields LoadQueueID and StoreQueueID are set to a value different than zero (i.e. the invalid processor resource index), then LSUnit initializes its LoadQueue/StoreQueue based on the BufferSize value declared by the two processor resources. With this patch, we more accurately track dynamic dispatch stalls caused by the lack of LS tokens (i.e. load/store queue full). This is also shown by the differences in two BdVer2 tests. Stalls that were previously classified as generic SCHEDULER FULL stalls, are not correctly classified either as "load queue full" or "store queue full". About the differences in the -scheduler-stats view: those differences are expected, because entries in the load/store queue are not released at instruction issue stage. Instead, those are released at instruction executed stage. This is the main reason why for the modified tests, the load/store queues gets full before PdEx is full. Differential Revision: https://reviews.llvm.org/D54957 llvm-svn: 347857	2018-11-29 12:15:56 +00:00
Craig Topper	c2540995ed	[X86] Correct comment. NFC llvm-svn: 347835	2018-11-29 05:56:03 +00:00
Sanjay Patel	2de209313e	[x86] try select simplification for target-specific nodes This failed to select (which might be a separate bug) in X86ISelDAGToDAG because we try to create a select node that can be simplified away after rL347227. This change avoids the problem by simplifying the SHRUNKBLEND node sooner. In the test case, we manage to realize that the true/false values of the select (SHRUNKBLEND) are the same thing, so it simplifies away completely. llvm-svn: 347818	2018-11-28 22:51:04 +00:00
Craig Topper	81f1b4a361	[X86] Make X86TTIImpl::getCastInstrCost properly handle the case where AVX512 is enabled, but 512-bit vectors aren't legal. Unlike most cost model functions this code makes a lot of table lookups without using the results from getTypeLegalizationCost. This means 512-bit vectors can be looked up even when the type isn't legal. This patch adds a check around the two tables that contain 512-bit types to make sure that neither of the types would be split by type legalization. Meaning 512 bit types are illegal. I wanted to write this in a somewhat generic way that uses type legalization query hooks. But if prefered, I can switch to just using is512BitVector and the subtarget feature. Differential Revision: https://reviews.llvm.org/D54984 llvm-svn: 347786	2018-11-28 18:11:42 +00:00
Craig Topper	d3bb036bc9	[X86] Add some cost model entries for sext/zext for avx512bw This fixes some of scalarization costs reported for sext/zext using avx512bw. This does not fix all scalarization costs being reported. Just the worst. I've restricted this only to combinations of types that are legal with avx512bw like v32i1/v64i1/v32i16/v64i8 and conversions between vXi1 and vXi8/vXi16 with legal vXi8/vXi16 result types. Differential Revision: https://reviews.llvm.org/D54979 llvm-svn: 347785	2018-11-28 18:11:39 +00:00
Craig Topper	f3b6f583e2	[X86] Add a combine for back to back VSRAI instructions Expansion of SIGN_EXTEND_INREG can create a VSRAI instruction. If there is already a VSRAI after it, we should combine them into a larger VSRAI Differential Revision: https://reviews.llvm.org/D54959 llvm-svn: 347784	2018-11-28 18:03:38 +00:00
Francis Visoiu Mistrih	d7eebd6d83	[CodeGen][NFC] Make `TII::getMemOpBaseImmOfs` return a base operand Currently, instructions doing memory accesses through a base operand that is not a register can not be analyzed using `TII::getMemOpBaseRegImmOfs`. This means that functions such as `TII::shouldClusterMemOps` will bail out on instructions using an FI as a base instead of a register. The goal of this patch is to refactor all this to return a base operand instead of a base register. Then in a separate patch, I will add FI support to the mem op clustering in the MachineScheduler. Differential Revision: https://reviews.llvm.org/D54846 llvm-svn: 347746	2018-11-28 12:00:20 +00:00
Mircea Trofin	35f0e5cd2d	Do not insert prefetches with unsupported memory operands. Summary: Ignore advices where the memory operand of the 'anchor' instruction uses unsupported register types. Reviewers: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54983 llvm-svn: 347724	2018-11-28 01:08:45 +00:00
Evandro Menezes	9ef79c884a	[TableGen] Refactor macro names (NFC) Make the names for the macros for `TargetInstrInfo` uniform. llvm-svn: 347706	2018-11-27 20:58:27 +00:00
Craig Topper	7ceef03dc9	[X86] Replace an APInt that is guaranteed to be 8-bits with just an 'unsigned' We're already mixing this APInt with other 'unsigned' variables. This allows us to use regular comparison operators instead of needing to use APInt::ult or APInt::uge. And it removes a later conversion from APInt to unsigned. I might be adding another combine to this function and this will probably simplify the logic required for that. llvm-svn: 347684	2018-11-27 18:24:56 +00:00
Craig Topper	5fb34b5498	[X86] Add cascade lake arch in X86 target. This is skylake-avx512 with the addition of avx512vnni ISA. Patch by Jianping Chen Differential Revision: https://reviews.llvm.org/D54785 llvm-svn: 347681	2018-11-27 18:05:00 +00:00
Craig Topper	196fd31e33	[X86] Use getUnpackl/getUnpackh instead of directly creating UNPCKL/UNPCKH nodes. llvm-svn: 347642	2018-11-27 06:24:56 +00:00
Craig Topper	4325505f05	[X86] Prevent DAG combine from folding a bitcast from vXi1 to iX with a store on pre-AVX512 targets. If we fold the bitcast into the store we'll end up creating a truncating store to vXi1 that will get scalarized. Instead allow the bitcast to be turned into a movmsk. We probably need to do something if the store itself is a vXi1 type, but I'll leave that til a testcase appears. llvm-svn: 347632	2018-11-27 02:57:27 +00:00
Sterling Augustine	9cc1ffadc5	Notify the linker when a TU compiled with split-stack has a function without a prologue. More context here: https://go-review.googlesource.com/c/go/+/148819/ llvm-svn: 347614	2018-11-26 23:26:31 +00:00
Mircea Trofin	183df14520	Add new passes to X86 pipeline tests Summary: Fixes test failures introduced by rL347596. Reviewers: davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54916 llvm-svn: 347607	2018-11-26 22:49:17 +00:00
Fangrui Song	82ddb8154e	[X86] Add dependency from X86 to ProfileData after rL347596 llvm-svn: 347606	2018-11-26 22:16:19 +00:00
Mircea Trofin	cfbc1788d6	Support for inserting profile-directed cache prefetches Summary: Support for profile-driven cache prefetching (X86) This change is part of a larger system, consisting of a cache prefetches recommender, create_llvm_prof (https://github.com/google/autofdo), and LLVM. A proof of concept recommender is DynamoRIO's cache miss analyzer. It processes memory access traces obtained from a running binary and identifies patterns in cache misses. Based on them, it produces a csv file with recommendations. The expectation is that, by leveraging such recommendations, we can reduce the amount of clock cycles spent waiting for data from memory. A microbenchmark based on the DynamoRIO analyzer is available as a proof of concept: https://goo.gl/6TM2Xp. The recommender makes prefetch recommendations in terms of: * the binary offset of an instruction with a memory operand; * a delta; * and a type (nta, t0, t1, t2) meaning: a prefetch of that type should be inserted right before the instrution at that binary offset, and the prefetch should be for an address delta away from the memory address the instruction will access. For example: 0x400ab2,64,nta and assuming the instruction at 0x400ab2 is: movzbl (%rbx,%rdx,1),%edx means that the recommender determined it would be beneficial for a prefetchnta instruction to be inserted right before this instruction, as such: prefetchnta 0x40(%rbx,%rdx,1) movzbl (%rbx, %rdx, 1), %edx The workflow for prefetch cache instrumentation is as follows (the proof of concept script details these steps as well): 1. build binary, making sure -gmlt -fdebug-info-for-profiling is passed. The latter option will enable the X86DiscriminateMemOps pass, which ensures instructions with memory operands are uniquely identifiable (this causes ~2% size increase in total binary size due to the additional debug information). 2. collect memory traces, run analysis to obtain recommendations (see above-referenced DynamoRIO demo as a proof of concept). 3. use create_llvm_prof to convert recommendations to reference insertion locations in terms of debug info locations. 4. rebuild binary, using the exact same set of arguments used initially, to which -mllvm -prefetch-hints-file=<file> needs to be added, using the afdo file obtained at step 3. Note that if sample profiling feedback-driven optimization is also desired, that happens before step 1 above. In this case, the sample profile afdo file that was used to produce the binary at step 1 must also be included in step 4. The data needed by the compiler in order to identify prefetch insertion points is very similar to what is needed for sample profiles. For this reason, and given that the overall approach (memory tracing-based cache recommendation mechanisms) is under active development, we use the afdo format as a syntax for capturing this information. We avoid confusing semantics with sample profile afdo data by feeding the two types of information to the compiler through separate files and compiler flags. Should the approach prove successful, we can investigate improvements to this encoding mechanism. Reviewers: davidxl, wmi, craig.topper Reviewed By: davidxl, wmi, craig.topper Subscribers: davide, danielcdh, mgorny, aprantl, eraman, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D54052 llvm-svn: 347596	2018-11-26 21:36:18 +00:00
Craig Topper	b955bf382c	[LegalizeVectorTypes][X86][ARM][AArch64][PowerPC] Don't use SplitVecOp_TruncateHelper for FP_TO_SINT/UINT. SplitVecOp_TruncateHelper tries to promote the result type while splitting FP_TO_SINT/UINT. It then concatenates the result and introduces a truncate to the original result type. But it does this without inserting the AssertZExt/AssertSExt that the regular result type promotion would insert. Nor does it turn FP_TO_UINT into FP_TO_SINT the way normal result type promotion for these operations does. This is bad on X86 which doesn't support FP_TO_SINT until AVX512. This patch disables the use of SplitVecOp_TruncateHelper for these operations and just lets normal promotion handle it. I've tweaked a couple things in X86ISelLowering to avoid a few obvious regressions there. I believe all the changes on X86 are improvements. The other targets look neutral. Differential Revision: https://reviews.llvm.org/D54906 llvm-svn: 347593	2018-11-26 21:12:39 +00:00
Than McIntosh	30c804bbb1	[CodeGen] Support custom format of stack maps Summary: Add a hook to the GCMetadataPrinter for emitting stack maps in custom format. The hook will be called at stack map generation time. The default stack map format is used if there is no hook. For this to be useful a few data structures and accessors are exposed from the StackMaps class, so the custom printer can access the stack map data. This patch authored by Cherry Zhang <cherryyz@google.com>. Reviewers: thanm, apilipenko, reames Reviewed By: reames Subscribers: reames, apilipenko, nemanjai, javed.absar, kbarton, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D53892 llvm-svn: 347584	2018-11-26 18:43:48 +00:00
Sanjay Patel	d31220e0de	[x86] promote all multiply i8 by constant to i32 We have these 2 "isDesirable" promotion hooks (I'm not sure why we need both of them, but that's independent of this patch), and we can adjust them to promote "mul i8 X, C" to i32. Then, all of our existing LEA and other multiply expansion magic happens as it would for i32 ops. Some of the test diffs show that we could end up with an actual 32-bit mul instruction here because we choose not to expand to simpler ops. That instruction could be slower depending on the subtarget. On the plus side, this means we don't need a separate instruction to load the constant operand and possibly an extra instruction to move the result. If we need to tune mul i32 further, we could add a later transform that tries to shrink it back to i8 based on subtarget timing. I did not bother to duplicate all of the 32-bit test file RUNs and target settings that exist to test whether LEA expansion is cheap or not. The diffs here assume a default target, so that means LEA is generally cheap. Differential Revision: https://reviews.llvm.org/D54803 llvm-svn: 347557	2018-11-26 15:22:30 +00:00
Sanjay Patel	7336e7c67a	[x86] limit transform for select-of-fp-constants This should likely be adjusted to limit this transform further, but these diffs should be clear wins. If we have blendv/conditional move, then we should assume those are cheap ops. The loads become independent of the compare, so those can be speculated before we need to use the values in the blend/mov. llvm-svn: 347526	2018-11-25 17:27:02 +00:00
Sanjay Patel	cadf62f360	[x86] fix predicate for avoiding vblendv It only makes sense to produce the logic ops when 1 of the constants is +0.0. Otherwise, go with vblendv to reduce code. llvm-svn: 347403	2018-11-21 18:02:50 +00:00
Simon Pilgrim	66bae9aee8	[X86][AVX] Remove BROADCAST if we only need the 0'th element We don't catch this with target shuffle simplification if the src/dst types are different. llvm-svn: 347386	2018-11-21 11:00:09 +00:00
Craig Topper	e9b4001a82	[X86] In getScalarMaskingNode, replace scalar_to_vector with a bitcast to v8i1 and an extract_subvector to convert i8 to v1i1. The bitcast can be nicely merged with any i8 loads that exist for argument passing in 32 mode for example. llvm-svn: 347380	2018-11-21 07:01:22 +00:00
Craig Topper	27a5896fe8	[X86] Correct 256 vpmovzx/vpmovsx isel patterns to check HasAVX2 instead of HasAVX to prevent fast-isel from using them incorrectly. These are AVX2 instructions, but have been incorrectly marked in tablegen for a while. This wasn't a problem until r346784 switched the patterns to use target independent ISD opcodes. This made the patterns visible to fast isel. Fixes PR39733 llvm-svn: 347375	2018-11-21 01:39:38 +00:00
Craig Topper	aa52ee2770	[X86] Emit a PACKUS instead of a VECTOR_SHUFFLE from LowerTRUNCATE for v16i16->v16i8. We can't guarantee that demanded bits passing through the vector shuffle won't cause the AND in front of this to be removed. This would prevent the PACKUS from being matched during shuffle lowering. Unfortunately, this adds a packuswb to one of the vector-reduce-mul.ll tests since we were removing the shuffle via SimplifyDemandedVectorElts. We appear to have similar issues with vpmovwb on the same test case on other targets. llvm-svn: 347361	2018-11-20 22:57:48 +00:00
Craig Topper	24b346da42	[X86] Emit a single shuffle for the v16i8->v4i32 step of a SIGN_EXTEND_VECTOR_INREG lowering on pre-sse4.1 targets. Previously we emitted to separate shuffles, one for unpcklbw and one for unpcklwd. Instead emit a single shuffle equivalent to both of the original shuffles. Shuffle lowering seems able to handle it. This avoids a bitcast between the two shuffles which seems helpful to DAG combine. Remove the custom type legalization for v8i8->v8i32. I had put that in to avoid some almost duplicate punpcklbw instructions I was seeing, but this lowering change seems to fix that. It also fixes some duplicate shuffles seen in vector-sext.ll llvm-svn: 347348	2018-11-20 21:21:52 +00:00
Simon Pilgrim	ee8b96f253	[X86][SSE] Add computeKnownBits/ComputeNumSignBits support for PACKSS/PACKUS instructions. Pull out getPackDemandedElts demanded elts remapping helper from computeKnownBitsForTargetNode and use in computeKnownBits/ComputeNumSignBits. llvm-svn: 347303	2018-11-20 13:23:37 +00:00
Simon Pilgrim	ed7e2fda18	[X86][SSE] XFormVExtractWithShuffleIntoLoad - getVectorShuffle won't accept SM_SentinelZero Noticed while working on improving demanded elts target shuffle shuffle combining llvm-svn: 347302	2018-11-20 12:17:50 +00:00
Simon Pilgrim	a6fb85ffa7	[X86][SSE] Lower immediately to PACKUS instead of VECTOR_SHUFFLE. As discussed on rL347240, this avoids some regressions on D54679 and also helps some combines to kick in a bit earlier. llvm-svn: 347300	2018-11-20 11:46:37 +00:00
Simon Pilgrim	7198506ba8	[X86][SSE] Add SimplifyDemandedVectorElts support for PACKSS/PACKUS instructions. As discussed on rL347240. llvm-svn: 347299	2018-11-20 11:09:46 +00:00
Craig Topper	17fa42a69b	[X86] Preserve undef information when creating a punpckl/hbw from a v16i8 where all the even or odd elements are undef. Previously if V2 was unused we ended up using V1 for both inputs as part of the code that follows the new code. By using lowerVectorShuffleWithUNPCK we keep the undef nature of V2 in the output. As near as I can tell this makes v16i8 behavior consistent with every other VT now. This does mean that we give the register allocator freedom to fill in random registers now and create false dependencies. But like I said we're already doing that for other types. llvm-svn: 347296	2018-11-20 09:04:01 +00:00
Craig Topper	b06d1aa3a1	[X86] Add custom type legalization for v8i8->v8i32 sign extend pre-SSE4.1 This helps with a future patch and makes us less reliant on DAG combine merging shuffles. llvm-svn: 347295	2018-11-20 09:03:58 +00:00
Craig Topper	c733c7bf94	[X86] Replace more calls to getZeroVector with regular getConstant. getZeroVector produces a specifically canonicalized zero vector, but we can just let DAG legalization take care of it. The test changes are because MULH lowering happens later than it should and this change gave us the opportunity to constant fold away a multiply during a DAG combine before the build_vector got legalized with a bitcast. llvm-svn: 347290	2018-11-20 06:54:01 +00:00
Craig Topper	808d0dd689	[X86] Rename combineVSZext->combineExtendVectorInreg. NFC Now that we no longer have target specific vector extend nodes let's make the function name match the nodes we do use. llvm-svn: 347268	2018-11-19 22:18:47 +00:00
Craig Topper	a5e0380c30	[X86][CostModel] Don't lookup intrinsic cost tables if the intrinsic isn't one we care about We're seeing some issues internally where we sent some intrinsics into the cost model that the getTypeLegalizationCost call fails on, but X86 specific tables don't care about. Our base class implementation takes care of them. We'd just like X86 backend to ignore them. This patch makes sure the switch returned something X86 cares about and skips the table lookups and type legalization call if not. Probably more efficient too since we don't go scanning the tables for every intrinsic we could possibly see. Differential Revision: https://reviews.llvm.org/D54711 llvm-svn: 347248	2018-11-19 18:57:31 +00:00
Simon Pilgrim	c4861ab170	[X86][SSE] Remove unnecessary bit-and in pshufb vector ctlz (PR39703) SSE PSHUFB vector ctlz lowering works at the i4 nibble level. As detailed in PR39703, we were masking the lower nibble off but we only actually use it in the case where the upper nibble is known to be zero, making it safe to remove the mask and save an instruction. Differential Revision: https://reviews.llvm.org/D54707 llvm-svn: 347242	2018-11-19 18:40:59 +00:00
Craig Topper	311bbcd535	[X86] Attempt to improve v32i8/v64i8 multiply lowering by applying the v16i8 non-avx2 algorithm to each 128-bit lane. Previously we split the vectors in half to allow the two halves to be any extended then concatenated the results back together. This patch instead instead extends the v16i8 sse algorithm to extend half of each 128-bit lane using punpcklbw/punpckhbw. Multiplies all the low half lanes and high half lanes together in separate operations. Then merges the half lane results back together using packuswb. Unfortunately, some of the cases in vector-reduce-mul.ll regress because we aren't narrowing the vector width of the multiplies as we reduce. The splitting was somewhat making up for that before by causing halves to be discarded after the split. Differential Revision: https://reviews.llvm.org/D54668 llvm-svn: 347240	2018-11-19 18:32:53 +00:00
Craig Topper	8b22bcd39f	[X86] Use a pcmpgt with 0 instead of psrad 31, to fill elements with the sign bit in v4i32 MULH lowering. The shift requires a copy to avoid clobbering a register. Comparing with 0 uses an xor to produce 0 that will be overwritten with the compare results. So still requires 2 instructions, but should be one byte shorter since it doesn't need to encode an immediate. llvm-svn: 347185	2018-11-19 07:22:26 +00:00
Craig Topper	3616891046	[X86] Use compare with 0 to fill an element with sign bits when sign extending to v2i64 pre-sse4.1 Previously we used an arithmetic shift right by 31, but that requires a copy to preserve the input. So we might as well materialize a zero and compare to it since the comparison will overwrite the register that contains the zeros. This should be one byte shorter. llvm-svn: 347181	2018-11-19 04:33:20 +00:00
Craig Topper	053f1eea96	[X86] Remove most of the SEXTLOAD Custom setOperationAction calls under -x86-experimental-vector-widening-legalization. Leave just the v4i8->v4i64 and v8i8->v8i64, but only enable them on pre-sse4.1 targets when 64-bit mode is enabled. In those cases we end up creating sext loads that get scalarized to code that looks better than what we get from loading into a vector register and doing a multiple step sign extend using unpacks and shifts. llvm-svn: 347180	2018-11-19 00:33:16 +00:00
Simon Pilgrim	7f92efa5a9	[X86][SSE] Add SimplifyDemandedVectorElts support for SSE packed i2fp conversions. llvm-svn: 347177	2018-11-18 22:13:31 +00:00
Craig Topper	0468c860b7	[X86] Add custom type legalization for extending v4i8/v4i16->v4i64. Pre-SSE4.1 sext_invec for v2i64 is complicated because we don't have a v2i64 sra instruction. So instead we sign extend to i32 using unpack and sra, then copy the elements and do a v4i32 sra to fill with sign bits, then interleave the i32 sign extend and the sign bits. So really we're doing to two sign extends but only using half of the v4i32 intermediate result. When the result is more than 128 bits, default type legalization would prefer to split the destination type all the way down to v2i64 with shuffles followed by v16i8/v8i16->v2i64 sext_inreg operations. This results in more instructions than necessary because we are only utilizing the lower 2 elements of the v4i32 intermediate result. Instead we can custom split a v4i8/v4i16->v4i64 sign_extend. Then we can sign extend v4i8/v4i16->v4i32 invec producing a full v4i32 result. Create the sign bit vector as a v4i32 then split and interleave with the sign bits using an punpackldq and punpackhdq. llvm-svn: 347176	2018-11-18 21:28:50 +00:00
Simon Pilgrim	b31bdbd2e9	[X86][SSE] Add SimplifyDemandedVectorElts support for SSE splat-vector-shifts. SSE vector shifts only use the bottom 64-bits of the shift amount vector. llvm-svn: 347173	2018-11-18 20:21:52 +00:00
Craig Topper	11d50948e2	[X86] Disable combineToExtendVectorInReg under -x86-experimental-vector-widening-legalization. Add custom type legalization for extends. If we widen illegal types instead of promoting, we should be able to rely on the type legalizer to create the vector_inreg operations for us with some caveats. This patch disables combineToExtendVectorInReg when we are using widening. I've enabled custom legalization for v8i8->v8i64 extends under avx512f since the type legalizer would want to create a vector_inreg with a v64i8 input type which isn't legal without avx512bw. So we go to v16i8 with custom code using the relaxation of rules we get from D54346. I've also enable custom legalization of v8i64 and v16i32 operations with with AVX. When the input type is 128 bits, the default splitting legalization would extend first 128->256, then do the a split to two 128 pieces. Extend each half to 256 and then concat the result. The custom legalization I've added instead uses a 128->256 bit vector_inreg extend that only reads the lower 64-bits for the low half of the split. Then shuffles the high 64-bits to the low 64-bits and does another vector_inreg extend. llvm-svn: 347172	2018-11-18 18:11:25 +00:00
Craig Topper	bc8148f7b0	[X86] Lower v16i16->v8i16 truncate using an 'and' with 255, an extract_subvector, and a packuswb instruction. Summary: This is an improvement over the two pshufbs and punpcklqdq we'd get otherwise. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54671 llvm-svn: 347171	2018-11-18 17:59:28 +00:00
Simon Pilgrim	ec808cf541	Remove unused variable. NFCI. llvm-svn: 347169	2018-11-18 17:24:59 +00:00
Simon Pilgrim	50828c75d0	[X86][SSE] Split IsSplatValue into GetSplatValue and IsSplatVector Refactor towards making this recursive (necessary for PR38243 rotation splat detection). IsSplatVector returns the original vector source of the splat and the splat index. GetSplatValue returns the scalar splatted value as an extraction from IsSplatVector. llvm-svn: 347168	2018-11-18 17:15:06 +00:00
Simon Pilgrim	fec9f8657b	[X86][SSE] Relax IsSplatValue - remove the 'variable shift' limit on subtracts. Means we don't use the per-lane-shifts as much when we can cheaply use the older splat-variable-shifts. llvm-svn: 347162	2018-11-18 15:52:08 +00:00
Simon Pilgrim	cc1f5d2407	[X86][SSE] Use raw shuffle mask decode in SimplifyDemandedVectorEltsForTargetNode (PR39549) We were using the 'normalized' shuffle mask from resolveTargetShuffleInputs, which replaces zero/undef inputs with sentinel values. For SimplifyDemandedVectorElts we need the raw mask so we can correctly demand those 'zero' inputs that got normalized away, this requires an extra bit of logic to locally normalize undef inputs. llvm-svn: 347158	2018-11-18 13:34:53 +00:00
Craig Topper	cd94a7c227	[X86] Add -x86-experimental-vector-widening-legalization check to combineSelect and combineSetCC to cover vXi16/vXi8 promotion without BWI. I don't yet have any test cases for this, but its the right thing to do based on log file inspection. llvm-svn: 347151	2018-11-18 08:30:09 +00:00
Craig Topper	b03f80a21c	[X86] Rename WidenMaskArithmetic->PromoteMaskArithmetic since we usually use widen to refer to adding elements not making elements larger. NFC llvm-svn: 347150	2018-11-18 07:35:08 +00:00
Craig Topper	f56a57518d	[X86] Don't use a pmaddwd for vXi32 multiply if the inputs are zero extends from i8 or smaller without SSE4.1. Prefer to shrink the mul instead. The zero extend will require two stages of unpacks to implement. So its better to shrink the multiply using pmullw and then extend that result back to v4i32 using a single unpack. llvm-svn: 347149	2018-11-18 05:53:21 +00:00
Craig Topper	0438d791fa	[X86] Add support for matching PACKUSWB from a v64i8 shuffle. llvm-svn: 347143	2018-11-17 18:54:43 +00:00
Craig Topper	dd61f11642	[X86] Don't extend v32i8 multiplies to v32i16 with avx512bw and prefer-vector-width=256. llvm-svn: 347131	2018-11-17 02:36:07 +00:00
Craig Topper	b05ea28f1f	[X86] Use getUnpackl/getUnpackh instead of hardcoding a shuffle mask. llvm-svn: 347127	2018-11-17 02:18:12 +00:00
Fangrui Song	7570932977	Use llvm::copy. NFC llvm-svn: 347126	2018-11-17 01:44:25 +00:00
Craig Topper	ee0333b4a9	[X86] Add custom promotion of narrow fp_to_uint/fp_to_sint operations under -x86-experimental-vector-widening-legalization. This tries to force the result type to vXi32 followed by a truncate. This can help avoid scalarization that would otherwise occur. There's some annoying examples of an avx512 truncate instruction followed by a packus where we should really be able to just use one truncate. But overall this is still a net improvement. llvm-svn: 347105	2018-11-16 22:53:00 +00:00
Craig Topper	87bc07b3dd	[X86] Qualify part of the masked gather handling in ReplaceNodeResults with a getTypeAction call to know if we can use default legalization. If we managed to switch to -x86-experimental-vector-widening-legalization this block can be removed. llvm-svn: 347100	2018-11-16 22:04:29 +00:00
Craig Topper	567aaeb40d	[X86] Remove a branch on SSE4.1 from LowerLoad We should be able to use getExtendInVec with or without sse4.1 to produce a SIGN_EXTEND_VECTOR_INREG. llvm-svn: 347095	2018-11-16 21:05:00 +00:00
Craig Topper	7fff9a9aef	[X86] In LowerLoad, fix assert messages and rename a variable that use Zize instead of Size. NFC llvm-svn: 347093	2018-11-16 21:04:56 +00:00
Rong Xu	3a38175723	[X86] Disable Condbr_merge pass Disable Condbr_merge pass for now due to PR39658. Will reenable the pass once the bug is fixed. llvm-svn: 347079	2018-11-16 19:35:00 +00:00
Simon Pilgrim	66f42ea6e1	[SelectionDAG] Move (repeated) SDTIntShiftDOp double shift node def to common code. NFCI. Prep work for PR39467. llvm-svn: 347067	2018-11-16 17:50:59 +00:00
Simon Pilgrim	bcd6631a2a	[X86][SSE] Move number of input limit out of resolveTargetShuffleInputs. Only combineX86ShufflesRecursively needs this limit. llvm-svn: 347054	2018-11-16 15:01:05 +00:00
Roman Lebedev	90c5b3f78e	[X86] X86DAGToDAGISel::matchBitExtract(): extract 'lshr' from `X` Summary: As discussed in previous review, and noted in the FIXME, if `X` is actually an `lshr Y, Z` (logical!), we can fold the `Z` into 'control`, and let the `BEXTR` do this too. We could just insert those 8 bits of shift amount into control, but it is better to instead zero-extend them, and 'or' them in place. We can only do this for `lshr`, not `ashr`, because we do not know that the mask cover only the bits of `Y`, and not any of the sign-extended bits. The obvious question is, is this actually legal to do? I believe it is. Relevant quotes, from `Intel® 64 and IA-32 Architectures Software Developer’s Manual`, `BEXTR — Bit Field Extract`: * `Bit 7:0 of the second source operand specifies the starting bit position of bit extraction.` * `A START value exceeding the operand size will not extract any bits from the second source operand.` * `Only bit positions up to (OperandSize -1) of the first source operand are extracted.` * `All higher order bits in the destination operand (starting at bit position LENGTH) are zeroed.` * `The destination register is cleared if no bits are extracted.` FIXME: if we can do this, i wonder if we should prefer `BEXTR` over `BZHI` in such cases. Reviewers: RKSimon, craig.topper, spatel, andreadb Reviewed By: RKSimon, craig.topper, andreadb Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54095 llvm-svn: 347048	2018-11-16 13:04:54 +00:00
Craig Topper	079c37da58	[X86] Add custom type legalization for v2i8/v4i8/v8i8 mul under -x86-experimental-vector-widening. By early promoting the multiply to use an i16 element type we can avoid op legalization emit a second multiply for the 8 upper elements of the v16i8 type we would otherwise get. llvm-svn: 347032	2018-11-16 06:15:21 +00:00
Craig Topper	5802b82b40	[X86] Use ANY_EXTEND instead of SIGN_EXTEND in the AVX2 and later path for legalizing vXi8 multiply. We aren't going to use the upper bits of the multiply result that the extend would effect. So we don't need a specific type of extend. This makes some reduction test cases shorter because we were previously trying to sign_extend a truncate which we can't eliminate. llvm-svn: 347011	2018-11-16 01:16:59 +00:00
Craig Topper	1acafd863f	[X86] Update a couple comments to remove a mention of a sign extending that no longer happens. NFC llvm-svn: 347010	2018-11-16 01:16:51 +00:00
Craig Topper	22bfa99448	[X86] Remove ANY_EXTEND special case from canReduceVMulWidth Removing this code doesn't affect any lit tests so it doesn't appear to be tested anymore. I assume it was when it was added, but I guess something else changed? Code coverage report also says its unused. I mostly didn't like that it seemed to count the sign bits as if it was a sign_extend, but then set isPositive as if it was a zero_extend. It feels like we should have picked one interpretation? Differential Revision: https://reviews.llvm.org/D54596 llvm-svn: 346995	2018-11-15 21:19:32 +00:00
Craig Topper	b144c7a6fb	[X86] Minor cleanup to getExtendInVec. NFCI Use unsigned to calculate the subvector index to avoid a cast. Remove an unnecessary condition and replace it with a stronger assert. Use the InVT variable we updated when we extracted instead of grabbing it from the In SDValue. llvm-svn: 346983	2018-11-15 19:20:22 +00:00
Craig Topper	73bb04ab6f	[X86] Add -x86-experimental-vector-widening support to reduceVMULWidth and combineMulToPMADDWD In reduceVMULWidth, we no longer need to worry about extending the vector to 128 bits first. Regular widening of extends, muls and shuffles will take care of that for us. In combineMulToPMADDWD, we can handle v2i32 multiplies and allow the VPMADDWD to be widened to v4i32 during type legalization by adding custom widening like we do have for AVG/ADDUS/SUBUS. I had to modify that code a little to allow different and output VTs. Differential Revision: https://reviews.llvm.org/D54512 llvm-svn: 346980	2018-11-15 18:59:31 +00:00
Simon Pilgrim	0db8cb0147	[X86] Fix MCNullStreamer support for modules with a CodeView flag This fixes -filetype=null support when compiling for a Win32 target and the module has a CodeView flag. The only places changed are the uses of getTargetStreamer function - this patch guards both of them with null checks. Committed on behalf of @eush (Eugene Sharygin) Differential Revision: https://reviews.llvm.org/D54008 llvm-svn: 346962	2018-11-15 15:17:15 +00:00
Craig Topper	553ac560aa	[X86] Add some custom type legalization rules for truncate with -x86-experimental-vector-widening-legalization. This avoids some nasty shuffles when we have avx512. It will also prevent using zmm truncate instructions when a ymm instruction that zeroes part of an xmm register will do. Also avoid using avx512 truncate instructions when the input is 128 bits or less. These instructions are 2 uops on skx so we can probably find a better single uop shuffle like pshufb. llvm-svn: 346936	2018-11-15 08:23:40 +00:00
Craig Topper	ea6ced9d1a	[X86] Don't mark SEXTLOADS with narrow types as Custom with -x86-experimental-vector-widening-legalization. The narrow types end up requesting widening, but generic legalization will end up scalaring and using a build_vector to do the widening. llvm-svn: 346916	2018-11-15 00:21:41 +00:00
Benjamin Kramer	6b7d6fe079	[X86] Remove unused variable llvm-svn: 346909	2018-11-14 23:13:27 +00:00
Craig Topper	0b2089da4b	[X86] Support v2i32/v4i16/v8i8 load/store using f64 on 32-bit targets under -x86-experimental-vector-widening-legalization. On 64-bit targets the type legalizer will use i64 to legalize these. But when i64 isn't legal, the type legalizer won't try an FP type. So do it manually instead. There are a few regressions in here due to some v2i32 operations like mul and div now being reassembled into a full vector just to store instead of storing the pieces. But this was already occuring in 64-bit mode so its not a new issue. llvm-svn: 346908	2018-11-14 23:02:09 +00:00
Nirav Dave	1241dcb3cf	Bias physical register immediate assignments The machine scheduler currently biases register copies to/from physical registers to be closer to their point of use / def to minimize their live ranges. This change extends this to also physical register assignments from immediate values. This causes a reduction in reduction in overall register pressure and minor reduction in spills and indirectly fixes an out-of-registers assertion (PR39391). Most test changes are from minor instruction reorderings and register name selection changes and direct consequences of that. Reviewers: MatzeB, qcolombet, myatsina, pcc Subscribers: nemanjai, jvesely, nhaehnle, eraman, hiraditya, javed.absar, arphaman, jfb, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D54218 llvm-svn: 346894	2018-11-14 21:11:53 +00:00
Craig Topper	6c94264b1f	[X86] Allow pmulh to be formed from narrow vXi16 vectors under -x86-experimental-vector-widening-legalization Narrower vectors will be widened to 128 bits without changing the element size. And generic type legalization can already handle widening mulhu/mulhs. Differential Revision: https://reviews.llvm.org/D54513 llvm-svn: 346879	2018-11-14 18:16:21 +00:00
Simon Pilgrim	cdb170794b	[CostModel] Add generic expansion funnel shift cost support Add support for the expansion of funnelshift/rotates to getIntrinsicInstrCost. This also required us to move the X86 fshl/fshr costs to the same place as the rotates to avoid expansion and get correct scalarization vs vectorization costs. llvm-svn: 346854	2018-11-14 12:24:50 +00:00
Simon Pilgrim	7501780ec6	[X86][AVX512] Remove constant pool shuffle decoding from SelectionDAG This patch removes the last use of the constant pool shuffle decode helper and consistently uses the 'getTargetShuffleMaskIndices' versions instead. The constant pool versions are now purely used for assembly comments. The avx512vbmi intrinsic upgrades had to be altered as they were being decoded as broadcasts, similar to what I fixed in rL346032. I don't think the change is critical - although its annoying that we lose the {k}{z} instruction test coverage as they are tricky to generate.... Differential Revision: https://reviews.llvm.org/D54083 llvm-svn: 346850	2018-11-14 11:26:35 +00:00
Craig Topper	aca8390216	[SelectionDAG][X86] Relax restriction on the width of an input to _EXTEND_VECTOR_INREG. Use them and regular _EXTEND to replace the X86 specific VSEXT/VZEXT opcodes Previously, the extend_vector_inreg opcode required their input register to be the same total width as their output. But this doesn't match up with how the X86 instructions are defined. For X86 the input just needs to be a legal type with at least enough elements to cover the output. This patch weakens the check on these nodes and allows them to be used as long as they have more input elements than output elements. I haven't changed type legalization behavior so it will still create them with matching input and output sizes. X86 will custom legalize these nodes by shrinking the input to be a 128 bit vector and once we've done that we treat them as legal operations. We still have one case during type legalization where we must custom handle v64i8 on avx512f targets without avx512bw where v64i8 isn't a legal type. In this case we will custom type legalize to a *extend_vector_inreg with a v16i8 input. After that the input is a legal type so type legalization should ignore the node and doesn't need to know about the relaxed restriction. We are no longer allowed to use the default expansion for these nodes during vector op legalization since the default expansion uses a shuffle which required the widths to match. Custom legalization for all types will prevent us from reaching the default expansion code. I believe DAG combine works correctly with the released restriction because it doesn't check the number of input elements. The rest of the patch is changing X86 to use either the vector_inreg nodes or the regular zero_extend/sign_extend nodes. I had to add additional isel patterns to handle any_extend during isel since simplifydemandedbits can create them at any time so we can't legalize to zero_extend before isel. We don't yet create any_extend_vector_inreg in simplifydemandedbits. Differential Revision: https://reviews.llvm.org/D54346 llvm-svn: 346784	2018-11-13 19:45:21 +00:00
Simon Pilgrim	e827fe09b3	[CostModel][X86] Fix constant vector XOP rights shifts We'll constant fold these cases so they are as cheap as vector left shift cases. Noticed while improving funnel shift costs. llvm-svn: 346760	2018-11-13 16:40:10 +00:00
Simon Pilgrim	72a7fbc1a3	Fix comment for XOP rotates. NFCI. llvm-svn: 346753	2018-11-13 12:09:27 +00:00
Simon Pilgrim	e565e5a962	[X86][SSE] Add lowerVectorShuffleAsByteRotateAndPermute (PR39387) This patch adds the ability to use a PALIGNR to rotate a pair of inputs to select a range containing all the referenced elements, followed by a single input permute to put them in the right location. Differential Revision: https://reviews.llvm.org/D54267 llvm-svn: 346706	2018-11-12 21:12:38 +00:00
Craig Topper	c48712b341	[X86] In LowerMULH, use generic truncate and vector shuffle nodes instead of directly emitting PACKUS. Truncate and shuffle lowering are already capable of matching to PACKUS using known bits analysis. This features one test change where we now prefer to extend v16i16->v16i32 then trunc v16i32->v16i8 over extract_subvector+packus when avx512f is available, but avx512bw is not. llvm-svn: 346697	2018-11-12 19:37:29 +00:00
Simon Pilgrim	93c64e5c76	[CostModel][X86] Add funnel shift rotation special case costs When we repeat the 2 shifting operands then this is a bit rotation - annoyingly this has to be done in the other getIntrinsicInstrCost than most intrinsics as we need to check the operands are the same. llvm-svn: 346688	2018-11-12 18:27:54 +00:00
Simon Pilgrim	49e93d2f0e	[CostModel][X86] Add SHLD/SHRD scalar funnel shift costs The costs match the typical reg-reg cases - the RMW case can be a lot slower but we don't model that at this level llvm-svn: 346683	2018-11-12 17:56:59 +00:00
Simon Pilgrim	f4cd292ba2	[CostModel][X86] SK_ExtractSubvector is cheap if the (legal) subvector is aligned within the source vector llvm-svn: 346664	2018-11-12 15:48:06 +00:00
Craig Topper	2eab39f77b	[X86] Use DAG.getConstant instead of getZeroVector. llvm-svn: 346605	2018-11-11 07:24:36 +00:00
Craig Topper	ef33a190bc	[X86] Replace calls to getOnesVector/getZeroVector with getConstant. getConstant will create a BUILD_VECTOR for us and use a legal type if necessary. So just create the simple node and let BUILD_VECTOR legalization do the canonicalization. llvm-svn: 346603	2018-11-11 01:40:04 +00:00
Benjamin Kramer	37c691e867	[X86] Remove unused variable llvm-svn: 346592	2018-11-10 18:11:11 +00:00
Craig Topper	7956a256e9	[X86] Remove apparently unneeded code from combineVSZext. No lit tests fail with this code removed. This is a pre-commit for D54346. llvm-svn: 346590	2018-11-10 17:44:28 +00:00
Simon Pilgrim	d3ca710ec9	[CostModel][X86] SK_ExtractSubvector costs must only be tested for vector types (PR39615) llvm-svn: 346589	2018-11-10 17:37:52 +00:00
Roman Lebedev	b428b8b214	[X86][BdVer2] Fix loads/stores throughput for Piledriver (PR39465) There are two AGU units, and per 1cy, there can be either two loads, or a load and a store; but not two stores, or two loads and a store. Additionally, loads shouldn't affect the store scheduler and vice versa. (but should affect the PdEX scheduler.) Required rL346545. Fixes https://bugs.llvm.org/show_bug.cgi?id=39465 llvm-svn: 346587	2018-11-10 14:31:43 +00:00
Craig Topper	a1b6667c6a	[X86] Use a MOVSX instruction instead of a MOVZX instruction in isel for an any_extend of the remainder from an 8-bit sdivrem. The sdivrem will emit its own MOVSX to move %ah to the low byte of a register. By using a MOVSX for an any_extend this allows a post-isel peephole to merge them. llvm-svn: 346581	2018-11-10 06:04:33 +00:00
Craig Topper	0364085281	[X86] In LowerHorizontalByteSum, emit vector_shuffle nodes instead of directly using X86ISD::UNPCKL/X86ISD::UNPCKH. This gives shuffle lowering the freedom to use zero_extend_vector_inreg for the unpckl shuffle. Shuffle combining usually makes this swap later, but not when AVX512 is enabled it seems. While there also use DAG.getConstant to create a 0 vector instead of using the helper the forces a specific BUILD_VECTOR. I don't think that helper is usually needed. We're basically free to create a constant build_vector anytime and it will be legalized on its own. llvm-svn: 346574	2018-11-10 00:26:42 +00:00
Craig Topper	17d64c71c5	[X86] Move the promotion of v16i16->v16i8 for avx512f but not avx512bw from lowering to isel. Change to use vpmovzx instead of vpmovsx. With avx512f but not avx512bw we need to extend to v16i32 then truncate that to to v16i8. Previously we emitted both nodes during lowering, but I'm trying to switch to using target independent nodes and with that switched the extend+truncate wou This patch changes the implementation to what will be necessary with that patch which helps minimize test diffs. llvm-svn: 346552	2018-11-09 20:09:53 +00:00
Craig Topper	731ea7dbc1	[X86] Turn X86ISD::VSEXT into X86ISD::VZEXT if the upper bits aren't demanded. This makes X86ISD::VSEXT more similar to ISD::SIGN_EXTEND and ISD::ZERO_EXTEND. I'm hoping to replace X86ISD::VSEXT/VZEXT with target independent nodes. Making the target specific nodes similar to the target independent nodes helps minimize test diffs in that patch. llvm-svn: 346539	2018-11-09 19:05:51 +00:00
Simon Pilgrim	fc8f1d7da7	[CostModel][X86] SK_ExtractSubvector is free if the subvector is at the start of the source vector llvm-svn: 346538	2018-11-09 19:04:27 +00:00
Sanjay Patel	fa1c0fe478	[x86] try to form broadcast before widening shuffle elements I noticed that we weren't generating broadcasts as much I thought we would with D54271, and this is part of the problem. Widening the shuffle elements means adding bitcasts and hiding the relationship between a splatted scalar and the vector. If we can form a broadcast, do that before going through the rest of the shuffle lowering because broadcasts should be cheap and can often be load-folded. Differential Revision: https://reviews.llvm.org/D54280 llvm-svn: 346498	2018-11-09 14:54:58 +00:00
Simon Pilgrim	ea51f98b9b	[X86] Add Subtarget to more lowerVectorShuffle functions. NFCI. This will be necessary for an update to D54267 llvm-svn: 346490	2018-11-09 13:19:03 +00:00
Clement Courbet	eee2e06e2a	[llvm-exegesis][NFC] Add a way to declare the default counter binding for unbound CPUs for a target. Summary: This simplifies the code and moves everything to tablegen for consistency. This also prepares the ground for adding issue counters. Reviewers: gchatelet, john.brawn, jsji Subscribers: nemanjai, mgorny, javed.absar, kbarton, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54297 llvm-svn: 346489	2018-11-09 13:15:32 +00:00
Clement Courbet	e6b727e552	[X86] Fix VZEROUPPER scheduling info on SNB,HSW,BDW,SXL,SKX. Summary: Starting from SNB, VZEROUPPER is handled by the renamer and uses no proc resources. After HSW, it also has zero latency. This fixes PR35606. To reproduce: Uops: llvm-exegesis -mode=uops -opcode-name=VZEROUPPER Latency: echo -e '#LLVM-EXEGESIS-DEFREG XMM0 1\n#LLVM-EXEGESIS-DEFREG XMM1 1\nvzeroupper' \| /tmp/llvm-exegesis -mode=latency -snippets-file=- echo -e '#LLVM-EXEGESIS-DEFREG XMM0 1\n#LLVM-EXEGESIS-DEFREG XMM1 1\nvzeroupper\naddps %xmm0, %xmm1' \| /tmp/llvm-exegesis -mode=latency -snippets-file=- Reviewers: RKSimon, craig.topper, andreadb Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D54107 llvm-svn: 346482	2018-11-09 09:49:06 +00:00
Sanjay Patel	b5535dc7b3	[x86] use shuffles for scalar insertion into high elements of a constant vector As discussed in D54073, we have a potential regression from more aggressive vector narrowing here, so let's try to avoid that by changing build-vector lowering slightly. Insert-vector-element lowering always does this since there's no "pinsr" for ymm/zmm: // If the vector is wider than 128 bits, extract the 128-bit subvector, insert // into that, and then insert the subvector back into the result. ...but we can sometimes do better for insert-into-constant-vector by using shuffle lowering. Differential Revision: https://reviews.llvm.org/D54271 llvm-svn: 346433	2018-11-08 19:16:27 +00:00
Than McIntosh	5bcdea5118	[X86] improve split-stack machine BB placement Summary: The conditional branch created to support -fsplit-stack for X86 is left unbiased/unhinted, resulting in less than ideal block placement: the __morestack call block is kept on the main hot path. Bias the branch to insure that the stack allocation block is treated as a "cold" block during machine basic block placement. Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54123 llvm-svn: 346336	2018-11-07 17:41:57 +00:00
Sanjay Patel	de58e93666	fix typos aggressively; NFC llvm-svn: 346316	2018-11-07 14:35:36 +00:00
Andrea Di Biagio	4ae974e745	[X86][FixupLEA] Avoid checking target features for every single processed instruction. NFCI llvm-svn: 346309	2018-11-07 12:26:00 +00:00
Craig Topper	6428a2cd9a	[X86] Add custom promotion of v2i8/v2i16 fp_to_sint to avoid over promotion to v2i64 which would force scalarization. llvm-svn: 346259	2018-11-06 19:24:21 +00:00
Matthias Braun	c6613879ce	LivePhysRegs/IfConversion: Change some types from unsigned to MCPhysReg; NFC Change the type in a couple of lists and sets that only store physical registers from unsigned to MCPhysRegs. The later is only 16bits and saves us a bit of memory. llvm-svn: 346254	2018-11-06 19:00:11 +00:00
Clement Courbet	54a1184fff	[X86][NFC] Fix comment. llvm-svn: 346226	2018-11-06 13:48:56 +00:00
Craig Topper	0b5f8169b0	[TargetLowering] Change TargetLoweringBase::getPreferredVectorAction to take an MVT instead of an EVT. NFC The main caller of this already has an MVT and several targets called getSimpleVT inside without checking isSimple. This makes the simpleness explicit. llvm-svn: 346180	2018-11-05 23:26:13 +00:00
Craig Topper	def82a81af	[X86] Don't turn any_extend from a mask register into a sign_extend during lowering. Add patterns to match any_extend during isel instead. SimplifyDemandedBits can turn a sign_extend back into an any_extend and trigger an infinite loop. So instead legalize it the same way as a sign_extend, but preserve the opcode. Then just pattern match it the same as sign_extend during isel. I don't have a reduced test case for such an infinite loop yet. llvm-svn: 346170	2018-11-05 22:08:17 +00:00
Craig Topper	30b627e5c9	[X86] Custom type legalize v2i8/v2i16/v2i32 mul to use to pmuludq. v2i8/v2i16/v2i32 are promoted to v2i64. pmuludq takes a v2i64 input and produces a v2i64 output. Since we don't about the upper bits of the type legalized multiply we can use the pmuludq to produce the multiply result for the bits we do care about. llvm-svn: 346115	2018-11-05 05:02:12 +00:00
Craig Topper	ed6a0a817f	[X86] Add vector shift by immediate to SimplifyDemandedBitsForTargetNode. Summary: This also enables some constant folding from KnownBits propagation. This helps on some cases vXi64 case in 32-bit mode where constant vectors appear as vXi32 and a bitcast. This can prevent getNode from constant folding sra/shl/srl. Reviewers: RKSimon, spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54069 llvm-svn: 346102	2018-11-04 17:31:27 +00:00
Craig Topper	1ba86188cf	[SelectionDAG] Remove special methods for creating *_EXTEND_VECTOR_INREG nodes. Move asserts into getNode. These methods were just wrappers around getNode with additional asserts (identical and repeated 3 times). But getNode already has a switch that can be used to hold these asserts that allows them to be shared for all 3 opcodes. This also enables checking on the places that create these nodes without using the wrappers. The rest of the patch is just changing all callers to use getNode directly. llvm-svn: 346087	2018-11-04 02:10:18 +00:00
Craig Topper	7aed9e600b	[X86] Update comment I forgot to change in r346043. NFC llvm-svn: 346073	2018-11-03 19:49:13 +00:00
Reid Kleckner	2bcb288ade	[codeview] Let the X86 backend tell us the VFRAME offset adjustment Use MachineFrameInfo's OffsetAdjustment field to pass this information from the target to CodeViewDebug.cpp. The X86 backend doesn't use it for any other purpose. This fixes PR38857 in the case where there is a non-aligned quantity of CSRs and a non-aligned quantity of locals. llvm-svn: 346062	2018-11-03 00:41:52 +00:00
Craig Topper	f7108aef14	[X86] In LowerEXTEND_VECTOR_INREG, emit a vector shuffle instead of directly using X86ISD::UNPCKL The majority of the changes are because the rest of shuffle lowering/combining prefers to replace the undef input with the other operand. Using UNPCKL directly seemed to avoid this and just grabbed a randomish register for the undef which can create false dependencies. llvm-svn: 346050	2018-11-02 22:48:02 +00:00
Craig Topper	60c202a494	[X86] Don't emit *_extend_vector_inreg nodes when both the input and output types are legal with AVX1 We already have custom lowering for the AVX case in LegalizeVectorOps. So its better to keep the regular extend op around as long as possible. I had to qualify one place in DAG combine that created illegal vector extending load operations. This change by itself had no effect on any tests which is why its included here. I've made a few cleanups to the custom lowering. The sign extend code no longer creates an identity shuffle with undef elements. The zero extend code now emits a zero_extend_vector_inreg instead of an unpckl with a zero vector. For the high half of the custom lowering of zero_extend/any_extend, we're now using an unpckh with a zero vector or undef. Previously we used used a pshufd to move the upper 64-bits to the lower 64-bits and then used a zero_extend_vector_inreg. I think the zero vector should require less execution resources and be smaller code size. Differential Revision: https://reviews.llvm.org/D54024 llvm-svn: 346043	2018-11-02 21:09:49 +00:00
Reid Kleckner	4dc0b1ac60	Fix clang -Wimplicit-fallthrough warnings across llvm, NFC This patch should not introduce any behavior changes. It consists of mostly one of two changes: 1. Replacing fall through comments with the LLVM_FALLTHROUGH macro 2. Inserting 'break' before falling through into a case block consisting of only 'break'. We were already using this warning with GCC, but its warning behaves slightly differently. In this patch, the following differences are relevant: 1. GCC recognizes comments that say "fall through" as annotations, clang doesn't 2. GCC doesn't warn on "case N: foo(); default: break;", clang does 3. GCC doesn't warn when the case contains a switch, but falls through the outer case. I will enable the warning separately in a follow-up patch so that it can be cleanly reverted if necessary. Reviewers: alexfh, rsmith, lattner, rtrieu, EricWF, bollu Differential Revision: https://reviews.llvm.org/D53950 llvm-svn: 345882	2018-11-01 19:54:45 +00:00
Simon Pilgrim	b34a052852	[LegalizeDAG] Add generic vector CTPOP expansion (PR32655) This patch adds support for expanding vector CTPOP instructions and removes the x86 'bitmath' lowering which replicates the same expansion. Differential Revision: https://reviews.llvm.org/D53258 llvm-svn: 345869	2018-11-01 18:22:11 +00:00
Simon Pilgrim	d5d7224355	[X86][X86FixupLEA] Rename processInstructionForSLM to processInstructionForSlowLEA (NFCI) The function isn't SLM specific (its driven by the FeatureSlowLEA flag). Minor tidyup prior to PR38225. llvm-svn: 345836	2018-11-01 14:57:07 +00:00
Simon Pilgrim	1f0a8421ad	[X86][SSE] Move 2-input limit up from getFauxShuffleMask to resolveTargetShuffleInputs (reapplied) Reapplying an updated version of rL345395 (reverted in rL345451), now the issues noticed in PR39483 have been fixed. This patch allows resolveTargetShuffleInputs to remove UNDEF inputs from cases where we have more than 2 inputs. llvm-svn: 345824	2018-11-01 11:52:09 +00:00
Matthias Braun	a9f900561e	X86: Consistently declare pass initializers in X86.h; NFC This avoids declaring them twice: in X86TargetMachine.cpp and the file implementing the pass. llvm-svn: 345801	2018-11-01 00:38:01 +00:00
Craig Topper	6c3f1692c8	Revert r345165 "[X86] Bring back the MOV64r0 pseudo instruction" Google is reporting regressions on some benchmarks. llvm-svn: 345785	2018-10-31 21:53:24 +00:00
Andrea Di Biagio	3d2b7176fc	[tblgen][PredicateExpander] Add the ability to describe more complex constraints on instruction operands. Before this patch, class PredicateExpander only knew how to expand simple predicates that performed checks on instruction operands. In particular, the new scheduling predicate syntax was not rich enough to express checks like this one: Foo(MI->getOperand(0).getImm()) == ExpectedVal; Here, the immediate operand value at index zero is passed in input to function Foo, and ExpectedVal is compared against the value returned by function Foo. While this predicate pattern doesn't show up in any X86 model, it shows up in other upstream targets. So, being able to support those predicates is fundamental if we want to be able to modernize all the scheduling models upstream. With this patch, we allow users to specify if a register/immediate operand value needs to be passed in input to a function as part of the predicate check. Now, register/immediate operand checks all derive from base class CheckOperandBase. This patch also changes where TIIPredicate definitions are expanded by the instructon info emitter. Before, definitions were expanded in class XXXGenInstrInfo (where XXX is a target name). With the introduction of this new syntax, we may want to have TIIPredicates expanded directly in XXXInstrInfo. That is because functions used by the new operand predicates may only exist in the derived class (i.e. XXXInstrInfo). This patch is a non functional change for the existing scheduling models. In future, we will be able to use this richer syntax to better describe complex scheduling predicates, and expose them to llvm-mca. Differential Revision: https://reviews.llvm.org/D53880 llvm-svn: 345714	2018-10-31 12:28:05 +00:00
Dorit Nuzman	34da6dd696	[LV] Support vectorization of interleave-groups that require an epilog under optsize using masked wide loads Under Opt for Size, the vectorizer does not vectorize interleave-groups that have gaps at the end of the group (such as a loop that reads only the even elements: a[2*i]) because that implies that we'll require a scalar epilogue (which is not allowed under Opt for Size). This patch extends the support for masked-interleave-groups (introduced by D53011 for conditional accesses) to also cover the case of gaps in a group of loads; Targets that enable the masked-interleave-group feature don't have to invalidate interleave-groups of loads with gaps; they could now use masked wide-loads and shuffles (if that's what the cost model selects). Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53668 llvm-svn: 345705	2018-10-31 09:57:56 +00:00
Craig Topper	6958b5ffa9	[X86] In lowerVectorShuffleAsBroadcast, make peeking through CONCAT_VECTORS work correctly if we already walked through a bitcast that changed the element size. The CONCAT_VECTORS case was using the original mask element count to determine how to adjust the broadcast index. But if we looked through a bitcast the original mask size doesn't tell us anything about the concat_vectors. This patch switchs to using the concat_vectors input element count directly instead. Differential Revision: https://reviews.llvm.org/D53823 llvm-svn: 345626	2018-10-30 18:48:42 +00:00
Francis Visoiu Mistrih	0e237d357e	[X86] Re-enable the machine verifier after fixing more tests Was disabled again in r345528. Hopefully this the bots. llvm-svn: 345593	2018-10-30 12:20:17 +00:00
Roman Lebedev	b3a14208ac	[X86][BMI1] X86DAGToDAGISel: select BEXTR from x & (-1 >> (32 - y)) pattern Summary: The final pattern. There is no test changes: * We are looking for the pattern with one-use of it's mask, * If the mask is one-use, D48768 will unfold it into pattern d. * Thus, the tests have extra-use on the mask. * Thus, only the BMI2 BZHI can be tested, and it already worked. * So there is no BMI1 test coverage, we just assume it works since it uses the same codepath. Reviewers: craig.topper, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53575 llvm-svn: 345584	2018-10-30 11:12:34 +00:00
Craig Topper	b293322cee	[LegalizeTypes] Teach PromoteIntRes_BITCAST to better handle a bitcast with vector output type and a vector input type that needs to be widened Summary: Previously if we had a bitcast vector output type that needs promotion and a vector input type that needs widening we would just do a stack store and load to handle the conversion. We can do a little better if we can widen the bitcast to a legal vector type the same size as the widened input type. Then we can do the bitcast between this widened type and the widened input type. Afterwards we can extract_subvector back to the original output and any_extend that. Type legalization will then circle back and handle promotion of the extract_subvector and the any_extend will just be removed. This will avoid going through the stack and allows us to remove a custom version of this legalization from X86. Reviewers: efriedma, RKSimon Reviewed By: efriedma Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D53229 llvm-svn: 345567	2018-10-30 03:27:15 +00:00
Craig Topper	67c2878501	[X86] Cleanup the code in LowerFABSorFNEG and LowerFCOPYSIGN a little. NFC Use SelectionDAG::EVTToAPFloatSemantics. Make the LogicVT calculation in LowerFABSorFNEG similar to LowerFCOPYSIGN. Use APInt::getSignedMaxValue instead of ~APInt::getSignMask. llvm-svn: 345565	2018-10-30 03:27:12 +00:00
Craig Topper	676d7a7a43	[X86] Stop changing f128 fand/for/fxor to v2i64. The additional patterns don't cost us much and it seems better than changing element widths. llvm-svn: 345564	2018-10-30 03:27:11 +00:00
Simon Pilgrim	090a444cb7	[X86] Set isMachineVerifierClean() back to false (PR27481) Put back the isMachineVerifierClean() override removed at rL345513 to fix Windows ThinLTO tests llvm-svn: 345528	2018-10-29 19:51:52 +00:00
Simon Pilgrim	3a2f3c2c0a	[X86][SSE] getFauxShuffleMask - Fix shuffle mask adjustment for multiple inserted subvectors Part of the issue discovered in PR39483, although its not fully exposed until I reapply rL345395 (by reverting rL345451) llvm-svn: 345520	2018-10-29 18:25:48 +00:00
Craig Topper	220fd33522	[X86] Add AES to KNL CPUs to match clang. I believe this was lost from KNL when AES was pushed from Westmere to Skylake recently. KNL used to inherit from IVB. llvm-svn: 345519	2018-10-29 18:17:01 +00:00
Francis Visoiu Mistrih	61c9de7565	[X86] Enable the MachineVerifier by default The machine verifier was disabled for x86 by default. There are now only 9 tests failing, compared to what previously was between 20 and 30. This is a good opportunity to file bugs for all the remaining issues, then explicitly disable the failing tests and enabling the machine verifier by default. This allows us to avoid adding new tests that break the verifier. PR27481 llvm-svn: 345513	2018-10-29 16:57:43 +00:00
Craig Topper	aa5eb2fbaa	[X86] Force floating point values in constant pool decoding to print in scientific notation so they can't be confused with integers. When the floating point constants are whole numbers they have no decimal point so look like integers, but mean something very different in something like an 'and' instruction. Ideally we would just print a decimal point and a 0, but I couldn't see how to make APFloat::toString do that. llvm-svn: 345488	2018-10-29 04:52:04 +00:00
Craig Topper	42aa87143d	[X86] Recognize constant splats in LowerFCOPYSIGN. llvm-svn: 345484	2018-10-28 23:51:35 +00:00
Simon Pilgrim	9b77f0c291	[VectorLegalizer] Enable TargetLowering::expandFP_TO_UINT support. Add vector support to TargetLowering::expandFP_TO_UINT. This exposes an issue in X86TargetLowering::LowerVSELECT which was assuming that the select mask was the same width as the LHS/RHS ops - as long as the result is a sign splat we can easily sext/trunk this. llvm-svn: 345473	2018-10-28 13:07:25 +00:00
Roman Lebedev	a5baf86744	AMD BdVer2 (Piledriver) Initial Scheduler model Summary: # Overview This is somewhat partial. * Latencies are good {F7371125} * All of these remaining inconsistencies //appear// to be noise/noisy/flaky. * NumMicroOps are somewhat good {F7371158} * Most of the remaining inconsistencies are from `Ld` / `Ld_ReadAfterLd` classes * Actual unit occupation (pipes, `ResourceCycles`) are undiscovered lands, i did not really look there. They are basically verbatum copy from `btver2` * Many `InstRW`. And there are still inconsistencies left... To be noted: I think this is the first new schedule profile produced with the new next-gen tools like llvm-exegesis! # Benchmark I realize that isn't what was suggested, but i'll start with some "internal" public real-world benchmark i understand - [[ https://github.com/darktable-org/rawspeed \| RawSpeed raw image decoding library ]]. Diff (the exact clang from trunk without/with this patch): ``` Comparing /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench to /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench Benchmark Time CPU Time Old Time New CPU Old CPU New ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Canon/EOS 5D Mark II/09.canon.sraw1.cr2/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Canon/EOS 5D Mark II/09.canon.sraw1.cr2/threads:8/real_time_mean -0.0607 -0.0604 234 219 233 219 Canon/EOS 5D Mark II/09.canon.sraw1.cr2/threads:8/real_time_median -0.0630 -0.0626 233 219 233 219 Canon/EOS 5D Mark II/09.canon.sraw1.cr2/threads:8/real_time_stddev +0.2581 +0.2587 1 2 1 2 Canon/EOS 5D Mark II/10.canon.sraw2.cr2/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Canon/EOS 5D Mark II/10.canon.sraw2.cr2/threads:8/real_time_mean -0.0770 -0.0767 144 133 144 133 Canon/EOS 5D Mark II/10.canon.sraw2.cr2/threads:8/real_time_median -0.0767 -0.0763 144 133 144 133 Canon/EOS 5D Mark II/10.canon.sraw2.cr2/threads:8/real_time_stddev -0.4170 -0.4156 1 0 1 0 Canon/EOS 5DS/2K4A9927.CR2/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Canon/EOS 5DS/2K4A9927.CR2/threads:8/real_time_mean -0.0271 -0.0270 463 450 463 450 Canon/EOS 5DS/2K4A9927.CR2/threads:8/real_time_median -0.0093 -0.0093 453 449 453 449 Canon/EOS 5DS/2K4A9927.CR2/threads:8/real_time_stddev -0.7280 -0.7280 13 4 13 4 Canon/EOS 5DS/2K4A9928.CR2/threads:8/real_time_pvalue 0.0004 0.0004 U Test, Repetitions: 25 vs 25 Canon/EOS 5DS/2K4A9928.CR2/threads:8/real_time_mean -0.0065 -0.0065 569 565 569 565 Canon/EOS 5DS/2K4A9928.CR2/threads:8/real_time_median -0.0077 -0.0077 569 564 569 564 Canon/EOS 5DS/2K4A9928.CR2/threads:8/real_time_stddev +1.0077 +1.0068 2 5 2 5 Canon/EOS 5DS/2K4A9929.CR2/threads:8/real_time_pvalue 0.0220 0.0199 U Test, Repetitions: 25 vs 25 Canon/EOS 5DS/2K4A9929.CR2/threads:8/real_time_mean +0.0006 +0.0007 312 312 312 312 Canon/EOS 5DS/2K4A9929.CR2/threads:8/real_time_median +0.0031 +0.0032 311 312 311 312 Canon/EOS 5DS/2K4A9929.CR2/threads:8/real_time_stddev -0.7069 -0.7072 4 1 4 1 Canon/EOS 10D/CRW_7673.CRW/threads:8/real_time_pvalue 0.0004 0.0004 U Test, Repetitions: 25 vs 25 Canon/EOS 10D/CRW_7673.CRW/threads:8/real_time_mean -0.0015 -0.0015 141 141 141 141 Canon/EOS 10D/CRW_7673.CRW/threads:8/real_time_median -0.0010 -0.0011 141 141 141 141 Canon/EOS 10D/CRW_7673.CRW/threads:8/real_time_stddev -0.1486 -0.1456 0 0 0 0 Canon/EOS 40D/_MG_0154.CR2/threads:8/real_time_pvalue 0.6139 0.8766 U Test, Repetitions: 25 vs 25 Canon/EOS 40D/_MG_0154.CR2/threads:8/real_time_mean -0.0008 -0.0005 60 60 60 60 Canon/EOS 40D/_MG_0154.CR2/threads:8/real_time_median -0.0006 -0.0002 60 60 60 60 Canon/EOS 40D/_MG_0154.CR2/threads:8/real_time_stddev -0.1467 -0.1390 0 0 0 0 Canon/EOS 77D/IMG_4049.CR2/threads:8/real_time_pvalue 0.0137 0.0137 U Test, Repetitions: 25 vs 25 Canon/EOS 77D/IMG_4049.CR2/threads:8/real_time_mean +0.0002 +0.0002 275 275 275 275 Canon/EOS 77D/IMG_4049.CR2/threads:8/real_time_median -0.0015 -0.0014 275 275 275 275 Canon/EOS 77D/IMG_4049.CR2/threads:8/real_time_stddev +3.3687 +3.3587 0 2 0 2 Canon/PowerShot G1/crw_1693.crw/threads:8/real_time_pvalue 0.4041 0.3933 U Test, Repetitions: 25 vs 25 Canon/PowerShot G1/crw_1693.crw/threads:8/real_time_mean +0.0004 +0.0004 67 67 67 67 Canon/PowerShot G1/crw_1693.crw/threads:8/real_time_median -0.0000 -0.0000 67 67 67 67 Canon/PowerShot G1/crw_1693.crw/threads:8/real_time_stddev +0.1947 +0.1995 0 0 0 0 Fujifilm/GFX 50S/20170525_0037TEST.RAF/threads:8/real_time_pvalue 0.0074 0.0001 U Test, Repetitions: 25 vs 25 Fujifilm/GFX 50S/20170525_0037TEST.RAF/threads:8/real_time_mean -0.0092 +0.0074 547 542 25 25 Fujifilm/GFX 50S/20170525_0037TEST.RAF/threads:8/real_time_median -0.0054 +0.0115 544 541 25 25 Fujifilm/GFX 50S/20170525_0037TEST.RAF/threads:8/real_time_stddev -0.4086 -0.3486 8 5 0 0 Fujifilm/X-Pro2/_DSF3051.RAF/threads:8/real_time_pvalue 0.3320 0.0000 U Test, Repetitions: 25 vs 25 Fujifilm/X-Pro2/_DSF3051.RAF/threads:8/real_time_mean +0.0015 +0.0204 218 218 12 12 Fujifilm/X-Pro2/_DSF3051.RAF/threads:8/real_time_median +0.0001 +0.0203 218 218 12 12 Fujifilm/X-Pro2/_DSF3051.RAF/threads:8/real_time_stddev +0.2259 +0.2023 1 1 0 0 GoPro/HERO6 Black/GOPR9172.GPR/threads:8/real_time_pvalue 0.0000 0.0001 U Test, Repetitions: 25 vs 25 GoPro/HERO6 Black/GOPR9172.GPR/threads:8/real_time_mean -0.0209 -0.0179 96 94 90 88 GoPro/HERO6 Black/GOPR9172.GPR/threads:8/real_time_median -0.0182 -0.0155 95 93 90 88 GoPro/HERO6 Black/GOPR9172.GPR/threads:8/real_time_stddev -0.6164 -0.2703 2 1 2 1 Kodak/DCS Pro 14nx/D7465857.DCR/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Kodak/DCS Pro 14nx/D7465857.DCR/threads:8/real_time_mean -0.0098 -0.0098 176 175 176 175 Kodak/DCS Pro 14nx/D7465857.DCR/threads:8/real_time_median -0.0126 -0.0126 176 174 176 174 Kodak/DCS Pro 14nx/D7465857.DCR/threads:8/real_time_stddev +6.9789 +6.9157 0 2 0 2 Nikon/D850/Nikon-D850-14bit-lossless-compressed.NEF/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Nikon/D850/Nikon-D850-14bit-lossless-compressed.NEF/threads:8/real_time_mean -0.0237 -0.0238 474 463 474 463 Nikon/D850/Nikon-D850-14bit-lossless-compressed.NEF/threads:8/real_time_median -0.0267 -0.0267 473 461 473 461 Nikon/D850/Nikon-D850-14bit-lossless-compressed.NEF/threads:8/real_time_stddev +0.7179 +0.7178 3 5 3 5 Olympus/E-M1MarkII/Olympus_EM1mk2__HIRES_50MP.ORF/threads:8/real_time_pvalue 0.6837 0.6554 U Test, Repetitions: 25 vs 25 Olympus/E-M1MarkII/Olympus_EM1mk2__HIRES_50MP.ORF/threads:8/real_time_mean -0.0014 -0.0013 1375 1373 1375 1373 Olympus/E-M1MarkII/Olympus_EM1mk2__HIRES_50MP.ORF/threads:8/real_time_median +0.0018 +0.0019 1371 1374 1371 1374 Olympus/E-M1MarkII/Olympus_EM1mk2__HIRES_50MP.ORF/threads:8/real_time_stddev -0.7457 -0.7382 11 3 10 3 Panasonic/DC-G9/P1000476.RW2/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Panasonic/DC-G9/P1000476.RW2/threads:8/real_time_mean -0.0080 -0.0289 22 22 10 10 Panasonic/DC-G9/P1000476.RW2/threads:8/real_time_median -0.0070 -0.0287 22 22 10 10 Panasonic/DC-G9/P1000476.RW2/threads:8/real_time_stddev +1.0977 +0.6614 0 0 0 0 Panasonic/DC-GH5/_T012014.RW2/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Panasonic/DC-GH5/_T012014.RW2/threads:8/real_time_mean +0.0132 +0.0967 35 36 10 11 Panasonic/DC-GH5/_T012014.RW2/threads:8/real_time_median +0.0132 +0.0956 35 36 10 11 Panasonic/DC-GH5/_T012014.RW2/threads:8/real_time_stddev -0.0407 -0.1695 0 0 0 0 Panasonic/DC-GH5S/P1022085.RW2/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Panasonic/DC-GH5S/P1022085.RW2/threads:8/real_time_mean +0.0331 +0.1307 13 13 6 6 Panasonic/DC-GH5S/P1022085.RW2/threads:8/real_time_median +0.0430 +0.1373 12 13 6 6 Panasonic/DC-GH5S/P1022085.RW2/threads:8/real_time_stddev -0.9006 -0.8847 1 0 0 0 Pentax/645Z/IMGP2837.PEF/threads:8/real_time_pvalue 0.0016 0.0010 U Test, Repetitions: 25 vs 25 Pentax/645Z/IMGP2837.PEF/threads:8/real_time_mean -0.0023 -0.0024 395 394 395 394 Pentax/645Z/IMGP2837.PEF/threads:8/real_time_median -0.0029 -0.0030 395 394 395 393 Pentax/645Z/IMGP2837.PEF/threads:8/real_time_stddev -0.0275 -0.0375 1 1 1 1 Phase One/P65/CF027310.IIQ/threads:8/real_time_pvalue 0.0232 0.0000 U Test, Repetitions: 25 vs 25 Phase One/P65/CF027310.IIQ/threads:8/real_time_mean -0.0047 +0.0039 114 113 28 28 Phase One/P65/CF027310.IIQ/threads:8/real_time_median -0.0050 +0.0037 114 113 28 28 Phase One/P65/CF027310.IIQ/threads:8/real_time_stddev -0.0599 -0.2683 1 1 0 0 Samsung/NX1/2016-07-23-142101_sam_9364.srw/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Samsung/NX1/2016-07-23-142101_sam_9364.srw/threads:8/real_time_mean +0.0206 +0.0207 405 414 405 414 Samsung/NX1/2016-07-23-142101_sam_9364.srw/threads:8/real_time_median +0.0204 +0.0205 405 414 405 414 Samsung/NX1/2016-07-23-142101_sam_9364.srw/threads:8/real_time_stddev +0.2155 +0.2212 1 1 1 1 Samsung/NX30/2015-03-07-163604_sam_7204.srw/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Samsung/NX30/2015-03-07-163604_sam_7204.srw/threads:8/real_time_mean -0.0109 -0.0108 147 145 147 145 Samsung/NX30/2015-03-07-163604_sam_7204.srw/threads:8/real_time_median -0.0104 -0.0103 147 145 147 145 Samsung/NX30/2015-03-07-163604_sam_7204.srw/threads:8/real_time_stddev -0.4919 -0.4800 0 0 0 0 Samsung/NX3000/_3184416.SRW/threads:8/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 25 vs 25 Samsung/NX3000/_3184416.SRW/threads:8/real_time_mean -0.0149 -0.0147 220 217 220 217 Samsung/NX3000/_3184416.SRW/threads:8/real_time_median -0.0173 -0.0169 221 217 220 217 Samsung/NX3000/_3184416.SRW/threads:8/real_time_stddev +1.0337 +1.0341 1 3 1 3 Sony/DSLR-A350/DSC05472.ARW/threads:8/real_time_pvalue 0.0001 0.0001 U Test, Repetitions: 25 vs 25 Sony/DSLR-A350/DSC05472.ARW/threads:8/real_time_mean -0.0019 -0.0019 194 193 194 193 Sony/DSLR-A350/DSC05472.ARW/threads:8/real_time_median -0.0021 -0.0021 194 193 194 193 Sony/DSLR-A350/DSC05472.ARW/threads:8/real_time_stddev -0.4441 -0.4282 0 0 0 0 Sony/ILCE-7RM2/14-bit-compressed.ARW/threads:8/real_time_pvalue 0.0000 0.4263 U Test, Repetitions: 25 vs 25 Sony/ILCE-7RM2/14-bit-compressed.ARW/threads:8/real_time_mean +0.0258 -0.0006 81 83 19 19 Sony/ILCE-7RM2/14-bit-compressed.ARW/threads:8/real_time_median +0.0235 -0.0011 81 82 19 19 Sony/ILCE-7RM2/14-bit-compressed.ARW/threads:8/real_time_stddev +0.1634 +0.1070 1 1 0 0 ``` {F7443905} If we look at the `_mean`s, the time column, the biggest win is `-7.7%` (`Canon/EOS 5D Mark II/10.canon.sraw2.cr2`), and the biggest loose is `+3.3%` (`Panasonic/DC-GH5S/P1022085.RW2`); Overall: mean `-0.7436%`, median `-0.23%`, `cbrt(sum(time^3))` = `-8.73%` Looks good so far i'd say. llvm-exegesis details: {F7371117} {F7371125} {F7371128} {F7371144} {F7371158} Reviewers: craig.topper, RKSimon, andreadb, courbet, avt77, spatel, GGanesh Reviewed By: andreadb Subscribers: javed.absar, gbedwell, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D52779 llvm-svn: 345463	2018-10-27 20:46:30 +00:00
Simon Pilgrim	a365719a24	[X86][SSE] LowerVSELECT - pull out repeated getOperand(). NFCI. llvm-svn: 345458	2018-10-27 18:37:59 +00:00
Simon Pilgrim	88116e905e	Revert rL345395: [X86][SSE] Move 2-input limit up from getFauxShuffleMask to resolveTargetShuffleInputs Makes no difference to actual shuffle decoding yet, but merges all the existing limits in one place for when proper support is fixed. ........ Its been reported that this is causing out of trunk failures. llvm-svn: 345451	2018-10-27 07:10:48 +00:00

... 3 4 5 6 7 ...

18374 Commits