llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	3521367ff3	[X86][SSE] getTargetShuffleMaskIndices - allow opt-in support for whole undef shuffle mask elements Enable this for PSHUFB constant mask decoding and remove the ConstantPool DecodePSHUFBMask llvm-svn: 344931	2018-10-22 18:09:02 +00:00
Simon Pilgrim	5dff767c25	[X86] getTargetConstantBitsFromNode - handle extraction from larger constant pool entries First step towards removing X86ShuffleDecodeConstantPool usage from X86ISelLowering.cpp llvm-svn: 344924	2018-10-22 17:43:33 +00:00
Craig Topper	8d8dcfe690	Revert r344877 "[X86] Stop promoting integer loads to vXi64" Sam McCall reported miscompiles in some tensorflow code. Reverting while I try to figure out. llvm-svn: 344921	2018-10-22 16:59:24 +00:00
Simon Pilgrim	6f5cd7c67f	[X86][SSE] getTargetShuffleMask - pull out repeated shuffle mask element size. NFCI. llvm-svn: 344910	2018-10-22 15:33:30 +00:00
Roman Lebedev	898808504d	[X86] X86DAGToDAGISel: handle BZHI selection too, not just BEXTR. Summary: As discussed in D52304 / IRC, we now have pattern matching for 'bit extract' in two places - tablegen and `X86DAGToDAGISel`. There are 4 patterns. And we will have a problem with `x & (-1 >> (32 - y))` pattern. * If the mask is one-use, then it is always unfolded into `x << (32 - y) >> (32 - y)` first. Thus, the existing test coverage is already broken. * If it is not one-use, then it is not unfolded, and is matched as BZHI. * If it is not one-use, we will not match it as BEXTR. And if it is one-use, it will have been unfolded already. So we will either not handle that pattern for BEXTR, or not have test coverage for it. This is bad. As discussed with @craig.topper, let's unify this matching, and do everything in `X86DAGToDAGISel`. Then we will not have code duplication, and will have proper test coverage. This indeed does not affect any tests, and this is great. It means that for these two patterns, the `X86DAGToDAGISel` is identical to the tablegen version. Please review carefully, i'm not fully sure about that intrinsic change, and introduction of the new `X86ISD` opcode. Reviewers: craig.topper, RKSimon, spatel Reviewed By: craig.topper Subscribers: llvm-commits, craig.topper Differential Revision: https://reviews.llvm.org/D53164 llvm-svn: 344904	2018-10-22 14:12:44 +00:00
Roman Lebedev	13c5ab2e27	[X86][BMI1]: X86DAGToDAGISel: select BEXTR from x & ((1 << nbits) + (-1)) pattern Summary: Trivial continuation of D52304. While this pattern is not canonical, we do select it in the BZHI case, so this should not be any different. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52348 llvm-svn: 344902	2018-10-22 13:54:17 +00:00
Craig Topper	290c081d91	[X86] Add patterns for vector and/or/xor/andn with other types than vXi64. This makes fast isel treat all legal vector types the same way. Previously only vXi64 was in the fast-isel tables. This unfortunately prevents matching of andn by fast-isel for these types since the requires SelectionDAG. But we already had this issue for vXi64. So at least we're consistent now. Interestinly it looks like fast-isel can't handle instructions with constant vector arguments so the the not part of the andn patterns is selected with SelectionDAG. This explains why VPTERNLOG shows up in some of the tests. This is a subset of D53268. As I make progress on that, I will try to reduce the number of lines in the tablegen files. llvm-svn: 344884	2018-10-22 06:30:22 +00:00
Craig Topper	321df5b0d4	[X86] Stop promoting integer loads to vXi64 Summary: Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to remove the bitcast. I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping. I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the load size. I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits, RKSimon Differential Revision: https://reviews.llvm.org/D53306 llvm-svn: 344877	2018-10-21 21:30:26 +00:00
Craig Topper	8de07b4db1	Revert r344873 "foo" Rebase gone wrong left this in my tree. llvm-svn: 344875	2018-10-21 21:08:37 +00:00
Craig Topper	5eea94edd4	[X86] Remove SDIVREM8_SEXT_HREG/UDIVREM8_ZEXT_HREG and their associated DAG combine and target bits support. Use a post isel peephole instead. Summary: These nodes exist to overcome an isel problem where we can generate a zero extend of an AH register followed by an extract subreg, and another zero extend. The first zero extend exists to avoid a partial register update copying the AH register into the low 8-bits. The second zero extend exists if the user wanted the remainder zero extended. To make this work we had a DAG combine to morph the DIVREM opcode to a special opcode that included the extend. But then we had to add the new node to computeKnownBits and computeNumSignBits to process the extension portion. This patch instead removes all of that and adds a late peephole to detect the two extends. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53449 llvm-svn: 344874	2018-10-21 21:07:27 +00:00
Craig Topper	e367039fe5	foo llvm-svn: 344873	2018-10-21 21:07:25 +00:00
Simon Pilgrim	eb806d5f30	[X86][AVX] Enable lowerVectorShuffleAsLanePermuteAndPermute v16i16/v32i8 unary shuffle lowering llvm-svn: 344868	2018-10-21 17:07:50 +00:00
Simon Pilgrim	abc24fdb94	[X86] Only extract constant pool shuffle mask data with zero offsets D53306 exposes an issue where we sometimes use constant pool data from bigger vectors than the target shuffle mask. This should be safe to do, but we have to be certain that we're using the bottom most part of the vector as the shuffle mask decoders have no way to peek into subvectors with non-zero offsets. llvm-svn: 344867	2018-10-21 11:55:56 +00:00
Craig Topper	5ed1099962	[X86] Remove some left over code from when MVT:i1 was a legal type for AVX512. llvm-svn: 344813	2018-10-19 20:44:33 +00:00
Craig Topper	5c81c68385	[X86] In PostprocessISelDAG, start from allnodes_end, not the root. There is no guarantee the root is at the end if isel created any nodes without morphing them. This includes the nodes created by manual isel from C++ code in X86ISelDAGToDAG. This is similar to r333415 from PowerPC which is where I originally stole the peephole loop from. I don't have a test case, but without this a future patch doesn't work which is how I found it. llvm-svn: 344808	2018-10-19 19:24:42 +00:00
Kristina Brooks	312fcc116b	[X86] Support for the mno-tls-direct-seg-refs flag Allows to disable direct TLS segment access (%fs or %gs). GCC supports a similar flag, it can be useful in some circumstances, e.g. when a thread context block needs to be updated directly from user space. More info and specific use cases: https://bugs.llvm.org/show_bug.cgi?id=16145 There is another revision for clang as well. Related: D53102 All X86 CodeGen tests appear to pass: ``` [46/47] Running lit suite /SourceCache/llvm-trunk-8.0/test/CodeGen Testing Time: 23.17s Expected Passes : 3801 Expected Failures : 15 Unsupported Tests : 8021 ``` Reviewed by: Craig Topper. Patch by nruslan (Ruslan Nikolaev). Differential Revision: https://reviews.llvm.org/D53103 llvm-svn: 344723	2018-10-18 03:14:37 +00:00
Craig Topper	e0a992918b	[X86] Match (cmp (and (shr X, C), mask), 0) to BEXTR+TEST. Without this we match the CMP+AND to a TEST and then match the SHR separately. I'm trusting analyzeCompare to remove the TEST during the peephole pass. Otherwise we need to check the flag users to see if they only use the Z flag. This recovers a case lost by r344270. Differential Revision: https://reviews.llvm.org/D53310 llvm-svn: 344649	2018-10-16 22:29:36 +00:00
Simon Pilgrim	7d27cfdcb2	[X86] Fix Skylake ReadAfterLd for PADDrm etc. Missed in rL343868 as due to their custom InstrRW. llvm-svn: 344600	2018-10-16 09:50:16 +00:00
Craig Topper	e70c560b6d	[X86] Remove some isel patterns that shouldn't be possible. These included a bitcast of a load from v4f32 to v2f64, but DAG combine should have already changed the type of the load to remove the cast. llvm-svn: 344573	2018-10-15 23:34:58 +00:00
Craig Topper	2909a3d9d0	[X86] Fix a bad bitcast in the load form of vXi16 uniform shift patterns for EVEX encoded instructions. llvm-svn: 344563	2018-10-15 21:51:32 +00:00
Chandler Carruth	edb12a838a	[TI removal] Make variables declared as `TerminatorInst` and initialized by `getTerminator()` calls instead be declared as `Instruction`. This is the biggest remaining chunk of the usage of `getTerminator()` that insists on the narrow type and so is an easy batch of updates. Several files saw more extensive updates where this would cascade to requiring API updates within the file to use `Instruction` instead of `TerminatorInst`. All of these were trivial in nature (pervasively using `Instruction` instead just worked). llvm-svn: 344502	2018-10-15 10:04:59 +00:00
Craig Topper	06aea1720a	[X86] Move promotion of vector and/or/xor from legalization to DAG combine Summary: I've noticed that the bitcasts we introduce for these make computeKnownBits and computeNumSignBits not work well in LegalizeVectorOps. LegalizeVectorOps legalizes bottom up while LegalizeDAG legalizes top down. The bottom up strategy for LegalizeVectorOps means operands are legalized before their uses. So we promote and/or/xor before we legalize the operands that use them making computeKnownBits/computeNumSignBits in places like LowerTruncate suboptimal. I looked at changing LegalizeVectorOps to be top down as well, but that was more disruptive and caused some regressions. I also looked at just moving promotion of binops to LegalizeDAG, but that had a few issues one around matching AND,ANDN,OR into VSELECT because I had to create ANDN as vXi64, but the other nodes hadn't legalized yet, I didn't look too hard at fixing that. This patch seems to produce better results overall than my other attempts. We now form broadcasts of constants better in some cases. For at least some of them the AND was being introduced in LegalizeDAG, promoted to vXi64, and the BUILD_VECTOR was also legalized there. I think we got bad ordering of that. Now the promotion is out of the legalizer so we handle this better. In the longer term I think we really should evaluate whether we should be doing this promotion at all. It's really there to reduce isel pattern count, but I'm wondering if we'd be better served just eating the pattern cost or doing C++ based isel for vector and/or/xor in X86ISelDAGToDAG. The masked and/or/xor will definitely be difficult in patterns if a bitcast gets between the vselect and the and/or/xor node. That becomes a lot of permutations to cover. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53107 llvm-svn: 344487	2018-10-15 01:51:58 +00:00
Craig Topper	671779456a	[X86] Add 128 MOVDDUP to the constant pool printing in X86AsmPrinter::EmitInstruction. We use this instruction to broadcast a single 64-bit value to a v2i64/v2f64 vector. llvm-svn: 344486	2018-10-15 01:51:53 +00:00
Simon Pilgrim	861cd0ba44	[X86][AVX] Enable lowerVectorShuffleAsLanePermuteAndPermute v16i16/v32i8 shuffle lowering Extends D53148 from v4f64 now that we have test coverage for v16i16/v32i8 shuffles. llvm-svn: 344481	2018-10-14 17:34:20 +00:00
Dorit Nuzman	38bbf81ade	recommit 344472 after fixing build failure on ARM and PPC. llvm-svn: 344475	2018-10-14 08:50:06 +00:00
Dorit Nuzman	5118c68cde	revert 344472 due to failures. llvm-svn: 344473	2018-10-14 07:21:20 +00:00
Dorit Nuzman	8174368955	[IAI,LV] Add support for vectorizing predicated strided accesses using masked interleave-group The vectorizer currently does not attempt to create interleave-groups that contain predicated loads/stores; predicated strided accesses can currently be vectorized only using masked gather/scatter or scalarization. This patch makes predicated loads/stores candidates for forming interleave-groups during the Loop-Vectorizer's analysis, and adds the proper support for masked-interleave- groups to the Loop-Vectorizer's planning and transformation stages. The patch also extends the TTI API to allow querying the cost of masked interleave groups (which each target can control); Targets that support masked vector loads/ stores may choose to enable this feature and allow vectorizing predicated strided loads/stores using masked wide loads/stores and shuffles. Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53011 llvm-svn: 344472	2018-10-14 07:06:16 +00:00
Craig Topper	20fa085d74	[X86] Fix bad indentation. NFC llvm-svn: 344471	2018-10-14 04:01:40 +00:00
Craig Topper	ec4b75f47a	[X86] Type legalize v2f32 stores by widening to v4f32, casting to v2f64, extracting f64 and storing. Summary: This is similar to what D52528 did for loads. It should match what generic type legalization does in 64-bit mode where it uses a v2i64 cast and an i64 store. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53173 llvm-svn: 344470	2018-10-14 03:36:27 +00:00
Benjamin Kramer	c55e997556	Move some helpers from the global namespace into anonymous ones. llvm-svn: 344468	2018-10-13 22:18:22 +00:00
Simon Pilgrim	c5d7c6e5f6	[X86][SSE] Remove most of vector CTTZ custom lowering and use LegalizeDAG instead. There is one remnant - AVX1 custom splitting of 256-bit vectors - which is due to a regression where the X86ISD::ANDNP is still performed as a YMM. I've also tightened the CTLZ or CTPOP lowering in SelectionDAGLegalize::ExpandBitCount to require a legal CTLZ - it doesn't affect existing users and fixes an issue with AVX512 codegen. llvm-svn: 344457	2018-10-13 16:11:15 +00:00
Simon Pilgrim	1c2051ead7	[X86][SSE] Begin removing vector CTTZ custom lowering and use LegalizeDAG instead. Adds CTTZ vector legalization support and begins the removal of the X86/SSE custom lowering. llvm-svn: 344453	2018-10-13 15:16:55 +00:00
Simon Pilgrim	1c6d320351	[X86][SSE] combineIncDecVector - use isConstantSplat Use isConstantSplat instead of ISD::isConstantSplatVector to let us us peek through to illegal types (in this case for i686 targets to recognise i64 constants) llvm-svn: 344452	2018-10-13 14:45:44 +00:00
Simon Pilgrim	a03379527a	[X86] Pull out target constant splat helper function. NFCI. The code in LowerScalarImmediateShift is just a more powerful version of ISD::isConstantSplatVector. llvm-svn: 344451	2018-10-13 14:28:40 +00:00
Simon Pilgrim	10434cbae1	Pull out repeated getOperand(). NFCI. llvm-svn: 344450	2018-10-13 13:33:32 +00:00
Simon Pilgrim	bc141724c0	Remove unused variable. NFCI. llvm-svn: 344449	2018-10-13 13:30:10 +00:00
Simon Pilgrim	f64e654d62	[X86][SSE] Improve CTTZ lowering when CTLZ is legal If we have better CTLZ support than CTPOP, then use cttz(x) = width - ctlz(~x & (x - 1)) - and remove the CTTZ_ZERO_UNDEF handling as it no longer gives better codegen. Similar to rL344447, this is also closer to LegalizeDAG's approach llvm-svn: 344448	2018-10-13 13:05:19 +00:00
Simon Pilgrim	afead139cf	[X86][SSE] Change CTTZ vector lowering to cttz(x) = ctpop(~x & (x - 1)) This patch changes the vector CTTZ lowering from: cttz(x) = ctpop((x & -x) - 1) to: cttz(x) = ctpop(~x & (x - 1)) Not only does this make better use of the PANDN instruction, but it also matches the LegalizeDAG method which should allow us to remove the x86 specific code at some point in the future (we need to fix some issues with the bitcasted logic ops and CTPOP lowering first). Differential Revision: https://reviews.llvm.org/D53214 llvm-svn: 344447	2018-10-13 12:12:06 +00:00
Simon Pilgrim	f3952413f7	[X86][AVX] Add lowerVectorShuffleAsLanePermuteAndPermute for v4f64 shuffles (PR39161) Add shuffle lowering for the case where we can shuffle the lanes into place followed by an in-lane permute. This is mainly for cases where we can have non-repeating permutes in each lane, but for now I've just enabled it for v4f64 unary shuffles to fix PR39161 - there is no test coverage for other shuffles that might benefit yet. We now have several cross-lane shuffle lowering methods that all do something similar - I've looked at merging some of these (notably by making the repeated mask mechanism in lowerVectorShuffleByMerging128BitLanes optional), but there is a lot of assertions/assumptions in the way that makes this tricky - I ended up going for adding yet another relatively simple method instead. Differential Revision: https://reviews.llvm.org/D53148 llvm-svn: 344446	2018-10-13 11:38:10 +00:00
Craig Topper	3e76b2d736	[X86] Improve type legalization of (v2i32/v4i16/v8i16 (bitcast (v2f32))) to avoid a stack stack temporary. llvm-svn: 344425	2018-10-12 22:00:04 +00:00
Craig Topper	c693a23025	[X86] Simplify the end of custom type legalization for (v2i32/v4i16/v8i8 (bitcast (f64))) by just emitting an EXTRACT_SUBVECTOR instead of a BUILD_VECTOR. Generic legalization should be able to finish legalizing the EXTRACT_SUBVECTOR probably by turning it into a BUILD_VECTOR. But we should emit the simplest sequence. llvm-svn: 344424	2018-10-12 22:00:00 +00:00
Craig Topper	a8a44f1bec	[X86] Skip (v2i32/v4i16/v8i8 (bitcast (f64))) handling in ReplaceNodeResults if the dest type can be widened by generic legalization. NFCI The algorithm we would do previously was identical to generic legalization. If we ever switch to legalizing integer vectors via widening we'll be able to kill off the code since it now only runs for promotion. llvm-svn: 344423	2018-10-12 21:59:58 +00:00
Sanjay Patel	e28c8ecd72	[x86] add and use fast horizontal vector math subtarget feature This is the planned follow-up to D52997. Here we are reducing horizontal vector math codegen by default. AMD Jaguar (btver2) should have no difference with this patch because it has fast-hops. (If we want to set that bit for other CPUs, let me know.) The code changes are small, but there are many test diffs. For files that are specifically testing for hops, I added RUNs to distinguish fast/slow, so we can see the consequences side-by-side. For files that are primarily concerned with codegen other than hops, I just updated the CHECK lines to reflect the new default codegen. To recap the recent horizontal op story: 1. Before rL343727, we were producing hops for all subtargets for a variety of patterns. Hops were likely not optimal for all targets though. 2. The IR improvement in r343727 exposed a hole in the backend hop pattern matching, so we reduced hop codegen for all subtargets. That was bad for Jaguar (PR39195). 3. We restored the hop codegen for all targets with rL344141. Good for Jaguar, but probably bad for other CPUs. 4. This patch allows us to distinguish when we want to produce hops, so everyone can be happy. I'm not sure if we have the best predicate here, but the intent is to undo the extra hop-iness that was enabled by r344141. Differential Revision: https://reviews.llvm.org/D53095 llvm-svn: 344361	2018-10-12 16:41:02 +00:00
Eric Liu	55ab86b72b	Fix unused variable warning after r344348 llvm-svn: 344350	2018-10-12 15:01:11 +00:00
Simon Pilgrim	78b5a3c3ef	[X86][SSE] LowerVectorCTPOP - pull out repeated byte sum stage. Pull out repeated byte sum stage for popcount of vector elements > 8bits. This allows us to simplify the LUT/BITMATH popcnt code to always assume vXi8 vectors, and also improves avx512bitalg codegen which only has access to vpopcntb/vpopcntw. llvm-svn: 344348	2018-10-12 14:18:47 +00:00
Simon Pilgrim	29279f29c8	[X86][SSE] Add extract_subvector(PSHUFB) -> PSHUFB(extract_subvector()) combine Fixes PR32160 by reducing the size of PSHUFB if we only use one of the lanes. This approach can probably be generalized to handle any target shuffle (and any subvector index) but we have no test coverage at the moment. llvm-svn: 344336	2018-10-12 12:10:34 +00:00
Andrea Di Biagio	6eebbe0a97	[tblgen][llvm-mca] Add the ability to describe move elimination candidates via tablegen. This patch adds the ability to identify instructions that are "move elimination candidates". It also allows scheduling models to describe processor register files that allow move elimination. A move elimination candidate is an instruction that can be eliminated at register renaming stage. Each subtarget can specify which instructions are move elimination candidates with the help of tablegen class "IsOptimizableRegisterMove" (see llvm/Target/TargetInstrPredicate.td). For example, on X86, BtVer2 allows both GPR and MMX/SSE moves to be eliminated. The definition of 'IsOptimizableRegisterMove' for BtVer2 looks like this: ``` def : IsOptimizableRegisterMove<[ InstructionEquivalenceClass<[ // GPR variants. MOV32rr, MOV64rr, // MMX variants. MMX_MOVQ64rr, // SSE variants. MOVAPSrr, MOVUPSrr, MOVAPDrr, MOVUPDrr, MOVDQArr, MOVDQUrr, // AVX variants. VMOVAPSrr, VMOVUPSrr, VMOVAPDrr, VMOVUPDrr, VMOVDQArr, VMOVDQUrr ], CheckNot<CheckSameRegOperand<0, 1>> > ]>; ``` Definitions of IsOptimizableRegisterMove from processor models of a same Target are processed by the SubtargetEmitter to auto-generate a target-specific override for each of the following predicate methods: ``` bool TargetSubtargetInfo::isOptimizableRegisterMove(const MachineInstr *MI) const; bool MCInstrAnalysis::isOptimizableRegisterMove(const MCInst &MI, unsigned CPUID) const; ``` By default, those methods return false (i.e. conservatively assume that there are no move elimination candidates). Tablegen class RegisterFile has been extended with the following information: - The set of register classes that allow move elimination. - Maxium number of moves that can be eliminated every cycle. - Whether move elimination is restricted to moves from registers that are known to be zero. This patch is structured in three part: A first part (which is mostly boilerplate) adds the new 'isOptimizableRegisterMove' target hooks, and extends existing register file descriptors in MC by introducing new fields to describe properties related to move elimination. A second part, uses the new tablegen constructs to describe move elimination in the BtVer2 scheduling model. A third part, teaches llm-mca how to query the new 'isOptimizableRegisterMove' hook to mark instructions that are candidates for move elimination. It also teaches class RegisterFile how to describe constraints on move elimination at PRF granularity. llvm-mca tests for btver2 show differences before/after this patch. Differential Revision: https://reviews.llvm.org/D53134 llvm-svn: 344334	2018-10-12 11:23:04 +00:00
Simon Pilgrim	c844bc84dd	[X86] Ignore float/double non-temporal loads (PR39256) Scalar non-temporal loads were asserting instead of just being ignored. Reduced from https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=10895 llvm-svn: 344331	2018-10-12 10:20:16 +00:00
Matthias Braun	d6131c9633	X86/TargetTransformInfo: Report div/rem constant immediate costs as TCC_Free DIV/REM by constants should always be expanded into mul/shift/etc. patterns. Unfortunately the ConstantHoisting pass runs too early at a point where the pattern isn't expanded yet. However after ConstantHoisting hoisted some immediate the result may not expand anymore. Also the hoisting typically doesn't make sense because it operates on immediates that will change completely during the expansion. Report DIV/REM as TCC_Free so ConstantHoisting will not touch them. Differential Revision: https://reviews.llvm.org/D53174 llvm-svn: 344315	2018-10-11 23:14:35 +00:00
Richard Trieu	dfd1760b5f	Inline variable into assert to avoid unused variable warning. llvm-svn: 344308	2018-10-11 22:42:41 +00:00
Craig Topper	35d513c7e4	[X86] Type legalize v2f32 loads by using an f64 load and a scalar_to_vector. On 64-bit targets the generic legalize will use an i64 load and a scalar_to_vector for us. But on 32-bit targets i64 isn't legal and the generic legalizer will end up emitting two 32-bit loads. We have DAG combines that try to put those two loads back together with pretty good success. This patch instead uses f64 to avoid the splitting entirely. I've made it do the same for 64-bit mode for consistency and to keep the load in the fp domain. There are a few things in here that look like regressions in 32-bit mode, but I believe they bring us closer to the 64-bit mode codegen. And that the 64-bit mode code could be better. I think those issues should be looked at separately. Differential Revision: https://reviews.llvm.org/D52528 llvm-svn: 344291	2018-10-11 20:36:06 +00:00
Craig Topper	fb2ac8969e	[X86] Restore X86ISelDAGToDAG::matchBEXTRFromAnd. Teach address matching to create a BEXTR pattern from a (shl (and X, mask >> C1) if C1 can be folded into addressing mode. This is an alternative to D53080 since I think using a BEXTR for a shifted mask is definitely an improvement when the shl can be absorbed into addressing mode. The other cases I'm less sure about. We already have several tricks for handling an and of a shift in address matching. This adds a new case for BEXTR. I've moved the BEXTR matching code back to X86ISelDAGToDAG to allow it to match. I suppose alternatively we could directly emit a X86ISD::BEXTR node that isel could pattern match. But I'm trying to view BEXTR matching as an isel concern so DAG combine can see 'and' and 'shift' operations that are well understood. We did lose a couple cases from tbm_patterns.ll, but I think there are ways to recover that. I've also put back the manual load folding code in matchBEXTRFromAnd that I removed a few months ago in r324939. This gives us some more freedom to make decisions based on the ability to fold a load. I haven't done anything with that yet. Differential Revision: https://reviews.llvm.org/D53126 llvm-svn: 344270	2018-10-11 18:06:07 +00:00
Roman Lebedev	4225f4adff	[X86][BMI1]: X86DAGToDAGISel: select BEXTR from x & ~(-1 << nbits) pattern Summary: As discussed in D48491, we can't really do this in the TableGen, since we need to produce two instructions. This only implements one single pattern. The other 3 patterns will be in follow-ups. I'm not sure yet if we want to also fuse shift into here (i.e `(x >> start) & ...`) Reviewers: RKSimon, craig.topper, spatel Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D52304 llvm-svn: 344224	2018-10-11 07:51:13 +00:00
Craig Topper	b5421c498d	[X86] Prevent non-temporal loads from folding into instructions by blocking them in X86DAGToDAGISel::IsProfitableToFold rather than with a predicate. Remove tryFoldVecLoad since tryFoldLoad would call IsProfitableToFold and pick up the new check. This saves about 5K out of ~600K on the generated isel table. llvm-svn: 344189	2018-10-10 21:48:34 +00:00
Roman Lebedev	33d84c6dac	[X86] Move X86DAGToDAGISel::matchBEXTRFromAnd() into X86ISelLowering Summary: As discussed in [[ https://bugs.llvm.org/show_bug.cgi?id=38938 \| PR38938 ]], we fail to emit `BEXTR` if the mask is shifted. We can't deal with that in `X86DAGToDAGISel` `before the address mode for the inc is selected`, and we can't really do it in the normal DAGCombine, because we don't have generic `ISD::BitFieldExtract` node, and if we simply turn the shifted mask into a normal mask + shift-left, it will be folded back. So it would seem X86ISelLowering is the place to handle this. This patch only moves the matchBEXTRFromAnd() from X86DAGToDAGISel to X86ISelLowering. It does not add support for the 'shifted mask' pattern. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52426 llvm-svn: 344179	2018-10-10 20:40:12 +00:00
Sanjay Patel	6cca8af227	[x86] allow single source horizontal op matching (PR39195) This is intended to restore horizontal codegen to what it looked like before IR demanded elements improved in: rL343727 As noted in PR39195: https://bugs.llvm.org/show_bug.cgi?id=39195 ...horizontal ops can be worse for performance than a shuffle+regular binop, so I've added a TODO. Ideally, we'd solve that in a machine instruction pass, but a quicker solution will be adding a 'HasFastHorizontalOp' feature bit to deal with it here in the DAG. Differential Revision: https://reviews.llvm.org/D52997 llvm-svn: 344141	2018-10-10 13:39:59 +00:00
Simon Pilgrim	5cb3a82892	[TargetLowering] Add root node back to work list after successful SimplifyDemandedBits/SimplifyDemandedVectorElts Similar to what already happens in the DAGCombiner wrappers, this patch adds the root nodes back onto the worklist if the DCI wrappers' SimplifyDemandedBits/SimplifyDemandedVectorElts were successful. Differential Revision: https://reviews.llvm.org/D53026 llvm-svn: 344132	2018-10-10 10:44:15 +00:00
Craig Topper	02c62aa58a	[X86] Remove FeatureRTM from Skylake processor list Summary: There are a LOT of Skylakes and later without TSX-NI. Examples: - SKL: https://ark.intel.com/products/136863/Intel-Core-i3-8121U-Processor-4M-Cache-up-to-3-20-GHz- - KBL: https://ark.intel.com/products/97540/Intel-Core-i7-7560U-Processor-4M-Cache-up-to-3-80-GHz- - KBL-R: https://ark.intel.com/products/149091/Intel-Core-i7-8565U-Processor-8M-Cache-up-to-4-60-GHz- - CNL: https://ark.intel.com/products/136863/Intel-Core-i3-8121U-Processor-4M-Cache-up-to-3_20-GHz This feature seems to be present only on high-end desktop and server chips (I can't find any SKX without). This commit leaves it disabled for all processors, but can be re-enabled for specific builds with -mrtm. Patch by Thiago Macieira Reviewers: erichkeane, craig.topper Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53041 llvm-svn: 344116	2018-10-10 07:43:35 +00:00
Rong Xu	5c7bf1a756	[X86] Fix sanitizer bot failure from 344085 Fix the memory issue exposed by sanitizer. llvm-svn: 344092	2018-10-09 23:10:56 +00:00
Rong Xu	3d2efdfdea	Recommit r343993: [X86] condition branches folding for three-way conditional codes Fix the memory issue exposed by sanitizer. llvm-svn: 344085	2018-10-09 22:03:40 +00:00
Craig Topper	f6d8400869	[X86] When lowering unsigned v2i64 setcc without SSE42, flip the sign bits in the v2i64 type then bitcast to v4i32. This may give slightly better opportunities for DAG combine to simplify with the operations before the setcc. It also matches the type the xors will eventually be promoted to anyway so it saves a legalization step. Almost all of the test changes are because our constant pool entry is now v2i64 instead of v4i32 on 64-bit targets. On 32-bit targets getConstant should be emitting a v4i32 build_vector and a v4i32->v2i64 bitcast. There are a couple test cases where it appears we now combine a bitwise not with one of these xors which caused a new constant vector to be generated. This prevented a constant pool entry from being shared. But if that's an issue we're concerned about, it seems we need to address it another way that just relying a bitcast to hide it. This came about from experiments I've been trying with pushing the promotion of and/or/xor to vXi64 later than LegalizeVectorOps where it is today. We run LegalizeVectorOps in a bottom up order. So the and/or/xor are promoted before their users are legalized. The bitcasts added for the promotion act as a barrier to computeKnownBits if we try to use it during vector legalization of a later operation. So by moving the promotion out we can hopefully get better results from computeKnownBits/computeNumSignBits like in LowerTruncate on AVX512. I've also looked at running LegalizeVectorOps in a top down order like LegalizeDAG, but thats showing some other issues. llvm-svn: 344071	2018-10-09 19:05:50 +00:00
Sanjay Patel	f5fac1826a	[x86] use demanded bits to simplify masked store codegen As noted in D52747, if we prefer IR to use trunc for bool vectors rather than and+icmp, we can expose codegen shortcomings as seen here with masked store. Replace a hard-coded PCMPGT simplification with the more general demanded bits call to improve things. Differential Revision: https://reviews.llvm.org/D52964 llvm-svn: 344048	2018-10-09 14:04:14 +00:00
Simon Pilgrim	720db8ed7b	[X86][AVX1] Enable _EXTEND_VECTOR_INREG lowering of 256-bit vectors As discussed on D52964, this adds 256-bit _EXTEND_VECTOR_INREG lowering support for AVX1 targets to help improve SimplifyDemandedBits handling. Differential Revision: https://reviews.llvm.org/D52980 llvm-svn: 344019	2018-10-09 07:42:01 +00:00
Rong Xu	47fd015163	[X86] Revert r343993 condition branches folding for three-way conditional codes Some buildbots failed. llvm-svn: 343998	2018-10-08 22:08:43 +00:00
Craig Topper	ff9f02580d	[X86] Prefer isTypeLegal over checking isSimple in a DAG combine. Simple types are a superset of what all in tree targets in LLVM could possibly have a legal type. This means the behavior of using isSimple to check for a supported type for X86 could change over time. For example, this could would change if a v256i1 type was added to MVT in the future. llvm-svn: 343995	2018-10-08 20:02:59 +00:00
Rong Xu	67b1b328f7	[X86] condition branches folding for three-way conditional codes This patch implements a pass that optimizes condition branches on x86 by taking advantage of the three-way conditional code generated by compare instructions. Currently, it tries to hoisting EQ and NE conditional branch to a dominant conditional branch condition where the same EQ/NE conditional code is computed. An example: bb_0: cmp %0, 19 jg bb_1 jmp bb_2 bb_1: cmp %0, 40 jg bb_3 jmp bb_4 bb_4: cmp %0, 20 je bb_5 jmp bb_6 Here we could combine the two compares in bb_0 and bb_4 and have the following code: bb_0: cmp %0, 20 jg bb_1 jl bb_2 jmp bb_5 bb_1: cmp %0, 40 jg bb_3 jmp bb_6 For the case of %0 == 20 (bb_5), we eliminate two jumps, and the control height for bb_6 is also reduced. bb_4 is gone after the optimization. This optimization is motivated by the branch pattern generated by the switch lowering: we always have pivot-1 compare for the inner nodes and we do a pivot compare again the leaf (like above pattern). This pass currently is enabled on Intel's Sandybridge and later arches. Some reviewers pointed out that on some arches (like AMD Jaguar), this pass may increase branch density to the point where it hurts the performance of the branch predictor. Differential Revision: https://reviews.llvm.org/D46662 llvm-svn: 343993	2018-10-08 18:52:39 +00:00
Simon Pilgrim	6fc8d05565	[X86][AVX2] Enable ZERO_EXTEND_VECTOR_INREG lowering of 256-bit vectors Some necessary yak shaving before lowering *_EXTEND_VECTOR_INREG 256-bit vectors on AVX1 targets as suggested by D52964. Differential Revision: https://reviews.llvm.org/D52970 llvm-svn: 343991	2018-10-08 18:40:50 +00:00
Sanjay Patel	43bf9917cc	[x86] make horizontal binop matching clearer; NFCI The instructions are complicated, so this code will probably never be very obvious, but hopefully this makes it better. As shown in PR39195: https://bugs.llvm.org/show_bug.cgi?id=39195 ...we need to improve the matching to not miss cases where we're h-opping on 1 source vector, and that should be a small patch after this rearranging. llvm-svn: 343989	2018-10-08 18:08:02 +00:00
Alexander Ivchenko	1aedf203dd	[GlobalIsel][X86] Support G_UDIV/G_UREM/G_SREM Support G_UDIV/G_UREM/G_SREM. The instruction selection code is taken from FastISel with only minor tweaks to adapt for GlobalISel. Differential Revision: https://reviews.llvm.org/D49781 llvm-svn: 343966	2018-10-08 13:40:34 +00:00
Simon Pilgrim	9fa1c66421	[X86] getFauxShuffleMask - Handle undef + sentinel values in subvector insertion llvm-svn: 343926	2018-10-06 22:13:44 +00:00
Simon Pilgrim	a30e8d23e2	[X86][AVX] Ensure resolveTargetShuffleInputs shuffle masks are the correct width Don't handle ZERO_EXTEND style shuffles until we support bitcasts. Found by inspection. llvm-svn: 343924	2018-10-06 17:18:41 +00:00
Simon Pilgrim	62d199f4e5	[X86] combinePMULDQ - add op back to worklist if SimplifyDemandedBits succeeds on either operand Prevents missing other simplifications that may occur deep in the operand chain where CommitTargetLoweringOpt won't add the PMULDQ back to the worklist itself llvm-svn: 343922	2018-10-06 14:51:14 +00:00
Simon Pilgrim	0cc0a24b55	[X86][SSE] SimplifyDemandedVectorEltsForTargetNode - simplify PSHUFB masks Attempt to simplify PSHUFB masks (even non-constant ones) - we should probably be able to simplify other variable shuffles as well as the need arises. llvm-svn: 343919	2018-10-06 13:49:31 +00:00
Simon Pilgrim	ae78d709b4	[X86] Use the SimplifyDemandedBits wrappers where possible. NFCI. Leave the wrapper to handle TargetLowering::TargetLoweringOpt and CommitTargetLoweringOpt. llvm-svn: 343918	2018-10-06 13:29:08 +00:00
Matthias Braun	81578e9f77	X86, AArch64, ARM: Do not attach debug location to spill/reload instructions This rebases and recommits r343520. hwasan should be fixed now and this shouldn't break the tests anymore. Spill/reload instructions are artificially generated by the compiler and have no relation to the original source code. So the best thing to do is not attach any debug location to them (instead of just taking the next debug location we find on following instructions). Differential Revision: https://reviews.llvm.org/D52125 llvm-svn: 343895	2018-10-05 22:00:13 +00:00
Simon Pilgrim	dc97118efe	[X86][AVX] Limit getFauxShuffleMask INSERT_SUBVECTOR support to 2 inputs rL343853 didn't limit the number of subinputs, but we don't currently support faux shuffles with more than 2 total inputs, so put a limiter in place until this is fixed. Found by Artem Dergachev. llvm-svn: 343891	2018-10-05 21:44:19 +00:00
Craig Topper	0ed892da70	[X86] Don't promote i16 compares to i32 if the immediate will fit in 8 bits. The comments in this code say we were trying to avoid 16-bit immediates, but if the immediate fits in 8-bits this isn't an issue. This avoids creating a zero extend that probably won't go away. The movmskb related changes are interesting. The movmskb instruction writes a 32-bit result, but fills the upper bits with 0. So the zero_extend we were previously emitting was free, but we turned a -1 immediate that would fit in 8-bits into a 32-bit immediate so it was still bad. llvm-svn: 343871	2018-10-05 18:13:36 +00:00
Simon Pilgrim	f09fc3bc12	[X86] Move ReadAfterLd functionality into X86FoldableSchedWrite (PR36957) Currently we hardcode instructions with ReadAfterLd if the register operands don't need to be available until the folded load has completed. This doesn't take into account the different load latencies of different memory operands (PR36957). This patch adds a ReadAfterFold def into X86FoldableSchedWrite to replace ReadAfterLd, allowing us to specify the load latency at a scheduler class level. I've added ReadAfterVec*Ld classes that match the XMM/Scl, XMM and YMM/ZMM WriteVecLoad classes that we currently use, we can tweak these values in future patches once this infrastructure is in place. Differential Revision: https://reviews.llvm.org/D52886 llvm-svn: 343868	2018-10-05 17:57:29 +00:00
Simon Pilgrim	6c5ab48fe7	[X86][AVX] getFauxShuffleMask - add support for INSERT_SUBVECTOR subvector shuffles Decode subvector shuffles from INSERT_SUBVECTOR(SRC0, SHUFFLE(EXTRACT_SUBVECTOR(SRC1)) This was found necessary while investigating PR39161 llvm-svn: 343853	2018-10-05 14:41:00 +00:00
Jonas Paulsson	faad1b3056	[TargetRegisterInfo] Remove temporary hook enableMultipleCopyHints() Finally all targets are enabling multiple regalloc hints, so the hook to disable this can now be removed. NFC. Review: Simon Pilgrim https://reviews.llvm.org/D52316 llvm-svn: 343851	2018-10-05 14:23:11 +00:00
Craig Topper	7d2155e3f9	[X86][LegalizeVectorOps] Use MERGE_VALUES to return two results from LowerLoad. Remove special case code in LegalizeVectorOps that allowed us to only return one result. Previously we replaced the chain use ourself and return the data result. LegalizeVectorOps then detected that we'd done this and assumed the chain had already been handled. This commit instead returns a MERGE_VALUES node with two results joined from nodes. This allows LegalizeVectorOps to do all the replacements for us without any special casing. The MERGE_VALUES will be removed by DAG combine. llvm-svn: 343817	2018-10-04 21:24:24 +00:00
Martin Storsjo	37b742e208	[COFF] [X86] Don't use llvm_unreachable for unsupported relocation types This can happen if assembling a reference to _GLOBAL_OFFSET_TABLE_. While it doesn't make sense to try to assemble that for COFF, the fact that we previously used llvm_unreachable meant that the code had undefined behaviour if something tried to assemble that. The configure script of libgmp would try to assemble such a snippet (which should signal a failure). If llvm is built without assertions, the undefined behaviour meant a (near) infinite loop. Differential Revision: https://reviews.llvm.org/D52903 llvm-svn: 343811	2018-10-04 20:43:38 +00:00
David Greene	4f916df29e	[X86] Set correct MMO offset on scalarized load pieces When scalarizing a load, be sure to update the offset in the MachineMemOperand for each scalar load. llvm-svn: 343776	2018-10-04 14:07:59 +00:00
Craig Topper	8b3c46f0a8	[X86] Merge matchANDXORWithAllOnesAsANDNP into combineANDXORWithAllOnesIntoANDNP. NFCI It's the only caller and the logic pretty easy to combine. llvm-svn: 343754	2018-10-04 06:13:27 +00:00
Craig Topper	a65c2dbfd6	[X86] Stop promoting vector ISD::SELECT to vXi64. The additional patterns needed for this aren't overwhelming and introducing extra bitcasts during lowering limits our ability to do computeNumSignBits. Not that I have a good example of that for select. I'm just becoming increasingly grumpy about promotion of AND/OR/XOR. SELECT was just a lot easier to fix. llvm-svn: 343723	2018-10-03 21:10:29 +00:00
Craig Topper	c39dc41b63	[X86] Add CMOV_VK2/VK4 pseudos and remove lowering code that turned v2i1/v4i1 SELECT into v8i1. llvm-svn: 343713	2018-10-03 20:28:43 +00:00
Craig Topper	703fbde3cb	[X86] Add CMOV pseudos for VR128X and VR256X register classes. Use them when AVX512VL is enabled. This allows the phi nodes to be generated with the correct register class when expanded. llvm-svn: 343710	2018-10-03 19:48:26 +00:00
Craig Topper	4b62c2dbda	[X86] Don't break CMOV pseudo instructions down by type. Just by register class. The register class is all that's important for the pseudo instructions. We can use patterns to handle the different types. llvm-svn: 343709	2018-10-03 19:48:23 +00:00
Simon Pilgrim	aabd99c27a	[X86] PUSH/POP 'mem-mem' instructions are not RMW - these are 2 different addresses This patch adds a 'WriteCopy' [WriteLoad, WriteStore] schedule sequence instead to better model the behaviour Found by @andreadb during llvm-mca testing on btver2 which was crashing on "zero uop" WriteRMW only instructions llvm-svn: 343708	2018-10-03 19:02:38 +00:00
Simon Pilgrim	b80d27a916	[X86] Move Atomic binops to use WriteALURMW schedule class These were being tagged as <WriteALULd, WriteRMW> instead of properly using the RMW sequence llvm-svn: 343705	2018-10-03 18:38:28 +00:00
Simon Pilgrim	0b451a2983	[X86][Btver2] Fix MMX PSHUFB schedule Match AMD Fam16h SOG + llvm-exegesis tests llvm-svn: 343701	2018-10-03 18:18:50 +00:00
Simon Pilgrim	a400612aed	[X86] Move Atomic CMPXCHG to WriteCMPXCHGRMW schedule class llvm-svn: 343700	2018-10-03 18:05:01 +00:00
Simon Pilgrim	2c59475c06	[X86] Add SkylakeClient uops counter - same as the other Intel models. llvm-svn: 343697	2018-10-03 16:45:26 +00:00
Nirav Dave	925b64be64	[X86] Correctly use SSE registers if no-x87 is selected. Fix use of SSE1 registers for f32 ops in no-x87 mode. Notably, allow use of SSE instructions for f32 operations in 64-bit mode (but not 32-bit which is disallowed by callign convention). Also avoid translating memset/memcopy/memmove into SSE registers without X87 for 32-bit mode. This fixes PR38738. Reviewers: nickdesaulniers, craig.topper Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D52555 llvm-svn: 343689	2018-10-03 14:13:30 +00:00
Simon Pilgrim	c68cc4efbe	[X86][Btver2] Most RMW instructions don't require an additional uop Remove uop on WriteRMW and move it into the few instructions that need it. Match AMD Fam16h SOG + llvm-exegesis tests llvm-svn: 343671	2018-10-03 10:28:43 +00:00
Simon Pilgrim	d11015861c	[X86] ALU/ADC RMW instructions should use the WriteRMW sequence class I was expecting this to be a nfc but Silvermont seems to be setup a little differently: // A folded store needs a cycle on MEC_RSV for the store data, but it does not need an extra port cycle to recompute the address. def : WriteRes<WriteRMW, [SLM_MEC_RSV]>; So moving from WriteStore to WriteRMW reduces predicted port pressure, confirmed by @craig.topper that this is correct. Differential Revision: https://reviews.llvm.org/D52740 llvm-svn: 343670	2018-10-03 10:01:13 +00:00
Matt Morehouse	4b1ec17fb0	Revert "X86, AArch64, ARM: Do not attach debug location to spill/reload instructions" This reverts r343520 due to breakage of HWASan tests on Android. llvm-svn: 343616	2018-10-02 18:35:44 +00:00
Craig Topper	49225d0915	[X86][Disassembler] Add bizarro versions of the MOVSXD instruction that sign extend from a GR32 to GR32 or GR16. The 0x63 opcodes in 64-bit mode have a fixed source size of 32-bits, but the destination size is controlled by REX.W and the 0x66 opsize prefix. This instruction is normally used with a REX.W prefix which provides desired behavior. The other encodings are interpretted as valid by the processor, but aren't useful. This patch makes us recognize them for the disassembler to match objdump. llvm-svn: 343614	2018-10-02 18:16:19 +00:00
Reid Kleckner	d5e4ec74e3	[codeview] Fix 32-bit x86 variable locations in realigned stack frames Add the .cv_fpo_stackalign directive so that we can define $T0, or the VFRAME virtual register, with it. This was overlooked in the initial implementation because unlike MSVC, we push CSRs before allocating stack space, so this value is only needed to describe local variable locations. Variables that the compiler now addresses via ESP are instead described as being stored at offsets from VFRAME, which for us is ESP after alignment in the prologue. This adds tests that show that we use the VFRAME register properly in our S_DEFRANGE records, and that we emit the correct FPO data to define it. Fixes PR38857 llvm-svn: 343603	2018-10-02 16:43:52 +00:00
Simon Pilgrim	860cb5c071	[X86][Btver2] Fix BLENDV and AESDEC schedules Match AMD Fam16h SOG + llvm-exegesis tests llvm-svn: 343597	2018-10-02 15:13:18 +00:00
Simon Pilgrim	201bbe3993	[X86] Remove unnecessary BT(C/R/S)m(i/r) scheduler overrides Some SchedAlias remain due to some badly setup RMW tags - but at least the overrides are all removed llvm-svn: 343586	2018-10-02 13:11:59 +00:00
Simon Pilgrim	271bcb9397	[X86] Add APInt constant assembly printer helper llvm-svn: 343577	2018-10-02 11:32:33 +00:00
Simon Pilgrim	ad23f270db	[X86] Standardize floating point assembly comments Consistently try to use APFloat::toString for floating point constant comments to get rid of differences between Constant / ConstantDataSequential values - it should help stop some of the linux-windows buildbot failures matching NaN/INF etc. as well. Differential Revision: https://reviews.llvm.org/D52702 llvm-svn: 343562	2018-10-02 09:08:51 +00:00
Reid Kleckner	9ea2c01264	[codeview] Emit S_FRAMEPROC and use S_DEFRANGE_FRAMEPOINTER_REL Summary: Before this change, LLVM would always describe locals on the stack as being relative to some specific register, RSP, ESP, EBP, ESI, etc. Variables in stack memory are pretty common, so there is a special S_DEFRANGE_FRAMEPOINTER_REL symbol for them. This change uses it to reduce the size of our debug info. On top of the size savings, there are cases on 32-bit x86 where local variables are addressed from ESP, but ESP changes across the function. Unlike in DWARF, there is no FPO data to describe the stack adjustments made to push arguments onto the stack and pop them off after the call, which makes it hard for the debugger to find the local variables in frames further up the stack. To handle this, CodeView has a special VFRAME register, which corresponds to the $T0 variable set by our FPO data in 32-bit. Offsets to local variables are instead relative to this value. This is part of PR38857. Reviewers: hans, zturner, javed.absar Subscribers: aprantl, hiraditya, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D52217 llvm-svn: 343543	2018-10-01 21:59:45 +00:00
Craig Topper	42cd8cd862	Recommit r343499 "[X86] Enable load folding in the test shrinking code" Original message: This patch adds load folding support to the test shrinking code. This was noticed missing in the review for D52669 llvm-svn: 343540	2018-10-01 21:35:28 +00:00
Craig Topper	f06a57fc89	Recommit r343498 "[X86] Improve test instruction shrinking when the sign flag is used and the output of the and is truncated." This includes a fix to prevent i16 compares with i32/i64 ands from being shrunk if bit 15 of the and is set and the sign bit is used. Original commit message: Currently we skip looking through truncates if the sign flag is used. But that's overly restrictive. It's safe to look through the truncate as long as we ensure one of the 3 things when we shrink. Either the MSB of the mask at the shrunken size isn't set. If the mask bit is set then either the shrunk size needs to be equal to the compare size or the sign There are still missed opportunities to shrink a load and fold it in here. This will be fixed in a future patch. llvm-svn: 343539	2018-10-01 21:35:26 +00:00
Matthias Braun	3e081703c3	X86, AArch64, ARM: Do not attach debug location to spill/reload instructions Spill/reload instructions are artificially generated by the compiler and have no relation to the original source code. So the best thing to do is not attach any debug location to them (instead of just taking the next debug location we find on following instructions). Differential Revision: https://reviews.llvm.org/D52125 llvm-svn: 343520	2018-10-01 18:56:39 +00:00
Craig Topper	e072934d28	Revert r343499 and r343498. X86 test improvements There's a subtle bug in the handling of truncate from i32/i64 to i32 without minsize. I'll be adding more test cases and trying to find a fix. llvm-svn: 343516	2018-10-01 18:40:44 +00:00
Craig Topper	aa84e1bba2	[X86] Enable load folding in the test shrinking code This patch adds load folding support to the test shrinking code. This was noticed missing in the review for D52669 Differential Revision: https://reviews.llvm.org/D52699 llvm-svn: 343499	2018-10-01 17:10:50 +00:00
Craig Topper	2b587ad071	[X86] Improve test instruction shrinking when the sign flag is used and the output of the and is truncated Currently we skip looking through truncates if the sign flag is used. But that's overly restrictive. It's safe to look through the truncate as long as we ensure one of the 3 things when we shrink. Either the MSB of the mask at the shrunken size isn't set. If the mask bit is set then either the shrunk size needs to be equal to the compare size or the sign flag needs to be unused. There are still missed opportunities to shrink a load and fold it in here. This will be fixed in a future patch. Differential Revision: https://reviews.llvm.org/D52669 llvm-svn: 343498	2018-10-01 17:10:45 +00:00
Simon Pilgrim	e0d2019052	[X86][Btver2] Fix BT(C\|R\|S)mr & BT(C\|R\|S)mi schedule latency + uop counts Match AMD Fam16h SOG + llvm-exegesis tests llvm-svn: 343494	2018-10-01 16:31:30 +00:00
Simon Pilgrim	683e35527b	[X86] Create schedule classes for BT(C\|R\|S)mi and BT(C\|R\|S)mr instructions llvm-svn: 343490	2018-10-01 16:12:44 +00:00
Simon Pilgrim	4334912c1c	[X86] Remove unnecessary BTmi/BTmr scheduler overrides llvm-svn: 343487	2018-10-01 15:01:00 +00:00
Simon Pilgrim	6ddc4e821c	[X86][Btver2] Fix BTmr schedule uop counts Match AMD Fam16h SOG + llvm-exegesis tests llvm-svn: 343484	2018-10-01 14:42:16 +00:00
Simon Pilgrim	43737a3df4	[X86] Create schedule classes for BTmi and BTmr instructions llvm-svn: 343478	2018-10-01 14:23:37 +00:00
Simon Pilgrim	a982236e59	[X86][Btver2] Fix masked load schedule JFPU01 resource usage should match JFPX Match AMD Fam16h SOG + llvm-exegesis tests llvm-svn: 343468	2018-10-01 13:12:05 +00:00
Andrea Di Biagio	24ea163007	[X86][BtVer2] Teach how to identify zero-idiom VPERM2F128rr instructions. This patch adds another variant class to identify zero-idiom VPERM2F128rr instructions. On Jaguar, a VPERM wih bit 3 and 7 of the mask set, is a zero-idiom. Differential Revision: https://reviews.llvm.org/D52663 llvm-svn: 343452	2018-10-01 10:35:13 +00:00
Clement Courbet	a933fb237e	[X86][Sched] Update scheduling information for VZEROALL on HWS, BDW, SKX, SNB. Summary: While looking at PR35606, I found out that the scheduling info is incorrect. One can check that it's really a P5+P6 and not a 2*P56 with: echo -e 'vzeroall\nvandps %xmm1, %xmm2, %xmm3' \| ./bin/llvm-exegesis -mode=uops -snippets-file=- (vandps executes on P5 only) Reviewers: craig.topper, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52541 llvm-svn: 343447	2018-10-01 08:37:48 +00:00
Clement Courbet	dac60b9837	[X86][Sched] Add pfm uop counter definitions for SNB,BDW,SKX. llvm-svn: 343446	2018-10-01 08:37:37 +00:00
Craig Topper	67d9dbdbdd	[X86] Stop X86DomainReassignment from creating copies between GR8/GR16 physical registers and k-registers. We can only copy between a k-register and a GR32/GR64 register. This patch detects that the copy will be illegal and prevents the domain reassignment from happening for that closure. This probably isn't the best fix, and we should probably figure out how to handle this correctly. Fixes PR38803. llvm-svn: 343443	2018-10-01 07:08:41 +00:00
Craig Topper	1d1dca6a6f	[X86] Change an llvm_unreachable to a report_fatal_error so the optimizer will stop making us reach the other report_fatal_error in this function. There's a conditional report_fatal_error just above this llvm_unreachable. The optimizer when seeing the unreachable removes the conditional and just makes any other error trigger the existing report_fatal_error. llvm-svn: 343428	2018-09-30 23:43:30 +00:00
Simon Pilgrim	f21083870d	[X86] Fix scheduler class for BTmi instructions This wasn't treated as a folded load instruction llvm-svn: 343424	2018-09-30 20:19:16 +00:00
Craig Topper	99ad2a5723	[X86] Copy memrefs when folding a load for division instruction selection. llvm-svn: 343419	2018-09-30 17:47:18 +00:00
Simon Pilgrim	4f5693ac8d	[X86][Btver2] Fix PCmpIStrI/PCmpIStrM schedules Missing JFPU0 pipe and double JFPU1 pipe (to match JVALU1) resources Match AMD Fam16h SOG + llvm-exegesis tests llvm-svn: 343413	2018-09-30 16:38:38 +00:00
Simon Pilgrim	9cec221a1c	[X86][BtVer2] Add the ability to add additional uops for folded instructions Some instructions take an extra load uop - but not consistently..... llvm-svn: 343410	2018-09-30 15:58:56 +00:00
Craig Topper	1709829fed	[X86] Disable BMI BEXTR in X86DAGToDAGISel::matchBEXTRFromAnd unless we're on compiling for a CPU with single uop BEXTR Summary: This function turns (X >> C1) & C2 into a BMI BEXTR or TBM BEXTRI instruction. For BMI BEXTR we have to materialize an immediate into a register to feed to the BEXTR instruction. The BMI BEXTR instruction is 2 uops on Intel CPUs. It looks like on SKL its one port 0/6 uop and one port 1/5 uop. Despite what Agner's tables say. I know one of the uops is a regular shift uop so it would have to go through the port 0/6 shifter unit. So that's the same or worse execution wise than the shift+and which is one 0/6 uop and one 0/1/5/6 uop. The move immediate into register is an additional 0/1/5/6 uop. For now I've limited this transform to AMD CPUs which have a single uop BEXTR. If may also might make sense if we can fold a load or if the and immediate is larger than 32-bits and can't be encoded as a sign extended 32-bit value or if LICM or CSE can hoist the move immediate and share it. But we'd need to look more carefully at that. In the regression I looked at it doesn't look load folding or large immediates were occurring so the regression isn't caused by the loss of those. So we could try to be smarter here if we find a compelling case. Reviewers: RKSimon, spatel, lebedev.ri, andreadb Reviewed By: RKSimon Subscribers: llvm-commits, andreadb, RKSimon Differential Revision: https://reviews.llvm.org/D52570 llvm-svn: 343399	2018-09-30 03:01:46 +00:00
Simon Pilgrim	a2efe82b81	[X86] SimplifyDemandedVectorEltsForTargetNode - remove identity target shuffles before simplifying inputs By removing demanded target shuffles that simplify to zero/undef/identity before simplifying its inputs we improve chances of further simplification, as only the immediate parent user of the combined is added back to the work list - this still doesn't help us if its passed through other ops though (bitcasts....). llvm-svn: 343390	2018-09-29 18:15:26 +00:00
Simon Pilgrim	a93407fadf	[X86][SSE] LowerScalarImmediateShift - remove 32-bit vXi64 special case handling. This is all handled generally by getTargetConstantBitsFromNode now llvm-svn: 343387	2018-09-29 17:36:22 +00:00
Simon Pilgrim	b5737007cd	Fix signed/unsigned mismatch warning. NFCI. llvm-svn: 343385	2018-09-29 17:11:19 +00:00
Simon Pilgrim	d633e290c8	[X86] getTargetConstantBitsFromNode - add support for rearranging constant bits via shuffles Exposed an issue that recursive calls to getTargetConstantBitsFromNode don't handle changes to EltSizeInBits yet. llvm-svn: 343384	2018-09-29 17:01:55 +00:00
Simon Pilgrim	ae34ae12ef	[X86][SSE] LowerScalarImmediateShift - use getTargetConstantBitsFromNode to get immediate data Don't just attempt to find a splat build vector. First step towards getting rid of all the 32-bit special case code. llvm-svn: 343383	2018-09-29 16:40:35 +00:00
Simon Pilgrim	a731940c60	[X86] getTargetConstantBitsFromNode - fix self-move assertions from gcc builds due to rL343375 llvm-svn: 343377	2018-09-29 14:51:09 +00:00
Simon Pilgrim	22d51014af	[X86] getTargetConstantBitsFromNode - add support for peeking through ISD::EXTRACT_SUBVECTOR llvm-svn: 343375	2018-09-29 14:17:32 +00:00
Simon Pilgrim	aa77033a6b	[X86][SSE] Fixed issue with v2i64 variable shifts on 32-bit targets The shift amount might have peeked through a extract_subvector, altering the number of vector elements in the 'Amt' variable - so we were incorrectly calculating the ratio when peeking through bitcasts, resulting in incorrectly detecting splats. llvm-svn: 343373	2018-09-29 13:25:22 +00:00
Simon Pilgrim	428c1196d8	[X86][Btver2] PSUBS/PSUBUS instructions are zero-idioms Noticed during llvm-exegesis tests, the PSUBS/PSUBUS instructions have the same zero-idiom behaviour to PSUB llvm-svn: 343321	2018-09-28 14:20:42 +00:00
Simon Pilgrim	66da1ed29d	[X86][Btver2] CVTSS2I/CVTSD2I - add missing JFPU0 pipe We issue JFPU1->JSTC then JFPU0->JFPA then -> JALU0 (integer pipe) Match AMD Fam16h SOG + llvm-exegesis tests llvm-svn: 343314	2018-09-28 13:19:22 +00:00
Simon Pilgrim	17e5981ebf	[X86][Btver2] Fix BSF/BSR schedule Double throughput to account for 2 pipes + fix BSF's latency/uop counts Match AMD Fam16h SOG + llvm-exegesis tests llvm-svn: 343311	2018-09-28 10:26:48 +00:00
Simon Pilgrim	280af1c7f0	[X86][BtVer2] Fix PHMINPOS schedule resources typo PHMINPOS can run on either JFPU pipe llvm-svn: 343299	2018-09-28 08:21:39 +00:00
Simon Pilgrim	2a64d393ea	[X86] Remove BT/BTC/BTR/BTS rr/ri overrides llvm-svn: 343241	2018-09-27 17:29:13 +00:00
Simon Pilgrim	86c7b07ecd	[X86][Btver2] (V)MPSADBW instructions take 3uops not 1 llvm-svn: 343238	2018-09-27 17:13:57 +00:00
Simon Pilgrim	dd744f158a	[X86][Btver2] BTC/BTR/BTS instructions take 2uops not 1 llvm-svn: 343234	2018-09-27 16:39:52 +00:00
Simon Pilgrim	29cf499bca	[X86] Split BT and BTC/BTR/BTS scheduler classes llvm-svn: 343233	2018-09-27 16:24:42 +00:00
Simon Pilgrim	c2a88ea64e	[X86][Btver2] BLSI/BLSMSK/BLSR instructions take 2uops not 1 (same as TZCNT) llvm-svn: 343227	2018-09-27 14:57:57 +00:00
Simon Pilgrim	98f503a326	[X86][Btver2] TZCNT instructions take 2uops not 1 llvm-svn: 343200	2018-09-27 12:28:47 +00:00
Simon Pilgrim	7e4f154e79	[X86][Btver2] Add uops counter for exegesis reports llvm-svn: 343194	2018-09-27 11:40:26 +00:00
Fangrui Song	0cac726a00	llvm::sort(C.begin(), C.end(), ...) -> llvm::sort(C, ...) Summary: The convenience wrapper in STLExtras is available since rL342102. Reviewers: dblaikie, javed.absar, JDevlieghere, andreadb Subscribers: MatzeB, sanjoy, arsenm, dschuff, mehdi_amini, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, javed.absar, gbedwell, jrtc27, mgrang, atanasyan, steven_wu, george.burgess.iv, dexonsmith, kristina, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D52573 llvm-svn: 343163	2018-09-27 02:13:45 +00:00
Simon Pilgrim	ebabd79f43	[X86][SSE] canReduceVMulWidth - use ComputeNumSignBits/SignBitIsZero directly Don't reinvent the wheel for BUILD_VECTOR/ZERO_EXTEND - its only the ANY_EXTEND special case that needs handling. llvm-svn: 343096	2018-09-26 11:48:52 +00:00
Clement Courbet	596c56ff9c	[llvm-exegesis] Add support for measuring NumMicroOps. Summary: Example output for vzeroall: --- mode: uops key: instructions: - 'VZEROALL' config: '' register_initial_values: cpu_name: haswell llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { debug_string: HWPort0, value: 0.0006, per_snippet_value: 0.0006, key: '3' } - { debug_string: HWPort1, value: 0.0011, per_snippet_value: 0.0011, key: '4' } - { debug_string: HWPort2, value: 0.0004, per_snippet_value: 0.0004, key: '5' } - { debug_string: HWPort3, value: 0.0018, per_snippet_value: 0.0018, key: '6' } - { debug_string: HWPort4, value: 0.0002, per_snippet_value: 0.0002, key: '7' } - { debug_string: HWPort5, value: 1.0019, per_snippet_value: 1.0019, key: '8' } - { debug_string: HWPort6, value: 1.0033, per_snippet_value: 1.0033, key: '9' } - { debug_string: HWPort7, value: 0.0001, per_snippet_value: 0.0001, key: '10' } - { debug_string: NumMicroOps, value: 20.0069, per_snippet_value: 20.0069, key: NumMicroOps } error: '' info: '' assembled_snippet: C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C3 ... Reviewers: gchatelet Subscribers: tschuett, RKSimon, andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D52539 llvm-svn: 343094	2018-09-26 11:22:56 +00:00
Simon Pilgrim	5beaac433d	[X86][SSE] Use ISD::MULHS for constant vXi16 ISD::SRA lowering (PR38151) Similar to the existing ISD::SRL constant vector shifts from D49562, this patch adds ISD::SRA support with ISD::MULHS. As we're dealing with signed values, we have to handle shift by zero and shift by one special cases, so XOP+AVX2/AVX512 splitting/extension is still a better solution - really we should still use ISD::MULHS if one of the special cases are used but for now I've just left a TODO and filtered by isKnownNeverZero. Differential Revision: https://reviews.llvm.org/D52171 llvm-svn: 343093	2018-09-26 10:57:05 +00:00
Craig Topper	12c18840fa	[X86] Allow movmskpd/ps ISD nodes to be created and selected with integer input types. This removes an int->fp bitcast between the surrounding code and the movmsk. I had already added a hack to combineMOVMSK to try to look through this bitcast to improve the SimplifyDemandedBits there. But I found an additional issue where the bitcast was preventing combineMOVMSK from being called again after earlier nodes in the DAG are optimized. The bitcast gets revisted, but not the user of the bitcast. By using integer types throughout, the bitcast doesn't get in the way. llvm-svn: 343046	2018-09-25 23:28:27 +00:00

1 2 3 4 5 ...

17985 Commits