Commit Graph

1756 Commits

Author SHA1 Message Date
Craig Topper dea0b88b04 [X86] Remove X86ISD::MOVLPS and X86ISD::MOVLPD. NFCI
These ISD nodes try to select the MOVLPS and MOVLPD instructions which are special load only instructions. They load data and merge it into the lower 64-bits of an XMM register. They are logically equivalent to our MOVSD node plus a load.

There was only one place in X86ISelLowering that used MOVLPD and no places that selected MOVLPS. The one place that selected MOVLPD had to choose between it and MOVSD based on whether there was a load. But lowering is too early to tell if the load can really be folded. So in isel we have patterns that use MOVSD for MOVLPD if we can't find a load.

We also had patterns that select the MOVLPD instruction for a MOVSD if we can find a load, but didn't choose the MOVLPD ISD opcode for some reason.

So it seems better to just standardize on MOVSD ISD opcode and manage MOVSD vs MOVLPD instruction with isel patterns.

llvm-svn: 336728
2018-07-10 21:00:22 +00:00
Craig Topper db73f56489 [X86] Remove some seemingly unnecessary patterns.
We're missing the EVEX equivalents of these patterns and seem to get along fine.

I think we end up with X86vzload for the obvious IR cases that would produce this DAG.

llvm-svn: 336638
2018-07-10 05:31:42 +00:00
Craig Topper e9cff7d47b [X86] Remove some patterns that include a bitcast of a floating point load to an integer type.
DAG combine should have converted the type of the load.

llvm-svn: 336557
2018-07-09 16:03:02 +00:00
Craig Topper 16ee4b4957 [X86] Remove some patterns that seems to be unreachable.
These patterns mapped (v2f64 (X86vzmovl (v2f64 (scalar_to_vector FR64:$src)))) to a MOVSD and an zeroing XOR. But the complexity of a pattern for (v2f64 (X86vzmovl (v2f64))) that selects MOVQ is artificially and hides this MOVSD pattern.

Weirder still, the SSE version of the pattern was explicitly blocked on SSE41, but yet we had copied it to AVX and AVX512.

llvm-svn: 336556
2018-07-09 16:03:01 +00:00
Craig Topper 22330c700b [X86] Remove some seemingly unnecessary AddedComplexity lines.
Looking at the generated tables this didn't seem to make an obvious difference in pattern priority.

llvm-svn: 336555
2018-07-09 16:02:59 +00:00
Craig Topper c98c675f03 [X86] Remove an AddedComplexity line that seems unnecessary.
It only existed on SSE and AVX version. AVX512 version didn't have it.

I checked the generated table and this didn't seem necessary to creat a match preference.

llvm-svn: 336516
2018-07-08 22:57:33 +00:00
Craig Topper f61c631b25 [X86] Remove patterns for MOVLPD/MOVLPS nodes with integer types.
Lowering shouldn't generate these. If we need to use them for integer types, it should use a bitcast.

llvm-svn: 336458
2018-07-06 18:47:57 +00:00
Craig Topper 56440b9745 [X86] Don't use aligned load/store instructions for fp128 if the load/store isn't aligned.
Similarily, don't fold fp128 loads into SSE instructions if the load isn't aligned. Unless we're targeting an AMD CPU that doesn't check alignment on arithmetic instructions.

Should fix PR38001

llvm-svn: 336121
2018-07-02 17:01:54 +00:00
Craig Topper 7ffa976993 [X86] Don't fold unaligned loads into SSE ROUNDPS/ROUNDPD for ceil/floor/nearbyint/rint/trunc.
Incorrect patterns were added in r334460. This changes them to check alignment properly for SSE.

llvm-svn: 335062
2018-06-19 17:51:42 +00:00
Mikhail Dvoretckii b1ce7765be [X86] VRNDSCALE* folding from masked and scalar ffloor and fceil patterns
This patch handles back-end folding of generic patterns created by lowering the
X86 rounding intrinsics to native IR in cases where the instruction isn't a
straightforward packed values rounding operation, but a masked operation or a
scalar operation.

Differential Revision: https://reviews.llvm.org/D45203

llvm-svn: 335037
2018-06-19 10:37:52 +00:00
Craig Topper 16fdde5e63 [X86] Add '.s' aliases to the assembler for the various redundant move encodings to match gas and our EVEX instructions.
We already have these aliases for EVEX enocded instructions, but not for the GPR, MMX, SSE, and VEX versions.

Also remove the vpextrw.s EVEX alias. That's not something gas implements.

llvm-svn: 334922
2018-06-18 05:00:50 +00:00
Craig Topper 29f22d7baa [X86] More additions to the load folding tables based on the autogenerated tables.
Including more additions for NotMemoryFoldable to remove some entries from the autogenerated table.

llvm-svn: 334898
2018-06-16 23:25:50 +00:00
Tomasz Krupa bcaab53d47 [X86] Lowering sqrt intrinsics to native IR
Summary: Complementary patch to lowering sqrt intrinsics in Clang.

Reviewers: craig.topper, spatel, RKSimon, DavidKreitzer, uriel.k

Reviewed By: craig.topper

Subscribers: tkrupa, mike.dvoretsky, llvm-commits

Differential Revision: https://reviews.llvm.org/D41599

llvm-svn: 334849
2018-06-15 18:05:24 +00:00
Craig Topper 9f829f76e8 [X86] Remove NotMemoryFoldable from some AVX/AVX512 scalar instructions.
Some of these instructions are already in the manual folding table so we should have them in the auto table too.

llvm-svn: 334725
2018-06-14 15:40:27 +00:00
Craig Topper b2552e1e08 [x86] fix mappings of cvttp2si/cvttp2ui x86 intrinsics to x86-specific nodes and isel patterns (PR37551)
Summary:
The tests in:
https://bugs.llvm.org/show_bug.cgi?id=37751
...show miscompiles because we wrongly mapped and folded x86-specific intrinsics into generic DAG nodes.

This patch corrects the mappings in X86IntrinsicsInfo.h and adds isel matching corresponding to the new patterns. The complete tests for the failure cases should be in avx-cvttp2si.ll and sse-cvttp2si.ll and avx512-cvttp2i.ll

Reviewers: RKSimon, gbedwell, spatel

Reviewed By: spatel

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D47993

llvm-svn: 334685
2018-06-14 03:16:58 +00:00
Craig Topper 957b738432 [X86] Add isel patterns for folding loads when creating ROUND instructions from ffloor/fnearbyint/fceil/frint/ftrunc.
We were missing packed isel folding patterns for all of sse41, avx, and avx512.

For some reason avx512 had scalar load folding patterns under optsize(due to partial/undef reg update), but we didn't have the equivalent sse41 and avx patterns.

Sometimes we would get load folding due to peephole pass anyway, but we're also missing avx512 instructions from the load folding table. I'll try to fix that in another patch.

Some of this was spotted in the review for D47993.

This patch adds all the folds to isel, adds a few spot tests, and disables the peephole pass on a few tests to ensure we're testing some of these patterns.

llvm-svn: 334460
2018-06-12 00:48:57 +00:00
Craig Topper 860562c915 [X86] Miscellaneous fixes to get the load folding table generator to work again.
llvm-svn: 334377
2018-06-10 21:48:24 +00:00
Nicolai Haehnle 01d261f18d TableGen: Streamline the semantics of NAME
Summary:
The new rules are straightforward. The main rules to keep in mind
are:

1. NAME is an implicit template argument of class and multiclass,
   and will be substituted by the name of the instantiating def/defm.

2. The name of a def/defm in a multiclass must contain a reference
   to NAME. If such a reference is not present, it is automatically
   prepended.

And for some additional subtleties, consider these:

3. defm with no name generates a unique name but has no special
   behavior otherwise.

4. def with no name generates an anonymous record, whose name is
   unique but undefined. In particular, the name won't contain a
   reference to NAME.

Keeping rules 1&2 in mind should allow a predictable behavior of
name resolution that is simple to follow.

The old "rules" were rather surprising: sometimes (but not always),
NAME would correspond to the name of the toplevel defm. They were
also plain bonkers when you pushed them to their limits, as the old
version of the TableGen test case shows.

Having NAME correspond to the name of the toplevel defm introduces
"spooky action at a distance" and breaks composability:
refactoring the upper layers of a hierarchy of nested multiclass
instantiations can cause unexpected breakage by changing the value
of NAME at a lower level of the hierarchy. The new rules don't
suffer from this problem.

Some existing .td files have to be adjusted because they ended up
depending on the details of the old implementation.

Change-Id: I694095231565b30f563e6fd0417b41ee01a12589

Reviewers: tra, simon_tatham, craig.topper, MartinO, arsenm, javed.absar

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D47430

llvm-svn: 333900
2018-06-04 14:26:05 +00:00
Alexander Ivchenko 96062eaa8e [X86] Scalar mask and scalar move optimizations
1. Introduction of mask scalar TableGen patterns.
2. Introduction of new scalar move TableGen patterns
   and refactoring of existing ones.
3. Folding of pattern created by introducing scalar
   masking in Clang header files.

Patch by tkrupa

Differential Revision: https://reviews.llvm.org/D47012

llvm-svn: 333419
2018-05-29 14:27:11 +00:00
Petar Jovanovic c051000b83 [X86][MIPS][ARM] New machine instruction property 'isMoveReg'
This property is needed in order to follow values movement between
registers. This property is used in TII to implement method that
returns true if simple copy like instruction is recognized, along
with source and destination machine operands.

Patch by Nikola Prica.

Differential Revision: https://reviews.llvm.org/D45204

llvm-svn: 333093
2018-05-23 15:28:28 +00:00
Simon Pilgrim 1273f4ad93 [X86] Add GPR<->XMM Schedule Tags
BtVer2 - fix NumMicroOp and account for the Lat+6cy GPR->XMM and Lat+1cy XMm->GPR delays (see rL332737)

The high number of MOVD/MOVQ equivalent instructions meant that there were a number of missed patterns in SNB/Znver1:
SNB - add missing GPR<->MMX costs (taken from Agner / Intel AOM)
Znver1 - add missing GPR<->XMM MOVQ costs (taken from Agner)

llvm-svn: 332745
2018-05-18 17:58:36 +00:00
Simon Pilgrim c4b8d367a8 [X86][SSE] Ensure vector partial load/stores use the WriteVecLoad/WriteVecStore scheduler classes
Retag some instructions that were missed when we split off vector load/store/moves - MOVQ/MOVD etc.

Fixes BtVer2/SLM which have different behaviours for GPR stores.

llvm-svn: 332718
2018-05-18 14:08:01 +00:00
Simon Pilgrim e819199e2a [X86][AVX] VEXTRACTF128mr store is a WriteFStoreX not WriteFStore
llvm-svn: 332715
2018-05-18 13:17:51 +00:00
Simon Pilgrim d749b321b2 [X86][SSE] Ensure float load/stores use the WriteFLoad/WriteFStore scheduler classes
Retag some instructions that were missed when we split off vector load/store/moves - MOVSS/MOVSD/MOVHPD/MOVHPD/MOVLPD/MOVLPS etc.

Fixes BtVer2/SLM which have different behaviours for GPR stores.

llvm-svn: 332714
2018-05-18 13:13:59 +00:00
Craig Topper a2c5264718 [X86] Add OptForSize to a couple load folding patterns. Remove some bad FIXME comments.
The FIXME comments were about preventing load folding to avoid a partial xmm update. But these instructions use GPR as input when the load isn't folded. This won't help prevent a partial xmm update.

llvm-svn: 332573
2018-05-17 05:41:11 +00:00
Simon Pilgrim 5647e89f5a [X86] Split WriteCvtI2F/WriteCvtF2I into I<->F32 and I<->F64 scheduler classes
A lot of the models still have too many InstRW overrides for these new classes - this needs cleaning up but I wanted to get the classes in first

llvm-svn: 332451
2018-05-16 10:53:45 +00:00
Simon Pilgrim be9a206883 [X86] Split WriteCvtF2F into F32->F64 and F64->F32 scheduler classes
BtVer2 - Fixes schedules for (V)CVTPS2PD instructions

A lot of the Intel models still have too many InstRW overrides for these new classes - this needs cleaning up but I wanted to get the classes in first

llvm-svn: 332376
2018-05-15 17:36:49 +00:00
Simon Pilgrim 891ebcdbaa [X86] Split off F16C WriteCvtPH2PS/WriteCvtPS2PH scheduler classes
Btver2 - VCVTPH2PSYrm needs to double pump the AGU
Broadwell - missing VCVTPS2PH*mr stores extra latency

Allows us to remove the WriteCvtF2FSt conversion store class

llvm-svn: 332357
2018-05-15 14:12:32 +00:00
Simon Pilgrim 215ce4a1ca [X86] Add NT load/store scheduler classes
llvm-svn: 332274
2018-05-14 18:37:19 +00:00
Craig Topper 266b7ae55d [X86] Cleanup a multiclass that doesn't need as many parameters after recent intrinsic removals.
llvm-svn: 332207
2018-05-14 00:17:52 +00:00
Craig Topper 38b713d4a7 [X86] Add some load folding patterns for cvtsi2ss/sd into intrinsic instructions.
llvm-svn: 332189
2018-05-13 01:54:33 +00:00
Craig Topper df3a9cedff [X86] Remove an autoupgrade legacy cvtss2sd intrinsics.
llvm-svn: 332187
2018-05-13 00:29:40 +00:00
Craig Topper 38ad7ddabc [X86] Remove and autoupgrade cvtsi2ss/cvtsi2sd intrinsics to match what clang has used for a very long time.
llvm-svn: 332186
2018-05-12 23:14:39 +00:00
Simon Pilgrim ead11e4d4b [X86] Added scheduler helper classes to split move/load/store by size
Nothing uses this yet but this will allow us to specialize MMX/XMM/YMM/ZMM vector moves.

llvm-svn: 332090
2018-05-11 12:46:54 +00:00
Simon Pilgrim ab34aa8294 [X86] Cleanup WriteFStore/WriteVecStore schedules
MOVNTPD/MOVNTPS should be WriteFStore

Standardized BDW/HSW/SKL/SKX WriteFStore/WriteVecStore - fixes some missed instregex patterns. (V)MASKMOVDQU was already using the default, its costs gets increased but is still nowhere near the real cost of that nasty instruction....

llvm-svn: 331864
2018-05-09 11:01:16 +00:00
Simon Pilgrim b0a3be04ec [X86] Add vector masked load/store scheduler classes (PR32857)
Split off from existing vector load/store classes to remove InstRW overrides.

llvm-svn: 331760
2018-05-08 12:17:55 +00:00
Simon Pilgrim 210286ed8f [X86] Add SchedWriteFTest/SchedWriteVecTest TEST scheduler classes
Split off from SchedWriteVecLogic to remove InstRW overrides.

llvm-svn: 331757
2018-05-08 10:28:03 +00:00
Simon Pilgrim 1233e1234a [X86] Split WriteFAdd/WriteFCmp/WriteFMul schedule classes
Split to support single/double for scalar, XMM and YMM/ZMM instructions - removing InstrRW overrides for these instructions.

Fixes Atom ADDSUBPD instruction and reclassifies VFPCLASS as WriteFCmp which is closer in behaviour.

llvm-svn: 331672
2018-05-07 20:52:53 +00:00
Simon Pilgrim e480ed0b9f [X86][AVX2] Tag VPMOVSX/VPMOVZX ymm instructions as WriteShuffle256
These are more like cross-lane shuffles than regular shuffles - we already do this for AVX512 equivalents.

Differential Revision: https://reviews.llvm.org/D46229

llvm-svn: 331659
2018-05-07 18:25:19 +00:00
Simon Pilgrim ac5d0a31ef [X86] Split WriteFDiv schedule classes to support single/double scalar, XMM and YMM/ZMM instructions.
This removes all InstrRW overrides for these instructions - some x87 overrides remain but most use default (and realistic) values.

llvm-svn: 331643
2018-05-07 16:15:46 +00:00
Simon Pilgrim f3ae50fca2 [X86] Split WriteFRcp/WriteFRsqrt/WriteFSqrt schedule classes
WriteFRcp/WriteFRsqrt are split to support scalar, XMM and YMM/ZMM instructions.

WriteFSqrt is split into single/double/long-double sizes and scalar, XMM, YMM and ZMM instructions.

This removes all InstrRW overrides for these instructions.

NOTE: There were a couple of typos in the Znver1 model - notably a 1cy throughput for SQRT that is highly unlikely and doesn't tally with Agner.

NOTE: I had to add Agner's numbers for several targets for WriteFSqrt80.
llvm-svn: 331629
2018-05-07 11:50:44 +00:00
Simon Pilgrim bf4c8c0ff2 [X86] Add WriteVecMOVMSKY scheduler class
llvm-svn: 331525
2018-05-04 14:54:33 +00:00
Simon Pilgrim be51b20127 [X86] Add SchedWriteFRnd fp rounding scheduler classes
Split off from SchedWriteFAdd for fp rounding/bit-manipulation instructions.

Fixes an issue on btver2 which only had the ymm version using the JSTC pipe instead of JFPA.

llvm-svn: 331515
2018-05-04 12:59:24 +00:00
Simon Pilgrim 542b20d656 [X86] Add WriteDPPD/WriteDPPS dot product scheduler classes
llvm-svn: 331489
2018-05-03 22:31:19 +00:00
Simon Pilgrim f2d2cedab4 [X86] Split WriteVecShift/WriteVarVecShift into MMX, XMM and YMM/ZMM scheduler classes
This took a bit of extra work as on Intel targets the old (V)PSLLDrr/(V)PSLLDrm style instructions act differently - I ended up creating WriteVecShiftImm classes for XMM/YMM/ZMM vector shift by immediate and retaining WriteVecShift as the default (used only by MMX) plus WriteVecShiftX/WriteVecShiftY. X86SchedWriteWidths hides most of this thank goodness.

llvm-svn: 331472
2018-05-03 17:56:43 +00:00
Simon Pilgrim f7dd6069a5 [X86] Split WriteVecALU/WritePHAdd into XMM and YMM/ZMM scheduler classes
llvm-svn: 331453
2018-05-03 13:27:10 +00:00
Simon Pilgrim e8671ef434 [X86] Convert most remaining uses of X86SchedWritePair scheduler classes to X86SchedWriteWidths.
We've dealt with the majority already.

llvm-svn: 331347
2018-05-02 12:27:54 +00:00
Simon Pilgrim c708868cb1 [X86] Split WriteFRcp/WriteFRsqrt/WriteFSqrt into XMM and YMM/ZMM scheduler classes
llvm-svn: 331290
2018-05-01 18:06:07 +00:00
Simon Pilgrim c546f9424f [X86] Split WriteFCmp into XMM and YMM/ZMM scheduler classes
Removes more WriteFCmp InstRW overrides

llvm-svn: 331283
2018-05-01 16:50:16 +00:00
Simon Pilgrim 1b7a80d80a [X86] Convert all uses of WriteFAdd to X86SchedWriteWidths.
In preparation of splitting WriteFAdd by vector width.

llvm-svn: 331273
2018-05-01 15:57:17 +00:00
Simon Pilgrim f6b81dae9e [X86] Convert all uses of WriteFShuffle to X86SchedWriteWidths.
In preparation of splitting WriteFShuffle by vector width.

llvm-svn: 331262
2018-05-01 14:14:42 +00:00
Simon Pilgrim 6f710a6440 [X86] Convert all uses of WriteFLogic/WriteVecLogic to X86SchedWriteWidths.
In preparation of splitting WriteVecLogic by vector width.

llvm-svn: 331256
2018-05-01 12:15:29 +00:00
Simon Pilgrim fc0c26f1a6 [X86] Tag PSLLDQ/PSRLDQ as WriteShuffle scheduler classes instead of shifts.
Although they are encoded similar to bit shifts, the byte shifts behave like shuffles from a scheduling point of view.

llvm-svn: 331253
2018-05-01 11:05:42 +00:00
Simon Pilgrim 3c35408e48 [X86] Introduce X86SchedWriteWidths schedule wrapper for different vector widths.
We need to split most of the scheduler classes by vector width to remove more of the InstRW overrides, this patch should make this easier/tidier by allowing us to pass the X86SchedWriteWidths wrapper to multi-width multiclasses and then split as required.

I've included fields for Scl (scalar float/double), MMX (MMX integer), XMM, YMM and ZMM widths. These fields mostly share the same classes but it should give us the flexibility that we may need in the future.

This patch has replaced a set of example SSE/AVX512 instruction cases but isn't exhaustive as it gets very noisy before we really need the functionality.

Differential Revision: https://reviews.llvm.org/D46266

llvm-svn: 331208
2018-04-30 18:18:38 +00:00
Craig Topper 06624e1a93 [X86] Restrict many of the InstAliases to either to only att or intel syntax. NFCI
Many of these aliases exist to give one syntax or the other a slightly different mnemonic and the other variant gets a duplicate of its normal mnemonic

This patch restricts a lot of these to only one variant so we don't get the duplication.

This removes a lot of duplicate entries from the matcher table. It also reduces the number of warnings printed when you enable the ambiguous match warning in tablegen.

llvm-svn: 331117
2018-04-28 18:46:11 +00:00
Simon Pilgrim 9f561dd54a [X86][SSE] Stop hard coding some instruction scheduler classes.
Make these arguments to the multiclass to allow easier specialization.

llvm-svn: 331107
2018-04-28 14:08:51 +00:00
Craig Topper d656410293 [X86] Make the STTNI flag intrinsics use the flags from pcmpestrm/pcmpistrm if the mask instrinsics are also used in the same basic block.
Summary:
Previously the flag intrinsics always used the index instructions even if a mask instruction also exists.

To fix fix this I've created a single ISD node type that returns index, mask, and flags. The SelectionDAG CSE process will merge all flavors of intrinsics with the same inputs to a s ingle node. Then during isel we just have to look at which results are used to know what instruction to generate. If both mask and index are used we'll need to emit two instructions. But for all other cases we can emit a single instruction.

Since I had to do manual isel anyway, I've removed the pseudo instructions and custom inserter code that was working around tablegen limitations with multiple implicit defs.

I've also renamed the recently added sse42.ll test case to sttni.ll since it focuses on that subset of the sse4.2 instructions.

Reviewers: chandlerc, RKSimon, spatel

Reviewed By: chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D46202

llvm-svn: 331091
2018-04-27 22:15:33 +00:00
Simon Pilgrim 8a937e00d8 [X86] Split WriteFBlend/WriteFVarBlend/WriteFVarShuffle into XMM and YMM/ZMM scheduler classes
This removes all the WriteFBlend/WriteFVarBlend InstRW overrides - some WriteFVarShuffle remain to be fixed.

llvm-svn: 331065
2018-04-27 18:19:48 +00:00
Simon Pilgrim c3c767bf50 [X86] Split WriteFHadd into XMM and YMM/ZMM scheduler classes
This removes all the HADD/HSUB PS/PD InstRW overrides.

llvm-svn: 331054
2018-04-27 16:11:57 +00:00
Simon Pilgrim b2aa89c909 [X86][AVX] Split WriteFLogic into XMM and YMM/ZMM scheduler classes
This removes all the AND/ANDN/OR/XOR PS/PD InstRW overrides.

llvm-svn: 331051
2018-04-27 15:50:33 +00:00
Craig Topper b0227189fd [X86] Remove alignment restriction on loading folding of pcmp[ei]str* during isel too.
This is a follow up to the changes in r330896 which enabled folding after isel during peephole and register allocation.

llvm-svn: 330897
2018-04-26 03:53:39 +00:00
Simon Pilgrim 27bc83e228 [X86] Split off PHMINPOSUW to their own schedule class
This also fixes Jaguar's schedule which was treating it as the WriteVecIMul default. 

llvm-svn: 330756
2018-04-24 18:49:25 +00:00
Simon Pilgrim f0945aa0e0 [X86][F16C] Add WriteCvtF2FSt scheduling class
Fixes the classification of VCVTPS2PHmr/VCVTPS2PHYmr which were tagged as WriteCvtF2FLd_WriteRMW (PR36887)

llvm-svn: 330737
2018-04-24 16:43:07 +00:00
Simon Pilgrim f7d2a93d5f [X86] Add vector element insertion/extraction scheduler classes
Split off pinsr/pextr and extractps instructions.

(Mostly) fixes PR36887.

Note: It might be worth adding a WriteFInsertLd class as well in the future.

Differential Revision: https://reviews.llvm.org/D45929

llvm-svn: 330714
2018-04-24 13:21:41 +00:00
Craig Topper 3f1d538165 [X86] Add VEX_WIG to VEX encoded version of VCMPPSY/VCMPPDY.
llvm-svn: 330563
2018-04-23 04:50:01 +00:00
Simon Pilgrim 2fd8269c6f [X86][MMX][SSE] Tag missed PHADD/PHSUB instructions with WritePHAdd
llvm-svn: 330545
2018-04-22 15:02:23 +00:00
Craig Topper e958c7270e [X86] Change TB to PS on LFENCE instruction.
This matches the other FENCE instructions.

llvm-svn: 330533
2018-04-22 03:15:02 +00:00
Simon Pilgrim 58ddaeabe2 [X86][AVX] VPERM2F128/VINSERTF128 should be a shuffle256 schedule like VPERM2I128/VINSERTI128
llvm-svn: 330522
2018-04-21 20:04:24 +00:00
Craig Topper 05242bf691 [X86] Add SchedWrites for LDMXCSR/STMXCSR.
llvm-svn: 330517
2018-04-21 18:07:36 +00:00
Simon Pilgrim d14d2e7b18 [X86] Add WriteFSign/WriteFLogic scheduler classes
Split the fp and integer vector logical instruction scheduler classes - older CPUs especially often handled these on different pipes.

This unearthed a couple of things that are also handled in this patch:

(1) We were tagging avx512 fp logic ops as WriteFAdd, probably because of the lack of WriteFLogic
(2) SandyBridge had integer logic ops only using Port5, when afaict they can use Ports015.
(3) Cleaned up x86 FCHS/FABS scheduling as they are typically treated as fp logic ops.

Differential Revision: https://reviews.llvm.org/D45629

llvm-svn: 330480
2018-04-20 21:16:05 +00:00
Gabor Buella 31fa8025ba [X86] WaitPKG instructions
Three new instructions:

umonitor - Sets up a linear address range to be
monitored by hardware and activates the monitor.
The address range should be a writeback memory
caching type.

umwait - A hint that allows the processor to
stop instruction execution and enter an
implementation-dependent optimized state
until occurrence of a class of events.

tpause - Directs the processor to enter an
implementation-dependent optimized state
until the TSC reaches the value in EDX:EAX.

Also modifying the description of the mfence
instruction, as the rep prefix (0xF3) was allowed
before, which would conflict with umonitor during
disassembly.

Before:
$ echo 0xf3,0x0f,0xae,0xf0 | llvm-mc -disassemble
.text
mfence

After:
$ echo 0xf3,0x0f,0xae,0xf0 | llvm-mc -disassemble
.text
umonitor        %rax

Reviewers: craig.topper, zvi

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D45253

llvm-svn: 330462
2018-04-20 18:42:47 +00:00
Craig Topper e56a2fc5e7 [X86] Add separate scheduling class for PSADBW instruction.
llvm-svn: 330204
2018-04-17 19:35:19 +00:00
Simon Pilgrim 86e3c26924 [X86] Add FP comparison scheduler classes
Split VCMP/VMAX/VMIN instructions off to WriteFCmp and VCOMIS instructions off to WriteFCom instead of assuming they match WriteFAdd

Differential Revision: https://reviews.llvm.org/D45656

llvm-svn: 330179
2018-04-17 07:22:44 +00:00
Simon Pilgrim 21e89795cc [X86] Remove remaining OpndItins/SizeItins from all instruction defs (PR37093)
llvm-svn: 330022
2018-04-13 14:36:59 +00:00
Simon Pilgrim ae0c2711b6 [X86] Remove OpndItins/SizeItins from all sse instruction defs (PR37093)
llvm-svn: 330013
2018-04-13 12:50:31 +00:00
Simon Pilgrim 1f070c334c [X86] Remove unused MoveLoadStoreItins/ShiftOpndItins schedule class wrappers.
Was being used to move around empty/unused itineraries...

llvm-svn: 329970
2018-04-12 22:57:34 +00:00
Simon Pilgrim 6551d405dc [X86] Remove x86 InstrItinClass entries (PR37093)
This removes the last of the x86 schedule itineraries, I'm intending to cleanup the remaining uses of NoItinerary/OpndItins/etc. before resolving PR37093.

llvm-svn: 329967
2018-04-12 22:44:47 +00:00
Simon Pilgrim 0e45634f4e [X86] Remove InstrItinClass entries from all x86 instruction defs (PR37093)
llvm-svn: 329953
2018-04-12 20:47:34 +00:00
Simon Pilgrim e9376b9fdc [X86] Remove InstrItinClass entries from SSE/AVX instructions defs (PR37093)
llvm-svn: 329945
2018-04-12 19:59:35 +00:00
Simon Pilgrim 577ae24feb [X86] Remove explicit SSE/AVX schedule itineraries from defs (PR37093)
llvm-svn: 329940
2018-04-12 19:25:07 +00:00
Simon Pilgrim 8904a86f65 [X86] Remove AES/CLMUL/CRC32/LDDQU/MOVNT/POPCNT/SHA schedule itineraries (PR37093)
llvm-svn: 329912
2018-04-12 14:31:42 +00:00
Simon Pilgrim 294556d40e [X86] Remove remaining system/special schedule itineraries (PR37093)
llvm-svn: 329906
2018-04-12 12:43:49 +00:00
Simon Pilgrim 89c8a10f7c [X86] Add variable shuffle schedule classes
Split variable index shuffles from immediate index shuffles

WriteFVarShuffle - variable 'in-lane' shuffles (VPERMILPS/VPERMIL2PS etc.)
WriteVarShuffle - variable 'in-lane' shuffles (PSHUFB/VPPERM etc.)

WriteFVarShuffle256 - variable 'cross-lane' shuffles (VPERMPS etc.)
WriteVarShuffle256 - variable 'cross-lane' shuffles (VPERMD etc.)

Differential Revision: https://reviews.llvm.org/D45404

llvm-svn: 329806
2018-04-11 13:49:19 +00:00
Simon Pilgrim 6131286553 [X86][SSE] Fix f32 mul/div itinerary groups typo
The RM folded itineraries were incorrectly using the f64 version.

llvm-svn: 329556
2018-04-09 10:45:53 +00:00
Craig Topper 6ecdb03f16 [X86] Use WriteFShuffle256 for VEXTRACTF128 to be consistent with VEXTRACTI128 which uses WriteShuffle256.
llvm-svn: 329310
2018-04-05 16:32:48 +00:00
Craig Topper 15303dda0d [X86] Revert r329251-329254
It's failing on the bots and I'm not sure why.

This reverts:

[X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents.
[X86] Use WriteFShuffle256 for VEXTRACTF128 to be consistent with VEXTRACTI128 which uses WriteShuffle256.
[X86] Remove some InstRWs for plain store instructions on Sandy Bridge.
[X86] Auto-generate complete checks. NFC

llvm-svn: 329256
2018-04-05 05:19:36 +00:00
Craig Topper 4b1fdd4921 [X86] Use WriteFShuffle256 for VEXTRACTF128 to be consistent with VEXTRACTI128 which uses WriteShuffle256.
llvm-svn: 329253
2018-04-05 04:42:02 +00:00
Craig Topper a30db995b3 [X86] Use the same predicate for the load for PMOVSXBQ and PMOVZXBQ.
These both use a 16-bit load, but one used loadi16_anyext and the other used extloadi32i16. The only difference between them is that loadi16_anyext checked that the load was at least 2 byte aligned and non-volatile. But the alignment doesn't matter here. Just use extloadi32i16 for both.

llvm-svn: 329154
2018-04-04 07:00:24 +00:00
Craig Topper dc74094398 [X86] Fix the SchedRW for AVX512 shift instructions.
It was being inadvertently defaulted to an FADD scheduler class.

llvm-svn: 328959
2018-04-02 03:15:02 +00:00
Craig Topper c90d906b16 [X86] Give VINSERTPS the same intinerary as INSERTPS.
llvm-svn: 328954
2018-04-02 00:48:11 +00:00
Craig Topper 13a0f83a05 [X86] Add SchedRW for PMULLD
Summary:
It seems many CPUs don't implement this instruction as well as the other vector multiplies. Often using a multi uop flow. Silvermont in particular has a 7 uop flow with 11 cycle throughput. Sandy Bridge implements it as a single uop with 5 cycle latency and 1 cycle throughput. But Haswell and later use 2 uops with 10 cycle latency and 2 cycle throughput.

This patch adds a new X86SchedWritePair we can use to tag this instruction separately. I've provided correct information for Silvermont, Btver2, and Sandy Bridge. I've removed the InstRWs for SandyBridge. I've left Haswell/Broadwell/Skylake InstRWs in place because I wasn't sure how to account for the different load latency between 128 and 256 bits. I also left Znver1 InstRWs in place because the existing values don't match Agner's spreadsheet.

I also left a FIXME in the SandyBridge model because it being used for the "generic" model is too optimistic for the 256/512-bit versions since those are multiple uops on all known CPUs.

Reviewers: RKSimon, GGanesh, courbet

Reviewed By: RKSimon

Subscribers: gchatelet, gbedwell, andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D44972

llvm-svn: 328914
2018-03-31 04:54:32 +00:00
Craig Topper ee3c19fd7f [X86] Add ReadAfterLds to some 3 src instructions
Sometimes the operand comes after the memory operand so we need 5 ReadDefaults first.

I suspect we also need to do something for the mask operand for masked avx512 instructions? I'm not sure if the mask should be ReadAfterLd or not since it can mask faults. If it shouldn't be ReadAfterLd then we're probably wrong for zero masking instructions already.

Differential Revision: https://reviews.llvm.org/D44726

llvm-svn: 328834
2018-03-29 22:03:05 +00:00
Simon Pilgrim a2f26788a3 [X86] Add WriteFMOVMSK/WriteVecMOVMSK/WriteMMXMOVMSK scheduler classes
Currently MOVMSK instructions use the WriteVecLogic class, which is a very poor choice given that MOVMSK involves a SSE->GPR transfer.

Differential Revision: https://reviews.llvm.org/D44924

llvm-svn: 328664
2018-03-27 20:38:54 +00:00
Simon Pilgrim 28e7bcbba6 [X86] Add WriteCRC32 scheduler class
Currently CRC32 instructions use the WriteFAdd class, this patch splits them off into their own, at the moment it is still mostly just a duplicate of WriteFAdd but it can now be tweaked on a target by target basis.

Differential Revision: https://reviews.llvm.org/D44647

llvm-svn: 328582
2018-03-26 21:06:14 +00:00
Simon Pilgrim f33d905293 [X86] Add WriteBitScan/WriteLZCNT/WriteTZCNT/WritePOPCNT scheduler classes (PR36881)
Give the bit count instructions their own scheduler classes instead of forcing them into existing classes.

These were mostly overridden anyway, but I had to add in costs from Agner for silvermont and znver1 and the Fam16h SoG for btver2 (Jaguar).

Differential Revision: https://reviews.llvm.org/D44879

llvm-svn: 328566
2018-03-26 18:19:28 +00:00
Craig Topper 6f28d3c954 [X86] Fix the SchedRW for intrinsic register form of SQRT/RCP/RSQRT.
llvm-svn: 328474
2018-03-26 05:05:12 +00:00
Craig Topper fbf2d850e3 [X86] Add itinerary to intrinsic version of sqrtss, rcpss, and rsqrtss instructions.
llvm-svn: 328472
2018-03-26 04:20:36 +00:00
Craig Topper c049cb7823 [X86] Correct the itineraries for the dot production instructions.
llvm-svn: 328471
2018-03-26 02:17:15 +00:00
Craig Topper 4367874bc5 [X86] Use the same itinerary for VCVTDQ2PD as the SSE version so that the generated scheduler classes will merge.
llvm-svn: 328470
2018-03-26 02:17:14 +00:00
Craig Topper 659f85af14 [X86] Swap the itineraries on the memory and register forms of CVTDQ2PD.
They were backwards.

llvm-svn: 328469
2018-03-26 02:17:13 +00:00
Craig Topper 4bf23eddaf [X86] Give VMOVSX/ZX the same itinerary as the SSE version so they'll reuse the same generated scheduler class.
llvm-svn: 328468
2018-03-26 02:17:12 +00:00
Craig Topper 6e8d99bbea [X86] Give vpmsadbw the same itinerary as the SSE version so they'll be able to share the same generated scheduler class.
llvm-svn: 328466
2018-03-25 23:52:06 +00:00
Craig Topper 4529d3abcb [X86] Add itinerary to RCPSS*_Int and similar instructions.
llvm-svn: 328353
2018-03-23 19:15:05 +00:00
Craig Topper dfeea84d63 [X86] Give VPCMPEQQ the same itinerary as its SSE counterpart.
llvm-svn: 328296
2018-03-23 06:58:55 +00:00
Craig Topper 659c66dfc1 [X86] Match vpblendvb/vblendvps/vblendvpd itineraries to the SSE equivalent. Change pblendvb/blendvps/blendvpd to use WriteFVarBlend
llvm-svn: 328294
2018-03-23 06:41:41 +00:00
Craig Topper 7580a7997d [X86] Change VPSADBW itinerary to SSE_INTALU_ITINS_P to match the SSE version.
llvm-svn: 328293
2018-03-23 06:41:40 +00:00
Craig Topper d5ac3ae8d3 [X86] Give VLDDQUrm and LDDQUrm the same itinerary.
llvm-svn: 328292
2018-03-23 06:41:39 +00:00
Craig Topper 6ef55d1887 [X86] Fix the itinerary for vextractps to match extractps.
llvm-svn: 328289
2018-03-23 06:41:35 +00:00
Craig Topper 40d3b32e12 [X86] Rename VROUNDYPS* and VROUNDYPD* instructions to VROUNDPSY* and VROUNDPDY*. Fix itinerary mistake on all memory forms of VROUNDPD
This makes the Y position consistent with other instructions.

This should have been NFC, but while refactoring the multiclass I noticed that VROUNDPD memory forms were using the register itinerary.

llvm-svn: 328254
2018-03-22 21:55:20 +00:00
Simon Pilgrim 6bdd6b32fd [X86][CLMUL] Fix/add missing itinerary tags to (V)PCLMULQDQ instructions
PCLMULQDQrm was using the rr itinerary.

Difference in itineraries between PCLMULQDQ/VPCLMULQDQ variants was causing an unnecessary duplication of scheduler class entries.

llvm-svn: 328193
2018-03-22 13:36:06 +00:00
Craig Topper 591f44df54 [X86] Correct the SchedRW on (V)MOVAPSrr_REV and similar to match their non _REV counterparts.
llvm-svn: 327879
2018-03-19 19:00:26 +00:00
Simon Pilgrim fb7aa57bf1 [X86][SSE] Introduce Float/Vector WriteMove, WriteLoad and Writetore scheduler classes
As discussed on D44428 and PR36726, this patch splits off WriteFMove/WriteVecMove, WriteFLoad/WriteVecLoad and WriteFStore/WriteVecStore scheduler classes to permit vectors to be handled separately from gpr/scalar types.

I've minimised the diff here by only moving various basic SSE/AVX vector instructions across - we can fix the rest when called for. This does fix the MOVDQA vs MOVAPS/MOVAPD discrepancies mentioned on D44428.

Differential Revision: https://reviews.llvm.org/D44471

llvm-svn: 327630
2018-03-15 14:45:30 +00:00
Simon Pilgrim d1c3c995c0 [X86][AVX] Use WriteFShuffleLd for broadcast reg-mem instructions
They shouldn't be treated as pure loads.

Found while investigating D44428

llvm-svn: 327524
2018-03-14 15:47:08 +00:00
Simon Pilgrim de995e6e37 [X86][SSE] Use WriteFShuffleLd for MOVDDUP/MOVSHDUP/MOVSLDUP reg-mem instructions
They shouldn't be treated as pure loads.

Found while investigating D44428

llvm-svn: 327505
2018-03-14 13:22:56 +00:00
Craig Topper a406796f5f [X86] Change X86::PMULDQ/PMULUDQ opcodes to take vXi64 type as input instead of vXi32.
This instruction can be thought of as reading either the even elements of a vXi32 input or the lower half of each element of a vXi64 input. We currently use the vXi32 interpretation, but vXi64 matches better with its broadcast behavior in EVEX.

I'm looking at moving MULDQ/MULUDQ creation to a DAG combine so we can do it when AVX512DQ is enabled without having to go through Custom lowering. But in some of the test cases we failed to use a broadcast load due to the size difference. This should help with that.

I'm also wondering if we can model these instructions in native IR and remove the intrinsics and I think using a vXi64 type will work better with that.

llvm-svn: 326991
2018-03-08 08:02:52 +00:00
Craig Topper 81c0eaf4c8 [X86] Allow int_x86_sse2_cvtps2dq and int_x86_avx_cvt_ps2dq_256 to select EVEX encoded instructions.
llvm-svn: 326041
2018-02-24 18:58:07 +00:00
Craig Topper dbddac0915 [X86] Remove 64/128/256 from MMX/SSE/AVX instruction names for overall consistency. NFC
MMX instrutions all start with MMX_ so the 64 isn't needed for disambigutation.
SSE/AVX1 instructions are assumed 128-bit so we don't need to say 128.
AVX2 instructions should use a Y to indicate 256-bits.

llvm-svn: 323402
2018-01-25 04:45:30 +00:00
Craig Topper 05af43fbad [X86] Fix some inconsistencies in the itineraries and Sched for (V)PEXTRW/(V)PINSRW
The weirdest being that PEXTRWrr was tagged as a memory operation.

llvm-svn: 323353
2018-01-24 17:58:57 +00:00
Craig Topper b85b484fee [X86] Adjust names of PINSRW/PEXTRW intructions between MMX/SSE/AVX/AVX512 for consistency and to maybe enable more regular expression compaction in the scheduler models. NFCI
llvm-svn: 323352
2018-01-24 17:58:51 +00:00
Craig Topper 002657731b [X86] Move 'Int_' to the end of the name of the VCOMISS/VUCOMISS and instructions to get them picked up by the scheduler model regexs.
All other intrinsic instructions put the _Int on the end. This make these instructions consistent and gets the prefix instregexs in the scheduler models to pick them up.

llvm-svn: 323261
2018-01-23 21:37:51 +00:00
Marina Yatsina 6fc2aaae8d Separate ExecutionDepsFix into 4 parts:
1. ReachingDefsAnalysis - Allows to identify for each instruction what is the “closest” reaching def of a certain register. Used by BreakFalseDeps (for clearance calculation) and ExecutionDomainFix (for arbitrating conflicting domains).
2. ExecutionDomainFix - Changes the variant of the instructions in order to minimize domain crossings.
3. BreakFalseDeps - Breaks false dependencies.
4. LoopTraversal - Creatws a traversal order of the basic blocks that is optimal for loops (introduced in revision L293571). Both ExecutionDomainFix and ReachingDefsAnalysis use this to determine the order they will traverse the basic blocks.

This also included the following changes to ExcecutionDepsFix original logic:
1. BreakFalseDeps and ReachingDefsAnalysis logic no longer restricted by a register class.
2. ReachingDefsAnalysis tracks liveness of reg units instead of reg indices into a given reg class.

Additional changes in affected files:
1. X86 and ARM targets now inherit from ExecutionDomainFix instead of ExecutionDepsFix. BreakFalseDeps also was added to the passes they activate.
2. Comments and references to ExecutionDepsFix replaced with ExecutionDomainFix and BreakFalseDeps, as appropriate.

Additional refactoring changes will follow.

This commit is (almost) NFC.
The only functional change is that now BreakFalseDeps will break dependency for all register classes.
Since no additional instructions were added to the list of instructions that have false dependencies, there is no actual change yet.
In a future commit several instructions (and tests) will be added.

This is the first of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869
Most of the patches are intended at refactoring the existent code.

Additional relevant reviews:
https://reviews.llvm.org/D40331
https://reviews.llvm.org/D40332
https://reviews.llvm.org/D40333
https://reviews.llvm.org/D40334

Differential Revision: https://reviews.llvm.org/D40330

Change-Id: Icaeb75e014eff96a8f721377783f9a3e6c679275
llvm-svn: 323087
2018-01-22 10:05:23 +00:00
Clement Courbet 36c7be664f [X86]Add missing predicates for VMOVDQUYrm,VMOVDQUYmr.
Summary:
Due to missing parentheses.

This is similar to https://reviews.llvm.org/D41983.

Reviewers: gchatelet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42062

llvm-svn: 322483
2018-01-15 13:37:05 +00:00
Craig Topper def1c30c66 [X86] Allow more cmpps/pd immediate encodings to be commuted during isel.
The code that checks the immediate wasn't masking to the lower 3-bits like the code in X86InstrInfo.cpp that's used by the peephole pass does.

llvm-svn: 322060
2018-01-09 07:09:34 +00:00
Craig Topper dffb98e03d [X86] Correct the execution domain for AVX1 VBROADCASTF128 to be FP instead of integer.
llvm-svn: 321821
2018-01-04 20:56:21 +00:00
Craig Topper 162439dcdf [X86] Pass itins.rr/itins.rm through properly for some instructions.
llvm-svn: 321452
2017-12-26 05:43:05 +00:00
Craig Topper e268598dd3 [X86] Add prefetchwt1 instruction and overhaul priorities and isel enabling for prefetch instructions.
Previously prefetch was only considered legal if sse was enabled, but it should be supported with 3dnow as well.

The prfchw flag now imply at least some form of prefetch without the write hint is available, either the sse or 3dnow version. This is true even if 3dnow and sse are explicitly disabled.

Similarly prefetchwt1 feature implies availability of prefetchw and the the prefetcht0/1/2/nta instructions. This way we can support _MM_HINT_ET0 using prefetchw and _MM_HINT_ET1 with prefetchwt1. And its assumed that if we have levels for the write hint we would have levels for the non-write hint, thus why we enable the sse prefetch instructions.

I believe this behavior is consistent with gcc. I've updated the prefetch.ll to test all of these combinations.

llvm-svn: 321335
2017-12-22 02:30:30 +00:00
Craig Topper a0be5a06c1 [X86] Rename some instructions that start with Int_ to have the _Int at the end.
This matches AVX512 version and is more consistent overall. And improves our scheduler models.

In some cases this adds _Int to instructions that didn't have any Int_ before. It's a side effect of the adjustments made to some of the multiclasses.

llvm-svn: 320325
2017-12-10 19:47:56 +00:00
Simon Pilgrim 49c74934dd Strip trailing whitespace. NFCI.
llvm-svn: 320306
2017-12-10 13:00:37 +00:00
Simon Pilgrim 91c159d841 [X86][AVX[ Tag VZEROALL/VZEROUPPER instructions scheduler classes
llvm-svn: 320302
2017-12-10 12:26:35 +00:00
Simon Pilgrim 6de94a1adc [X86] Tag SSE4A instructions as SSE INTALU scheduler classes
llvm-svn: 320301
2017-12-10 12:08:04 +00:00
Simon Pilgrim 19d460b066 [X86][SHA] Tag SHA instructions scheduler classes
Put these under VecIMul itinerary classes for now - seems to be a good average value

llvm-svn: 320161
2017-12-08 16:38:41 +00:00
Simon Pilgrim ca63dcce7f [X86][SSE42] SSE42 string pseudo instructions don't need scheduling info
llvm-svn: 320043
2017-12-07 13:52:07 +00:00
Simon Pilgrim 9afbe77a91 [X86][AVX512] Tag mask reg op instruction scheduler classes
llvm-svn: 319945
2017-12-06 19:36:00 +00:00
Simon Pilgrim 809c024b3d [X86][AVX2] Tag MASKMOV instruction scheduler classes
llvm-svn: 319915
2017-12-06 18:24:48 +00:00
Simon Pilgrim df05251921 [X86][AVX512] Tag aligned/unaligned move instruction scheduler classes
llvm-svn: 319913
2017-12-06 17:59:26 +00:00
Simon Pilgrim b69dae42e3 [X86][AVX512] Tag GATHER/SCATTER instruction scheduler classes
NOTE: At the moment these use the WriteLoad/WriteStore classes, which severely underestimates the costs. This needs to be reviewed.
llvm-svn: 319829
2017-12-05 20:47:11 +00:00
Simon Pilgrim fd3a2632e5 [X86][AVX512] Tag scalar CVT and CMP instruction scheduler classes
llvm-svn: 319765
2017-12-05 13:49:44 +00:00
Simon Pilgrim 299a54c5b9 [X86][SSE] Cleanup float/int conversion scheduler itinerary classes
Makes it easier to grok where each is supposed to be used, mainly useful for adding to the AVX512 instructions but hopefully can be used more in SSE/AVX as well.

llvm-svn: 319614
2017-12-02 12:27:44 +00:00
Simon Pilgrim 2dc4ff1cde [X86][AVX512] Tag vshift/vpermv/pshufd/pshufb instructions scheduler classes
llvm-svn: 319540
2017-12-01 13:25:54 +00:00
Simon Pilgrim 3e5987cf8d [X86][AVX512] Tag RCP/RSQRT/GETEXP instructions scheduler classes
llvm-svn: 319418
2017-11-30 10:48:47 +00:00
Simon Pilgrim 4d2c703492 [X86][AVX512] Tag RCP/RSQRT/GETEXP instructions scheduler classes (REVERSION)
Accidental commit of incomplete patch

llvm-svn: 319346
2017-11-29 19:37:38 +00:00
Simon Pilgrim 87034cb498 [X86][AVX512] Tag RCP/RSQRT/GETEXP instructions scheduler classes
llvm-svn: 319338
2017-11-29 19:19:59 +00:00
Simon Pilgrim 1401a75341 [X86][AVX512] Tag VPERMILV instruction scheduler class
llvm-svn: 319316
2017-11-29 14:58:34 +00:00
Simon Pilgrim 756348c1c9 [X86][AVX512] Setup unary (PABS/VPLZCNT/VPOPCNT/VPCONFLICT/VMOV*DUP) instruction scheduler classes
llvm-svn: 319312
2017-11-29 13:49:51 +00:00
Simon Pilgrim e3291de2b8 [X86][SSE] Merged sse2_unpack and sse2_unpack PUNPCK instruction templates. NFCI.
llvm-svn: 319310
2017-11-29 12:12:27 +00:00
Simon Pilgrim da95772230 [X86][SSE] Merged sse2_pack and sse2_pack_y PACKSS/PACKUS instruction templates. NFCI.
llvm-svn: 319308
2017-11-29 11:35:45 +00:00
Simon Pilgrim f490c6efee [X86][SSE] Add SSE_SHUFP OpndItins
Update multi-classes to take the scheduling OpndItins instead of hard coding it.

Will be reused in the AVX512 equivalents.

llvm-svn: 319249
2017-11-28 23:09:18 +00:00
Simon Pilgrim 8f62394751 [X86][SSE] Add SSE_UNPCK/SSE_PUNPCK OpndItins
Update multi-classes to take the scheduling OpndItins instead of hard coding it.

Will be reused in the AVX512 equivalents.

llvm-svn: 319245
2017-11-28 22:55:08 +00:00
Simon Pilgrim 1bc7b0e148 [X86][SSE] Use SSE_PACK OpndItins in PACKSS/PACKUS instruction definitions
Update multi-classes to take the scheduling OpndItins instead of hard coding it.

SSE_PACK will be reused in the AVX512 equivalents.

llvm-svn: 319243
2017-11-28 22:47:45 +00:00
Simon Pilgrim d49bd0cd87 [X86][SSE] Add SSE_HADDSUB/SSE_PABS/SSE_PALIGN OpndItins
Update multi-classes to take the scheduling OpndItins instead of hard coding it.

Will be reused in the AVX512 equivalents.

llvm-svn: 319209
2017-11-28 19:39:47 +00:00