BtVer2 - Fixes schedules for (V)CVTPS2PD instructions
A lot of the Intel models still have too many InstRW overrides for these new classes - this needs cleaning up but I wanted to get the classes in first
llvm-svn: 332376
Btver2 - VCVTPH2PSYrm needs to double pump the AGU
Broadwell - missing VCVTPS2PH*mr stores extra latency
Allows us to remove the WriteCvtF2FSt conversion store class
llvm-svn: 332357
MOVNTPD/MOVNTPS should be WriteFStore
Standardized BDW/HSW/SKL/SKX WriteFStore/WriteVecStore - fixes some missed instregex patterns. (V)MASKMOVDQU was already using the default, its costs gets increased but is still nowhere near the real cost of that nasty instruction....
llvm-svn: 331864
This fixes a couple of BtVer2 missing instructions that weren't been handled in the override.
NOTE: There are still a lot of overrides that still need cleaning up!
llvm-svn: 331770
I've created the necessary classes but there are still a lot of overrides that need cleaning up.
NOTE: The Znver1 model was missing some div/idiv variants in the instregex patterns and wasn't setting the resource cycles at all in the overrides.
llvm-svn: 331767
Split to support single/double for scalar, XMM and YMM/ZMM instructions - removing InstrRW overrides for these instructions.
Fixes Atom ADDSUBPD instruction and reclassifies VFPCLASS as WriteFCmp which is closer in behaviour.
llvm-svn: 331672
These are more like cross-lane shuffles than regular shuffles - we already do this for AVX512 equivalents.
Differential Revision: https://reviews.llvm.org/D46229
llvm-svn: 331659
WriteFRcp/WriteFRsqrt are split to support scalar, XMM and YMM/ZMM instructions.
WriteFSqrt is split into single/double/long-double sizes and scalar, XMM, YMM and ZMM instructions.
This removes all InstrRW overrides for these instructions.
NOTE: There were a couple of typos in the Znver1 model - notably a 1cy throughput for SQRT that is highly unlikely and doesn't tally with Agner.
NOTE: I had to add Agner's numbers for several targets for WriteFSqrt80.
llvm-svn: 331629
Split off from SchedWriteFAdd for fp rounding/bit-manipulation instructions.
Fixes an issue on btver2 which only had the ymm version using the JSTC pipe instead of JFPA.
llvm-svn: 331515
This took a bit of extra work as on Intel targets the old (V)PSLLDrr/(V)PSLLDrm style instructions act differently - I ended up creating WriteVecShiftImm classes for XMM/YMM/ZMM vector shift by immediate and retaining WriteVecShift as the default (used only by MMX) plus WriteVecShiftX/WriteVecShiftY. X86SchedWriteWidths hides most of this thank goodness.
llvm-svn: 331472
Also retagged VDBPSADBW instructions as SchedWritePSADBW instead of SchedWriteVecIMul which matches the behaviour on SkylakeServer (the only thing that supports it...)
llvm-svn: 331445
It doesn't really exist. The instruction always writes 16-bits of memory. Putting a REX.w on it won't change anything.
While I was touching the encoding tests to remove it, I added some other missing register form test cases.
llvm-svn: 331135