Commit Graph

175 Commits

Author SHA1 Message Date
Simon Pilgrim f2d2cedab4 [X86] Split WriteVecShift/WriteVarVecShift into MMX, XMM and YMM/ZMM scheduler classes
This took a bit of extra work as on Intel targets the old (V)PSLLDrr/(V)PSLLDrm style instructions act differently - I ended up creating WriteVecShiftImm classes for XMM/YMM/ZMM vector shift by immediate and retaining WriteVecShift as the default (used only by MMX) plus WriteVecShiftX/WriteVecShiftY. X86SchedWriteWidths hides most of this thank goodness.

llvm-svn: 331472
2018-05-03 17:56:43 +00:00
Simon Pilgrim f7dd6069a5 [X86] Split WriteVecALU/WritePHAdd into XMM and YMM/ZMM scheduler classes
llvm-svn: 331453
2018-05-03 13:27:10 +00:00
Simon Pilgrim 93c878c76b [X86] Split WriteVecIMul/WriteVecPMULLD/WriteMPSAD/WritePSADBW into XMM and YMM/ZMM scheduler classes
Also retagged VDBPSADBW instructions as SchedWritePSADBW instead of SchedWriteVecIMul which matches the behaviour on SkylakeServer (the only thing that supports it...)

llvm-svn: 331445
2018-05-03 10:31:20 +00:00
Simon Pilgrim 6732f6ea51 [X86] Split WriteShuffle/WriteVarShuffle + WriteBlend/WriteVarBlend into XMM and YMM/ZMM scheduler classes
llvm-svn: 331386
2018-05-02 18:48:23 +00:00
Simon Pilgrim 819f218f07 [X86] Cleanup WriteFShuffle/WriteFVarShuffle (+256 variants) scheduler classes with more common default values
llvm-svn: 331380
2018-05-02 17:58:50 +00:00
Simon Pilgrim a53d330890 Fix line-endings. NFCI.
llvm-svn: 331367
2018-05-02 16:16:24 +00:00
Clement Courbet d2ff5fb536 Re-land rL331357 "[X86] Fix scheduling info for VMPSADBWYrmi."
Without the rebase mess.

https://reviews.llvm.org/D46356

llvm-svn: 331362
2018-05-02 14:35:48 +00:00
Simon Pilgrim 86d9f23ded [X86] Cleanup WriteFMul scheduler classes with more common default values
Intel models were targeting x87 instead of packed sse.

llvm-svn: 331360
2018-05-02 14:25:32 +00:00
Clement Courbet 0f1da8f365 Revert rL331355 "[X86] Fix scheduling info for VMPSADBWYrmi."
It contains unrelated changes.

llvm-svn: 331357
2018-05-02 13:54:38 +00:00
Clement Courbet eeb2123a83 [X86] Fix scheduling info for VMPSADBWYrmi.
https://reviews.llvm.org/D46356

llvm-svn: 331355
2018-05-02 13:40:48 +00:00
Simon Pilgrim e93fd5f1e4 [X86] Cleanup WriteFAdd/WriteFCmp scheduler classes with more common default values
Intel models were targeting x87 instead of packed sse.

Also fixes XOP's VFRCZ to use WriteFAdd/WriteFAddY.

llvm-svn: 331340
2018-05-02 09:18:49 +00:00
Simon Pilgrim 21caf0124f [X86] Split WriteFMul/WriteFDiv into XMM and YMM/ZMM scheduler classes
llvm-svn: 331293
2018-05-01 18:22:53 +00:00
Simon Pilgrim c708868cb1 [X86] Split WriteFRcp/WriteFRsqrt/WriteFSqrt into XMM and YMM/ZMM scheduler classes
llvm-svn: 331290
2018-05-01 18:06:07 +00:00
Simon Pilgrim c546f9424f [X86] Split WriteFCmp into XMM and YMM/ZMM scheduler classes
Removes more WriteFCmp InstRW overrides

llvm-svn: 331283
2018-05-01 16:50:16 +00:00
Simon Pilgrim 5269167f5b [X86] Split WriteFAdd into XMM and YMM/ZMM scheduler classes
Removes more WriteFAdd InstRW overrides

llvm-svn: 331276
2018-05-01 16:13:42 +00:00
Simon Pilgrim dd8eae128b [X86] Split WriteFShuffle into XMM and YMM/ZMM scheduler classes
Removes more WriteFShuffle InstRW overrides

llvm-svn: 331264
2018-05-01 14:25:01 +00:00
Simon Pilgrim 57f2b185ac [X86] Split WriteVecLogic into XMM and YMM/ZMM scheduler classes
This removes all the WriteVecLogic InstRW overrides.

llvm-svn: 331258
2018-05-01 12:39:17 +00:00
Simon Pilgrim fc0c26f1a6 [X86] Tag PSLLDQ/PSRLDQ as WriteShuffle scheduler classes instead of shifts.
Although they are encoded similar to bit shifts, the byte shifts behave like shuffles from a scheduling point of view.

llvm-svn: 331253
2018-05-01 11:05:42 +00:00
Simon Pilgrim d5ada498db [X86] Merge more instregex single matches to reduce InstrRW compile time.
llvm-svn: 331143
2018-04-29 15:33:15 +00:00
Craig Topper ebd3e4a69c [X86] Remove SLDT64m instruction.
It doesn't really exist. The instruction always writes 16-bits of memory. Putting a REX.w on it won't change anything.

While I was touching the encoding tests to remove it, I added some other missing register form test cases.

llvm-svn: 331135
2018-04-29 04:50:53 +00:00
Simon Pilgrim 8ee7d01dcf [X86] Merge some x87 instruction instregex single matches. NFCI.
llvm-svn: 331084
2018-04-27 21:14:19 +00:00
Simon Pilgrim 8a937e00d8 [X86] Split WriteFBlend/WriteFVarBlend/WriteFVarShuffle into XMM and YMM/ZMM scheduler classes
This removes all the WriteFBlend/WriteFVarBlend InstRW overrides - some WriteFVarShuffle remain to be fixed.

llvm-svn: 331065
2018-04-27 18:19:48 +00:00
Simon Pilgrim c3c767bf50 [X86] Split WriteFHadd into XMM and YMM/ZMM scheduler classes
This removes all the HADD/HSUB PS/PD InstRW overrides.

llvm-svn: 331054
2018-04-27 16:11:57 +00:00
Simon Pilgrim b2aa89c909 [X86][AVX] Split WriteFLogic into XMM and YMM/ZMM scheduler classes
This removes all the AND/ANDN/OR/XOR PS/PD InstRW overrides.

llvm-svn: 331051
2018-04-27 15:50:33 +00:00
Simon Pilgrim aef5ca7299 [X86] Replace some system instruction instregex single matches with instrs entry. NFCI.
llvm-svn: 331034
2018-04-27 13:32:42 +00:00
Simon Pilgrim dbd1ae7ddd [X86] Split WriteFMA into XMM, Scalar and YMM/ZMM scheduler classes
This removes all the FMA InstRW overrides.

If we ever get PR36924, then we can remove many of these declarations from models.

llvm-svn: 330820
2018-04-25 13:07:58 +00:00
Simon Pilgrim 27bc83e228 [X86] Split off PHMINPOSUW to their own schedule class
This also fixes Jaguar's schedule which was treating it as the WriteVecIMul default. 

llvm-svn: 330756
2018-04-24 18:49:25 +00:00
Simon Pilgrim f0945aa0e0 [X86][F16C] Add WriteCvtF2FSt scheduling class
Fixes the classification of VCVTPS2PHmr/VCVTPS2PHYmr which were tagged as WriteCvtF2FLd_WriteRMW (PR36887)

llvm-svn: 330737
2018-04-24 16:43:07 +00:00
Simon Pilgrim 16299273d0 [X86] Remove unnecessary FMA reg-mem InstRW scheduler overrides.
llvm-svn: 330720
2018-04-24 14:47:11 +00:00
Simon Pilgrim f7d2a93d5f [X86] Add vector element insertion/extraction scheduler classes
Split off pinsr/pextr and extractps instructions.

(Mostly) fixes PR36887.

Note: It might be worth adding a WriteFInsertLd class as well in the future.

Differential Revision: https://reviews.llvm.org/D45929

llvm-svn: 330714
2018-04-24 13:21:41 +00:00
Simon Pilgrim e5e4bf02d6 [X86] Remove unnecessary vector memory folded InstRW overrides.
We have test coverage for these with resources-sse*/avx*

llvm-svn: 330662
2018-04-23 22:45:04 +00:00
Simon Pilgrim ed09ebb48d [X86] Remove unnecessary WriteLEA InstRW overrides.
llvm-svn: 330648
2018-04-23 21:04:23 +00:00
Simon Pilgrim 8cd01aaa0f [X86] Replace x87 instregex with instrs if they only match one instruction
llvm-svn: 330611
2018-04-23 16:10:50 +00:00
Simon Pilgrim 0a334a8668 [X86] Remove unnecessary MMX reg-mem InstRW scheduler overrides.
llvm-svn: 330581
2018-04-23 11:57:15 +00:00
Simon Pilgrim 06e16541ba [X86] Remove unnecessary WriteFBlend/WriteBlend InstRW overrides.
Fixed a lot of the default classes which were being completely overridden.

llvm-svn: 330554
2018-04-22 18:35:53 +00:00
Simon Pilgrim 091680b6e7 [X86] Remove unnecessary WriteFMul/WriteFRcp/WriteFRsqrt InstRW overrides.
llvm-svn: 330553
2018-04-22 18:09:50 +00:00
Simon Pilgrim b362d02229 [X86] Remove unnecessary CVT instrw overrides.
llvm-svn: 330552
2018-04-22 17:54:58 +00:00
Simon Pilgrim ef8d3ae4b5 [X86] Fix (completely overridden) WriteFHAdd/WritePHAdd classes to allow us to remove unnecessary instrw overrides.
llvm-svn: 330546
2018-04-22 15:25:59 +00:00
Simon Pilgrim 96855ec39e [X86] Remove unnecessary WriteFVarBlend/WriteVarBlend InstRW overrides.
This also fixes some of the ReadAfterLd issues due to InstRW.

llvm-svn: 330544
2018-04-22 14:43:12 +00:00
Simon Pilgrim a41ae2f005 [X86] Fix WriteMPSAD/WritePSADBW values to allow us to remove unnecessary instrw overrides.
llvm-svn: 330542
2018-04-22 10:39:16 +00:00
Simon Pilgrim 37334ea67a [X86] Strip unnecessary prefetch + vector move/load instrw overrides from scheduler models.
llvm-svn: 330527
2018-04-21 21:59:36 +00:00
Simon Pilgrim 920802cc50 [X86] Strip unnecessary WriteCvtF2I instrw overrides from scheduler models.
llvm-svn: 330525
2018-04-21 21:16:44 +00:00
Simon Pilgrim 825ead950e [X86] Strip unnecessary broadcast/shuffle256 instrw overrides from scheduler models.
llvm-svn: 330523
2018-04-21 20:45:12 +00:00
Simon Pilgrim 74ccc6a303 [X86] Strip unnecessary vector integer math, shift-imm, extend, shuffle, pack/unpack instruction instrw overrides from scheduler models.
llvm-svn: 330521
2018-04-21 19:11:55 +00:00
Craig Topper 05242bf691 [X86] Add SchedWrites for LDMXCSR/STMXCSR.
llvm-svn: 330517
2018-04-21 18:07:36 +00:00
Simon Pilgrim a80df0999f [X86][Broadwell] Remove unnecessary VORPD/VORPS instrw override - missed in D45629
llvm-svn: 330513
2018-04-21 16:17:47 +00:00
Simon Pilgrim 93b102cd45 [X86] Strip unnecessary WriteFRcp/WriteFRsqrt instruction instrw overrides from scheduler models.
The required the default skylake schedules to be updated - these were being completely overriden by the InstRW and the existing values not used at all.

llvm-svn: 330510
2018-04-21 15:16:59 +00:00
Simon Pilgrim 2193524fb4 [X86] Strip unnecessary WriteFShuffle instruction instrw overrides from scheduler models.
llvm-svn: 330508
2018-04-21 14:56:56 +00:00
Simon Pilgrim 02fc375a22 [X86] Strip unnecessary MMX instruction instrw overrides from scheduler models.
llvm-svn: 330503
2018-04-21 12:15:42 +00:00
Simon Pilgrim c0f654f18e [X86] Strip unnecessary x87 instruction instrw overrides from scheduler models.
llvm-svn: 330501
2018-04-21 11:25:02 +00:00
Simon Pilgrim d14d2e7b18 [X86] Add WriteFSign/WriteFLogic scheduler classes
Split the fp and integer vector logical instruction scheduler classes - older CPUs especially often handled these on different pipes.

This unearthed a couple of things that are also handled in this patch:

(1) We were tagging avx512 fp logic ops as WriteFAdd, probably because of the lack of WriteFLogic
(2) SandyBridge had integer logic ops only using Port5, when afaict they can use Ports015.
(3) Cleaned up x86 FCHS/FABS scheduling as they are typically treated as fp logic ops.

Differential Revision: https://reviews.llvm.org/D45629

llvm-svn: 330480
2018-04-20 21:16:05 +00:00
Craig Topper b5f2659130 [X86] Correct the scheduling data for register forms of XCHG and XADD on Intel CPUs.
The XCHG16rr/XCHG32rr/XCHG64rr instructions should be 3 uops just like XCHG8rr. I believe they're just implemented as 3 move uops with a temporary register.

XADD is probably 2 moves and an add also using a temporary register.

Change the latency for both from 2 cycles to 3 cycles. Only 2 of the uops are serialized in their execution, the move into the temporary and the move out of the temporary. The move from one GPR to the other should be able to go in parallel with this if there are ALU resources available.

llvm-svn: 330349
2018-04-19 18:00:17 +00:00
Simon Pilgrim 5e492d29a3 [X86] Merge some MMX instregex
There's a lot more but I'd prefer focussing on removing unnecessary InstRWs first.

llvm-svn: 330347
2018-04-19 17:32:10 +00:00
Simon Pilgrim 3c06617f0e [X86][FMA] Remove FMA reg-reg InstRW scheduler overrides.
These are all already handled identically by WriteFMA.

llvm-svn: 330319
2018-04-19 11:37:26 +00:00
Craig Topper f846e2d1b1 [X86] Scrub scheduling information for MUL/IMUL on Intel CPUs.
This removes a bunch of unnecessary InstRW overrides. It also cleans up the missing information from the Sandy Bridge model. Other fixes to other models.

llvm-svn: 330308
2018-04-19 05:34:05 +00:00
Craig Topper dfccafe18a [X86][Broadwell] Remove some unnecessary InstRW overrides and add some FIXMEs.
llvm-svn: 330241
2018-04-18 06:41:25 +00:00
Craig Topper e56a2fc5e7 [X86] Add separate scheduling class for PSADBW instruction.
llvm-svn: 330204
2018-04-17 19:35:19 +00:00
Craig Topper 655e1db722 [X86] Remove unnecessary InstRW overrides. Add somes FIXMEs/TODOs.
llvm-svn: 330203
2018-04-17 19:35:14 +00:00
Simon Pilgrim 86e3c26924 [X86] Add FP comparison scheduler classes
Split VCMP/VMAX/VMIN instructions off to WriteFCmp and VCOMIS instructions off to WriteFCom instead of assuming they match WriteFAdd

Differential Revision: https://reviews.llvm.org/D45656

llvm-svn: 330179
2018-04-17 07:22:44 +00:00
Simon Pilgrim 89c8a10f7c [X86] Add variable shuffle schedule classes
Split variable index shuffles from immediate index shuffles

WriteFVarShuffle - variable 'in-lane' shuffles (VPERMILPS/VPERMIL2PS etc.)
WriteVarShuffle - variable 'in-lane' shuffles (PSHUFB/VPPERM etc.)

WriteFVarShuffle256 - variable 'cross-lane' shuffles (VPERMPS etc.)
WriteVarShuffle256 - variable 'cross-lane' shuffles (VPERMD etc.)

Differential Revision: https://reviews.llvm.org/D45404

llvm-svn: 329806
2018-04-11 13:49:19 +00:00
Andrea Di Biagio 486358c153 [X86][Broadwell] HWPort5 should not be added to BroadwellModelProcResources.
The BroadwellModelProcResources had an entry for HWPort5, which is a Haswell
resource, and not a Broadwell processor resource. That entry was added to the
Broadwell model because variable blends were consuming it.

This was clearly a typo (the resource name should have been BWPort5), which
unfortunately was never caught before. It was not reported as an error because
HWPort5 is a resource defined by the Haswell model. It has been found when
testing some code with llvm-mca: the list of resources in the resource pressure
view was odd.

This patch fixes the issue; now variable blend instructions consume 2 cycles on
BWPort5 instead of HWPort5. This is enough to get rid of the extra (spurious)
entry in the BroadWellModelProcResources table.

llvm-svn: 329686
2018-04-10 10:49:41 +00:00
Craig Topper b7baa358f6 [X86] Add SchedWrites for CMOV and SETCC. Use them to remove InstRWs.
Summary:
Cmov and setcc previously used WriteALU, but on Intel processors at least they are more restricted than basic ALU ops.

This patch adds new SchedWrites for them and removes the InstRWs. I had to leave some InstRWs for CMOVA/CMOVBE and SETA/SETBE because those have an extra uop relative to the other condition codes on Intel CPUs.

The test changes are due to fixing a missing ZnAGU dependency on the memory form of setcc.

Reviewers: RKSimon, andreadb, GGanesh

Reviewed By: RKSimon

Subscribers: GGanesh, llvm-commits

Differential Revision: https://reviews.llvm.org/D45380

llvm-svn: 329539
2018-04-08 17:53:18 +00:00
Craig Topper c50570fb4f [X686] Add appropriate ReadAfterLd for the register input to memory forms of ADC/SBB.
llvm-svn: 329424
2018-04-06 17:12:18 +00:00
Craig Topper f0d042619b [X86] Attempt to model basic arithmetic instructions in the Haswell/Broadwell/Skylake scheduler models without InstRWs
Summary:
This patch removes InstRW overrides for basic arithmetic/logic instructions. To do this I've added the store address port to RMW. And used a WriteSequence to make the latency additive. It does not cover ADC/SBB because they have different latency.

Apparently we were inconsistent about whether the store has latency or not thus the test changes.

I've also left out Sandy Bridge because the load latency there is currently 4 cycles and should be 5.

Reviewers: RKSimon, andreadb

Reviewed By: andreadb

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45351

llvm-svn: 329416
2018-04-06 16:16:48 +00:00
Craig Topper f131b60049 [X86] Add an extra store address cycle to WriteRMW in the Sandy Bridge/Broadwell/Haswell/Skylake scheduler model.
Even those the address was calculated for the load, its calculated again for the store.

llvm-svn: 329415
2018-04-06 16:16:46 +00:00
Craig Topper fbe3132f67 [X86] Separate CDQ and CDQE in the scheduler model.
According to Agner's data, CDQE is closer to CWDE.

llvm-svn: 329354
2018-04-05 21:56:19 +00:00
Craig Topper 3b0b96c591 [X86] Add LEAVE instruction to the scheduler models using the same data as LEAVE64. Make LEAVE/LEAVE64 more correct on Sandy Bridge.
This is the 32-bit mode version of LEAVE64. It should be at least somewhat similar to LEAVE64.

The Sandy Bridge version was missing a load port use.

llvm-svn: 329347
2018-04-05 21:16:26 +00:00
Craig Topper c6bb36a3d0 [X86] Remove some InstRWs for plain store instructions on Sandy Bridge.
We were forcing the latency of these instructions to 5 cycles, but every other scheduler model had them as 1 cycle. I'm sure I didn't get everything, but this gets a big portion.

llvm-svn: 329339
2018-04-05 20:04:06 +00:00
Craig Topper 15303dda0d [X86] Revert r329251-329254
It's failing on the bots and I'm not sure why.

This reverts:

[X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents.
[X86] Use WriteFShuffle256 for VEXTRACTF128 to be consistent with VEXTRACTI128 which uses WriteShuffle256.
[X86] Remove some InstRWs for plain store instructions on Sandy Bridge.
[X86] Auto-generate complete checks. NFC

llvm-svn: 329256
2018-04-05 05:19:36 +00:00
Craig Topper 5c36557426 [X86] Auto-generate complete checks. NFC
llvm-svn: 329251
2018-04-05 04:41:59 +00:00
Craig Topper 498875fab0 [X86] Separate BSWAP32r and BSWAP64r scheduling data in SandyBridge/Haswell/Broadwell/Skylake scheduler models.
The BSWAP64r version is 2 uops and BSWAP32r is only 1 uop. The regular expressions also looked for a non-existant BSWAP16r.

llvm-svn: 329211
2018-04-04 17:54:19 +00:00
Craig Topper 8104f266a4 [X86] Correct the throughput for divide instructions in Sandy Bridge/Haswell/Broadwell/Skylake scheduler models.
Fixes most of PR36898. Still need to fix the 512-bit instructions, but Agner's tables don't have those.

llvm-svn: 328960
2018-04-02 05:33:28 +00:00
Craig Topper 9f834810ea [X86] Give ADC8/16/32/64mi the same scheduling information as ADC8/16/32/64mr and SBB8/16/32/64mi.
It doesn't make a lot of sense that it would be different.

llvm-svn: 328946
2018-04-01 21:54:24 +00:00
Craig Topper 13a0f83a05 [X86] Add SchedRW for PMULLD
Summary:
It seems many CPUs don't implement this instruction as well as the other vector multiplies. Often using a multi uop flow. Silvermont in particular has a 7 uop flow with 11 cycle throughput. Sandy Bridge implements it as a single uop with 5 cycle latency and 1 cycle throughput. But Haswell and later use 2 uops with 10 cycle latency and 2 cycle throughput.

This patch adds a new X86SchedWritePair we can use to tag this instruction separately. I've provided correct information for Silvermont, Btver2, and Sandy Bridge. I've removed the InstRWs for SandyBridge. I've left Haswell/Broadwell/Skylake InstRWs in place because I wasn't sure how to account for the different load latency between 128 and 256 bits. I also left Znver1 InstRWs in place because the existing values don't match Agner's spreadsheet.

I also left a FIXME in the SandyBridge model because it being used for the "generic" model is too optimistic for the 256/512-bit versions since those are multiple uops on all known CPUs.

Reviewers: RKSimon, GGanesh, courbet

Reviewed By: RKSimon

Subscribers: gchatelet, gbedwell, andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D44972

llvm-svn: 328914
2018-03-31 04:54:32 +00:00
Craig Topper 89310f56c8 [X86] Correct the placement of ReadAfterLd in BEXTR and BZHI. Add dedicated SchedRW for BEXTR/BZHI.
These instructions have the memory operand before the register operand. So we need to put ReadDefault for all the load ops first. Then the ReadAfterLd

Differential Revision: https://reviews.llvm.org/D44838

llvm-svn: 328823
2018-03-29 20:41:39 +00:00
Simon Pilgrim a2f26788a3 [X86] Add WriteFMOVMSK/WriteVecMOVMSK/WriteMMXMOVMSK scheduler classes
Currently MOVMSK instructions use the WriteVecLogic class, which is a very poor choice given that MOVMSK involves a SSE->GPR transfer.

Differential Revision: https://reviews.llvm.org/D44924

llvm-svn: 328664
2018-03-27 20:38:54 +00:00
Simon Pilgrim 28e7bcbba6 [X86] Add WriteCRC32 scheduler class
Currently CRC32 instructions use the WriteFAdd class, this patch splits them off into their own, at the moment it is still mostly just a duplicate of WriteFAdd but it can now be tweaked on a target by target basis.

Differential Revision: https://reviews.llvm.org/D44647

llvm-svn: 328582
2018-03-26 21:06:14 +00:00
Simon Pilgrim f33d905293 [X86] Add WriteBitScan/WriteLZCNT/WriteTZCNT/WritePOPCNT scheduler classes (PR36881)
Give the bit count instructions their own scheduler classes instead of forcing them into existing classes.

These were mostly overridden anyway, but I had to add in costs from Agner for silvermont and znver1 and the Fam16h SoG for btver2 (Jaguar).

Differential Revision: https://reviews.llvm.org/D44879

llvm-svn: 328566
2018-03-26 18:19:28 +00:00
Craig Topper cdfcf8ecda [X86] Merge the SSE and AVX versions of fp divs and sqrts in the SandyBridge/Haswell/Broadwell/Skylake scheduler models.
I've used Agner's data as best I could to get the values to converge on.

llvm-svn: 328473
2018-03-26 05:05:10 +00:00
Simon Pilgrim e3547af7be [X86] Add the ability to override memory folding latency to schedules and add 1uop for memory folds for Intel models
The Intel models need an extra 1uop for memory folded instructions, plus a lot of instructions take a non-default memory latency which should allow us to use the multiclass a lot more to tidy things up.

Differential Revision: https://reviews.llvm.org/D44840

llvm-svn: 328446
2018-03-25 10:21:19 +00:00
Simon Pilgrim c21deec37b [X86][Broadwell] Merge xmm/ymm instructions instregex entries to reduce regex matches to reduce compile time
llvm-svn: 328434
2018-03-24 19:37:28 +00:00
Craig Topper 40d3b32e12 [X86] Rename VROUNDYPS* and VROUNDYPD* instructions to VROUNDPSY* and VROUNDPDY*. Fix itinerary mistake on all memory forms of VROUNDPD
This makes the Y position consistent with other instructions.

This should have been NFC, but while refactoring the multiclass I noticed that VROUNDPD memory forms were using the register itinerary.

llvm-svn: 328254
2018-03-22 21:55:20 +00:00
Craig Topper 4a3be6e578 [X86] Correct the scheduling data for some of the 32 and 64 bit multiplies to as best as I understand how they are implemented.
llvm-svn: 328231
2018-03-22 19:22:51 +00:00
Simon Pilgrim 53b2c3329a [X86][SSE42] Use the default PCMPEST/PCMPIST scheduler classes directly. NFCI.
Models were completely overriding all SSE42 strins instructions when the default classes could be used for exactly the same coverage.

llvm-svn: 328203
2018-03-22 14:56:18 +00:00
Simon Pilgrim 3b2ff1faa9 [X86][CLMUL] Use the default CLMUL scheduler classes directly. NFCI.
Models were completely overriding all CLMUL instructions when the WriteCLMUL default classes could be used for exactly the same coverage.

llvm-svn: 328194
2018-03-22 13:37:30 +00:00
Simon Pilgrim 7684e055b3 [X86] Use the default AES scheduler classes directly. NFCI.
Models were completely overriding all AES instructions when the WriteAES default classes could be used for exactly the same coverage.

Removes 6 unnecessary scheduler classes from every model.

Note: Still looking for a way for tblgen to warn when this is happening - often the override is more complete than the default.
llvm-svn: 328192
2018-03-22 13:18:08 +00:00
Craig Topper 5a69a0011b [X86][Broadwell] Merge multiple InstrRW entries that map to the same SchedWriteRes group (NFCI) (PR35955)
llvm-svn: 328076
2018-03-21 06:28:42 +00:00
Craig Topper 3e9462607e [X86] Add TEST16mi/TEST32mi/TEST64mi32 to the Sandybridge/Haswell/Broadwell/Skylake scheduler models.
Move it from a load+store group on SNB to a load only group, the same group as CMP.

llvm-svn: 327944
2018-03-20 03:02:03 +00:00
Craig Topper b4c7873f8c [X86] Add JCXZ/JECXZ to Sandybridge/Haswell/Broadwell/Skylake scheduler models.
JRCXZ was already present, but not the others.

We never codegen this instruction so this doesn't affect much just trying to get them all into a single generated scheduler class in the output.

llvm-svn: 327881
2018-03-19 19:00:32 +00:00
Craig Topper 836cfb3a4c [X86] Add the rest of the TEST with immediate instructions to the scheduler models to match their 8-bit counterpart.
llvm-svn: 327874
2018-03-19 17:58:41 +00:00
Craig Topper 645e531a69 [X86] Add MOV16ri*/MOV32ri*/MOV64ri* to scheduler models to match MOV8ri. Correct SchedRW and itinerary for MOV32ri64.
llvm-svn: 327872
2018-03-19 17:46:59 +00:00
Simon Pilgrim 30c38c3849 [X86] Generalize schedule classes to support multiple stages
Currently the WriteResPair style multi-classes take a single pipeline stage and latency, this patch generalizes this to make it easier to create complex schedules with ResourceCycles and NumMicroOps be overriden from their defaults.

This has already been done for the Jaguar scheduler to remove a number of custom schedule classes and adding it to the other x86 targets will make it much tidier as we add additional classes in the future to try and replace so many custom cases.

I've converted some instructions but a lot of the models need a bit of cleanup after the patch has been committed - memory latencies not being consistent, the class not actually being used when we could remove some/all customs, etc. I'd prefer to keep this as NFC as possible so later patches can be smaller and target specific.

Differential Revision: https://reviews.llvm.org/D44612

llvm-svn: 327855
2018-03-19 14:46:07 +00:00
Craig Topper d10ceffa5f [X86] Add ADD16i16/ADD32i32/ADD64i32 and similar to the scheduler models to match ADD8i8.
Also move ADC8i8 and SBB8i8 in the Sandy Bridge model to the same class as ADC8ri and SBB8ri. That seems more accurate since its the 8i8 is just the register forced to AL instead of coming from modrm.

llvm-svn: 327820
2018-03-19 04:21:40 +00:00
Craig Topper 13a1650d8a [X86] Merge 8-bit instructions into instregex with 16/32/64 instructions in the scheduler models as much as possible. NFCI
This reduces the total number of generated scheduler classes from 5404 to 5316.

llvm-svn: 327815
2018-03-19 00:56:09 +00:00
Craig Topper 2d451e73f9 [X86] Fix a bunch of overlapping regular expressions in the scheduler models.
llvm-svn: 327787
2018-03-18 08:38:06 +00:00
Craig Topper 89dcda3e90 [X86] Remove MMX_MASKMOVQ64 and VMASKMOVDQU from scheduler models.
The information was so wildly inaccurate and incomplete its better to just remove it.

MMX_MASKMOVQ64 showed up twice in several scheduler models. In Haswell and Broadwell they were on adjacent lines. On Skylake the copies had different information.

MMX_MASKMOVQ and MASKMOVDQU were completely missing.

MMX_MASKMOVQ64 was listed on Haswell/Broadwell as 1 cycle on port 1 despite it being a store instruction.

Filed PR36780 to track fixing this right.

llvm-svn: 327783
2018-03-18 03:24:42 +00:00
Simon Pilgrim fb7aa57bf1 [X86][SSE] Introduce Float/Vector WriteMove, WriteLoad and Writetore scheduler classes
As discussed on D44428 and PR36726, this patch splits off WriteFMove/WriteVecMove, WriteFLoad/WriteVecLoad and WriteFStore/WriteVecStore scheduler classes to permit vectors to be handled separately from gpr/scalar types.

I've minimised the diff here by only moving various basic SSE/AVX vector instructions across - we can fix the rest when called for. This does fix the MOVDQA vs MOVAPS/MOVAPD discrepancies mentioned on D44428.

Differential Revision: https://reviews.llvm.org/D44471

llvm-svn: 327630
2018-03-15 14:45:30 +00:00
Clement Courbet 327fac4d75 [X86] Add IMUL scheduling info on sandybridge, fix it on >=haswell.
Summary:
Only IMUL16rri uses an extra P0156. IMUL32* and IMUL16rr only use
P1.
This was computed using https://github.com/google/EXEgesis/blob/master/exegesis/tools/compute_itineraries.cc

This can easily be validated by running perf on the following code:

```
int main(int argc, char**argv) {
  int a = argc;
  int b = argc;
  int c = argc;
  int d = argc;

  for (int i = 0; i < LOOP_ITERATIONS; ++i) {
    asm volatile(
      R"(
        .rept 10000
        imull $0x2, %%edx, %%eax
        imull $0x2, %%ecx, %%ebx
        imull $0x2, %%eax, %%edx
        imull $0x2, %%ebx, %%ecx
        .endr
      )"
      : "+a"(a), "+b"(b), "+c"(c), "+d"(d)
      :
      :);
  }
  return a+b+c+d;
}
```
-> test.cc

perf stat -x, -e cycles --pfm-events=uops_executed_port:port_0:u,uops_executed_port:port_1:u,uops_executed_port:port_2:u,uops_executed_port:port_3:u,uops_executed_port:port_4:u,uops_executed_port:port_5:u,uops_executed_port:port_6:u,uops_executed_port:port_7:u test

Reviewers: craig.topper, RKSimon, gadi.haber

Subscribers: llvm-commits, gchatelet, chandlerc

Differential Revision: https://reviews.llvm.org/D43460

llvm-svn: 326877
2018-03-07 08:14:02 +00:00
Craig Topper b369cdbaad [X86] Expand IMUL/MUL instregexs in Intel scheduler models. Add load latency to some of them in SkylakeClient model.
The regular expressions and the imul names caused some instructions to be matched by multiple regexs creating unpredictable results.

This changes them all to use explicit instrs instead.

While doing this I also found that some instructions in Skylake were missing load latency so I fixed that too.

llvm-svn: 323406
2018-01-25 06:57:42 +00:00
Craig Topper 066e73762d [X86] Name the MMX phaddd instruction with 3 Ds instead of just 2. NFC
llvm-svn: 323403
2018-01-25 04:45:32 +00:00