Commit Graph

86 Commits

Author SHA1 Message Date
Simon Pilgrim a41ae2f005 [X86] Fix WriteMPSAD/WritePSADBW values to allow us to remove unnecessary instrw overrides.
llvm-svn: 330542
2018-04-22 10:39:16 +00:00
Simon Pilgrim 37334ea67a [X86] Strip unnecessary prefetch + vector move/load instrw overrides from scheduler models.
llvm-svn: 330527
2018-04-21 21:59:36 +00:00
Simon Pilgrim 920802cc50 [X86] Strip unnecessary WriteCvtF2I instrw overrides from scheduler models.
llvm-svn: 330525
2018-04-21 21:16:44 +00:00
Simon Pilgrim 825ead950e [X86] Strip unnecessary broadcast/shuffle256 instrw overrides from scheduler models.
llvm-svn: 330523
2018-04-21 20:45:12 +00:00
Simon Pilgrim 74ccc6a303 [X86] Strip unnecessary vector integer math, shift-imm, extend, shuffle, pack/unpack instruction instrw overrides from scheduler models.
llvm-svn: 330521
2018-04-21 19:11:55 +00:00
Craig Topper 05242bf691 [X86] Add SchedWrites for LDMXCSR/STMXCSR.
llvm-svn: 330517
2018-04-21 18:07:36 +00:00
Simon Pilgrim a80df0999f [X86][Broadwell] Remove unnecessary VORPD/VORPS instrw override - missed in D45629
llvm-svn: 330513
2018-04-21 16:17:47 +00:00
Simon Pilgrim 93b102cd45 [X86] Strip unnecessary WriteFRcp/WriteFRsqrt instruction instrw overrides from scheduler models.
The required the default skylake schedules to be updated - these were being completely overriden by the InstRW and the existing values not used at all.

llvm-svn: 330510
2018-04-21 15:16:59 +00:00
Simon Pilgrim 2193524fb4 [X86] Strip unnecessary WriteFShuffle instruction instrw overrides from scheduler models.
llvm-svn: 330508
2018-04-21 14:56:56 +00:00
Simon Pilgrim 02fc375a22 [X86] Strip unnecessary MMX instruction instrw overrides from scheduler models.
llvm-svn: 330503
2018-04-21 12:15:42 +00:00
Simon Pilgrim c0f654f18e [X86] Strip unnecessary x87 instruction instrw overrides from scheduler models.
llvm-svn: 330501
2018-04-21 11:25:02 +00:00
Simon Pilgrim d14d2e7b18 [X86] Add WriteFSign/WriteFLogic scheduler classes
Split the fp and integer vector logical instruction scheduler classes - older CPUs especially often handled these on different pipes.

This unearthed a couple of things that are also handled in this patch:

(1) We were tagging avx512 fp logic ops as WriteFAdd, probably because of the lack of WriteFLogic
(2) SandyBridge had integer logic ops only using Port5, when afaict they can use Ports015.
(3) Cleaned up x86 FCHS/FABS scheduling as they are typically treated as fp logic ops.

Differential Revision: https://reviews.llvm.org/D45629

llvm-svn: 330480
2018-04-20 21:16:05 +00:00
Craig Topper b5f2659130 [X86] Correct the scheduling data for register forms of XCHG and XADD on Intel CPUs.
The XCHG16rr/XCHG32rr/XCHG64rr instructions should be 3 uops just like XCHG8rr. I believe they're just implemented as 3 move uops with a temporary register.

XADD is probably 2 moves and an add also using a temporary register.

Change the latency for both from 2 cycles to 3 cycles. Only 2 of the uops are serialized in their execution, the move into the temporary and the move out of the temporary. The move from one GPR to the other should be able to go in parallel with this if there are ALU resources available.

llvm-svn: 330349
2018-04-19 18:00:17 +00:00
Simon Pilgrim 5e492d29a3 [X86] Merge some MMX instregex
There's a lot more but I'd prefer focussing on removing unnecessary InstRWs first.

llvm-svn: 330347
2018-04-19 17:32:10 +00:00
Simon Pilgrim 3c06617f0e [X86][FMA] Remove FMA reg-reg InstRW scheduler overrides.
These are all already handled identically by WriteFMA.

llvm-svn: 330319
2018-04-19 11:37:26 +00:00
Craig Topper f846e2d1b1 [X86] Scrub scheduling information for MUL/IMUL on Intel CPUs.
This removes a bunch of unnecessary InstRW overrides. It also cleans up the missing information from the Sandy Bridge model. Other fixes to other models.

llvm-svn: 330308
2018-04-19 05:34:05 +00:00
Craig Topper dfccafe18a [X86][Broadwell] Remove some unnecessary InstRW overrides and add some FIXMEs.
llvm-svn: 330241
2018-04-18 06:41:25 +00:00
Craig Topper e56a2fc5e7 [X86] Add separate scheduling class for PSADBW instruction.
llvm-svn: 330204
2018-04-17 19:35:19 +00:00
Craig Topper 655e1db722 [X86] Remove unnecessary InstRW overrides. Add somes FIXMEs/TODOs.
llvm-svn: 330203
2018-04-17 19:35:14 +00:00
Simon Pilgrim 86e3c26924 [X86] Add FP comparison scheduler classes
Split VCMP/VMAX/VMIN instructions off to WriteFCmp and VCOMIS instructions off to WriteFCom instead of assuming they match WriteFAdd

Differential Revision: https://reviews.llvm.org/D45656

llvm-svn: 330179
2018-04-17 07:22:44 +00:00
Simon Pilgrim 89c8a10f7c [X86] Add variable shuffle schedule classes
Split variable index shuffles from immediate index shuffles

WriteFVarShuffle - variable 'in-lane' shuffles (VPERMILPS/VPERMIL2PS etc.)
WriteVarShuffle - variable 'in-lane' shuffles (PSHUFB/VPPERM etc.)

WriteFVarShuffle256 - variable 'cross-lane' shuffles (VPERMPS etc.)
WriteVarShuffle256 - variable 'cross-lane' shuffles (VPERMD etc.)

Differential Revision: https://reviews.llvm.org/D45404

llvm-svn: 329806
2018-04-11 13:49:19 +00:00
Andrea Di Biagio 486358c153 [X86][Broadwell] HWPort5 should not be added to BroadwellModelProcResources.
The BroadwellModelProcResources had an entry for HWPort5, which is a Haswell
resource, and not a Broadwell processor resource. That entry was added to the
Broadwell model because variable blends were consuming it.

This was clearly a typo (the resource name should have been BWPort5), which
unfortunately was never caught before. It was not reported as an error because
HWPort5 is a resource defined by the Haswell model. It has been found when
testing some code with llvm-mca: the list of resources in the resource pressure
view was odd.

This patch fixes the issue; now variable blend instructions consume 2 cycles on
BWPort5 instead of HWPort5. This is enough to get rid of the extra (spurious)
entry in the BroadWellModelProcResources table.

llvm-svn: 329686
2018-04-10 10:49:41 +00:00
Craig Topper b7baa358f6 [X86] Add SchedWrites for CMOV and SETCC. Use them to remove InstRWs.
Summary:
Cmov and setcc previously used WriteALU, but on Intel processors at least they are more restricted than basic ALU ops.

This patch adds new SchedWrites for them and removes the InstRWs. I had to leave some InstRWs for CMOVA/CMOVBE and SETA/SETBE because those have an extra uop relative to the other condition codes on Intel CPUs.

The test changes are due to fixing a missing ZnAGU dependency on the memory form of setcc.

Reviewers: RKSimon, andreadb, GGanesh

Reviewed By: RKSimon

Subscribers: GGanesh, llvm-commits

Differential Revision: https://reviews.llvm.org/D45380

llvm-svn: 329539
2018-04-08 17:53:18 +00:00
Craig Topper c50570fb4f [X686] Add appropriate ReadAfterLd for the register input to memory forms of ADC/SBB.
llvm-svn: 329424
2018-04-06 17:12:18 +00:00
Craig Topper f0d042619b [X86] Attempt to model basic arithmetic instructions in the Haswell/Broadwell/Skylake scheduler models without InstRWs
Summary:
This patch removes InstRW overrides for basic arithmetic/logic instructions. To do this I've added the store address port to RMW. And used a WriteSequence to make the latency additive. It does not cover ADC/SBB because they have different latency.

Apparently we were inconsistent about whether the store has latency or not thus the test changes.

I've also left out Sandy Bridge because the load latency there is currently 4 cycles and should be 5.

Reviewers: RKSimon, andreadb

Reviewed By: andreadb

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45351

llvm-svn: 329416
2018-04-06 16:16:48 +00:00
Craig Topper f131b60049 [X86] Add an extra store address cycle to WriteRMW in the Sandy Bridge/Broadwell/Haswell/Skylake scheduler model.
Even those the address was calculated for the load, its calculated again for the store.

llvm-svn: 329415
2018-04-06 16:16:46 +00:00
Craig Topper fbe3132f67 [X86] Separate CDQ and CDQE in the scheduler model.
According to Agner's data, CDQE is closer to CWDE.

llvm-svn: 329354
2018-04-05 21:56:19 +00:00
Craig Topper 3b0b96c591 [X86] Add LEAVE instruction to the scheduler models using the same data as LEAVE64. Make LEAVE/LEAVE64 more correct on Sandy Bridge.
This is the 32-bit mode version of LEAVE64. It should be at least somewhat similar to LEAVE64.

The Sandy Bridge version was missing a load port use.

llvm-svn: 329347
2018-04-05 21:16:26 +00:00
Craig Topper c6bb36a3d0 [X86] Remove some InstRWs for plain store instructions on Sandy Bridge.
We were forcing the latency of these instructions to 5 cycles, but every other scheduler model had them as 1 cycle. I'm sure I didn't get everything, but this gets a big portion.

llvm-svn: 329339
2018-04-05 20:04:06 +00:00
Craig Topper 15303dda0d [X86] Revert r329251-329254
It's failing on the bots and I'm not sure why.

This reverts:

[X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents.
[X86] Use WriteFShuffle256 for VEXTRACTF128 to be consistent with VEXTRACTI128 which uses WriteShuffle256.
[X86] Remove some InstRWs for plain store instructions on Sandy Bridge.
[X86] Auto-generate complete checks. NFC

llvm-svn: 329256
2018-04-05 05:19:36 +00:00
Craig Topper 5c36557426 [X86] Auto-generate complete checks. NFC
llvm-svn: 329251
2018-04-05 04:41:59 +00:00
Craig Topper 498875fab0 [X86] Separate BSWAP32r and BSWAP64r scheduling data in SandyBridge/Haswell/Broadwell/Skylake scheduler models.
The BSWAP64r version is 2 uops and BSWAP32r is only 1 uop. The regular expressions also looked for a non-existant BSWAP16r.

llvm-svn: 329211
2018-04-04 17:54:19 +00:00
Craig Topper 8104f266a4 [X86] Correct the throughput for divide instructions in Sandy Bridge/Haswell/Broadwell/Skylake scheduler models.
Fixes most of PR36898. Still need to fix the 512-bit instructions, but Agner's tables don't have those.

llvm-svn: 328960
2018-04-02 05:33:28 +00:00
Craig Topper 9f834810ea [X86] Give ADC8/16/32/64mi the same scheduling information as ADC8/16/32/64mr and SBB8/16/32/64mi.
It doesn't make a lot of sense that it would be different.

llvm-svn: 328946
2018-04-01 21:54:24 +00:00
Craig Topper 13a0f83a05 [X86] Add SchedRW for PMULLD
Summary:
It seems many CPUs don't implement this instruction as well as the other vector multiplies. Often using a multi uop flow. Silvermont in particular has a 7 uop flow with 11 cycle throughput. Sandy Bridge implements it as a single uop with 5 cycle latency and 1 cycle throughput. But Haswell and later use 2 uops with 10 cycle latency and 2 cycle throughput.

This patch adds a new X86SchedWritePair we can use to tag this instruction separately. I've provided correct information for Silvermont, Btver2, and Sandy Bridge. I've removed the InstRWs for SandyBridge. I've left Haswell/Broadwell/Skylake InstRWs in place because I wasn't sure how to account for the different load latency between 128 and 256 bits. I also left Znver1 InstRWs in place because the existing values don't match Agner's spreadsheet.

I also left a FIXME in the SandyBridge model because it being used for the "generic" model is too optimistic for the 256/512-bit versions since those are multiple uops on all known CPUs.

Reviewers: RKSimon, GGanesh, courbet

Reviewed By: RKSimon

Subscribers: gchatelet, gbedwell, andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D44972

llvm-svn: 328914
2018-03-31 04:54:32 +00:00
Craig Topper 89310f56c8 [X86] Correct the placement of ReadAfterLd in BEXTR and BZHI. Add dedicated SchedRW for BEXTR/BZHI.
These instructions have the memory operand before the register operand. So we need to put ReadDefault for all the load ops first. Then the ReadAfterLd

Differential Revision: https://reviews.llvm.org/D44838

llvm-svn: 328823
2018-03-29 20:41:39 +00:00
Simon Pilgrim a2f26788a3 [X86] Add WriteFMOVMSK/WriteVecMOVMSK/WriteMMXMOVMSK scheduler classes
Currently MOVMSK instructions use the WriteVecLogic class, which is a very poor choice given that MOVMSK involves a SSE->GPR transfer.

Differential Revision: https://reviews.llvm.org/D44924

llvm-svn: 328664
2018-03-27 20:38:54 +00:00
Simon Pilgrim 28e7bcbba6 [X86] Add WriteCRC32 scheduler class
Currently CRC32 instructions use the WriteFAdd class, this patch splits them off into their own, at the moment it is still mostly just a duplicate of WriteFAdd but it can now be tweaked on a target by target basis.

Differential Revision: https://reviews.llvm.org/D44647

llvm-svn: 328582
2018-03-26 21:06:14 +00:00
Simon Pilgrim f33d905293 [X86] Add WriteBitScan/WriteLZCNT/WriteTZCNT/WritePOPCNT scheduler classes (PR36881)
Give the bit count instructions their own scheduler classes instead of forcing them into existing classes.

These were mostly overridden anyway, but I had to add in costs from Agner for silvermont and znver1 and the Fam16h SoG for btver2 (Jaguar).

Differential Revision: https://reviews.llvm.org/D44879

llvm-svn: 328566
2018-03-26 18:19:28 +00:00
Craig Topper cdfcf8ecda [X86] Merge the SSE and AVX versions of fp divs and sqrts in the SandyBridge/Haswell/Broadwell/Skylake scheduler models.
I've used Agner's data as best I could to get the values to converge on.

llvm-svn: 328473
2018-03-26 05:05:10 +00:00
Simon Pilgrim e3547af7be [X86] Add the ability to override memory folding latency to schedules and add 1uop for memory folds for Intel models
The Intel models need an extra 1uop for memory folded instructions, plus a lot of instructions take a non-default memory latency which should allow us to use the multiclass a lot more to tidy things up.

Differential Revision: https://reviews.llvm.org/D44840

llvm-svn: 328446
2018-03-25 10:21:19 +00:00
Simon Pilgrim c21deec37b [X86][Broadwell] Merge xmm/ymm instructions instregex entries to reduce regex matches to reduce compile time
llvm-svn: 328434
2018-03-24 19:37:28 +00:00
Craig Topper 40d3b32e12 [X86] Rename VROUNDYPS* and VROUNDYPD* instructions to VROUNDPSY* and VROUNDPDY*. Fix itinerary mistake on all memory forms of VROUNDPD
This makes the Y position consistent with other instructions.

This should have been NFC, but while refactoring the multiclass I noticed that VROUNDPD memory forms were using the register itinerary.

llvm-svn: 328254
2018-03-22 21:55:20 +00:00
Craig Topper 4a3be6e578 [X86] Correct the scheduling data for some of the 32 and 64 bit multiplies to as best as I understand how they are implemented.
llvm-svn: 328231
2018-03-22 19:22:51 +00:00
Simon Pilgrim 53b2c3329a [X86][SSE42] Use the default PCMPEST/PCMPIST scheduler classes directly. NFCI.
Models were completely overriding all SSE42 strins instructions when the default classes could be used for exactly the same coverage.

llvm-svn: 328203
2018-03-22 14:56:18 +00:00
Simon Pilgrim 3b2ff1faa9 [X86][CLMUL] Use the default CLMUL scheduler classes directly. NFCI.
Models were completely overriding all CLMUL instructions when the WriteCLMUL default classes could be used for exactly the same coverage.

llvm-svn: 328194
2018-03-22 13:37:30 +00:00
Simon Pilgrim 7684e055b3 [X86] Use the default AES scheduler classes directly. NFCI.
Models were completely overriding all AES instructions when the WriteAES default classes could be used for exactly the same coverage.

Removes 6 unnecessary scheduler classes from every model.

Note: Still looking for a way for tblgen to warn when this is happening - often the override is more complete than the default.
llvm-svn: 328192
2018-03-22 13:18:08 +00:00
Craig Topper 5a69a0011b [X86][Broadwell] Merge multiple InstrRW entries that map to the same SchedWriteRes group (NFCI) (PR35955)
llvm-svn: 328076
2018-03-21 06:28:42 +00:00
Craig Topper 3e9462607e [X86] Add TEST16mi/TEST32mi/TEST64mi32 to the Sandybridge/Haswell/Broadwell/Skylake scheduler models.
Move it from a load+store group on SNB to a load only group, the same group as CMP.

llvm-svn: 327944
2018-03-20 03:02:03 +00:00
Craig Topper b4c7873f8c [X86] Add JCXZ/JECXZ to Sandybridge/Haswell/Broadwell/Skylake scheduler models.
JRCXZ was already present, but not the others.

We never codegen this instruction so this doesn't affect much just trying to get them all into a single generated scheduler class in the output.

llvm-svn: 327881
2018-03-19 19:00:32 +00:00