Commit Graph

129 Commits

Author SHA1 Message Date
Simon Pilgrim 2864b46469 [X86] Split off WriteIMul64 from WriteIMul schedule class (PR36931)
This fixes a couple of BtVer2 missing instructions that weren't been handled in the override.

NOTE: There are still a lot of overrides that still need cleaning up!
llvm-svn: 331770
2018-05-08 14:55:16 +00:00
Simon Pilgrim 2580554333 [X86] Split WriteIDiv into div/idiv 8/16/32/64 implementations (PR36930)
I've created the necessary classes but there are still a lot of overrides that need cleaning up.

NOTE: The Znver1 model was missing some div/idiv variants in the instregex patterns and wasn't setting the resource cycles at all in the overrides.
llvm-svn: 331767
2018-05-08 13:51:45 +00:00
Simon Pilgrim b0a3be04ec [X86] Add vector masked load/store scheduler classes (PR32857)
Split off from existing vector load/store classes to remove InstRW overrides.

llvm-svn: 331760
2018-05-08 12:17:55 +00:00
Simon Pilgrim 210286ed8f [X86] Add SchedWriteFTest/SchedWriteVecTest TEST scheduler classes
Split off from SchedWriteVecLogic to remove InstRW overrides.

llvm-svn: 331757
2018-05-08 10:28:03 +00:00
Simon Pilgrim 1233e1234a [X86] Split WriteFAdd/WriteFCmp/WriteFMul schedule classes
Split to support single/double for scalar, XMM and YMM/ZMM instructions - removing InstrRW overrides for these instructions.

Fixes Atom ADDSUBPD instruction and reclassifies VFPCLASS as WriteFCmp which is closer in behaviour.

llvm-svn: 331672
2018-05-07 20:52:53 +00:00
Simon Pilgrim ac5d0a31ef [X86] Split WriteFDiv schedule classes to support single/double scalar, XMM and YMM/ZMM instructions.
This removes all InstrRW overrides for these instructions - some x87 overrides remain but most use default (and realistic) values.

llvm-svn: 331643
2018-05-07 16:15:46 +00:00
Simon Pilgrim f3ae50fca2 [X86] Split WriteFRcp/WriteFRsqrt/WriteFSqrt schedule classes
WriteFRcp/WriteFRsqrt are split to support scalar, XMM and YMM/ZMM instructions.

WriteFSqrt is split into single/double/long-double sizes and scalar, XMM, YMM and ZMM instructions.

This removes all InstrRW overrides for these instructions.

NOTE: There were a couple of typos in the Znver1 model - notably a 1cy throughput for SQRT that is highly unlikely and doesn't tally with Agner.

NOTE: I had to add Agner's numbers for several targets for WriteFSqrt80.
llvm-svn: 331629
2018-05-07 11:50:44 +00:00
Simon Pilgrim 0e51a125ea [X86] Add WriteEMMS scheduler class
Filled in the missing values from Btver2 SoG or Agner

llvm-svn: 331546
2018-05-04 18:16:13 +00:00
Simon Pilgrim d7ffbc5c7e [X86] Finish splitting WriteVecShift and WriteVecIMul to remove InstRW overrides.
llvm-svn: 331543
2018-05-04 17:47:46 +00:00
Simon Pilgrim 67cc246dca [X86] Cleanup SchedWriteFMA classes and use X86SchedWriteWidths directly.
Rename scalar and XMM versions, this is to match/simplify an upcoming change to split MUL/DIV/SQRT scalar/xmm/ymm/zmm classes.

llvm-svn: 331531
2018-05-04 15:20:18 +00:00
Simon Pilgrim bf4c8c0ff2 [X86] Add WriteVecMOVMSKY scheduler class
llvm-svn: 331525
2018-05-04 14:54:33 +00:00
Simon Pilgrim be51b20127 [X86] Add SchedWriteFRnd fp rounding scheduler classes
Split off from SchedWriteFAdd for fp rounding/bit-manipulation instructions.

Fixes an issue on btver2 which only had the ymm version using the JSTC pipe instead of JFPA.

llvm-svn: 331515
2018-05-04 12:59:24 +00:00
Simon Pilgrim 542b20d656 [X86] Add WriteDPPD/WriteDPPS dot product scheduler classes
llvm-svn: 331489
2018-05-03 22:31:19 +00:00
Simon Pilgrim f2d2cedab4 [X86] Split WriteVecShift/WriteVarVecShift into MMX, XMM and YMM/ZMM scheduler classes
This took a bit of extra work as on Intel targets the old (V)PSLLDrr/(V)PSLLDrm style instructions act differently - I ended up creating WriteVecShiftImm classes for XMM/YMM/ZMM vector shift by immediate and retaining WriteVecShift as the default (used only by MMX) plus WriteVecShiftX/WriteVecShiftY. X86SchedWriteWidths hides most of this thank goodness.

llvm-svn: 331472
2018-05-03 17:56:43 +00:00
Simon Pilgrim f7dd6069a5 [X86] Split WriteVecALU/WritePHAdd into XMM and YMM/ZMM scheduler classes
llvm-svn: 331453
2018-05-03 13:27:10 +00:00
Simon Pilgrim 93c878c76b [X86] Split WriteVecIMul/WriteVecPMULLD/WriteMPSAD/WritePSADBW into XMM and YMM/ZMM scheduler classes
Also retagged VDBPSADBW instructions as SchedWritePSADBW instead of SchedWriteVecIMul which matches the behaviour on SkylakeServer (the only thing that supports it...)

llvm-svn: 331445
2018-05-03 10:31:20 +00:00
Clement Courbet 6794660828 [TableGen][NFC] Make ResourceCycles definitions more explicit.
https://reviews.llvm.org/D46356

llvm-svn: 331439
2018-05-03 06:08:47 +00:00
Simon Pilgrim 6732f6ea51 [X86] Split WriteShuffle/WriteVarShuffle + WriteBlend/WriteVarBlend into XMM and YMM/ZMM scheduler classes
llvm-svn: 331386
2018-05-02 18:48:23 +00:00
Simon Pilgrim 21caf0124f [X86] Split WriteFMul/WriteFDiv into XMM and YMM/ZMM scheduler classes
llvm-svn: 331293
2018-05-01 18:22:53 +00:00
Simon Pilgrim c708868cb1 [X86] Split WriteFRcp/WriteFRsqrt/WriteFSqrt into XMM and YMM/ZMM scheduler classes
llvm-svn: 331290
2018-05-01 18:06:07 +00:00
Simon Pilgrim c546f9424f [X86] Split WriteFCmp into XMM and YMM/ZMM scheduler classes
Removes more WriteFCmp InstRW overrides

llvm-svn: 331283
2018-05-01 16:50:16 +00:00
Simon Pilgrim 5269167f5b [X86] Split WriteFAdd into XMM and YMM/ZMM scheduler classes
Removes more WriteFAdd InstRW overrides

llvm-svn: 331276
2018-05-01 16:13:42 +00:00
Simon Pilgrim dd8eae128b [X86] Split WriteFShuffle into XMM and YMM/ZMM scheduler classes
Removes more WriteFShuffle InstRW overrides

llvm-svn: 331264
2018-05-01 14:25:01 +00:00
Simon Pilgrim 57f2b185ac [X86] Split WriteVecLogic into XMM and YMM/ZMM scheduler classes
This removes all the WriteVecLogic InstRW overrides.

llvm-svn: 331258
2018-05-01 12:39:17 +00:00
Simon Pilgrim 8a937e00d8 [X86] Split WriteFBlend/WriteFVarBlend/WriteFVarShuffle into XMM and YMM/ZMM scheduler classes
This removes all the WriteFBlend/WriteFVarBlend InstRW overrides - some WriteFVarShuffle remain to be fixed.

llvm-svn: 331065
2018-04-27 18:19:48 +00:00
Simon Pilgrim c3c767bf50 [X86] Split WriteFHadd into XMM and YMM/ZMM scheduler classes
This removes all the HADD/HSUB PS/PD InstRW overrides.

llvm-svn: 331054
2018-04-27 16:11:57 +00:00
Simon Pilgrim b2aa89c909 [X86][AVX] Split WriteFLogic into XMM and YMM/ZMM scheduler classes
This removes all the AND/ANDN/OR/XOR PS/PD InstRW overrides.

llvm-svn: 331051
2018-04-27 15:50:33 +00:00
Simon Pilgrim dbd1ae7ddd [X86] Split WriteFMA into XMM, Scalar and YMM/ZMM scheduler classes
This removes all the FMA InstRW overrides.

If we ever get PR36924, then we can remove many of these declarations from models.

llvm-svn: 330820
2018-04-25 13:07:58 +00:00
Simon Pilgrim 27bc83e228 [X86] Split off PHMINPOSUW to their own schedule class
This also fixes Jaguar's schedule which was treating it as the WriteVecIMul default. 

llvm-svn: 330756
2018-04-24 18:49:25 +00:00
Simon Pilgrim f0945aa0e0 [X86][F16C] Add WriteCvtF2FSt scheduling class
Fixes the classification of VCVTPS2PHmr/VCVTPS2PHYmr which were tagged as WriteCvtF2FLd_WriteRMW (PR36887)

llvm-svn: 330737
2018-04-24 16:43:07 +00:00
Simon Pilgrim 828ef9e013 [X86][BtVer2] Fix VCVTPS2PHmr/VCVTPS2PHYmr latencies
These are stores, not loads, so don't need to account for load latency.

llvm-svn: 330735
2018-04-24 16:26:51 +00:00
Simon Pilgrim f7d2a93d5f [X86] Add vector element insertion/extraction scheduler classes
Split off pinsr/pextr and extractps instructions.

(Mostly) fixes PR36887.

Note: It might be worth adding a WriteFInsertLd class as well in the future.

Differential Revision: https://reviews.llvm.org/D45929

llvm-svn: 330714
2018-04-24 13:21:41 +00:00
Craig Topper 05242bf691 [X86] Add SchedWrites for LDMXCSR/STMXCSR.
llvm-svn: 330517
2018-04-21 18:07:36 +00:00
Simon Pilgrim d14d2e7b18 [X86] Add WriteFSign/WriteFLogic scheduler classes
Split the fp and integer vector logical instruction scheduler classes - older CPUs especially often handled these on different pipes.

This unearthed a couple of things that are also handled in this patch:

(1) We were tagging avx512 fp logic ops as WriteFAdd, probably because of the lack of WriteFLogic
(2) SandyBridge had integer logic ops only using Port5, when afaict they can use Ports015.
(3) Cleaned up x86 FCHS/FABS scheduling as they are typically treated as fp logic ops.

Differential Revision: https://reviews.llvm.org/D45629

llvm-svn: 330480
2018-04-20 21:16:05 +00:00
Simon Pilgrim df8fa6d734 [X86][BtVer2] Cleanup some old FIXMEs from the model. NFCI.
llvm-svn: 330428
2018-04-20 13:12:04 +00:00
Simon Pilgrim f21ace6cdd [X86][BtVer2] Remove SSE4A EXTRQ/EXTRQI InstRW overrides.
These are already handled identically by WriteALU.

llvm-svn: 330332
2018-04-19 14:38:36 +00:00
Simon Pilgrim 33dede9075 [X86][BtVer2] Remove 128-bit F16C InstRW overrides.
These are already handled identically by WriteCvtF2F.

llvm-svn: 330318
2018-04-19 11:16:33 +00:00
Craig Topper e56a2fc5e7 [X86] Add separate scheduling class for PSADBW instruction.
llvm-svn: 330204
2018-04-17 19:35:19 +00:00
Simon Pilgrim 86e3c26924 [X86] Add FP comparison scheduler classes
Split VCMP/VMAX/VMIN instructions off to WriteFCmp and VCOMIS instructions off to WriteFCom instead of assuming they match WriteFAdd

Differential Revision: https://reviews.llvm.org/D45656

llvm-svn: 330179
2018-04-17 07:22:44 +00:00
Simon Pilgrim 89c8a10f7c [X86] Add variable shuffle schedule classes
Split variable index shuffles from immediate index shuffles

WriteFVarShuffle - variable 'in-lane' shuffles (VPERMILPS/VPERMIL2PS etc.)
WriteVarShuffle - variable 'in-lane' shuffles (PSHUFB/VPPERM etc.)

WriteFVarShuffle256 - variable 'cross-lane' shuffles (VPERMPS etc.)
WriteVarShuffle256 - variable 'cross-lane' shuffles (VPERMD etc.)

Differential Revision: https://reviews.llvm.org/D45404

llvm-svn: 329806
2018-04-11 13:49:19 +00:00
Craig Topper b7baa358f6 [X86] Add SchedWrites for CMOV and SETCC. Use them to remove InstRWs.
Summary:
Cmov and setcc previously used WriteALU, but on Intel processors at least they are more restricted than basic ALU ops.

This patch adds new SchedWrites for them and removes the InstRWs. I had to leave some InstRWs for CMOVA/CMOVBE and SETA/SETBE because those have an extra uop relative to the other condition codes on Intel CPUs.

The test changes are due to fixing a missing ZnAGU dependency on the memory form of setcc.

Reviewers: RKSimon, andreadb, GGanesh

Reviewed By: RKSimon

Subscribers: GGanesh, llvm-commits

Differential Revision: https://reviews.llvm.org/D45380

llvm-svn: 329539
2018-04-08 17:53:18 +00:00
Simon Pilgrim 86588fc809 [X86][Btver2] Add vector extract costs
llvm-svn: 329524
2018-04-08 11:26:26 +00:00
Andrea Di Biagio c74ad502ce [MC][Tablegen] Allow models to describe the retire control unit for llvm-mca.
This patch adds the ability to describe properties of the hardware retire
control unit.

Tablegen class RetireControlUnit has been added for this purpose (see
TargetSchedule.td).

A RetireControlUnit specifies the size of the reorder buffer, as well as the
maximum number of opcodes that can be retired every cycle.

A zero (or negative) value for the reorder buffer size means: "the size is
unknown". If the size is unknown, then llvm-mca defaults it to the value of
field SchedMachineModel::MicroOpBufferSize.  A zero or negative number of
opcodes retired per cycle means: "there is no restriction on the number of
instructions that can be retired every cycle".

Models can optionally specify an instance of RetireControlUnit. There can only
be up-to one RetireControlUnit definition per scheduling model.

Information related to the RCU (RetireControlUnit) is stored in (two new fields
of) MCExtraProcessorInfo.  llvm-mca loads that information when it initializes
the DispatchUnit / RetireControlUnit (see Dispatch.h/Dispatch.cpp).

This patch fixes PR36661.

Differential Revision: https://reviews.llvm.org/D45259

llvm-svn: 329304
2018-04-05 15:41:41 +00:00
Andrea Di Biagio 9da4d6db33 [MC][Tablegen] Allow the definition of processor register files in the scheduling model for llvm-mca
This patch allows the description of register files in processor scheduling
models. This addresses PR36662.

A new tablegen class named 'RegisterFile' has been added to TargetSchedule.td.
Targets can optionally describe register files for their processors using that
class. In particular, class RegisterFile allows to specify:
 - The total number of physical registers.
 - Which target registers are accessible through the register file.
 - The cost of allocating a register at register renaming stage.

Example (from this patch - see file X86/X86ScheduleBtVer2.td)

  def FpuPRF : RegisterFile<72, [VR64, VR128, VR256], [1, 1, 2]>

Here, FpuPRF describes a register file for MMX/XMM/YMM registers. On Jaguar
(btver2), a YMM register definition consumes 2 physical registers, while MMX/XMM
register definitions only cost 1 physical register.

The syntax allows to specify an empty set of register classes.  An empty set of
register classes means: this register file models all the registers specified by
the Target.  For each register class, users can specify an optional register
cost. By default, register costs default to 1.  A value of 0 for the number of
physical registers means: "this register file has an unbounded number of
physical registers".

This patch is structured in two parts.

* Part 1 - MC/Tablegen *

A first part adds the tablegen definition of RegisterFile, and teaches the
SubtargetEmitter how to emit information related to register files.

Information about register files is accessible through an instance of
MCExtraProcessorInfo.
The idea behind this design is to logically partition the processor description
which is only used by external tools (like llvm-mca) from the processor
information used by the llvm machine schedulers.
I think that this design would make easier for targets to get rid of the extra
processor information if they don't want it.

* Part 2 - llvm-mca related *

The second part of this patch is related to changes to llvm-mca.

The main differences are:
 1) class RegisterFile now needs to take into account the "cost of a register"
when allocating physical registers at register renaming stage.
 2) Point 1. triggered a minor refactoring which lef to the removal of the
"maximum 32 register files" restriction.
 3) The BackendStatistics view has been updated so that we can print out extra
details related to each register file implemented by the processor.

The effect of point 3. is also visible in tests register-files-[1..5].s.

Differential Revision: https://reviews.llvm.org/D44980

llvm-svn: 329067
2018-04-03 13:36:24 +00:00
Simon Pilgrim 3b8ad346f9 [X86][Btver2] Add MMX_PSHUFB to the JWritePSHUFB InstRW entries
llvm-svn: 328918
2018-03-31 09:15:54 +00:00
Craig Topper 13a0f83a05 [X86] Add SchedRW for PMULLD
Summary:
It seems many CPUs don't implement this instruction as well as the other vector multiplies. Often using a multi uop flow. Silvermont in particular has a 7 uop flow with 11 cycle throughput. Sandy Bridge implements it as a single uop with 5 cycle latency and 1 cycle throughput. But Haswell and later use 2 uops with 10 cycle latency and 2 cycle throughput.

This patch adds a new X86SchedWritePair we can use to tag this instruction separately. I've provided correct information for Silvermont, Btver2, and Sandy Bridge. I've removed the InstRWs for SandyBridge. I've left Haswell/Broadwell/Skylake InstRWs in place because I wasn't sure how to account for the different load latency between 128 and 256 bits. I also left Znver1 InstRWs in place because the existing values don't match Agner's spreadsheet.

I also left a FIXME in the SandyBridge model because it being used for the "generic" model is too optimistic for the 256/512-bit versions since those are multiple uops on all known CPUs.

Reviewers: RKSimon, GGanesh, courbet

Reviewed By: RKSimon

Subscribers: gchatelet, gbedwell, andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D44972

llvm-svn: 328914
2018-03-31 04:54:32 +00:00
Andrea Di Biagio dc97172b2f [X86][BtVer2] Fixed the number of micro opcodes for AVX vector converts and
VSQRT instructions.

There were still a few AVX instructions with an incorrect number of opcodes.
These should be fixed now.

llvm-svn: 328892
2018-03-30 18:53:47 +00:00
Andrea Di Biagio 3eaa26bb64 [X86][BtVer2] Fix the number of uOps for horizontal operations.
llvm-svn: 328886
2018-03-30 18:15:30 +00:00
Andrea Di Biagio 073a9d74ca [X86][BtVer2] Add missing ReadAfterLd to RM variants of AVX horizontal adds and
most vector logic instructions.

Fixed a few InstRW that forgot to specify a ReadAfterLd for the register input
operand.

llvm-svn: 328867
2018-03-30 14:48:08 +00:00
Craig Topper 89310f56c8 [X86] Correct the placement of ReadAfterLd in BEXTR and BZHI. Add dedicated SchedRW for BEXTR/BZHI.
These instructions have the memory operand before the register operand. So we need to put ReadDefault for all the load ops first. Then the ReadAfterLd

Differential Revision: https://reviews.llvm.org/D44838

llvm-svn: 328823
2018-03-29 20:41:39 +00:00