This took a bit of extra work as on Intel targets the old (V)PSLLDrr/(V)PSLLDrm style instructions act differently - I ended up creating WriteVecShiftImm classes for XMM/YMM/ZMM vector shift by immediate and retaining WriteVecShift as the default (used only by MMX) plus WriteVecShiftX/WriteVecShiftY. X86SchedWriteWidths hides most of this thank goodness.
llvm-svn: 331472
Also retagged VDBPSADBW instructions as SchedWritePSADBW instead of SchedWriteVecIMul which matches the behaviour on SkylakeServer (the only thing that supports it...)
llvm-svn: 331445
Split off pinsr/pextr and extractps instructions.
(Mostly) fixes PR36887.
Note: It might be worth adding a WriteFInsertLd class as well in the future.
Differential Revision: https://reviews.llvm.org/D45929
llvm-svn: 330714
Split the fp and integer vector logical instruction scheduler classes - older CPUs especially often handled these on different pipes.
This unearthed a couple of things that are also handled in this patch:
(1) We were tagging avx512 fp logic ops as WriteFAdd, probably because of the lack of WriteFLogic
(2) SandyBridge had integer logic ops only using Port5, when afaict they can use Ports015.
(3) Cleaned up x86 FCHS/FABS scheduling as they are typically treated as fp logic ops.
Differential Revision: https://reviews.llvm.org/D45629
llvm-svn: 330480
Split VCMP/VMAX/VMIN instructions off to WriteFCmp and VCOMIS instructions off to WriteFCom instead of assuming they match WriteFAdd
Differential Revision: https://reviews.llvm.org/D45656
llvm-svn: 330179
Summary:
Cmov and setcc previously used WriteALU, but on Intel processors at least they are more restricted than basic ALU ops.
This patch adds new SchedWrites for them and removes the InstRWs. I had to leave some InstRWs for CMOVA/CMOVBE and SETA/SETBE because those have an extra uop relative to the other condition codes on Intel CPUs.
The test changes are due to fixing a missing ZnAGU dependency on the memory form of setcc.
Reviewers: RKSimon, andreadb, GGanesh
Reviewed By: RKSimon
Subscribers: GGanesh, llvm-commits
Differential Revision: https://reviews.llvm.org/D45380
llvm-svn: 329539
This patch adds the ability to describe properties of the hardware retire
control unit.
Tablegen class RetireControlUnit has been added for this purpose (see
TargetSchedule.td).
A RetireControlUnit specifies the size of the reorder buffer, as well as the
maximum number of opcodes that can be retired every cycle.
A zero (or negative) value for the reorder buffer size means: "the size is
unknown". If the size is unknown, then llvm-mca defaults it to the value of
field SchedMachineModel::MicroOpBufferSize. A zero or negative number of
opcodes retired per cycle means: "there is no restriction on the number of
instructions that can be retired every cycle".
Models can optionally specify an instance of RetireControlUnit. There can only
be up-to one RetireControlUnit definition per scheduling model.
Information related to the RCU (RetireControlUnit) is stored in (two new fields
of) MCExtraProcessorInfo. llvm-mca loads that information when it initializes
the DispatchUnit / RetireControlUnit (see Dispatch.h/Dispatch.cpp).
This patch fixes PR36661.
Differential Revision: https://reviews.llvm.org/D45259
llvm-svn: 329304
This patch allows the description of register files in processor scheduling
models. This addresses PR36662.
A new tablegen class named 'RegisterFile' has been added to TargetSchedule.td.
Targets can optionally describe register files for their processors using that
class. In particular, class RegisterFile allows to specify:
- The total number of physical registers.
- Which target registers are accessible through the register file.
- The cost of allocating a register at register renaming stage.
Example (from this patch - see file X86/X86ScheduleBtVer2.td)
def FpuPRF : RegisterFile<72, [VR64, VR128, VR256], [1, 1, 2]>
Here, FpuPRF describes a register file for MMX/XMM/YMM registers. On Jaguar
(btver2), a YMM register definition consumes 2 physical registers, while MMX/XMM
register definitions only cost 1 physical register.
The syntax allows to specify an empty set of register classes. An empty set of
register classes means: this register file models all the registers specified by
the Target. For each register class, users can specify an optional register
cost. By default, register costs default to 1. A value of 0 for the number of
physical registers means: "this register file has an unbounded number of
physical registers".
This patch is structured in two parts.
* Part 1 - MC/Tablegen *
A first part adds the tablegen definition of RegisterFile, and teaches the
SubtargetEmitter how to emit information related to register files.
Information about register files is accessible through an instance of
MCExtraProcessorInfo.
The idea behind this design is to logically partition the processor description
which is only used by external tools (like llvm-mca) from the processor
information used by the llvm machine schedulers.
I think that this design would make easier for targets to get rid of the extra
processor information if they don't want it.
* Part 2 - llvm-mca related *
The second part of this patch is related to changes to llvm-mca.
The main differences are:
1) class RegisterFile now needs to take into account the "cost of a register"
when allocating physical registers at register renaming stage.
2) Point 1. triggered a minor refactoring which lef to the removal of the
"maximum 32 register files" restriction.
3) The BackendStatistics view has been updated so that we can print out extra
details related to each register file implemented by the processor.
The effect of point 3. is also visible in tests register-files-[1..5].s.
Differential Revision: https://reviews.llvm.org/D44980
llvm-svn: 329067
Summary:
It seems many CPUs don't implement this instruction as well as the other vector multiplies. Often using a multi uop flow. Silvermont in particular has a 7 uop flow with 11 cycle throughput. Sandy Bridge implements it as a single uop with 5 cycle latency and 1 cycle throughput. But Haswell and later use 2 uops with 10 cycle latency and 2 cycle throughput.
This patch adds a new X86SchedWritePair we can use to tag this instruction separately. I've provided correct information for Silvermont, Btver2, and Sandy Bridge. I've removed the InstRWs for SandyBridge. I've left Haswell/Broadwell/Skylake InstRWs in place because I wasn't sure how to account for the different load latency between 128 and 256 bits. I also left Znver1 InstRWs in place because the existing values don't match Agner's spreadsheet.
I also left a FIXME in the SandyBridge model because it being used for the "generic" model is too optimistic for the 256/512-bit versions since those are multiple uops on all known CPUs.
Reviewers: RKSimon, GGanesh, courbet
Reviewed By: RKSimon
Subscribers: gchatelet, gbedwell, andreadb, llvm-commits
Differential Revision: https://reviews.llvm.org/D44972
llvm-svn: 328914
These instructions have the memory operand before the register operand. So we need to put ReadDefault for all the load ops first. Then the ReadAfterLd
Differential Revision: https://reviews.llvm.org/D44838
llvm-svn: 328823
Similar to r328694. The number of micro opcodes should be 2 for those
instructions.
This was found when testing AVX code for BtVer2 using llvm-mca.
llvm-svn: 328698
The Jaguar backend natively supports 128-bit data types. Operations on YMM
registers are split into two COPs (complex operations). Each COP consumes a slot
in the dispatch group, and in the reorder buffer.
The scheduling model for Jaguar should mark those instructions as `let
NumMicroOps = 2`.
This was found when testing AVX code for BtVer2 using llvm-mca.
llvm-svn: 328694
Currently MOVMSK instructions use the WriteVecLogic class, which is a very poor choice given that MOVMSK involves a SSE->GPR transfer.
Differential Revision: https://reviews.llvm.org/D44924
llvm-svn: 328664
Currently CRC32 instructions use the WriteFAdd class, this patch splits them off into their own, at the moment it is still mostly just a duplicate of WriteFAdd but it can now be tweaked on a target by target basis.
Differential Revision: https://reviews.llvm.org/D44647
llvm-svn: 328582
Give the bit count instructions their own scheduler classes instead of forcing them into existing classes.
These were mostly overridden anyway, but I had to add in costs from Agner for silvermont and znver1 and the Fam16h SoG for btver2 (Jaguar).
Differential Revision: https://reviews.llvm.org/D44879
llvm-svn: 328566