Commit Graph

209 Commits

Author SHA1 Message Date
Simon Pilgrim 061096d2c2 [llvm-mca][x86] Remove addsubpd from SSE2 tests
llvm-svn: 331678
2018-05-07 21:10:48 +00:00
Simon Pilgrim 1233e1234a [X86] Split WriteFAdd/WriteFCmp/WriteFMul schedule classes
Split to support single/double for scalar, XMM and YMM/ZMM instructions - removing InstrRW overrides for these instructions.

Fixes Atom ADDSUBPD instruction and reclassifies VFPCLASS as WriteFCmp which is closer in behaviour.

llvm-svn: 331672
2018-05-07 20:52:53 +00:00
Simon Pilgrim e480ed0b9f [X86][AVX2] Tag VPMOVSX/VPMOVZX ymm instructions as WriteShuffle256
These are more like cross-lane shuffles than regular shuffles - we already do this for AVX512 equivalents.

Differential Revision: https://reviews.llvm.org/D46229

llvm-svn: 331659
2018-05-07 18:25:19 +00:00
Simon Pilgrim 763bf12085 [X86][Znver1] Remove WriteFMul/WriteFRcp InstRW overrides/aliases.
Fixes x87 schedules to more closely match Agner - AMD doesn't tend to "special case" x87 instructions as much as Intel.

llvm-svn: 331645
2018-05-07 16:34:26 +00:00
Simon Pilgrim ac5d0a31ef [X86] Split WriteFDiv schedule classes to support single/double scalar, XMM and YMM/ZMM instructions.
This removes all InstrRW overrides for these instructions - some x87 overrides remain but most use default (and realistic) values.

llvm-svn: 331643
2018-05-07 16:15:46 +00:00
Simon Pilgrim f3ae50fca2 [X86] Split WriteFRcp/WriteFRsqrt/WriteFSqrt schedule classes
WriteFRcp/WriteFRsqrt are split to support scalar, XMM and YMM/ZMM instructions.

WriteFSqrt is split into single/double/long-double sizes and scalar, XMM, YMM and ZMM instructions.

This removes all InstrRW overrides for these instructions.

NOTE: There were a couple of typos in the Znver1 model - notably a 1cy throughput for SQRT that is highly unlikely and doesn't tally with Agner.

NOTE: I had to add Agner's numbers for several targets for WriteFSqrt80.
llvm-svn: 331629
2018-05-07 11:50:44 +00:00
Simon Pilgrim 0e51a125ea [X86] Add WriteEMMS scheduler class
Filled in the missing values from Btver2 SoG or Agner

llvm-svn: 331546
2018-05-04 18:16:13 +00:00
Simon Pilgrim be51b20127 [X86] Add SchedWriteFRnd fp rounding scheduler classes
Split off from SchedWriteFAdd for fp rounding/bit-manipulation instructions.

Fixes an issue on btver2 which only had the ymm version using the JSTC pipe instead of JFPA.

llvm-svn: 331515
2018-05-04 12:59:24 +00:00
Simon Pilgrim 0aed731516 [X86][Znver1] Use SchedAlias to tag microcoded scheduler classes
Avoids extra entries in the class tables.

Found a typo that missed the MMX_PHSUBSW instruction.

llvm-svn: 331488
2018-05-03 22:12:23 +00:00
Simon Pilgrim 350c22c587 [X86][SNB] Fix scheduling of MMX integer multiply instructions.
The entries were being bound to the wrong class.

llvm-svn: 331388
2018-05-02 19:26:14 +00:00
Clement Courbet a1a3095d88 [X86] Fix scheduling info for (V?)SQRTPDm on silvermont.
https://reviews.llvm.org/D46356

llvm-svn: 331356
2018-05-02 13:46:14 +00:00
Andrea Di Biagio e047d3529b [llvm-mca] Correctly handle zero-latency stores that consume pipeline resources.
This fixes PR37293.

We can have scheduling classes with no write latency entries, that still consume
processor resources. We don't want to treat those instructions as zero-latency
instructions; they still have to be issued to the underlying pipelines, so they
still consume resource cycles.

This is likely to be a regression which I have accidentally introduced at
revision 330807. Now, if an instruction has a non-empty set of write processor
resources, we conservatively treat it as a normal (i.e. non zero-latency)
instruction.

llvm-svn: 331193
2018-04-30 15:55:04 +00:00
Andrea Di Biagio 77bd1c748a [llvm-mca] Regenerate test Atom/resources-sse3.s. NFC
Before this change, it wrongly specified -mcpu=slm instead of -mcpu=atom.

llvm-svn: 331170
2018-04-30 12:13:04 +00:00
Andrea Di Biagio e9384eb13b [llvm-mca] Support for in-order CPU for -instruction-tables testing.
Added Intel Atom tests to verify that the tool correctly generates instruction
tables even if the CPU is in-order.

Fixes PR37282.

llvm-svn: 331169
2018-04-30 12:05:34 +00:00
Simon Pilgrim 8962c344f9 [llvm-mca][X86] Add BT resource tests to all models
llvm-svn: 331144
2018-04-29 15:45:31 +00:00
Simon Pilgrim 2d569361fc [llvm-mca][X86] Add add/adc + sub/sbb resource tests to all models
llvm-svn: 331140
2018-04-29 11:03:25 +00:00
Simon Pilgrim 318e9d39ab [llvm-mca][X86] Add double shift resource tests to all relevant models
llvm-svn: 331109
2018-04-28 15:18:49 +00:00
Simon Pilgrim 4d0187c893 [llvm-mca][X86] Add shift/rotate resource tests to all relevant models
I intend to add further instruction tests to the resources-x86_64.s test file as required, but this initial commit is to help remove a load of unnecessary InstRW overrides in a future patch

llvm-svn: 331108
2018-04-28 14:56:18 +00:00
Simon Pilgrim 7574ffd7bc [llvm-mca][X86] Updated fma3 tests after rL330820
llvm-svn: 330822
2018-04-25 13:19:04 +00:00
Andrea Di Biagio 93c49d5e58 [llvm-mca] Default to the native host cpu if flag -mcpu is not specified.
llvm-svn: 330809
2018-04-25 10:18:25 +00:00
Simon Pilgrim 27bc83e228 [X86] Split off PHMINPOSUW to their own schedule class
This also fixes Jaguar's schedule which was treating it as the WriteVecIMul default. 

llvm-svn: 330756
2018-04-24 18:49:25 +00:00
Simon Pilgrim f0945aa0e0 [X86][F16C] Add WriteCvtF2FSt scheduling class
Fixes the classification of VCVTPS2PHmr/VCVTPS2PHYmr which were tagged as WriteCvtF2FLd_WriteRMW (PR36887)

llvm-svn: 330737
2018-04-24 16:43:07 +00:00
Simon Pilgrim 828ef9e013 [X86][BtVer2] Fix VCVTPS2PHmr/VCVTPS2PHYmr latencies
These are stores, not loads, so don't need to account for load latency.

llvm-svn: 330735
2018-04-24 16:26:51 +00:00
Simon Pilgrim f35b8ac196 [X86][IVB] Add F16C resource tests.
Note this is IvyBridge (which shares the model) NOT SandyBridge.

llvm-svn: 330734
2018-04-24 16:22:59 +00:00
Andrea Di Biagio 0626864fa4 [llvm-mca] Default the output asm dialect used by the instruction printer to the input asm dialect.
The instruction printer used by llvm-mca to generate the performance report now
defaults the output assembly format to the format used for the input assembly
file.

On x86, the asm format can be either AT&T or Intel, depending on the
presence/absence of directive `.intel_syntax`.

Users can still specify a different assembly dialect with the command line flag
-output-asm-variant=<uint>.

llvm-svn: 330733
2018-04-24 16:19:08 +00:00
Simon Pilgrim 16299273d0 [X86] Remove unnecessary FMA reg-mem InstRW scheduler overrides.
llvm-svn: 330720
2018-04-24 14:47:11 +00:00
Simon Pilgrim f7d2a93d5f [X86] Add vector element insertion/extraction scheduler classes
Split off pinsr/pextr and extractps instructions.

(Mostly) fixes PR36887.

Note: It might be worth adding a WriteFInsertLd class as well in the future.

Differential Revision: https://reviews.llvm.org/D45929

llvm-svn: 330714
2018-04-24 13:21:41 +00:00
Simon Pilgrim 87ba905fe9 [llvm-mca][X86] Add BMI/LZCNT/POPCNT resource tests to all relevant models
The SandyBridge BMI tests are actually run on IvyBridge as that's the first lowest CPU that actually support the ISAs (but still use the SandyBridge model).

llvm-svn: 330556
2018-04-22 20:42:24 +00:00
Simon Pilgrim 96855ec39e [X86] Remove unnecessary WriteFVarBlend/WriteVarBlend InstRW overrides.
This also fixes some of the ReadAfterLd issues due to InstRW.

llvm-svn: 330544
2018-04-22 14:43:12 +00:00
Simon Pilgrim 5e9f1da0cd [llvm-mca][X86] Add POPCNT resource test
llvm-svn: 330540
2018-04-22 09:58:00 +00:00
Simon Pilgrim e25aa02bc4 [llvm-mca][X86] Add AVX2 resource tests
llvm-svn: 330512
2018-04-21 16:12:42 +00:00
Simon Pilgrim d73bd154d9 [llvm-mca][X86] Add SSE resource tests to all models
llvm-svn: 330506
2018-04-21 14:16:57 +00:00
Simon Pilgrim 26178d4336 [llvm-mca][X86] Add MMX resource tests
llvm-svn: 330502
2018-04-21 11:28:59 +00:00
Simon Pilgrim 1264066cd7 [llvm-mca][X86] Add X87 resource tests
llvm-svn: 330499
2018-04-21 10:36:19 +00:00
Simon Pilgrim 1803bfb75f [llvm-mca][X86] Add MMX/SSE/AES/CLMUL resource SandyBridge tests
llvm-svn: 330486
2018-04-20 22:04:11 +00:00
Simon Pilgrim 0a6bfb1843 [llvm-mca][X86] Add prefetch instruction resource tests
llvm-svn: 330371
2018-04-19 22:11:58 +00:00
Simon Pilgrim 7209117868 [llvm-mca][FMA] Add FMA resource tests
llvm-svn: 330366
2018-04-19 21:32:22 +00:00
Simon Pilgrim 4a486c13fa [llvm-mca][X86] Add resource test for every out-of-order scheduler model
I've copied and regenerated a resource file from btver2 to every x86 scheduler model supported by llvm-mca so we have at least some basic coverage.

For most this has been the avx1 tests, but for silvermont I've used sse42 as thats the latest it supports.

More will be added later.

llvm-svn: 330352
2018-04-19 18:08:10 +00:00
Simon Pilgrim f209321d61 [llvm-mca][X86] Add mmx instruction to btver2 resource tests
Useful to see scheduler class deltas against xmm equivalents

llvm-svn: 330335
2018-04-19 15:09:46 +00:00
Simon Pilgrim c310bfa193 [llvm-mca][X86] Add mmx versions of SSSE3 instructions
Move PABS instructions incorrectly tested under SSE2 

llvm-svn: 330295
2018-04-18 20:47:48 +00:00
Greg Bedwell 90d141a295 [UpdateTestChecks] Add update_mca_test_checks.py script
This script can be used to regenerate tests in the
test/tools/llvm-mca directory (PR36904).

Regenerated a number of tests using the pattern: test/tools/llvm-mca/*/*/*.s

Differential Revision: https://reviews.llvm.org/D45369

llvm-svn: 330246
2018-04-18 10:27:45 +00:00
Craig Topper e56a2fc5e7 [X86] Add separate scheduling class for PSADBW instruction.
llvm-svn: 330204
2018-04-17 19:35:19 +00:00
Andrea Di Biagio c752616f30 [llvm-mca] Ensure that instructions with a schedule read-advance are always issued in the right order.
Normally, the Scheduler prioritizes older instructions over younger instructions
during the instruction issue stage. In one particular case where a dependent
instruction had a schedule read-advance associated to one of the input operands,
this rule was not correctly applied.

This patch fixes the issue and adds a test to verify that we don't regress that
particular case.

llvm-svn: 330032
2018-04-13 15:19:07 +00:00
Andrea Di Biagio f41ad5c59e [llvm-mca] Renamed BackendStatistics to RetireControlUnitStatistics.
Also, removed flag -verbose in favor of flag -retire-stats.

llvm-svn: 329794
2018-04-11 12:12:53 +00:00
Andrea Di Biagio 1cc29c045e [llvm-mca] Move the logic that prints scheduler statistics from BackendStatistics to its own view.
Added flag -scheduler-stats to print scheduler related statistics.

llvm-svn: 329792
2018-04-11 11:37:46 +00:00
Andrea Di Biagio 821f650bba [llvm-mca] Move the logic that prints dispatch unit statistics from BackendStatistics to its own view.
This patch moves the logic that collects and analyzes dispatch events to the
DispatchStatistics view.

Added flag -dispatch-stats to print statistics related to the dispatch logic.

llvm-svn: 329708
2018-04-10 14:55:14 +00:00
Andrea Di Biagio 074cef3dfb [llvm-mca] Increase the default number of iterations to 100.
llvm-svn: 329694
2018-04-10 12:50:03 +00:00
Andrea Di Biagio c9f409eb6f Reapply "[llvm-mca] Do not separate iterations with a newline in the timeline view."
This reapplies r329403 with a fix for the floating point rounding issue.

llvm-svn: 329680
2018-04-10 09:55:33 +00:00
Andrea Di Biagio c65901282b [llvm-mca] Add the ability to mark regions of code for analysis (PR36875)
This patch teaches llvm-mca how to parse code comments in search for special
"markers" used to select regions of code.

Example:

# LLVM-MCA-BEGIN My Code Region
  ....
# LLVM-MCA-END

The MCAsmLexer now delegates to an object of class MCACommentParser (i.e. an
AsmCommentConsumer) the parsing of code comments to search for begin/end code
region markers.

A comment starting with substring "LLVM-MCA-BEGIN" marks the beginning of a new
region of code.  A comment starting with substring "LLVM-MCA-END" marks the end
of the last region.

This implementation doesn't allow regions to overlap. Each region can have a
optional description; internally, each region is identified by a range of source
code locations (SMLoc).

MCInst objects are added to a region R only if the source location for the
MCInst is in the range of locations specified by R.

By default, the tool allocates an implicit "Default" code region which contains
every source location.  See new tests llvm-mca-marker-*.s for a few examples.

A new Backend object is created for every region. So, the analysis is conducted
on every parsed code region.  The final report is the union of the reports
generated for every code region.  Note that empty regions are skipped.

Special "[#] Code Region - ..." strings are used in the report to mark the
portion which is specific to a code region only. For example, see
llvm-mca-markers-5.s.

Differential Revision: https://reviews.llvm.org/D45433

llvm-svn: 329590
2018-04-09 16:39:52 +00:00
Hans Wennborg 6400c03e6a Revert r329403 "[llvm-mca] Do not separate iterations with a newline in the timeline view."
This made AArch64/CortexA57/direct-branch.s fail on Windows, e.g.
http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/builds/11251

> Also, update a few tests to minimize the diff in D45369.
> No functional change intended.

llvm-svn: 329569
2018-04-09 13:53:41 +00:00
Simon Pilgrim 86588fc809 [X86][Btver2] Add vector extract costs
llvm-svn: 329524
2018-04-08 11:26:26 +00:00
Andrea Di Biagio 85b8138bc6 [llvm-mca] Do not separate iterations with a newline in the timeline view.
Also, update a few tests to minimize the diff in D45369.
No functional change intended.

llvm-svn: 329403
2018-04-06 15:30:02 +00:00
Andrea Di Biagio c74ad502ce [MC][Tablegen] Allow models to describe the retire control unit for llvm-mca.
This patch adds the ability to describe properties of the hardware retire
control unit.

Tablegen class RetireControlUnit has been added for this purpose (see
TargetSchedule.td).

A RetireControlUnit specifies the size of the reorder buffer, as well as the
maximum number of opcodes that can be retired every cycle.

A zero (or negative) value for the reorder buffer size means: "the size is
unknown". If the size is unknown, then llvm-mca defaults it to the value of
field SchedMachineModel::MicroOpBufferSize.  A zero or negative number of
opcodes retired per cycle means: "there is no restriction on the number of
instructions that can be retired every cycle".

Models can optionally specify an instance of RetireControlUnit. There can only
be up-to one RetireControlUnit definition per scheduling model.

Information related to the RCU (RetireControlUnit) is stored in (two new fields
of) MCExtraProcessorInfo.  llvm-mca loads that information when it initializes
the DispatchUnit / RetireControlUnit (see Dispatch.h/Dispatch.cpp).

This patch fixes PR36661.

Differential Revision: https://reviews.llvm.org/D45259

llvm-svn: 329304
2018-04-05 15:41:41 +00:00
Simon Pilgrim 8139a88cb6 [X86][Btver2] Strip unnecessary check prefixes from resources tests
llvm-svn: 329192
2018-04-04 13:25:45 +00:00
Andrea Di Biagio 8dabf4f145 [llvm-mca] Move the logic that prints register file statistics to its own view. NFCI
Before this patch, the "BackendStatistics" view was responsible for printing the
register file usage (as well as many other statistics).

Now users can enable register file usage statistics using the command line flag
`-register-file-stats`. By default, the tool doesn't print register file
statistics.

llvm-svn: 329083
2018-04-03 16:46:23 +00:00
Andrea Di Biagio 9da4d6db33 [MC][Tablegen] Allow the definition of processor register files in the scheduling model for llvm-mca
This patch allows the description of register files in processor scheduling
models. This addresses PR36662.

A new tablegen class named 'RegisterFile' has been added to TargetSchedule.td.
Targets can optionally describe register files for their processors using that
class. In particular, class RegisterFile allows to specify:
 - The total number of physical registers.
 - Which target registers are accessible through the register file.
 - The cost of allocating a register at register renaming stage.

Example (from this patch - see file X86/X86ScheduleBtVer2.td)

  def FpuPRF : RegisterFile<72, [VR64, VR128, VR256], [1, 1, 2]>

Here, FpuPRF describes a register file for MMX/XMM/YMM registers. On Jaguar
(btver2), a YMM register definition consumes 2 physical registers, while MMX/XMM
register definitions only cost 1 physical register.

The syntax allows to specify an empty set of register classes.  An empty set of
register classes means: this register file models all the registers specified by
the Target.  For each register class, users can specify an optional register
cost. By default, register costs default to 1.  A value of 0 for the number of
physical registers means: "this register file has an unbounded number of
physical registers".

This patch is structured in two parts.

* Part 1 - MC/Tablegen *

A first part adds the tablegen definition of RegisterFile, and teaches the
SubtargetEmitter how to emit information related to register files.

Information about register files is accessible through an instance of
MCExtraProcessorInfo.
The idea behind this design is to logically partition the processor description
which is only used by external tools (like llvm-mca) from the processor
information used by the llvm machine schedulers.
I think that this design would make easier for targets to get rid of the extra
processor information if they don't want it.

* Part 2 - llvm-mca related *

The second part of this patch is related to changes to llvm-mca.

The main differences are:
 1) class RegisterFile now needs to take into account the "cost of a register"
when allocating physical registers at register renaming stage.
 2) Point 1. triggered a minor refactoring which lef to the removal of the
"maximum 32 register files" restriction.
 3) The BackendStatistics view has been updated so that we can print out extra
details related to each register file implemented by the processor.

The effect of point 3. is also visible in tests register-files-[1..5].s.

Differential Revision: https://reviews.llvm.org/D44980

llvm-svn: 329067
2018-04-03 13:36:24 +00:00
Andrea Di Biagio 6fd62feff8 [llvm-mca] Do not assume that implicit reads cannot be associated with ReadAdvance entries.
Before, the instruction builder incorrectly assumed that only explicit reads
could have been associated with ReadAdvance entries.
This patch fixes the issue and adds a test to verify it.

llvm-svn: 328972
2018-04-02 13:46:49 +00:00
Craig Topper 13a0f83a05 [X86] Add SchedRW for PMULLD
Summary:
It seems many CPUs don't implement this instruction as well as the other vector multiplies. Often using a multi uop flow. Silvermont in particular has a 7 uop flow with 11 cycle throughput. Sandy Bridge implements it as a single uop with 5 cycle latency and 1 cycle throughput. But Haswell and later use 2 uops with 10 cycle latency and 2 cycle throughput.

This patch adds a new X86SchedWritePair we can use to tag this instruction separately. I've provided correct information for Silvermont, Btver2, and Sandy Bridge. I've removed the InstRWs for SandyBridge. I've left Haswell/Broadwell/Skylake InstRWs in place because I wasn't sure how to account for the different load latency between 128 and 256 bits. I also left Znver1 InstRWs in place because the existing values don't match Agner's spreadsheet.

I also left a FIXME in the SandyBridge model because it being used for the "generic" model is too optimistic for the 256/512-bit versions since those are multiple uops on all known CPUs.

Reviewers: RKSimon, GGanesh, courbet

Reviewed By: RKSimon

Subscribers: gchatelet, gbedwell, andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D44972

llvm-svn: 328914
2018-03-31 04:54:32 +00:00
Andrea Di Biagio dc97172b2f [X86][BtVer2] Fixed the number of micro opcodes for AVX vector converts and
VSQRT instructions.

There were still a few AVX instructions with an incorrect number of opcodes.
These should be fixed now.

llvm-svn: 328892
2018-03-30 18:53:47 +00:00
Andrea Di Biagio 3eaa26bb64 [X86][BtVer2] Fix the number of uOps for horizontal operations.
llvm-svn: 328886
2018-03-30 18:15:30 +00:00
Andrea Di Biagio 073a9d74ca [X86][BtVer2] Add missing ReadAfterLd to RM variants of AVX horizontal adds and
most vector logic instructions.

Fixed a few InstRW that forgot to specify a ReadAfterLd for the register input
operand.

llvm-svn: 328867
2018-03-30 14:48:08 +00:00
Andrea Di Biagio 42d8ea22c0 [X86][BtVer2] Add tests that show how ReadAfterLd is missing for some
instructions.

In the Btver2 model, there are a few InstRW overrides that don't specify a
ReadAfterLd for the register input operand.

As a result, a few AVX variants of horizontal operations and most vector logic
operations with a folded memory operand don't have a ReadAdvance info associated
to their input register operands.

llvm-svn: 328865
2018-03-30 14:29:33 +00:00
Andrea Di Biagio 01043625cf [X86] Add llvm-mca tests for r328834.
Verify that the ReadAfterLd is correctly applied to FMA and 4-ops variable blend
instructions.

As Craig pointed out in D44726, some Intel models still have to be fixed.

llvm-svn: 328861
2018-03-30 13:38:37 +00:00
Andrea Di Biagio 0823090843 [X86] Add tests to verify the presence of "ReadAfterLd" after r328823.
This change adds a couple of tests to verify the change introduced by revision
328823 ([X86] Correct the placement of ReadAfterLd in BEXTR and BZHI).

llvm-svn: 328859
2018-03-30 11:44:48 +00:00
Andrea Di Biagio 0a837ef6b1 [llvm-mca] Correctly set the ReadAdvance information for register use operands.
The tool was passing the wrong operand index to method
MCSubtargetInfo::getReadAdvanceCycles(). That method requires a "UseIdx", and
not the operand index. This was found when testing X86 code where instructions
had a memory folded operand.

This patch fixes the issue and adds test read-advance-1.s to ensure that
the ReadAfterLd (a ReadAdvance of 3cy) information is correctly used.

llvm-svn: 328790
2018-03-29 14:26:56 +00:00
Andrea Di Biagio 5076b98fb9 [X86][BtVer2] Fix the number of micro opcodes for AES[ENC|DEC] and other YMM instructions.
Similar to r328694. The number of micro opcodes should be 2 for those
instructions.

This was found when testing AVX code for BtVer2 using llvm-mca.

llvm-svn: 328698
2018-03-28 12:12:04 +00:00
Andrea Di Biagio 010924e35c [X86][BtVer2] Fix the number of micro opcodes for a bunch of YMM instructions.
The Jaguar backend natively supports 128-bit data types. Operations on YMM
registers are split into two COPs (complex operations). Each COP consumes a slot
in the dispatch group, and in the reorder buffer.

The scheduling model for Jaguar should mark those instructions as `let
NumMicroOps = 2`.

This was found when testing AVX code for BtVer2 using llvm-mca.

llvm-svn: 328694
2018-03-28 10:49:33 +00:00
Andrea Di Biagio 9ecb4011ca [llvm-mca] pass the correct set of used registers in checkRAT.
We were incorrectly initializing the array of used registers in method checkRAT.
As a consequence, the number of register file stalls was misreported.

Added a test to cover this case.

llvm-svn: 328629
2018-03-27 15:23:41 +00:00
Simon Pilgrim fcf49df21c [X86][Btver2] Add (U)COMISD/(U)COMISD scheduler costs
Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write)

llvm-svn: 328573
2018-03-26 19:01:06 +00:00
Simon Pilgrim 86ea53123d [X86][Btver2] Add CVTSI2SD/CVTSI2SS scheduler costs
We still need to account for how Jaguar passes data from GPR -> XMM, which isn't as clean as XMM -> GPR.....

llvm-svn: 328551
2018-03-26 17:02:02 +00:00
Simon Pilgrim 8815105cd5 [X86][Btver2] Add CVTSD2SS/CVTSS2SD scheduler costs
llvm-svn: 328541
2018-03-26 16:24:13 +00:00
Simon Pilgrim aa40148cae [X86][Btver2] Account for the "+i" integer pipe transfer costs (1cy use of JALU0 for GPR PRF write)
llvm-svn: 328536
2018-03-26 16:10:08 +00:00
Simon Pilgrim 0b73b29388 [X86][Btver2] Add CVTSD2SI/CVTSS2SI scheduler costs
Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write)

This also adds missing vcvttss2si tests 

llvm-svn: 328505
2018-03-26 15:30:47 +00:00
Simon Pilgrim 3aa9344605 [X86][Btver2] Fix YMM BLENDPD/BLENDPS + UNPCKPD/UNPCKP instructions costs
These should match the YMM MOVDUP/ PERMILPD/PERMILPS + SHUFPD/SHUFPS shuffles instead of using the WriteFShuffle defaults.

llvm-svn: 328501
2018-03-26 14:44:24 +00:00
Andrea Di Biagio 5ffd2c3cfc [llvm-mca] Fix how views are added to the InstructionTables.
This should fix the stack-use-after-scope reported by the asan buildbots after
revision 328493.

llvm-svn: 328499
2018-03-26 14:25:52 +00:00
Simon Pilgrim 67df1cf597 [X86][Btver2] Add (V)SQRTPD/(V)SQRTSD costs
The xmm sd/pd versions were using the WriteFSQRT default which is modelled on sqrtss/sqrtps

llvm-svn: 328497
2018-03-26 14:03:40 +00:00
Andrea Di Biagio ff9c1092b7 [llvm-mca] Add a flag -instruction-info to enable/disable the instruction info view.
llvm-svn: 328493
2018-03-26 13:44:54 +00:00
Simon Pilgrim caa203aed5 [X86][Btver2] Double the AGU and schedule pipe resources for YMM
Both the AGUs and schedule pipes are double pumped for 256-bit instructions as well as the functional units which we already model.

llvm-svn: 328491
2018-03-26 13:15:20 +00:00
Andrea Di Biagio d1569290ef [llvm-mca] Add flag -instruction-tables to print the theoretical resource pressure distribution for instructions (PR36874)
The goal of this patch is to address most of PR36874.  To fully fix PR36874 we
need to split the "InstructionInfo" view from the "SummaryView". That would make
easy to check the latency and rthroughput as well.

The patch reuses all the logic from ResourcePressureView to print out the
"instruction tables".

We have an entry for every instruction in the input sequence. Each entry reports
the theoretical resource pressure distribution. Resource pressure is uniformly
distributed across all the processor resource units of a group.

At the moment, the backend pipeline is not configurable, so the only way to fix
this is by creating a different driver that simply sends instruction events to
the resource pressure view.  That means, we don't use the Backend interface.
Instead, it is simpler to just have a different code-path for when flag
-instruction-tables is specified.

Once Clement addresses bug 36663, then we can port the "instruction tables"
logic into a stage of our configurable pipeline.

Updated the BtVer2 test cases (thanks Simon for the help). Now we pass flag
-instruction-tables to each modified test.

Differential Revision: https://reviews.llvm.org/D44839

llvm-svn: 328487
2018-03-26 12:04:53 +00:00
Simon Pilgrim 6c63e6c222 [X86][Btver2] Cleanup TEST instructions to use JFPA (+JFPX on ymms) function unit
llvm-svn: 328343
2018-03-23 17:59:22 +00:00
Simon Pilgrim e5c0a041ff [X86][Btver2] Cleanup MOVMSK instructions to use JFPA function unit
Add missing non-VEX and (V)PMOVMSKB instructions to the pattern

llvm-svn: 328338
2018-03-23 17:38:59 +00:00
Simon Pilgrim 256f149bf0 [X86][Btver2] Vector permutes use a JFPU01 scheduler pipe and JFPX/JVALU function unit
llvm-svn: 328331
2018-03-23 16:17:56 +00:00
Simon Pilgrim ee282b3160 [X86][Btver2] Vector store instructions use a JFPU1 scheduler pipe and JSAGU/JSTC function units
llvm-svn: 328328
2018-03-23 15:35:13 +00:00
Simon Pilgrim 1335b9c0ca [X86][Btver2] Cleanup DPPS/DPPD instructions to use JFPA/JFPM function units
llvm-svn: 328324
2018-03-23 15:17:50 +00:00
Simon Pilgrim 5792e10ffb [X86][Btver2] Fix MicroOps counts for DPPS/YMM memory folded instructions
This was due to a misunderstanding over what llvm calls a micro-op (retirement unit) is actually called a macro-op on the AMD/Jaguar target. Folded loads don't affect num macro ops.

llvm-svn: 328320
2018-03-23 14:45:03 +00:00
Simon Pilgrim 8619962c73 [X86][Btver2] Cleanup SSE42 PCMPISTR/PCMPESTR string instructions to correctly use JFPU1 scheduler pipe followed by JLAGU/JSAGU/JFPA/JVALU function units
Fixes throughput to match Agner/Fam16h-SoG as well.

llvm-svn: 328318
2018-03-23 14:27:26 +00:00
Simon Pilgrim a1e3ea01ef [X86][Btver2] Vector move/load/store instructions use a JFPU01 scheduler pipe and JFPX/JVALU function unit as well as the AGUs
llvm-svn: 328304
2018-03-23 11:27:31 +00:00
Craig Topper 659c66dfc1 [X86] Match vpblendvb/vblendvps/vblendvpd itineraries to the SSE equivalent. Change pblendvb/blendvps/blendvpd to use WriteFVarBlend
llvm-svn: 328294
2018-03-23 06:41:41 +00:00
Craig Topper 7580a7997d [X86] Change VPSADBW itinerary to SSE_INTALU_ITINS_P to match the SSE version.
llvm-svn: 328293
2018-03-23 06:41:40 +00:00
Simon Pilgrim bcb86bb927 [X86][Btver2] Conversion, MaskedLoad/MaskedStore and NTStores all are scheduled through the JFPU1 pipe
llvm-svn: 328226
2018-03-22 18:29:16 +00:00
Simon Pilgrim 0e031afa95 [X86][Btver2] FCMP (inc FMAX/FMIN) instructions use the JFPA functional pipe
The ymm instructions are double pumped as well.

llvm-svn: 328222
2018-03-22 17:43:12 +00:00
Simon Pilgrim e5b51f6786 [X86][Btver2] FMUL ymm instructions are double pumped on the JFPM functional pipe
llvm-svn: 328217
2018-03-22 17:25:38 +00:00
Andrea Di Biagio 12ef5260ea [llvm-mca] Move the logic that computes the register file usage to the BackendStatistics view.
With this patch, the "instruction dispatched" event now provides information
related to the number of microarchitectural registers used in each register
file. Similarly, the "instruction retired" event is now able to tell how may
registers are freed in each register file.

Currently, the BackendStatistics view is the only consumer of register
usage/pressure information. BackendStatistics uses that info to print out a few
general statistics (i.e. max number of mappings used; total mapping created).
Before this patch, the BackendStatistics was forced to query the Backend to
obtain the register pressure information.

This helps removes that dependency. Now views are completely independent from
the Backend.  As a consequence, it should be easier to address PR36663 and
further modularize the pipeline.

Added a couple of test cases in the BtVer2 specific directory.

llvm-svn: 328129
2018-03-21 18:11:05 +00:00
Simon Pilgrim 203876f104 [X86][Btver2] Fix crc32 schedule costs
The default is currently FAdd for some reason

llvm-svn: 327807
2018-03-18 19:54:42 +00:00
Simon Pilgrim 13cd3b0961 [X86][Btver2] Add crc32 resource tests
llvm-svn: 327805
2018-03-18 18:55:34 +00:00
Simon Pilgrim c3db8c7cda [X86][Btver2] FADD/FHADD ymm instructions are double pumped on the JFPA functional pipe
llvm-svn: 327804
2018-03-18 18:45:57 +00:00
Simon Pilgrim 036cc82622 [X86][Btver2] Float bitwise ymm instructions are double pumped on the JFPX (JFPA/JFPM) functional pipes
llvm-svn: 327803
2018-03-18 17:10:12 +00:00
Simon Pilgrim 87d2f7463f [X86][Btver2] F16C instructions are performed on the JSTC functional pipe
llvm-svn: 327801
2018-03-18 15:59:51 +00:00
Simon Pilgrim 40f6d6ad0b [X86][Btver2] SSE4A EXTRQ/INSERTQ instructions are performed on the JVALU0/JVALU1 functional pipes
llvm-svn: 327794
2018-03-18 13:05:09 +00:00
Simon Pilgrim e16790b133 [X86][Btver2] Modelled float bitwise instructions as being performed on the float cluster (FPA/FPM) not the integer.
llvm-svn: 327793
2018-03-18 12:37:35 +00:00
Simon Pilgrim e409f84e7e [X86][Btver2] Correctly distinguish between scheduling pipe and functional unit for JWriteResFpuPair defs
Jaguar's FPU has 2 scheduler pipes (JFPU0/JFPU1) which forward to multiple functional sub-units each. We need to model that an micro-op will both consume the scheduler pipe and a functional unit.

This patch just handles the ops defined through JWriteResFpuPair, I'll go through the custom cases later.

llvm-svn: 327791
2018-03-18 12:09:17 +00:00
Simon Pilgrim 0ba4a0f3a6 [X86][Btver2] Add llvm-mca tests to show pipe resource usage of most vector instructions
Hopefully these tests can be easily reused should any other subtarget get in depth llvm-mca coverage (we can either copy the tests or move them into a common dir and run it with multiple prefixes).

llvm-svn: 327788
2018-03-18 09:32:38 +00:00
Simon Pilgrim 9c4157bb70 [X86][Btver2] Tweak pipes test to remove register dependencies
It gives us a better view of pipe usage in the timeline which is what the test is trying to show.

llvm-svn: 327685
2018-03-15 23:15:11 +00:00
Simon Pilgrim 3894809997 [X86][Btver2] Fix ymm div/sqrt to use fmul unit
YMM FDiv/FSqrt are dispatched on pipe JFPU1 but should be performed on the JFPM unit - that is where most of the cycles are spent.

This matches the pipes for WriteFSqrt/WriteFDiv definitions.

llvm-svn: 327682
2018-03-15 23:00:47 +00:00
Simon Pilgrim 49a56faee2 [X86][Btver2] Add test to show timeline of fpu instructions on different pipes/units
Try to demonstrate the scheduling from fpu0/fpu1 pipes to the valu0/vimul/fpa or valu1/stc/fpm functional units

llvm-svn: 327676
2018-03-15 22:34:24 +00:00
Andrea Di Biagio 7948738673 [llvm-mca] BackendStatistics: early exit from method printSchedulerUsage if the
no scheduler resources were consumed.

llvm-svn: 327215
2018-03-10 17:40:25 +00:00
Andrea Di Biagio 373c38a2db [llvm-mca] Fix handling of zero-latency instructions.
This patch fixes a problem found when testing zero latency instructions on
target AArch64 -mcpu=exynos-m3 / -mcpu=exynos-m1.

On Exynos-m3/m1, direct branches are zero-latency instructions that don't consume
any processor resources.  The DispatchUnit marks zero-latency instructions as
"executed", so that no scheduling is required.  The event of instruction
executed is then notified to all the listeners, and the reorder buffer (managed
by the RetireControlUnit) is updated. In particular, the entry associated to the
zero-latency instruction in the reorder buffer is marked as executed.

Before this patch, the DispatchUnit forgot to assign a retire control unit token
(RCUToken) to the zero-latency instruction. As a consequence, the RCUToken was
used uninitialized. This was causing a crash in the RetireControlUnit logic.

Fixes PR36650.

llvm-svn: 327056
2018-03-08 20:21:55 +00:00
Andrea Di Biagio 7bbac07f22 [llvm-mca] Emit the 'Instruction Info' table before the resource pressure view.
In future, both the summary information and the 'instruction info' table should
be moved into a separate "Summary" view.

llvm-svn: 327010
2018-03-08 15:34:38 +00:00
Andrea Di Biagio 3a6b092017 [llvm-mca] LLVM Machine Code Analyzer.
llvm-mca is an LLVM based performance analysis tool that can be used to
statically measure the performance of code, and to help triage potential
problems with target scheduling models.

llvm-mca uses information which is already available in LLVM (e.g. scheduling
models) to statically measure the performance of machine code in a specific cpu.
Performance is measured in terms of throughput as well as processor resource
consumption. The tool currently works for processors with an out-of-order
backend, for which there is a scheduling model available in LLVM.

The main goal of this tool is not just to predict the performance of the code
when run on the target, but also help with diagnosing potential performance
issues.

Given an assembly code sequence, llvm-mca estimates the IPC (instructions per
cycle), as well as hardware resources pressure. The analysis and reporting style
were mostly inspired by the IACA tool from Intel.

This patch is related to the RFC on llvm-dev visible at this link:
http://lists.llvm.org/pipermail/llvm-dev/2018-March/121490.html

Differential Revision: https://reviews.llvm.org/D43951

llvm-svn: 326998
2018-03-08 13:05:02 +00:00