llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	061096d2c2	[llvm-mca][x86] Remove addsubpd from SSE2 tests llvm-svn: 331678	2018-05-07 21:10:48 +00:00
Simon Pilgrim	1233e1234a	[X86] Split WriteFAdd/WriteFCmp/WriteFMul schedule classes Split to support single/double for scalar, XMM and YMM/ZMM instructions - removing InstrRW overrides for these instructions. Fixes Atom ADDSUBPD instruction and reclassifies VFPCLASS as WriteFCmp which is closer in behaviour. llvm-svn: 331672	2018-05-07 20:52:53 +00:00
Simon Pilgrim	e480ed0b9f	[X86][AVX2] Tag VPMOVSX/VPMOVZX ymm instructions as WriteShuffle256 These are more like cross-lane shuffles than regular shuffles - we already do this for AVX512 equivalents. Differential Revision: https://reviews.llvm.org/D46229 llvm-svn: 331659	2018-05-07 18:25:19 +00:00
Simon Pilgrim	763bf12085	[X86][Znver1] Remove WriteFMul/WriteFRcp InstRW overrides/aliases. Fixes x87 schedules to more closely match Agner - AMD doesn't tend to "special case" x87 instructions as much as Intel. llvm-svn: 331645	2018-05-07 16:34:26 +00:00
Simon Pilgrim	ac5d0a31ef	[X86] Split WriteFDiv schedule classes to support single/double scalar, XMM and YMM/ZMM instructions. This removes all InstrRW overrides for these instructions - some x87 overrides remain but most use default (and realistic) values. llvm-svn: 331643	2018-05-07 16:15:46 +00:00
Simon Pilgrim	f3ae50fca2	[X86] Split WriteFRcp/WriteFRsqrt/WriteFSqrt schedule classes WriteFRcp/WriteFRsqrt are split to support scalar, XMM and YMM/ZMM instructions. WriteFSqrt is split into single/double/long-double sizes and scalar, XMM, YMM and ZMM instructions. This removes all InstrRW overrides for these instructions. NOTE: There were a couple of typos in the Znver1 model - notably a 1cy throughput for SQRT that is highly unlikely and doesn't tally with Agner. NOTE: I had to add Agner's numbers for several targets for WriteFSqrt80. llvm-svn: 331629	2018-05-07 11:50:44 +00:00
Simon Pilgrim	0e51a125ea	[X86] Add WriteEMMS scheduler class Filled in the missing values from Btver2 SoG or Agner llvm-svn: 331546	2018-05-04 18:16:13 +00:00
Simon Pilgrim	be51b20127	[X86] Add SchedWriteFRnd fp rounding scheduler classes Split off from SchedWriteFAdd for fp rounding/bit-manipulation instructions. Fixes an issue on btver2 which only had the ymm version using the JSTC pipe instead of JFPA. llvm-svn: 331515	2018-05-04 12:59:24 +00:00
Simon Pilgrim	0aed731516	[X86][Znver1] Use SchedAlias to tag microcoded scheduler classes Avoids extra entries in the class tables. Found a typo that missed the MMX_PHSUBSW instruction. llvm-svn: 331488	2018-05-03 22:12:23 +00:00
Simon Pilgrim	350c22c587	[X86][SNB] Fix scheduling of MMX integer multiply instructions. The entries were being bound to the wrong class. llvm-svn: 331388	2018-05-02 19:26:14 +00:00
Clement Courbet	a1a3095d88	[X86] Fix scheduling info for (V?)SQRTPDm on silvermont. https://reviews.llvm.org/D46356 llvm-svn: 331356	2018-05-02 13:46:14 +00:00
Andrea Di Biagio	e047d3529b	[llvm-mca] Correctly handle zero-latency stores that consume pipeline resources. This fixes PR37293. We can have scheduling classes with no write latency entries, that still consume processor resources. We don't want to treat those instructions as zero-latency instructions; they still have to be issued to the underlying pipelines, so they still consume resource cycles. This is likely to be a regression which I have accidentally introduced at revision 330807. Now, if an instruction has a non-empty set of write processor resources, we conservatively treat it as a normal (i.e. non zero-latency) instruction. llvm-svn: 331193	2018-04-30 15:55:04 +00:00
Andrea Di Biagio	77bd1c748a	[llvm-mca] Regenerate test Atom/resources-sse3.s. NFC Before this change, it wrongly specified -mcpu=slm instead of -mcpu=atom. llvm-svn: 331170	2018-04-30 12:13:04 +00:00
Andrea Di Biagio	e9384eb13b	[llvm-mca] Support for in-order CPU for -instruction-tables testing. Added Intel Atom tests to verify that the tool correctly generates instruction tables even if the CPU is in-order. Fixes PR37282. llvm-svn: 331169	2018-04-30 12:05:34 +00:00
Simon Pilgrim	8962c344f9	[llvm-mca][X86] Add BT resource tests to all models llvm-svn: 331144	2018-04-29 15:45:31 +00:00
Simon Pilgrim	2d569361fc	[llvm-mca][X86] Add add/adc + sub/sbb resource tests to all models llvm-svn: 331140	2018-04-29 11:03:25 +00:00
Simon Pilgrim	318e9d39ab	[llvm-mca][X86] Add double shift resource tests to all relevant models llvm-svn: 331109	2018-04-28 15:18:49 +00:00
Simon Pilgrim	4d0187c893	[llvm-mca][X86] Add shift/rotate resource tests to all relevant models I intend to add further instruction tests to the resources-x86_64.s test file as required, but this initial commit is to help remove a load of unnecessary InstRW overrides in a future patch llvm-svn: 331108	2018-04-28 14:56:18 +00:00
Simon Pilgrim	7574ffd7bc	[llvm-mca][X86] Updated fma3 tests after rL330820 llvm-svn: 330822	2018-04-25 13:19:04 +00:00
Andrea Di Biagio	93c49d5e58	[llvm-mca] Default to the native host cpu if flag -mcpu is not specified. llvm-svn: 330809	2018-04-25 10:18:25 +00:00
Simon Pilgrim	27bc83e228	[X86] Split off PHMINPOSUW to their own schedule class This also fixes Jaguar's schedule which was treating it as the WriteVecIMul default. llvm-svn: 330756	2018-04-24 18:49:25 +00:00
Simon Pilgrim	f0945aa0e0	[X86][F16C] Add WriteCvtF2FSt scheduling class Fixes the classification of VCVTPS2PHmr/VCVTPS2PHYmr which were tagged as WriteCvtF2FLd_WriteRMW (PR36887) llvm-svn: 330737	2018-04-24 16:43:07 +00:00
Simon Pilgrim	828ef9e013	[X86][BtVer2] Fix VCVTPS2PHmr/VCVTPS2PHYmr latencies These are stores, not loads, so don't need to account for load latency. llvm-svn: 330735	2018-04-24 16:26:51 +00:00
Simon Pilgrim	f35b8ac196	[X86][IVB] Add F16C resource tests. Note this is IvyBridge (which shares the model) NOT SandyBridge. llvm-svn: 330734	2018-04-24 16:22:59 +00:00
Andrea Di Biagio	0626864fa4	[llvm-mca] Default the output asm dialect used by the instruction printer to the input asm dialect. The instruction printer used by llvm-mca to generate the performance report now defaults the output assembly format to the format used for the input assembly file. On x86, the asm format can be either AT&T or Intel, depending on the presence/absence of directive `.intel_syntax`. Users can still specify a different assembly dialect with the command line flag -output-asm-variant=<uint>. llvm-svn: 330733	2018-04-24 16:19:08 +00:00
Simon Pilgrim	16299273d0	[X86] Remove unnecessary FMA reg-mem InstRW scheduler overrides. llvm-svn: 330720	2018-04-24 14:47:11 +00:00
Simon Pilgrim	f7d2a93d5f	[X86] Add vector element insertion/extraction scheduler classes Split off pinsr/pextr and extractps instructions. (Mostly) fixes PR36887. Note: It might be worth adding a WriteFInsertLd class as well in the future. Differential Revision: https://reviews.llvm.org/D45929 llvm-svn: 330714	2018-04-24 13:21:41 +00:00
Simon Pilgrim	87ba905fe9	[llvm-mca][X86] Add BMI/LZCNT/POPCNT resource tests to all relevant models The SandyBridge BMI tests are actually run on IvyBridge as that's the first lowest CPU that actually support the ISAs (but still use the SandyBridge model). llvm-svn: 330556	2018-04-22 20:42:24 +00:00
Simon Pilgrim	96855ec39e	[X86] Remove unnecessary WriteFVarBlend/WriteVarBlend InstRW overrides. This also fixes some of the ReadAfterLd issues due to InstRW. llvm-svn: 330544	2018-04-22 14:43:12 +00:00
Simon Pilgrim	5e9f1da0cd	[llvm-mca][X86] Add POPCNT resource test llvm-svn: 330540	2018-04-22 09:58:00 +00:00
Simon Pilgrim	e25aa02bc4	[llvm-mca][X86] Add AVX2 resource tests llvm-svn: 330512	2018-04-21 16:12:42 +00:00
Simon Pilgrim	d73bd154d9	[llvm-mca][X86] Add SSE resource tests to all models llvm-svn: 330506	2018-04-21 14:16:57 +00:00
Simon Pilgrim	26178d4336	[llvm-mca][X86] Add MMX resource tests llvm-svn: 330502	2018-04-21 11:28:59 +00:00
Simon Pilgrim	1264066cd7	[llvm-mca][X86] Add X87 resource tests llvm-svn: 330499	2018-04-21 10:36:19 +00:00
Simon Pilgrim	1803bfb75f	[llvm-mca][X86] Add MMX/SSE/AES/CLMUL resource SandyBridge tests llvm-svn: 330486	2018-04-20 22:04:11 +00:00
Simon Pilgrim	0a6bfb1843	[llvm-mca][X86] Add prefetch instruction resource tests llvm-svn: 330371	2018-04-19 22:11:58 +00:00
Simon Pilgrim	7209117868	[llvm-mca][FMA] Add FMA resource tests llvm-svn: 330366	2018-04-19 21:32:22 +00:00
Simon Pilgrim	4a486c13fa	[llvm-mca][X86] Add resource test for every out-of-order scheduler model I've copied and regenerated a resource file from btver2 to every x86 scheduler model supported by llvm-mca so we have at least some basic coverage. For most this has been the avx1 tests, but for silvermont I've used sse42 as thats the latest it supports. More will be added later. llvm-svn: 330352	2018-04-19 18:08:10 +00:00
Simon Pilgrim	f209321d61	[llvm-mca][X86] Add mmx instruction to btver2 resource tests Useful to see scheduler class deltas against xmm equivalents llvm-svn: 330335	2018-04-19 15:09:46 +00:00
Simon Pilgrim	c310bfa193	[llvm-mca][X86] Add mmx versions of SSSE3 instructions Move PABS instructions incorrectly tested under SSE2 llvm-svn: 330295	2018-04-18 20:47:48 +00:00
Greg Bedwell	90d141a295	[UpdateTestChecks] Add update_mca_test_checks.py script This script can be used to regenerate tests in the test/tools/llvm-mca directory (PR36904). Regenerated a number of tests using the pattern: test/tools/llvm-mca///*.s Differential Revision: https://reviews.llvm.org/D45369 llvm-svn: 330246	2018-04-18 10:27:45 +00:00
Craig Topper	e56a2fc5e7	[X86] Add separate scheduling class for PSADBW instruction. llvm-svn: 330204	2018-04-17 19:35:19 +00:00
Andrea Di Biagio	c752616f30	[llvm-mca] Ensure that instructions with a schedule read-advance are always issued in the right order. Normally, the Scheduler prioritizes older instructions over younger instructions during the instruction issue stage. In one particular case where a dependent instruction had a schedule read-advance associated to one of the input operands, this rule was not correctly applied. This patch fixes the issue and adds a test to verify that we don't regress that particular case. llvm-svn: 330032	2018-04-13 15:19:07 +00:00
Andrea Di Biagio	f41ad5c59e	[llvm-mca] Renamed BackendStatistics to RetireControlUnitStatistics. Also, removed flag -verbose in favor of flag -retire-stats. llvm-svn: 329794	2018-04-11 12:12:53 +00:00
Andrea Di Biagio	1cc29c045e	[llvm-mca] Move the logic that prints scheduler statistics from BackendStatistics to its own view. Added flag -scheduler-stats to print scheduler related statistics. llvm-svn: 329792	2018-04-11 11:37:46 +00:00
Andrea Di Biagio	821f650bba	[llvm-mca] Move the logic that prints dispatch unit statistics from BackendStatistics to its own view. This patch moves the logic that collects and analyzes dispatch events to the DispatchStatistics view. Added flag -dispatch-stats to print statistics related to the dispatch logic. llvm-svn: 329708	2018-04-10 14:55:14 +00:00
Andrea Di Biagio	074cef3dfb	[llvm-mca] Increase the default number of iterations to 100. llvm-svn: 329694	2018-04-10 12:50:03 +00:00
Andrea Di Biagio	c9f409eb6f	Reapply "[llvm-mca] Do not separate iterations with a newline in the timeline view." This reapplies r329403 with a fix for the floating point rounding issue. llvm-svn: 329680	2018-04-10 09:55:33 +00:00
Andrea Di Biagio	c65901282b	[llvm-mca] Add the ability to mark regions of code for analysis (PR36875) This patch teaches llvm-mca how to parse code comments in search for special "markers" used to select regions of code. Example: # LLVM-MCA-BEGIN My Code Region .... # LLVM-MCA-END The MCAsmLexer now delegates to an object of class MCACommentParser (i.e. an AsmCommentConsumer) the parsing of code comments to search for begin/end code region markers. A comment starting with substring "LLVM-MCA-BEGIN" marks the beginning of a new region of code. A comment starting with substring "LLVM-MCA-END" marks the end of the last region. This implementation doesn't allow regions to overlap. Each region can have a optional description; internally, each region is identified by a range of source code locations (SMLoc). MCInst objects are added to a region R only if the source location for the MCInst is in the range of locations specified by R. By default, the tool allocates an implicit "Default" code region which contains every source location. See new tests llvm-mca-marker-*.s for a few examples. A new Backend object is created for every region. So, the analysis is conducted on every parsed code region. The final report is the union of the reports generated for every code region. Note that empty regions are skipped. Special "[#] Code Region - ..." strings are used in the report to mark the portion which is specific to a code region only. For example, see llvm-mca-markers-5.s. Differential Revision: https://reviews.llvm.org/D45433 llvm-svn: 329590	2018-04-09 16:39:52 +00:00
Hans Wennborg	6400c03e6a	Revert r329403 "[llvm-mca] Do not separate iterations with a newline in the timeline view." This made AArch64/CortexA57/direct-branch.s fail on Windows, e.g. http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/builds/11251 > Also, update a few tests to minimize the diff in D45369. > No functional change intended. llvm-svn: 329569	2018-04-09 13:53:41 +00:00
Simon Pilgrim	86588fc809	[X86][Btver2] Add vector extract costs llvm-svn: 329524	2018-04-08 11:26:26 +00:00
Andrea Di Biagio	85b8138bc6	[llvm-mca] Do not separate iterations with a newline in the timeline view. Also, update a few tests to minimize the diff in D45369. No functional change intended. llvm-svn: 329403	2018-04-06 15:30:02 +00:00
Andrea Di Biagio	c74ad502ce	[MC][Tablegen] Allow models to describe the retire control unit for llvm-mca. This patch adds the ability to describe properties of the hardware retire control unit. Tablegen class RetireControlUnit has been added for this purpose (see TargetSchedule.td). A RetireControlUnit specifies the size of the reorder buffer, as well as the maximum number of opcodes that can be retired every cycle. A zero (or negative) value for the reorder buffer size means: "the size is unknown". If the size is unknown, then llvm-mca defaults it to the value of field SchedMachineModel::MicroOpBufferSize. A zero or negative number of opcodes retired per cycle means: "there is no restriction on the number of instructions that can be retired every cycle". Models can optionally specify an instance of RetireControlUnit. There can only be up-to one RetireControlUnit definition per scheduling model. Information related to the RCU (RetireControlUnit) is stored in (two new fields of) MCExtraProcessorInfo. llvm-mca loads that information when it initializes the DispatchUnit / RetireControlUnit (see Dispatch.h/Dispatch.cpp). This patch fixes PR36661. Differential Revision: https://reviews.llvm.org/D45259 llvm-svn: 329304	2018-04-05 15:41:41 +00:00
Simon Pilgrim	8139a88cb6	[X86][Btver2] Strip unnecessary check prefixes from resources tests llvm-svn: 329192	2018-04-04 13:25:45 +00:00
Andrea Di Biagio	8dabf4f145	[llvm-mca] Move the logic that prints register file statistics to its own view. NFCI Before this patch, the "BackendStatistics" view was responsible for printing the register file usage (as well as many other statistics). Now users can enable register file usage statistics using the command line flag `-register-file-stats`. By default, the tool doesn't print register file statistics. llvm-svn: 329083	2018-04-03 16:46:23 +00:00
Andrea Di Biagio	9da4d6db33	[MC][Tablegen] Allow the definition of processor register files in the scheduling model for llvm-mca This patch allows the description of register files in processor scheduling models. This addresses PR36662. A new tablegen class named 'RegisterFile' has been added to TargetSchedule.td. Targets can optionally describe register files for their processors using that class. In particular, class RegisterFile allows to specify: - The total number of physical registers. - Which target registers are accessible through the register file. - The cost of allocating a register at register renaming stage. Example (from this patch - see file X86/X86ScheduleBtVer2.td) def FpuPRF : RegisterFile<72, [VR64, VR128, VR256], [1, 1, 2]> Here, FpuPRF describes a register file for MMX/XMM/YMM registers. On Jaguar (btver2), a YMM register definition consumes 2 physical registers, while MMX/XMM register definitions only cost 1 physical register. The syntax allows to specify an empty set of register classes. An empty set of register classes means: this register file models all the registers specified by the Target. For each register class, users can specify an optional register cost. By default, register costs default to 1. A value of 0 for the number of physical registers means: "this register file has an unbounded number of physical registers". This patch is structured in two parts. * Part 1 - MC/Tablegen * A first part adds the tablegen definition of RegisterFile, and teaches the SubtargetEmitter how to emit information related to register files. Information about register files is accessible through an instance of MCExtraProcessorInfo. The idea behind this design is to logically partition the processor description which is only used by external tools (like llvm-mca) from the processor information used by the llvm machine schedulers. I think that this design would make easier for targets to get rid of the extra processor information if they don't want it. * Part 2 - llvm-mca related * The second part of this patch is related to changes to llvm-mca. The main differences are: 1) class RegisterFile now needs to take into account the "cost of a register" when allocating physical registers at register renaming stage. 2) Point 1. triggered a minor refactoring which lef to the removal of the "maximum 32 register files" restriction. 3) The BackendStatistics view has been updated so that we can print out extra details related to each register file implemented by the processor. The effect of point 3. is also visible in tests register-files-[1..5].s. Differential Revision: https://reviews.llvm.org/D44980 llvm-svn: 329067	2018-04-03 13:36:24 +00:00
Andrea Di Biagio	6fd62feff8	[llvm-mca] Do not assume that implicit reads cannot be associated with ReadAdvance entries. Before, the instruction builder incorrectly assumed that only explicit reads could have been associated with ReadAdvance entries. This patch fixes the issue and adds a test to verify it. llvm-svn: 328972	2018-04-02 13:46:49 +00:00
Craig Topper	13a0f83a05	[X86] Add SchedRW for PMULLD Summary: It seems many CPUs don't implement this instruction as well as the other vector multiplies. Often using a multi uop flow. Silvermont in particular has a 7 uop flow with 11 cycle throughput. Sandy Bridge implements it as a single uop with 5 cycle latency and 1 cycle throughput. But Haswell and later use 2 uops with 10 cycle latency and 2 cycle throughput. This patch adds a new X86SchedWritePair we can use to tag this instruction separately. I've provided correct information for Silvermont, Btver2, and Sandy Bridge. I've removed the InstRWs for SandyBridge. I've left Haswell/Broadwell/Skylake InstRWs in place because I wasn't sure how to account for the different load latency between 128 and 256 bits. I also left Znver1 InstRWs in place because the existing values don't match Agner's spreadsheet. I also left a FIXME in the SandyBridge model because it being used for the "generic" model is too optimistic for the 256/512-bit versions since those are multiple uops on all known CPUs. Reviewers: RKSimon, GGanesh, courbet Reviewed By: RKSimon Subscribers: gchatelet, gbedwell, andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D44972 llvm-svn: 328914	2018-03-31 04:54:32 +00:00
Andrea Di Biagio	dc97172b2f	[X86][BtVer2] Fixed the number of micro opcodes for AVX vector converts and VSQRT instructions. There were still a few AVX instructions with an incorrect number of opcodes. These should be fixed now. llvm-svn: 328892	2018-03-30 18:53:47 +00:00
Andrea Di Biagio	3eaa26bb64	[X86][BtVer2] Fix the number of uOps for horizontal operations. llvm-svn: 328886	2018-03-30 18:15:30 +00:00
Andrea Di Biagio	073a9d74ca	[X86][BtVer2] Add missing ReadAfterLd to RM variants of AVX horizontal adds and most vector logic instructions. Fixed a few InstRW that forgot to specify a ReadAfterLd for the register input operand. llvm-svn: 328867	2018-03-30 14:48:08 +00:00
Andrea Di Biagio	42d8ea22c0	[X86][BtVer2] Add tests that show how ReadAfterLd is missing for some instructions. In the Btver2 model, there are a few InstRW overrides that don't specify a ReadAfterLd for the register input operand. As a result, a few AVX variants of horizontal operations and most vector logic operations with a folded memory operand don't have a ReadAdvance info associated to their input register operands. llvm-svn: 328865	2018-03-30 14:29:33 +00:00
Andrea Di Biagio	01043625cf	[X86] Add llvm-mca tests for r328834. Verify that the ReadAfterLd is correctly applied to FMA and 4-ops variable blend instructions. As Craig pointed out in D44726, some Intel models still have to be fixed. llvm-svn: 328861	2018-03-30 13:38:37 +00:00
Andrea Di Biagio	0823090843	[X86] Add tests to verify the presence of "ReadAfterLd" after r328823. This change adds a couple of tests to verify the change introduced by revision 328823 ([X86] Correct the placement of ReadAfterLd in BEXTR and BZHI). llvm-svn: 328859	2018-03-30 11:44:48 +00:00
Andrea Di Biagio	0a837ef6b1	[llvm-mca] Correctly set the ReadAdvance information for register use operands. The tool was passing the wrong operand index to method MCSubtargetInfo::getReadAdvanceCycles(). That method requires a "UseIdx", and not the operand index. This was found when testing X86 code where instructions had a memory folded operand. This patch fixes the issue and adds test read-advance-1.s to ensure that the ReadAfterLd (a ReadAdvance of 3cy) information is correctly used. llvm-svn: 328790	2018-03-29 14:26:56 +00:00
Andrea Di Biagio	5076b98fb9	[X86][BtVer2] Fix the number of micro opcodes for AES[ENC\|DEC] and other YMM instructions. Similar to r328694. The number of micro opcodes should be 2 for those instructions. This was found when testing AVX code for BtVer2 using llvm-mca. llvm-svn: 328698	2018-03-28 12:12:04 +00:00
Andrea Di Biagio	010924e35c	[X86][BtVer2] Fix the number of micro opcodes for a bunch of YMM instructions. The Jaguar backend natively supports 128-bit data types. Operations on YMM registers are split into two COPs (complex operations). Each COP consumes a slot in the dispatch group, and in the reorder buffer. The scheduling model for Jaguar should mark those instructions as `let NumMicroOps = 2`. This was found when testing AVX code for BtVer2 using llvm-mca. llvm-svn: 328694	2018-03-28 10:49:33 +00:00
Andrea Di Biagio	9ecb4011ca	[llvm-mca] pass the correct set of used registers in checkRAT. We were incorrectly initializing the array of used registers in method checkRAT. As a consequence, the number of register file stalls was misreported. Added a test to cover this case. llvm-svn: 328629	2018-03-27 15:23:41 +00:00
Simon Pilgrim	fcf49df21c	[X86][Btver2] Add (U)COMISD/(U)COMISD scheduler costs Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write) llvm-svn: 328573	2018-03-26 19:01:06 +00:00
Simon Pilgrim	86ea53123d	[X86][Btver2] Add CVTSI2SD/CVTSI2SS scheduler costs We still need to account for how Jaguar passes data from GPR -> XMM, which isn't as clean as XMM -> GPR..... llvm-svn: 328551	2018-03-26 17:02:02 +00:00
Simon Pilgrim	8815105cd5	[X86][Btver2] Add CVTSD2SS/CVTSS2SD scheduler costs llvm-svn: 328541	2018-03-26 16:24:13 +00:00
Simon Pilgrim	aa40148cae	[X86][Btver2] Account for the "+i" integer pipe transfer costs (1cy use of JALU0 for GPR PRF write) llvm-svn: 328536	2018-03-26 16:10:08 +00:00
Simon Pilgrim	0b73b29388	[X86][Btver2] Add CVTSD2SI/CVTSS2SI scheduler costs Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write) This also adds missing vcvttss2si tests llvm-svn: 328505	2018-03-26 15:30:47 +00:00
Simon Pilgrim	3aa9344605	[X86][Btver2] Fix YMM BLENDPD/BLENDPS + UNPCKPD/UNPCKP instructions costs These should match the YMM MOVDUP/ PERMILPD/PERMILPS + SHUFPD/SHUFPS shuffles instead of using the WriteFShuffle defaults. llvm-svn: 328501	2018-03-26 14:44:24 +00:00
Andrea Di Biagio	5ffd2c3cfc	[llvm-mca] Fix how views are added to the InstructionTables. This should fix the stack-use-after-scope reported by the asan buildbots after revision 328493. llvm-svn: 328499	2018-03-26 14:25:52 +00:00
Simon Pilgrim	67df1cf597	[X86][Btver2] Add (V)SQRTPD/(V)SQRTSD costs The xmm sd/pd versions were using the WriteFSQRT default which is modelled on sqrtss/sqrtps llvm-svn: 328497	2018-03-26 14:03:40 +00:00
Andrea Di Biagio	ff9c1092b7	[llvm-mca] Add a flag -instruction-info to enable/disable the instruction info view. llvm-svn: 328493	2018-03-26 13:44:54 +00:00
Simon Pilgrim	caa203aed5	[X86][Btver2] Double the AGU and schedule pipe resources for YMM Both the AGUs and schedule pipes are double pumped for 256-bit instructions as well as the functional units which we already model. llvm-svn: 328491	2018-03-26 13:15:20 +00:00
Andrea Di Biagio	d1569290ef	[llvm-mca] Add flag -instruction-tables to print the theoretical resource pressure distribution for instructions (PR36874) The goal of this patch is to address most of PR36874. To fully fix PR36874 we need to split the "InstructionInfo" view from the "SummaryView". That would make easy to check the latency and rthroughput as well. The patch reuses all the logic from ResourcePressureView to print out the "instruction tables". We have an entry for every instruction in the input sequence. Each entry reports the theoretical resource pressure distribution. Resource pressure is uniformly distributed across all the processor resource units of a group. At the moment, the backend pipeline is not configurable, so the only way to fix this is by creating a different driver that simply sends instruction events to the resource pressure view. That means, we don't use the Backend interface. Instead, it is simpler to just have a different code-path for when flag -instruction-tables is specified. Once Clement addresses bug 36663, then we can port the "instruction tables" logic into a stage of our configurable pipeline. Updated the BtVer2 test cases (thanks Simon for the help). Now we pass flag -instruction-tables to each modified test. Differential Revision: https://reviews.llvm.org/D44839 llvm-svn: 328487	2018-03-26 12:04:53 +00:00
Simon Pilgrim	6c63e6c222	[X86][Btver2] Cleanup TEST instructions to use JFPA (+JFPX on ymms) function unit llvm-svn: 328343	2018-03-23 17:59:22 +00:00
Simon Pilgrim	e5c0a041ff	[X86][Btver2] Cleanup MOVMSK instructions to use JFPA function unit Add missing non-VEX and (V)PMOVMSKB instructions to the pattern llvm-svn: 328338	2018-03-23 17:38:59 +00:00
Simon Pilgrim	256f149bf0	[X86][Btver2] Vector permutes use a JFPU01 scheduler pipe and JFPX/JVALU function unit llvm-svn: 328331	2018-03-23 16:17:56 +00:00
Simon Pilgrim	ee282b3160	[X86][Btver2] Vector store instructions use a JFPU1 scheduler pipe and JSAGU/JSTC function units llvm-svn: 328328	2018-03-23 15:35:13 +00:00
Simon Pilgrim	1335b9c0ca	[X86][Btver2] Cleanup DPPS/DPPD instructions to use JFPA/JFPM function units llvm-svn: 328324	2018-03-23 15:17:50 +00:00
Simon Pilgrim	5792e10ffb	[X86][Btver2] Fix MicroOps counts for DPPS/YMM memory folded instructions This was due to a misunderstanding over what llvm calls a micro-op (retirement unit) is actually called a macro-op on the AMD/Jaguar target. Folded loads don't affect num macro ops. llvm-svn: 328320	2018-03-23 14:45:03 +00:00
Simon Pilgrim	8619962c73	[X86][Btver2] Cleanup SSE42 PCMPISTR/PCMPESTR string instructions to correctly use JFPU1 scheduler pipe followed by JLAGU/JSAGU/JFPA/JVALU function units Fixes throughput to match Agner/Fam16h-SoG as well. llvm-svn: 328318	2018-03-23 14:27:26 +00:00
Simon Pilgrim	a1e3ea01ef	[X86][Btver2] Vector move/load/store instructions use a JFPU01 scheduler pipe and JFPX/JVALU function unit as well as the AGUs llvm-svn: 328304	2018-03-23 11:27:31 +00:00
Craig Topper	659c66dfc1	[X86] Match vpblendvb/vblendvps/vblendvpd itineraries to the SSE equivalent. Change pblendvb/blendvps/blendvpd to use WriteFVarBlend llvm-svn: 328294	2018-03-23 06:41:41 +00:00
Craig Topper	7580a7997d	[X86] Change VPSADBW itinerary to SSE_INTALU_ITINS_P to match the SSE version. llvm-svn: 328293	2018-03-23 06:41:40 +00:00
Simon Pilgrim	bcb86bb927	[X86][Btver2] Conversion, MaskedLoad/MaskedStore and NTStores all are scheduled through the JFPU1 pipe llvm-svn: 328226	2018-03-22 18:29:16 +00:00
Simon Pilgrim	0e031afa95	[X86][Btver2] FCMP (inc FMAX/FMIN) instructions use the JFPA functional pipe The ymm instructions are double pumped as well. llvm-svn: 328222	2018-03-22 17:43:12 +00:00
Simon Pilgrim	e5b51f6786	[X86][Btver2] FMUL ymm instructions are double pumped on the JFPM functional pipe llvm-svn: 328217	2018-03-22 17:25:38 +00:00
Andrea Di Biagio	12ef5260ea	[llvm-mca] Move the logic that computes the register file usage to the BackendStatistics view. With this patch, the "instruction dispatched" event now provides information related to the number of microarchitectural registers used in each register file. Similarly, the "instruction retired" event is now able to tell how may registers are freed in each register file. Currently, the BackendStatistics view is the only consumer of register usage/pressure information. BackendStatistics uses that info to print out a few general statistics (i.e. max number of mappings used; total mapping created). Before this patch, the BackendStatistics was forced to query the Backend to obtain the register pressure information. This helps removes that dependency. Now views are completely independent from the Backend. As a consequence, it should be easier to address PR36663 and further modularize the pipeline. Added a couple of test cases in the BtVer2 specific directory. llvm-svn: 328129	2018-03-21 18:11:05 +00:00
Simon Pilgrim	203876f104	[X86][Btver2] Fix crc32 schedule costs The default is currently FAdd for some reason llvm-svn: 327807	2018-03-18 19:54:42 +00:00
Simon Pilgrim	13cd3b0961	[X86][Btver2] Add crc32 resource tests llvm-svn: 327805	2018-03-18 18:55:34 +00:00
Simon Pilgrim	c3db8c7cda	[X86][Btver2] FADD/FHADD ymm instructions are double pumped on the JFPA functional pipe llvm-svn: 327804	2018-03-18 18:45:57 +00:00
Simon Pilgrim	036cc82622	[X86][Btver2] Float bitwise ymm instructions are double pumped on the JFPX (JFPA/JFPM) functional pipes llvm-svn: 327803	2018-03-18 17:10:12 +00:00
Simon Pilgrim	87d2f7463f	[X86][Btver2] F16C instructions are performed on the JSTC functional pipe llvm-svn: 327801	2018-03-18 15:59:51 +00:00
Simon Pilgrim	40f6d6ad0b	[X86][Btver2] SSE4A EXTRQ/INSERTQ instructions are performed on the JVALU0/JVALU1 functional pipes llvm-svn: 327794	2018-03-18 13:05:09 +00:00
Simon Pilgrim	e16790b133	[X86][Btver2] Modelled float bitwise instructions as being performed on the float cluster (FPA/FPM) not the integer. llvm-svn: 327793	2018-03-18 12:37:35 +00:00
Simon Pilgrim	e409f84e7e	[X86][Btver2] Correctly distinguish between scheduling pipe and functional unit for JWriteResFpuPair defs Jaguar's FPU has 2 scheduler pipes (JFPU0/JFPU1) which forward to multiple functional sub-units each. We need to model that an micro-op will both consume the scheduler pipe and a functional unit. This patch just handles the ops defined through JWriteResFpuPair, I'll go through the custom cases later. llvm-svn: 327791	2018-03-18 12:09:17 +00:00
Simon Pilgrim	0ba4a0f3a6	[X86][Btver2] Add llvm-mca tests to show pipe resource usage of most vector instructions Hopefully these tests can be easily reused should any other subtarget get in depth llvm-mca coverage (we can either copy the tests or move them into a common dir and run it with multiple prefixes). llvm-svn: 327788	2018-03-18 09:32:38 +00:00
Simon Pilgrim	9c4157bb70	[X86][Btver2] Tweak pipes test to remove register dependencies It gives us a better view of pipe usage in the timeline which is what the test is trying to show. llvm-svn: 327685	2018-03-15 23:15:11 +00:00
Simon Pilgrim	3894809997	[X86][Btver2] Fix ymm div/sqrt to use fmul unit YMM FDiv/FSqrt are dispatched on pipe JFPU1 but should be performed on the JFPM unit - that is where most of the cycles are spent. This matches the pipes for WriteFSqrt/WriteFDiv definitions. llvm-svn: 327682	2018-03-15 23:00:47 +00:00
Simon Pilgrim	49a56faee2	[X86][Btver2] Add test to show timeline of fpu instructions on different pipes/units Try to demonstrate the scheduling from fpu0/fpu1 pipes to the valu0/vimul/fpa or valu1/stc/fpm functional units llvm-svn: 327676	2018-03-15 22:34:24 +00:00
Andrea Di Biagio	7948738673	[llvm-mca] BackendStatistics: early exit from method printSchedulerUsage if the no scheduler resources were consumed. llvm-svn: 327215	2018-03-10 17:40:25 +00:00
Andrea Di Biagio	373c38a2db	[llvm-mca] Fix handling of zero-latency instructions. This patch fixes a problem found when testing zero latency instructions on target AArch64 -mcpu=exynos-m3 / -mcpu=exynos-m1. On Exynos-m3/m1, direct branches are zero-latency instructions that don't consume any processor resources. The DispatchUnit marks zero-latency instructions as "executed", so that no scheduling is required. The event of instruction executed is then notified to all the listeners, and the reorder buffer (managed by the RetireControlUnit) is updated. In particular, the entry associated to the zero-latency instruction in the reorder buffer is marked as executed. Before this patch, the DispatchUnit forgot to assign a retire control unit token (RCUToken) to the zero-latency instruction. As a consequence, the RCUToken was used uninitialized. This was causing a crash in the RetireControlUnit logic. Fixes PR36650. llvm-svn: 327056	2018-03-08 20:21:55 +00:00
Andrea Di Biagio	7bbac07f22	[llvm-mca] Emit the 'Instruction Info' table before the resource pressure view. In future, both the summary information and the 'instruction info' table should be moved into a separate "Summary" view. llvm-svn: 327010	2018-03-08 15:34:38 +00:00
Andrea Di Biagio	3a6b092017	[llvm-mca] LLVM Machine Code Analyzer. llvm-mca is an LLVM based performance analysis tool that can be used to statically measure the performance of code, and to help triage potential problems with target scheduling models. llvm-mca uses information which is already available in LLVM (e.g. scheduling models) to statically measure the performance of machine code in a specific cpu. Performance is measured in terms of throughput as well as processor resource consumption. The tool currently works for processors with an out-of-order backend, for which there is a scheduling model available in LLVM. The main goal of this tool is not just to predict the performance of the code when run on the target, but also help with diagnosing potential performance issues. Given an assembly code sequence, llvm-mca estimates the IPC (instructions per cycle), as well as hardware resources pressure. The analysis and reporting style were mostly inspired by the IACA tool from Intel. This patch is related to the RFC on llvm-dev visible at this link: http://lists.llvm.org/pipermail/llvm-dev/2018-March/121490.html Differential Revision: https://reviews.llvm.org/D43951 llvm-svn: 326998	2018-03-08 13:05:02 +00:00

1 2 3 4 5

209 Commits