Commit Graph

44 Commits

Author SHA1 Message Date
Simon Pilgrim e409f84e7e [X86][Btver2] Correctly distinguish between scheduling pipe and functional unit for JWriteResFpuPair defs
Jaguar's FPU has 2 scheduler pipes (JFPU0/JFPU1) which forward to multiple functional sub-units each. We need to model that an micro-op will both consume the scheduler pipe and a functional unit.

This patch just handles the ops defined through JWriteResFpuPair, I'll go through the custom cases later.

llvm-svn: 327791
2018-03-18 12:09:17 +00:00
Simon Pilgrim f86d48b3ae [X86][Btver2] Merge equivalent VBLENDVY + VPERMILY schedule groups
Thanks to Craig Topper for noticing this.

llvm-svn: 327789
2018-03-18 10:22:35 +00:00
Craig Topper 2d451e73f9 [X86] Fix a bunch of overlapping regular expressions in the scheduler models.
llvm-svn: 327787
2018-03-18 08:38:06 +00:00
Simon Pilgrim 23578e7d3c [X86][Btver2] Add correct mul/imul schedule costs
Integer multiply is performed on the JMul function unit and i64 requires double pumping

llvm-svn: 327707
2018-03-16 14:01:01 +00:00
Simon Pilgrim 8d28ae6aec [X86][Btver2] Add correct lzcnt/tzcnt/popcnt schedule costs
Don't use WriteIMul defaults

llvm-svn: 327706
2018-03-16 13:43:55 +00:00
Simon Pilgrim 14e5a1b05b [X86][Btver2] Add support for multiple pipelines stages for x86 scalar schedules. NFCI.
This allows us to use JWriteResIntPair for complex schedule classes (like WriteIDiv) as well as single pipe instructions.

llvm-svn: 327686
2018-03-15 23:46:12 +00:00
Simon Pilgrim 3894809997 [X86][Btver2] Fix ymm div/sqrt to use fmul unit
YMM FDiv/FSqrt are dispatched on pipe JFPU1 but should be performed on the JFPM unit - that is where most of the cycles are spent.

This matches the pipes for WriteFSqrt/WriteFDiv definitions.

llvm-svn: 327682
2018-03-15 23:00:47 +00:00
Simon Pilgrim 48b758e8ad [X86][Btver2] Attach AES/CLMUL instructions to a scheduler pipe
llvm-svn: 327650
2018-03-15 17:45:10 +00:00
Simon Pilgrim d30df5769e [X86][Btver2] Remove JAny resource, and map system/microcoded instructions to JALU pipes
Simplifies throughput to the issue width (1/2) instead of permitting any pipe (1/6)

llvm-svn: 327632
2018-03-15 15:12:12 +00:00
Simon Pilgrim fb7aa57bf1 [X86][SSE] Introduce Float/Vector WriteMove, WriteLoad and Writetore scheduler classes
As discussed on D44428 and PR36726, this patch splits off WriteFMove/WriteVecMove, WriteFLoad/WriteVecLoad and WriteFStore/WriteVecStore scheduler classes to permit vectors to be handled separately from gpr/scalar types.

I've minimised the diff here by only moving various basic SSE/AVX vector instructions across - we can fix the rest when called for. This does fix the MOVDQA vs MOVAPS/MOVAPD discrepancies mentioned on D44428.

Differential Revision: https://reviews.llvm.org/D44471

llvm-svn: 327630
2018-03-15 14:45:30 +00:00
Simon Pilgrim 48fbf0c69a [X86][Btver2] Add support for multiple pipelines stages for fpu schedules. NFCI.
This allows us to use JWriteResFpuPair for complex schedule classes as well as single pipe instructions.

llvm-svn: 327588
2018-03-14 23:12:09 +00:00
Simon Pilgrim dfeebdbed7 [X86][Btver2] Add ResourceCycles and NumMicroOps overrides to scalar instructions. NFCI.
Currently still use default values - this is setup for a future patch.

llvm-svn: 327582
2018-03-14 21:55:54 +00:00
Simon Pilgrim d594942928 [X86][Btver2] Fix YMM shuffle, permute and permutevar scheduler costs
Account for ymm double pumping and add proper pshufb/permutevar support

llvm-svn: 327510
2018-03-14 14:05:19 +00:00
Simon Pilgrim 3d4c86d399 [X86][Btver2] Split i8/i16/i32/i64 div/idiv costs
We were assuming a mixture of 32/64 division costs.

llvm-svn: 327407
2018-03-13 15:22:24 +00:00
Simon Pilgrim 7f1b9196cb [X86][Btver2] Clean up formatting/comments in scheduler model. NFCI.
Moved 'special cases' to be closer to other system classes.

llvm-svn: 327332
2018-03-12 21:35:12 +00:00
Simon Pilgrim f0a9b25394 [X86][Btver2] FSqrt/FDiv reg-reg instructions don't use the AGU.
I love you llvm-mca.

llvm-svn: 327306
2018-03-12 18:12:46 +00:00
Simon Pilgrim deface9c73 [X86][Btver2] Prefix all scheduler defs. NFCI.
These are all global, so prefix with 'J' to help prevent accidental name clashes with other models.

llvm-svn: 327296
2018-03-12 17:07:08 +00:00
Simon Pilgrim 6f01e654b4 [X86][Btver2] Extend JWriteResFpuPair to accept resource/uop counts. NFCI.
This allows the single resource classes (VarBlend, MPSAD, VarVecShift) to use the JWriteResFpuPair macro.

llvm-svn: 327289
2018-03-12 16:02:56 +00:00
Simon Pilgrim bc216b440f [X86][Btver2] Use JWriteResFpuPair wrapper for AES/CLMUL/HADD scheduler cases. NFCI.
These are single pipe and have the default resource/uop counts like JWriteResFpuPair so there's no need to handle them separately.

llvm-svn: 327283
2018-03-12 15:29:00 +00:00
Simon Pilgrim 8cbc1d232b [X86][BTVER2] Fix throughput of YMM bitwise instructions
These instructions are double-pumped, split into 2 128-bit ops and then passing through either FPU pipe.

Found while testing llvm-mca (D43951)

llvm-svn: 326597
2018-03-02 18:20:35 +00:00
Simon Pilgrim 8c87a2e7bd [X86][BTVER2] Reduce instregex usage (PR35955)
Most are just replaced with instrs lists, but a few regexps have been further generalized to match more instructions with a single pattern.

llvm-svn: 322734
2018-01-17 19:12:48 +00:00
Simon Pilgrim a8e6b885bd [X86][BTVER2] Fix scheduling of VCMPSD/VCMPSS instructions
For some reason they don't have a trailing i like the packed equivalents.

llvm-svn: 322600
2018-01-16 22:15:41 +00:00
Simon Pilgrim 3c66e2c541 [X86][BTVER2] Use instrs instead of instregex for low match counts (PR35955)
llvm-svn: 322598
2018-01-16 22:08:43 +00:00
Simon Pilgrim e9a2832f32 [X86][BTVER2] Use instrs instead of instregex for single use matches (PR35955)
llvm-svn: 322597
2018-01-16 21:44:48 +00:00
Simon Pilgrim 79add5f155 [X86] Fix typos in WriteVMOVNTDQSt and WriteVMOVNTPYSt pattern names. NFCI.
llvm-svn: 322495
2018-01-15 17:55:21 +00:00
Andrew V. Tischenko e58c0c96b2 Update BTVER2 sched numbers for some AVX instructions (xmm version).
Differential Revision: https://reviews.llvm.org/D40067

llvm-svn: 322485
2018-01-15 14:21:11 +00:00
Simon Pilgrim 68f9accf51 [X86] Remove CompleteModel tags from CPU targets until we have better error checking (PR35636)
The checks we have for complete models are not great and miss many cases - e.g. in PR35636 it failed to recognise that only the first output (of 2) was actually tagged by the InstRW

Raised PR35639 and PR35643 as examples

llvm-svn: 320492
2017-12-12 16:12:53 +00:00
Simon Pilgrim cd58171110 [X86] Flag BTVER2 scheduler model as complete
We just have to locally tag COPY as WriteMove

llvm-svn: 320300
2017-12-10 11:51:29 +00:00
Andrew V. Tischenko 44cfc51415 Add proper BTVER2 sched support for MOV instr.
Differential Revision: https://reviews.llvm.org/D40345

llvm-svn: 320034
2017-12-07 11:19:49 +00:00
Simon Pilgrim 97160be53d [X86][FMA] Tag all FMA/FMA4 instructions with WriteFMA schedule class
As mentioned on PR17367, many instructions are missing scheduling tags preventing us from setting 'CompleteModel = 1' for better instruction analysis. This patch deals with FMA/FMA4 which is one of the bigger offenders (along with AVX512 in general).

Annoyingly all scheduler models need to define WriteFMA (now that its actually used), even for older targets without FMA/FMA4 support, but that is an existing problem shared by other schedule classes.

Differential Revision: https://reviews.llvm.org/D40351

llvm-svn: 319016
2017-11-27 10:41:32 +00:00
Andrew V. Tischenko 26dde7719b Update BTVER2 sched numbers for SSE42 string instructions.
Differential Revision: https://reviews.llvm.org/D39846

llvm-svn: 319013
2017-11-27 09:58:00 +00:00
Andrew V. Tischenko 198720d38e Add BTVER2 sched support for SHLD/SHRD.
Differential Revision: https://reviews.llvm.org/D40124

llvm-svn: 318977
2017-11-25 10:46:53 +00:00
Andrew V. Tischenko f8c75b8794 Sched model improving on btver2: JFPU01 resource, vtestp* for xmm.
Differential Revision: https://reviews.llvm.org/D39802

llvm-svn: 317785
2017-11-09 14:19:59 +00:00
Andrew V. Tischenko 3c8bf5ec37 The patch updates sched numbers for YMM AVX instrs such as VMOVx, VORx, VXOR, VPERMILx, VBROADCASTx, etc.
PR32857 should be closed.
Differential Revision: https://reviews.llvm.org/D39227

llvm-svn: 317196
2017-11-02 10:33:41 +00:00
Andrew V. Tischenko 3d971e39f8 Update VCVTx, VMOVNTPx and VROUNDYPx instructions scheduling on btver2.
Differential Revision: https://reviews.llvm.org/D39059

llvm-svn: 317101
2017-11-01 16:10:20 +00:00
Simon Pilgrim 5e3808afa2 [X86][F16C] Fix btver2 AGU pipe scheduling
Use the store AGU for stores, and the load AGU needs to be the first pipe for loads

llvm-svn: 316771
2017-10-27 16:34:58 +00:00
Andrew V. Tischenko f4fbe4a51b Update f16c instruction scheduling on btver2.
Differential Revision: https://reviews.llvm.org/D39051

llvm-svn: 316435
2017-10-24 13:38:30 +00:00
Andrew V. Tischenko 777308b548 Update DPPD/DPPS instruction scheduling on btver2.
Differential Revision: https://reviews.llvm.org/D39046

llvm-svn: 316334
2017-10-23 15:53:30 +00:00
Andrew V. Tischenko e255526d0b Added cost of ZEROALL and ZEROUPPER instrs in btver2 cpu.
Differential Revision https://reviews.llvm.org/D35834

llvm-svn: 309269
2017-07-27 13:12:08 +00:00
Simon Pilgrim 73ef87978f [X86][SSE4A] Add EXTRQ/INSERTQ values to BTVER2 scheduling model
llvm-svn: 308132
2017-07-16 12:06:06 +00:00
Andrew V. Tischenko ae9d6db769 [X86] Model 256-bit AVX instructions in the AMD Jaguar scheduler Part-1 (PR28573).
The new version of the model is definitely faster.

Differential Revision:
https://reviews.llvm.org/D35198

llvm-svn: 307552
2017-07-10 16:36:03 +00:00
Andrew V. Tischenko 8cb1d0931f Add scheduler classes to integer/float horizontal operations.
This patch will close PR32801.
Differential Revision: https://reviews.llvm.org/D33203

llvm-svn: 304986
2017-06-08 16:44:13 +00:00
Andrea Di Biagio 196e873cdc [X86][SchedModel] SSE reciprocal square root instruction latencies.
The SSE rsqrt instruction (a fast reciprocal square root estimate) was
grouped in the same scheduling IIC_SSE_SQRT* class as the accurate (but very
slow) SSE sqrt instruction. For code which uses rsqrt (possibly with
newton-raphson iterations) this poor scheduling was affecting performances.

This patch splits off the rsqrt instruction from the sqrt instruction scheduling
classes and creates new IIC_SSE_RSQER* classes with latency values based on
Agner's table.

Differential Revision: http://reviews.llvm.org/D5370

Patch by Simon Pilgrim.

llvm-svn: 218517
2014-09-26 12:56:44 +00:00
Sanjay Patel 1191adf4df Add a scheduling model for AMD 16H Jaguar (btver2).
This is a first pass at a scheduling model for Jaguar.
It's structured largely on the existing SandyBridge and SLM sched models.

Using this model, in addition to turning on the PostRA scheduler, results in 
some perf wins on internal and 3rd party benchmarks. There's not much difference 
in LLVM's test-suite benchmarking subset of tests.

Differential Revision: http://reviews.llvm.org/D5229

llvm-svn: 217457
2014-09-09 20:07:07 +00:00