Simon Pilgrim
3aa9344605
[X86][Btver2] Fix YMM BLENDPD/BLENDPS + UNPCKPD/UNPCKP instructions costs
...
These should match the YMM MOVDUP/ PERMILPD/PERMILPS + SHUFPD/SHUFPS shuffles instead of using the WriteFShuffle defaults.
llvm-svn: 328501
2018-03-26 14:44:24 +00:00
Simon Pilgrim
67df1cf597
[X86][Btver2] Add (V)SQRTPD/(V)SQRTSD costs
...
The xmm sd/pd versions were using the WriteFSQRT default which is modelled on sqrtss/sqrtps
llvm-svn: 328497
2018-03-26 14:03:40 +00:00
Simon Pilgrim
caa203aed5
[X86][Btver2] Double the AGU and schedule pipe resources for YMM
...
Both the AGUs and schedule pipes are double pumped for 256-bit instructions as well as the functional units which we already model.
llvm-svn: 328491
2018-03-26 13:15:20 +00:00
Simon Pilgrim
6c63e6c222
[X86][Btver2] Cleanup TEST instructions to use JFPA (+JFPX on ymms) function unit
...
llvm-svn: 328343
2018-03-23 17:59:22 +00:00
Simon Pilgrim
e5c0a041ff
[X86][Btver2] Cleanup MOVMSK instructions to use JFPA function unit
...
Add missing non-VEX and (V)PMOVMSKB instructions to the pattern
llvm-svn: 328338
2018-03-23 17:38:59 +00:00
Simon Pilgrim
256f149bf0
[X86][Btver2] Vector permutes use a JFPU01 scheduler pipe and JFPX/JVALU function unit
...
llvm-svn: 328331
2018-03-23 16:17:56 +00:00
Simon Pilgrim
ee282b3160
[X86][Btver2] Vector store instructions use a JFPU1 scheduler pipe and JSAGU/JSTC function units
...
llvm-svn: 328328
2018-03-23 15:35:13 +00:00
Simon Pilgrim
1335b9c0ca
[X86][Btver2] Cleanup DPPS/DPPD instructions to use JFPA/JFPM function units
...
llvm-svn: 328324
2018-03-23 15:17:50 +00:00
Simon Pilgrim
5792e10ffb
[X86][Btver2] Fix MicroOps counts for DPPS/YMM memory folded instructions
...
This was due to a misunderstanding over what llvm calls a micro-op (retirement unit) is actually called a macro-op on the AMD/Jaguar target. Folded loads don't affect num macro ops.
llvm-svn: 328320
2018-03-23 14:45:03 +00:00
Simon Pilgrim
8619962c73
[X86][Btver2] Cleanup SSE42 PCMPISTR/PCMPESTR string instructions to correctly use JFPU1 scheduler pipe followed by JLAGU/JSAGU/JFPA/JVALU function units
...
Fixes throughput to match Agner/Fam16h-SoG as well.
llvm-svn: 328318
2018-03-23 14:27:26 +00:00
Simon Pilgrim
a1e3ea01ef
[X86][Btver2] Vector move/load/store instructions use a JFPU01 scheduler pipe and JFPX/JVALU function unit as well as the AGUs
...
llvm-svn: 328304
2018-03-23 11:27:31 +00:00
Craig Topper
40d3b32e12
[X86] Rename VROUNDYPS* and VROUNDYPD* instructions to VROUNDPSY* and VROUNDPDY*. Fix itinerary mistake on all memory forms of VROUNDPD
...
This makes the Y position consistent with other instructions.
This should have been NFC, but while refactoring the multiclass I noticed that VROUNDPD memory forms were using the register itinerary.
llvm-svn: 328254
2018-03-22 21:55:20 +00:00
Simon Pilgrim
bcb86bb927
[X86][Btver2] Conversion, MaskedLoad/MaskedStore and NTStores all are scheduled through the JFPU1 pipe
...
llvm-svn: 328226
2018-03-22 18:29:16 +00:00
Simon Pilgrim
0e031afa95
[X86][Btver2] FCMP (inc FMAX/FMIN) instructions use the JFPA functional pipe
...
The ymm instructions are double pumped as well.
llvm-svn: 328222
2018-03-22 17:43:12 +00:00
Simon Pilgrim
e5b51f6786
[X86][Btver2] FMUL ymm instructions are double pumped on the JFPM functional pipe
...
llvm-svn: 328217
2018-03-22 17:25:38 +00:00
Sanjay Patel
05daae75ad
[x86] put nops into the WriteNop class and customize for Jaguar
...
1. Given that we already have a classification bucket with 'nop' in the name,
that's where 'nop' belongs. Right now, it's only used for prefix bytes and 'pause'.
2. Make the latency of this class '1' for Jaguar to tell the scheduler (and presumably
llvm-mca) how to model the resource requirements better even though a nop has no
dependencies.
Differential Revision: https://reviews.llvm.org/D44608
llvm-svn: 327853
2018-03-19 14:26:50 +00:00
Simon Pilgrim
203876f104
[X86][Btver2] Fix crc32 schedule costs
...
The default is currently FAdd for some reason
llvm-svn: 327807
2018-03-18 19:54:42 +00:00
Simon Pilgrim
c3db8c7cda
[X86][Btver2] FADD/FHADD ymm instructions are double pumped on the JFPA functional pipe
...
llvm-svn: 327804
2018-03-18 18:45:57 +00:00
Simon Pilgrim
036cc82622
[X86][Btver2] Float bitwise ymm instructions are double pumped on the JFPX (JFPA/JFPM) functional pipes
...
llvm-svn: 327803
2018-03-18 17:10:12 +00:00
Simon Pilgrim
87d2f7463f
[X86][Btver2] F16C instructions are performed on the JSTC functional pipe
...
llvm-svn: 327801
2018-03-18 15:59:51 +00:00
Simon Pilgrim
541992203d
[X86][Btver2] Strip default latency/resource values. NFCI.
...
llvm-svn: 327795
2018-03-18 13:16:11 +00:00
Simon Pilgrim
40f6d6ad0b
[X86][Btver2] SSE4A EXTRQ/INSERTQ instructions are performed on the JVALU0/JVALU1 functional pipes
...
llvm-svn: 327794
2018-03-18 13:05:09 +00:00
Simon Pilgrim
e16790b133
[X86][Btver2] Modelled float bitwise instructions as being performed on the float cluster (FPA/FPM) not the integer.
...
llvm-svn: 327793
2018-03-18 12:37:35 +00:00
Simon Pilgrim
e409f84e7e
[X86][Btver2] Correctly distinguish between scheduling pipe and functional unit for JWriteResFpuPair defs
...
Jaguar's FPU has 2 scheduler pipes (JFPU0/JFPU1) which forward to multiple functional sub-units each. We need to model that an micro-op will both consume the scheduler pipe and a functional unit.
This patch just handles the ops defined through JWriteResFpuPair, I'll go through the custom cases later.
llvm-svn: 327791
2018-03-18 12:09:17 +00:00
Simon Pilgrim
f86d48b3ae
[X86][Btver2] Merge equivalent VBLENDVY + VPERMILY schedule groups
...
Thanks to Craig Topper for noticing this.
llvm-svn: 327789
2018-03-18 10:22:35 +00:00
Craig Topper
2d451e73f9
[X86] Fix a bunch of overlapping regular expressions in the scheduler models.
...
llvm-svn: 327787
2018-03-18 08:38:06 +00:00
Simon Pilgrim
23578e7d3c
[X86][Btver2] Add correct mul/imul schedule costs
...
Integer multiply is performed on the JMul function unit and i64 requires double pumping
llvm-svn: 327707
2018-03-16 14:01:01 +00:00
Simon Pilgrim
8d28ae6aec
[X86][Btver2] Add correct lzcnt/tzcnt/popcnt schedule costs
...
Don't use WriteIMul defaults
llvm-svn: 327706
2018-03-16 13:43:55 +00:00
Simon Pilgrim
14e5a1b05b
[X86][Btver2] Add support for multiple pipelines stages for x86 scalar schedules. NFCI.
...
This allows us to use JWriteResIntPair for complex schedule classes (like WriteIDiv) as well as single pipe instructions.
llvm-svn: 327686
2018-03-15 23:46:12 +00:00
Simon Pilgrim
3894809997
[X86][Btver2] Fix ymm div/sqrt to use fmul unit
...
YMM FDiv/FSqrt are dispatched on pipe JFPU1 but should be performed on the JFPM unit - that is where most of the cycles are spent.
This matches the pipes for WriteFSqrt/WriteFDiv definitions.
llvm-svn: 327682
2018-03-15 23:00:47 +00:00
Simon Pilgrim
48b758e8ad
[X86][Btver2] Attach AES/CLMUL instructions to a scheduler pipe
...
llvm-svn: 327650
2018-03-15 17:45:10 +00:00
Simon Pilgrim
d30df5769e
[X86][Btver2] Remove JAny resource, and map system/microcoded instructions to JALU pipes
...
Simplifies throughput to the issue width (1/2) instead of permitting any pipe (1/6)
llvm-svn: 327632
2018-03-15 15:12:12 +00:00
Simon Pilgrim
fb7aa57bf1
[X86][SSE] Introduce Float/Vector WriteMove, WriteLoad and Writetore scheduler classes
...
As discussed on D44428 and PR36726, this patch splits off WriteFMove/WriteVecMove, WriteFLoad/WriteVecLoad and WriteFStore/WriteVecStore scheduler classes to permit vectors to be handled separately from gpr/scalar types.
I've minimised the diff here by only moving various basic SSE/AVX vector instructions across - we can fix the rest when called for. This does fix the MOVDQA vs MOVAPS/MOVAPD discrepancies mentioned on D44428.
Differential Revision: https://reviews.llvm.org/D44471
llvm-svn: 327630
2018-03-15 14:45:30 +00:00
Simon Pilgrim
48fbf0c69a
[X86][Btver2] Add support for multiple pipelines stages for fpu schedules. NFCI.
...
This allows us to use JWriteResFpuPair for complex schedule classes as well as single pipe instructions.
llvm-svn: 327588
2018-03-14 23:12:09 +00:00
Simon Pilgrim
dfeebdbed7
[X86][Btver2] Add ResourceCycles and NumMicroOps overrides to scalar instructions. NFCI.
...
Currently still use default values - this is setup for a future patch.
llvm-svn: 327582
2018-03-14 21:55:54 +00:00
Simon Pilgrim
d594942928
[X86][Btver2] Fix YMM shuffle, permute and permutevar scheduler costs
...
Account for ymm double pumping and add proper pshufb/permutevar support
llvm-svn: 327510
2018-03-14 14:05:19 +00:00
Simon Pilgrim
3d4c86d399
[X86][Btver2] Split i8/i16/i32/i64 div/idiv costs
...
We were assuming a mixture of 32/64 division costs.
llvm-svn: 327407
2018-03-13 15:22:24 +00:00
Simon Pilgrim
7f1b9196cb
[X86][Btver2] Clean up formatting/comments in scheduler model. NFCI.
...
Moved 'special cases' to be closer to other system classes.
llvm-svn: 327332
2018-03-12 21:35:12 +00:00
Simon Pilgrim
f0a9b25394
[X86][Btver2] FSqrt/FDiv reg-reg instructions don't use the AGU.
...
I love you llvm-mca.
llvm-svn: 327306
2018-03-12 18:12:46 +00:00
Simon Pilgrim
deface9c73
[X86][Btver2] Prefix all scheduler defs. NFCI.
...
These are all global, so prefix with 'J' to help prevent accidental name clashes with other models.
llvm-svn: 327296
2018-03-12 17:07:08 +00:00
Simon Pilgrim
6f01e654b4
[X86][Btver2] Extend JWriteResFpuPair to accept resource/uop counts. NFCI.
...
This allows the single resource classes (VarBlend, MPSAD, VarVecShift) to use the JWriteResFpuPair macro.
llvm-svn: 327289
2018-03-12 16:02:56 +00:00
Simon Pilgrim
bc216b440f
[X86][Btver2] Use JWriteResFpuPair wrapper for AES/CLMUL/HADD scheduler cases. NFCI.
...
These are single pipe and have the default resource/uop counts like JWriteResFpuPair so there's no need to handle them separately.
llvm-svn: 327283
2018-03-12 15:29:00 +00:00
Simon Pilgrim
8cbc1d232b
[X86][BTVER2] Fix throughput of YMM bitwise instructions
...
These instructions are double-pumped, split into 2 128-bit ops and then passing through either FPU pipe.
Found while testing llvm-mca (D43951)
llvm-svn: 326597
2018-03-02 18:20:35 +00:00
Simon Pilgrim
8c87a2e7bd
[X86][BTVER2] Reduce instregex usage (PR35955)
...
Most are just replaced with instrs lists, but a few regexps have been further generalized to match more instructions with a single pattern.
llvm-svn: 322734
2018-01-17 19:12:48 +00:00
Simon Pilgrim
a8e6b885bd
[X86][BTVER2] Fix scheduling of VCMPSD/VCMPSS instructions
...
For some reason they don't have a trailing i like the packed equivalents.
llvm-svn: 322600
2018-01-16 22:15:41 +00:00
Simon Pilgrim
3c66e2c541
[X86][BTVER2] Use instrs instead of instregex for low match counts (PR35955)
...
llvm-svn: 322598
2018-01-16 22:08:43 +00:00
Simon Pilgrim
e9a2832f32
[X86][BTVER2] Use instrs instead of instregex for single use matches (PR35955)
...
llvm-svn: 322597
2018-01-16 21:44:48 +00:00
Simon Pilgrim
79add5f155
[X86] Fix typos in WriteVMOVNTDQSt and WriteVMOVNTPYSt pattern names. NFCI.
...
llvm-svn: 322495
2018-01-15 17:55:21 +00:00
Andrew V. Tischenko
e58c0c96b2
Update BTVER2 sched numbers for some AVX instructions (xmm version).
...
Differential Revision: https://reviews.llvm.org/D40067
llvm-svn: 322485
2018-01-15 14:21:11 +00:00
Simon Pilgrim
68f9accf51
[X86] Remove CompleteModel tags from CPU targets until we have better error checking (PR35636)
...
The checks we have for complete models are not great and miss many cases - e.g. in PR35636 it failed to recognise that only the first output (of 2) was actually tagged by the InstRW
Raised PR35639 and PR35643 as examples
llvm-svn: 320492
2017-12-12 16:12:53 +00:00
Simon Pilgrim
cd58171110
[X86] Flag BTVER2 scheduler model as complete
...
We just have to locally tag COPY as WriteMove
llvm-svn: 320300
2017-12-10 11:51:29 +00:00
Andrew V. Tischenko
44cfc51415
Add proper BTVER2 sched support for MOV instr.
...
Differential Revision: https://reviews.llvm.org/D40345
llvm-svn: 320034
2017-12-07 11:19:49 +00:00
Simon Pilgrim
97160be53d
[X86][FMA] Tag all FMA/FMA4 instructions with WriteFMA schedule class
...
As mentioned on PR17367, many instructions are missing scheduling tags preventing us from setting 'CompleteModel = 1' for better instruction analysis. This patch deals with FMA/FMA4 which is one of the bigger offenders (along with AVX512 in general).
Annoyingly all scheduler models need to define WriteFMA (now that its actually used), even for older targets without FMA/FMA4 support, but that is an existing problem shared by other schedule classes.
Differential Revision: https://reviews.llvm.org/D40351
llvm-svn: 319016
2017-11-27 10:41:32 +00:00
Andrew V. Tischenko
26dde7719b
Update BTVER2 sched numbers for SSE42 string instructions.
...
Differential Revision: https://reviews.llvm.org/D39846
llvm-svn: 319013
2017-11-27 09:58:00 +00:00
Andrew V. Tischenko
198720d38e
Add BTVER2 sched support for SHLD/SHRD.
...
Differential Revision: https://reviews.llvm.org/D40124
llvm-svn: 318977
2017-11-25 10:46:53 +00:00
Andrew V. Tischenko
f8c75b8794
Sched model improving on btver2: JFPU01 resource, vtestp* for xmm.
...
Differential Revision: https://reviews.llvm.org/D39802
llvm-svn: 317785
2017-11-09 14:19:59 +00:00
Andrew V. Tischenko
3c8bf5ec37
The patch updates sched numbers for YMM AVX instrs such as VMOVx, VORx, VXOR, VPERMILx, VBROADCASTx, etc.
...
PR32857 should be closed.
Differential Revision: https://reviews.llvm.org/D39227
llvm-svn: 317196
2017-11-02 10:33:41 +00:00
Andrew V. Tischenko
3d971e39f8
Update VCVTx, VMOVNTPx and VROUNDYPx instructions scheduling on btver2.
...
Differential Revision: https://reviews.llvm.org/D39059
llvm-svn: 317101
2017-11-01 16:10:20 +00:00
Simon Pilgrim
5e3808afa2
[X86][F16C] Fix btver2 AGU pipe scheduling
...
Use the store AGU for stores, and the load AGU needs to be the first pipe for loads
llvm-svn: 316771
2017-10-27 16:34:58 +00:00
Andrew V. Tischenko
f4fbe4a51b
Update f16c instruction scheduling on btver2.
...
Differential Revision: https://reviews.llvm.org/D39051
llvm-svn: 316435
2017-10-24 13:38:30 +00:00
Andrew V. Tischenko
777308b548
Update DPPD/DPPS instruction scheduling on btver2.
...
Differential Revision: https://reviews.llvm.org/D39046
llvm-svn: 316334
2017-10-23 15:53:30 +00:00
Andrew V. Tischenko
e255526d0b
Added cost of ZEROALL and ZEROUPPER instrs in btver2 cpu.
...
Differential Revision https://reviews.llvm.org/D35834
llvm-svn: 309269
2017-07-27 13:12:08 +00:00
Simon Pilgrim
73ef87978f
[X86][SSE4A] Add EXTRQ/INSERTQ values to BTVER2 scheduling model
...
llvm-svn: 308132
2017-07-16 12:06:06 +00:00
Andrew V. Tischenko
ae9d6db769
[X86] Model 256-bit AVX instructions in the AMD Jaguar scheduler Part-1 (PR28573).
...
The new version of the model is definitely faster.
Differential Revision:
https://reviews.llvm.org/D35198
llvm-svn: 307552
2017-07-10 16:36:03 +00:00
Andrew V. Tischenko
8cb1d0931f
Add scheduler classes to integer/float horizontal operations.
...
This patch will close PR32801.
Differential Revision: https://reviews.llvm.org/D33203
llvm-svn: 304986
2017-06-08 16:44:13 +00:00
Andrea Di Biagio
196e873cdc
[X86][SchedModel] SSE reciprocal square root instruction latencies.
...
The SSE rsqrt instruction (a fast reciprocal square root estimate) was
grouped in the same scheduling IIC_SSE_SQRT* class as the accurate (but very
slow) SSE sqrt instruction. For code which uses rsqrt (possibly with
newton-raphson iterations) this poor scheduling was affecting performances.
This patch splits off the rsqrt instruction from the sqrt instruction scheduling
classes and creates new IIC_SSE_RSQER* classes with latency values based on
Agner's table.
Differential Revision: http://reviews.llvm.org/D5370
Patch by Simon Pilgrim.
llvm-svn: 218517
2014-09-26 12:56:44 +00:00
Sanjay Patel
1191adf4df
Add a scheduling model for AMD 16H Jaguar (btver2).
...
This is a first pass at a scheduling model for Jaguar.
It's structured largely on the existing SandyBridge and SLM sched models.
Using this model, in addition to turning on the PostRA scheduler, results in
some perf wins on internal and 3rd party benchmarks. There's not much difference
in LLVM's test-suite benchmarking subset of tests.
Differential Revision: http://reviews.llvm.org/D5229
llvm-svn: 217457
2014-09-09 20:07:07 +00:00