The regexs are treated as a prefix match already so the checking for optional text at the end provides no value. Instead it prevents the binary search optimization in tablegen from kicking in due to the top level question mark.
llvm-svn: 323351
We had a lot of separate 32 and 64 instructions that had the same scheduling data. This merges them into the same regular expression. This is pretty consistent with a lot of other instructions.
llvm-svn: 320924
I've hopefully sidestepped the MSVC issue that caused it to be reverted. We no longer include the Sched enum from X86GenInstrInfo.inc on the X86 target. So hopefully MSVC's preprocessor will skip over it and nothing will notice the 11000 character enum name.
Original commit message:
When the scheduler tables are generated by tablegen, the instructions are divided up into groups based on their default scheduling information and how they are referenced by groups for each processor. For any set of instructions that are matched by a specific InstRW line, that group of instructions is guaranteed to not be in a group with any other instructions. So in general, the more InstRW class definitions are created, the more groups we end up with in the generated files. Particularly if a lot of the InstRW lines only match to single instructions, which is true of a large number of the Intel scheduler models.
This change alone reduces the number of instructions groups from ~6000 to ~5500. And there's lots more we could do.
llvm-svn: 320655
[X86] Use regular expressions more aggressively to reduce the number of scheduler entries needed for FMA3 instructions.
When the scheduler tables are generated by tablegen, the instructions are divided up into groups based on their default scheduling information and how they are referenced by groups for each processor. For any set of instructions that are matched by a specific InstRW line, that group of instructions is guaranteed to not be in a group with any other instructions. So in general, the more InstRW class definitions are created, the more groups we end up with in the generated files. Particularly if a lot of the InstRW lines only match to single instructions, which is true of a large number of the Intel scheduler models.
This change alone reduces the number of instructions groups from ~6000 to ~5500. And there's lots more we could do.
llvm-svn: 320470
When the scheduler tables are generated by tablegen, the instructions are divided up into groups based on their default scheduling information and how they are referenced by groups for each processor. For any set of instructions that are matched by a specific InstRW line, that group of instructions is guaranteed to not be in a group with any other instructions. So in general, the more InstRW class definitions are created, the more groups we end up with in the generated files. Particularly if a lot of the InstRW lines only match to single instructions, which is true of a large number of the Intel scheduler models.
This change alone reduces the number of instructions groups from ~6000 to ~5500. And there's lots more we could do.
llvm-svn: 320461
This matches AVX512 version and is more consistent overall. And improves our scheduler models.
In some cases this adds _Int to instructions that didn't have any Int_ before. It's a side effect of the adjustments made to some of the multiclasses.
llvm-svn: 320325
If the question mark is inside the parentheses it only applies to the single character proceeding it.
I had to make a few additional cleanups to fix some duplicate warnings that were exposed by fixing this.
llvm-svn: 320279
Updated the scheduling information for the Haswell subtarget with the following changes:
Regrouped the instructions after adding appropriate load + store latencies.
Added scheduling for missing instructions such as the GATHER instrs.
The changes were made after revisiting the latencies impact of all memory uOps.
Reviewers: RKSimon, zvi, craig.topper, apilipenko
Differential Revision: https://reviews.llvm.org/D40021
Change-Id: Iaf6c1f5169add1552845a8a566af4e5a359217a7
llvm-svn: 320137
As mentioned on PR17367, many instructions are missing scheduling tags preventing us from setting 'CompleteModel = 1' for better instruction analysis. This patch deals with FMA/FMA4 which is one of the bigger offenders (along with AVX512 in general).
Annoyingly all scheduler models need to define WriteFMA (now that its actually used), even for older targets without FMA/FMA4 support, but that is an existing problem shared by other schedule classes.
Differential Revision: https://reviews.llvm.org/D40351
llvm-svn: 319016
Summary:
Intel documentation shows the memory operand as the first operand. But we currently treat it as the second operand. Conceptually the order doesn't matter since it doesn't write memory. We have aliases to parse with the operands in either order and the isel matching is commutable.
For the register®ister form order does matter for the assembly parser. PR22995 was previously filed and fixed by changing the register®ister form from MRMSrcReg to MRMDestReg to match gas. Ideally the memory form should match by using MRMDestMem.
I believe this supercedes D38025 which was trying to switch the register®ister form back to pre-PR22995.
Reviewers: aymanmus, RKSimon, zvi
Reviewed By: aymanmus
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38120
llvm-svn: 314639
This patch completely replaces the instruction scheduling information for the Haswell architecture target by modifying the file X86SchedHaswell.td located under the X86 Target.
We used the scheduling information retrieved from the Haswell architects in order to replace and modify the existing scheduling.
The patch continues the scheduling replacement effort started with the SNB target in r307529 and r310792.
Information includes latency, number of micro-Ops and used ports by each HSW instruction.
Please expect some performance fluctuations due to code alignment effects.
Reviewers: RKSimon, zvi, aymanmus, craig.topper, m_zuckerman, igorb, dim, chandlerc, aaboud
Differential Revision: https://reviews.llvm.org/D36663
llvm-svn: 311879
•static latency
•number of uOps from which the instructions consists
•all ports used by the instruction
Reviewers:
RKSimon
zvi
aymanmus
m_zuckerman
Differential Revision: https://reviews.llvm.org/D33897
llvm-svn: 306414
The latency for the WriteMULm class was set to 4, which is actually lower than the latency for WriteMULr (5).
A better estimate would be 4 added to WriteMULr, that is, 9.
llvm-svn: 230634
The SSE rsqrt instruction (a fast reciprocal square root estimate) was
grouped in the same scheduling IIC_SSE_SQRT* class as the accurate (but very
slow) SSE sqrt instruction. For code which uses rsqrt (possibly with
newton-raphson iterations) this poor scheduling was affecting performances.
This patch splits off the rsqrt instruction from the sqrt instruction scheduling
classes and creates new IIC_SSE_RSQER* classes with latency values based on
Agner's table.
Differential Revision: http://reviews.llvm.org/D5370
Patch by Simon Pilgrim.
llvm-svn: 218517
The old method used by X86TTI to determine partial-unrolling thresholds was
messy (because it worked by testing target features), and also would not
correctly identify the target CPU if certain target features were disabled.
After some discussions on IRC with Chandler et al., it was decided that the
processor scheduling models were the right containers for this information
(because it is often tied to special uop dispatch-buffer sizes).
This does represent a small functionality change:
- For generic x86-64 (which uses the SB model and, thus, will get some
unrolling).
- For AMD cores (because they still currently use the SB scheduling model)
- For Haswell (based on benchmarking by Louis Gerbarg, it was decided to bump
the default threshold to 50; we're working on a test case for this).
Otherwise, nothing has changed for any other targets. The logic, however, has
been moved into BasicTTI, so other targets may now also opt-in to this
functionality simply by setting LoopMicroOpBufferSize in their processor
model definitions.
llvm-svn: 208289