[documentation][llvm-mca] Update the documentation.

Scheduling models can now describe processor register files and retire control
units. This updates the existing documentation and the README file.

llvm-svn: 329311
This commit is contained in:
Andrea Di Biagio 2018-04-05 16:42:32 +00:00
parent 6ecdb03f16
commit efc3f39f02
2 changed files with 19 additions and 40 deletions

View File

@ -65,15 +65,14 @@ option specifies "``-``", then the output will also be sent to standard output.
.. option:: -dispatch=<width> .. option:: -dispatch=<width>
Specify a different dispatch width for the processor. The dispatch width Specify a different dispatch width for the processor. The dispatch width
defaults to the 'IssueWidth' specified by the processor scheduling model. defaults to field 'IssueWidth' in the processor scheduling model. If width is
If width is zero, then the default dispatch width is used. zero, then the default dispatch width is used.
.. option:: -register-file-size=<size> .. option:: -register-file-size=<size>
Specify the size of the register file. When specified, this flag limits Specify the size of the register file. When specified, this flag limits how
how many temporary registers are available for register renaming purposes. By many temporary registers are available for register renaming purposes. A value
default, the number of temporary registers is unlimited. A value of zero for of zero for this flag means "unlimited number of temporary registers".
this flag means "unlimited number of temporary registers".
.. option:: -iterations=<number of iterations> .. option:: -iterations=<number of iterations>

View File

@ -34,9 +34,7 @@ the purpose of scheduling instructions (and therefore not described by the
scheduling model), but are very important for this tool. scheduling model), but are very important for this tool.
A few examples of details that are missing in scheduling models are: A few examples of details that are missing in scheduling models are:
- Maximum number of instructions retired per cycle.
- Actual dispatch width (it often differs from the issue width). - Actual dispatch width (it often differs from the issue width).
- Number of temporary registers available for renaming.
- Number of read/write ports in the register file(s). - Number of read/write ports in the register file(s).
- Length of the load/store queue in the LSUnit. - Length of the load/store queue in the LSUnit.
@ -387,17 +385,17 @@ An instruction can be dispatched if:
- There are enough temporary registers to do register renaming - There are enough temporary registers to do register renaming
- Schedulers are not full. - Schedulers are not full.
Scheduling models don't describe register files, and therefore the tool doesn't Since r329067, scheduling models can now optionally specify which register files
know if there is more than one register file, and how many temporaries are are available on the processor. Class DispatchUnit(see Dispatch.h) would use
available for register renaming. that information to initialize register file descriptors.
By default, the tool (optimistically) assumes a single register file with an By default, if the model doesn't describe register files, the tool
unbounded number of temporary registers. Users can limit the number of (optimistically) assumes a single register file with an unbounded number of
temporary registers available for register renaming using flag temporary registers. Users can limit the number of temporary registers that are
`-register-file-size=<N>`, where N is the number of temporaries. A value of globally available for register renaming using flag `-register-file-size=<N>`,
zero for N means 'unbounded'. Knowing how many temporaries are available for where N is the number of temporaries. A value of zero for N means 'unbounded'.
register renaming, the tool can predict dispatch stalls caused by the lack of Knowing how many temporaries are available for register renaming, the tool can
temporaries. predict dispatch stalls caused by the lack of temporaries.
The number of reorder buffer entries consumed by an instruction depends on the The number of reorder buffer entries consumed by an instruction depends on the
number of micro-opcodes it specifies in the target scheduling model (see field number of micro-opcodes it specifies in the target scheduling model (see field
@ -667,25 +665,6 @@ instructions are not evaluated, and therefore control flow is not affected.
However, the tool still queries the processor scheduling model to obtain latency However, the tool still queries the processor scheduling model to obtain latency
information for instructions that affect the control flow. information for instructions that affect the control flow.
Possible extensions to the scheduling model
-------------------------------------------
Section "Instruction Dispatch" explained how the tool doesn't know about the
register files, and temporaries available in each register file for register
renaming purposes.
The LLVM scheduling model could be extended to better describe register files.
Ideally, scheduling model should be able to define:
- The size of each register file
- How many temporary registers are available for register renaming
- How register classes map to register files
The scheduling model doesn't specify the retire throughput (i.e. how many
instructions can be retired every cycle). Users can specify flag
`-max-retire-per-cycle=<uint>` to limit how many instructions the retire control
unit can retire every cycle. Ideally, every processor should be able to specify
the retire throughput (for example, by adding an extra field to the scheduling
model tablegen class).
Known limitations on X86 processors Known limitations on X86 processors
----------------------------------- -----------------------------------
@ -867,8 +846,6 @@ analysis.
Future work Future work
----------- -----------
* Address limitations (described in section "Known limitations"). * Address limitations (described in section "Known limitations").
* Integrate extra description in the processor models, and make it opt-in for
the targets (see section "Possible extensions to the scheduling model").
* Let processors specify the selection strategy for processor resource groups * Let processors specify the selection strategy for processor resource groups
and resources with multiple units. The tool currently uses a round-robin and resources with multiple units. The tool currently uses a round-robin
selector to pick the next resource to use. selector to pick the next resource to use.
@ -877,8 +854,11 @@ Future work
* Address design issues identified in section "Known design problems". * Address design issues identified in section "Known design problems".
* Define a standard interface for "Views". This would let users customize the * Define a standard interface for "Views". This would let users customize the
performance report generated by the tool. performance report generated by the tool.
* Simplify the Backend interface.
When interfaces are mature/stable: When interfaces are mature/stable:
* Move the logic into a library. This will enable a number of other * Move the logic into a library. This will enable a number of other
interesting use cases. interesting use cases.
Work is currently tracked on https://bugs.llvm.org. llvm-mca bugs are tagged
with prefix [llvm-mca]. You can easily find the full list of open bugs if you
search for that tag.