[llvm-mca][docs] Always use `llvm-mca` in place of `MCA`.

llvm-svn: 338394
This commit is contained in:
Andrea Di Biagio 2018-07-31 15:29:10 +00:00
parent 0b8fdd2847
commit bdcf6ad60d
1 changed files with 46 additions and 49 deletions

View File

@ -207,23 +207,23 @@ EXIT STATUS
:program:`llvm-mca` returns 0 on success. Otherwise, an error message is printed
to standard error, and the tool returns 1.
HOW MCA WORKS
-------------
HOW LLVM-MCA WORKS
------------------
MCA takes assembly code as input. The assembly code is parsed into a sequence
of MCInst with the help of the existing LLVM target assembly parsers. The
parsed sequence of MCInst is then analyzed by a ``Pipeline`` module to generate
a performance report.
:program:`llvm-mca` takes assembly code as input. The assembly code is parsed
into a sequence of MCInst with the help of the existing LLVM target assembly
parsers. The parsed sequence of MCInst is then analyzed by a ``Pipeline`` module
to generate a performance report.
The Pipeline module simulates the execution of the machine code sequence in a
loop of iterations (default is 100). During this process, the pipeline collects
a number of execution related statistics. At the end of this process, the
pipeline generates and prints a report from the collected statistics.
Here is an example of a performance report generated by MCA for a dot-product
of two packed float vectors of four elements. The analysis is conducted for
target x86, cpu btver2. The following result can be produced via the following
command using the example located at
Here is an example of a performance report generated by the tool for a
dot-product of two packed float vectors of four elements. The analysis is
conducted for target x86, cpu btver2. The following result can be produced via
the following command using the example located at
``test/tools/llvm-mca/X86/BtVer2/dot-product.s``:
.. code-block:: bash
@ -316,7 +316,7 @@ pressure should be uniformly distributed between multiple resources.
Timeline View
^^^^^^^^^^^^^
MCA's timeline view produces a detailed report of each instruction's state
The timeline view produces a detailed report of each instruction's state
transitions through an instruction pipeline. This view is enabled by the
command line option ``-timeline``. As instructions transition through the
various stages of the pipeline, their states are depicted in the view report.
@ -331,7 +331,7 @@ These states are represented by the following characters:
Below is the timeline view for a subset of the dot-product example located in
``test/tools/llvm-mca/X86/BtVer2/dot-product.s`` and processed by
MCA using the following command:
:program:`llvm-mca` using the following command:
.. code-block:: bash
@ -366,7 +366,7 @@ MCA using the following command:
2. 3 5.7 0.0 0.0 vhaddps %xmm3, %xmm3, %xmm4
The timeline view is interesting because it shows instruction state changes
during execution. It also gives an idea of how MCA processes instructions
during execution. It also gives an idea of how the tool processes instructions
executed on the target, and how their timing information might be calculated.
The timeline view is structured in two tables. The first table shows
@ -415,8 +415,8 @@ and therefore consuming temporary registers).
Table *Average Wait times* helps diagnose performance issues that are caused by
the presence of long latency instructions and potentially long data dependencies
which may limit the ILP. Note that MCA, by default, assumes at least 1cy
between the dispatch event and the issue event.
which may limit the ILP. Note that :program:`llvm-mca`, by default, assumes at
least 1cy between the dispatch event and the issue event.
When the performance is limited by data dependencies and/or long latency
instructions, the number of cycles spent while in the *ready* state is expected
@ -602,9 +602,9 @@ entries in the reorder buffer defaults to the `MicroOpBufferSize` provided by
the target scheduling model.
Instructions that are dispatched to the schedulers consume scheduler buffer
entries. MCA queries the scheduling model to determine the set of
buffered resources consumed by an instruction. Buffered resources are treated
like scheduler resources.
entries. :program:`llvm-mca` queries the scheduling model to determine the set
of buffered resources consumed by an instruction. Buffered resources are
treated like scheduler resources.
Instruction Issue
"""""""""""""""""
@ -612,22 +612,21 @@ Each processor scheduler implements a buffer of instructions. An instruction
has to wait in the scheduler's buffer until input register operands become
available. Only at that point, does the instruction becomes eligible for
execution and may be issued (potentially out-of-order) for execution.
Instruction latencies are computed by MCA with the help of the scheduling
model.
Instruction latencies are computed by :program:`llvm-mca` with the help of the
scheduling model.
MCA's scheduler is designed to simulate multiple processor schedulers. The
scheduler is responsible for tracking data dependencies, and dynamically
selecting which processor resources are consumed by instructions.
The scheduler delegates the management of processor resource units and resource
groups to a resource manager. The resource manager is responsible for
selecting resource units that are consumed by instructions. For example, if an
instruction consumes 1cy of a resource group, the resource manager selects one
of the available units from the group; by default, the resource manager uses a
:program:`llvm-mca`'s scheduler is designed to simulate multiple processor
schedulers. The scheduler is responsible for tracking data dependencies, and
dynamically selecting which processor resources are consumed by instructions.
It delegates the management of processor resource units and resource groups to a
resource manager. The resource manager is responsible for selecting resource
units that are consumed by instructions. For example, if an instruction
consumes 1cy of a resource group, the resource manager selects one of the
available units from the group; by default, the resource manager uses a
round-robin selector to guarantee that resource usage is uniformly distributed
between all units of a group.
MCA's scheduler implements three instruction queues:
:program:`llvm-mca`'s scheduler implements three instruction queues:
* WaitQueue: a queue of instructions whose operands are not ready.
* ReadyQueue: a queue of instructions ready to execute.
@ -638,8 +637,8 @@ scheduler are either placed into the WaitQueue or into the ReadyQueue.
Every cycle, the scheduler checks if instructions can be moved from the
WaitQueue to the ReadyQueue, and if instructions from the ReadyQueue can be
issued. The algorithm prioritizes older instructions over younger
instructions.
issued to the underlying pipelines. The algorithm prioritizes older instructions
over younger instructions.
Write-Back and Retire Stage
"""""""""""""""""""""""""""
@ -656,15 +655,13 @@ for the instruction during the register renaming stage.
Load/Store Unit and Memory Consistency Model
""""""""""""""""""""""""""""""""""""""""""""
To simulate an out-of-order execution of memory operations, MCA utilizes a
simulated load/store unit (LSUnit) to simulate the speculative execution of
loads and stores.
To simulate an out-of-order execution of memory operations, :program:`llvm-mca`
utilizes a simulated load/store unit (LSUnit) to simulate the speculative
execution of loads and stores.
Each load (or store) consumes an entry in the load (or store) queue. The
number of slots in the load/store queues is unknown by MCA, since there is no
mention of it in the scheduling model. In practice, users can specify flags
``-lqueue`` and ``-squeue`` to limit the number of entries in the load and
store queues respectively. The queues are unbounded by default.
Each load (or store) consumes an entry in the load (or store) queue. Users can
specify flags ``-lqueue`` and ``-squeue`` to limit the number of entries in the
load and store queues respectively. The queues are unbounded by default.
The LSUnit implements a relaxed consistency model for memory loads and stores.
The rules are:
@ -701,15 +698,15 @@ cache. It only knows if an instruction "MayLoad" and/or "MayStore." For
loads, the scheduling model provides an "optimistic" load-to-use latency (which
usually matches the load-to-use latency for when there is a hit in the L1D).
MCA does not know about serializing operations or memory-barrier like
instructions. The LSUnit conservatively assumes that an instruction which has
both "MayLoad" and unmodeled side effects behaves like a "soft" load-barrier.
That means, it serializes loads without forcing a flush of the load queue.
Similarly, instructions that "MayStore" and have unmodeled side effects are
treated like store barriers. A full memory barrier is a "MayLoad" and
"MayStore" instruction with unmodeled side effects. This is inaccurate, but it
is the best that we can do at the moment with the current information available
in LLVM.
:program:`llvm-mca` does not know about serializing operations or memory-barrier
like instructions. The LSUnit conservatively assumes that an instruction which
has both "MayLoad" and unmodeled side effects behaves like a "soft"
load-barrier. That means, it serializes loads without forcing a flush of the
load queue. Similarly, instructions that "MayStore" and have unmodeled side
effects are treated like store barriers. A full memory barrier is a "MayLoad"
and "MayStore" instruction with unmodeled side effects. This is inaccurate, but
it is the best that we can do at the moment with the current information
available in LLVM.
A load/store barrier consumes one entry of the load/store queue. A load/store
barrier enforces ordering of loads/stores. A younger load cannot pass a load