forked from OSchip/llvm-project
194 lines
6.2 KiB
ReStructuredText
194 lines
6.2 KiB
ReStructuredText
llvm-mca - LLVM Machine Code Analyzer
|
|
=====================================
|
|
|
|
SYNOPSIS
|
|
--------
|
|
|
|
:program:`llvm-mca` [*options*] [input]
|
|
|
|
DESCRIPTION
|
|
-----------
|
|
|
|
:program:`llvm-mca` is a performance analysis tool that uses information
|
|
available in LLVM (e.g. scheduling models) to statically measure the performance
|
|
of machine code in a specific CPU.
|
|
|
|
Performance is measured in terms of throughput as well as processor resource
|
|
consumption. The tool currently works for processors with an out-of-order
|
|
backend, for which there is a scheduling model available in LLVM.
|
|
|
|
The main goal of this tool is not just to predict the performance of the code
|
|
when run on the target, but also help with diagnosing potential performance
|
|
issues.
|
|
|
|
Given an assembly code sequence, llvm-mca estimates the IPC (Instructions Per
|
|
Cycle), as well as hardware resource pressure. The analysis and reporting style
|
|
were inspired by the IACA tool from Intel.
|
|
|
|
:program:`llvm-mca` allows the usage of special code comments to mark regions of
|
|
the assembly code to be analyzed. A comment starting with substring
|
|
``LLVM-MCA-BEGIN`` marks the beginning of a code region. A comment starting with
|
|
substring ``LLVM-MCA-END`` marks the end of a code region. For example:
|
|
|
|
.. code-block:: none
|
|
|
|
# LLVM-MCA-BEGIN My Code Region
|
|
...
|
|
# LLVM-MCA-END
|
|
|
|
Multiple regions can be specified provided that they do not overlap. A code
|
|
region can have an optional description. If no user-defined region is specified,
|
|
then :program:`llvm-mca` assumes a default region which contains every
|
|
instruction in the input file. Every region is analyzed in isolation, and the
|
|
final performance report is the union of all the reports generated for every
|
|
code region.
|
|
|
|
Inline assembly directives may be used from source code to annotate the
|
|
assembly text:
|
|
|
|
.. code-block:: c++
|
|
|
|
int foo(int a, int b) {
|
|
__asm volatile("# LLVM-MCA-BEGIN foo");
|
|
a += 42;
|
|
__asm volatile("# LLVM-MCA-END");
|
|
a *= b;
|
|
return a;
|
|
}
|
|
|
|
So for example, you can compile code with clang, output assembly, and pipe it
|
|
directly into llvm-mca for analysis:
|
|
|
|
.. code-block:: bash
|
|
|
|
$ clang foo.c -O2 -target x86_64-unknown-unknown -S -o - | llvm-mca -mcpu=btver2
|
|
|
|
OPTIONS
|
|
-------
|
|
|
|
If ``input`` is "``-``" or omitted, :program:`llvm-mca` reads from standard
|
|
input. Otherwise, it will read from the specified filename.
|
|
|
|
If the :option:`-o` option is omitted, then :program:`llvm-mca` will send its output
|
|
to standard output if the input is from standard input. If the :option:`-o`
|
|
option specifies "``-``", then the output will also be sent to standard output.
|
|
|
|
|
|
.. option:: -help
|
|
|
|
Print a summary of command line options.
|
|
|
|
.. option:: -mtriple=<target triple>
|
|
|
|
Specify a target triple string.
|
|
|
|
.. option:: -march=<arch>
|
|
|
|
Specify the architecture for which to analyze the code. It defaults to the
|
|
host default target.
|
|
|
|
.. option:: -mcpu=<cpuname>
|
|
|
|
Specify the processor for which to analyze the code. By default, the cpu name
|
|
is autodetected from the host.
|
|
|
|
.. option:: -output-asm-variant=<variant id>
|
|
|
|
Specify the output assembly variant for the report generated by the tool.
|
|
On x86, possible values are [0, 1]. A value of 0 (vic. 1) for this flag enables
|
|
the AT&T (vic. Intel) assembly format for the code printed out by the tool in
|
|
the analysis report.
|
|
|
|
.. option:: -dispatch=<width>
|
|
|
|
Specify a different dispatch width for the processor. The dispatch width
|
|
defaults to field 'IssueWidth' in the processor scheduling model. If width is
|
|
zero, then the default dispatch width is used.
|
|
|
|
.. option:: -register-file-size=<size>
|
|
|
|
Specify the size of the register file. When specified, this flag limits how
|
|
many temporary registers are available for register renaming purposes. A value
|
|
of zero for this flag means "unlimited number of temporary registers".
|
|
|
|
.. option:: -iterations=<number of iterations>
|
|
|
|
Specify the number of iterations to run. If this flag is set to 0, then the
|
|
tool sets the number of iterations to a default value (i.e. 100).
|
|
|
|
.. option:: -noalias=<bool>
|
|
|
|
If set, the tool assumes that loads and stores don't alias. This is the
|
|
default behavior.
|
|
|
|
.. option:: -lqueue=<load queue size>
|
|
|
|
Specify the size of the load queue in the load/store unit emulated by the tool.
|
|
By default, the tool assumes an unbound number of entries in the load queue.
|
|
A value of zero for this flag is ignored, and the default load queue size is
|
|
used instead.
|
|
|
|
.. option:: -squeue=<store queue size>
|
|
|
|
Specify the size of the store queue in the load/store unit emulated by the
|
|
tool. By default, the tool assumes an unbound number of entries in the store
|
|
queue. A value of zero for this flag is ignored, and the default store queue
|
|
size is used instead.
|
|
|
|
.. option:: -timeline
|
|
|
|
Enable the timeline view.
|
|
|
|
.. option:: -timeline-max-iterations=<iterations>
|
|
|
|
Limit the number of iterations to print in the timeline view. By default, the
|
|
timeline view prints information for up to 10 iterations.
|
|
|
|
.. option:: -timeline-max-cycles=<cycles>
|
|
|
|
Limit the number of cycles in the timeline view. By default, the number of
|
|
cycles is set to 80.
|
|
|
|
.. option:: -resource-pressure
|
|
|
|
Enable the resource pressure view. This is enabled by default.
|
|
|
|
.. option:: -register-file-stats
|
|
|
|
Enable register file usage statistics.
|
|
|
|
.. option:: -dispatch-stats
|
|
|
|
Enable extra dispatch statistics. This view collects and analyzes instruction
|
|
dispatch events, as well as static/dynamic dispatch stall events. This view
|
|
is disabled by default.
|
|
|
|
.. option:: -scheduler-stats
|
|
|
|
Enable extra scheduler statistics. This view collects and analyzes instruction
|
|
issue events. This view is disabled by default.
|
|
|
|
.. option:: -retire-stats
|
|
|
|
Enable extra retire control unit statistics. This view is disabled by default.
|
|
|
|
.. option:: -instruction-info
|
|
|
|
Enable the instruction info view. This is enabled by default.
|
|
|
|
.. option:: -instruction-tables
|
|
|
|
Prints resource pressure information based on the static information
|
|
available from the processor model. This differs from the resource pressure
|
|
view because it doesn't require that the code is simulated. It instead prints
|
|
the theoretical uniform distribution of resource pressure for every
|
|
instruction in sequence.
|
|
|
|
|
|
EXIT STATUS
|
|
-----------
|
|
|
|
:program:`llvm-mca` returns 0 on success. Otherwise, an error message is printed
|
|
to standard error, and the tool returns 1.
|
|
|