llvm-project/llvm/docs/CommandGuide/llvm-mca.rst

llvm-mca - LLVM Machine Code Analyzer
=====================================

SYNOPSIS
--------

:program:`llvm-mca` [*options*] [input]

DESCRIPTION
-----------

:program:`llvm-mca` is a performance analysis tool that uses information
available in LLVM (e.g. scheduling models) to statically measure the performance
of machine code in a specific CPU.

Performance is measured in terms of throughput as well as processor resource
consumption. The tool currently works for processors with an out-of-order
backend, for which there is a scheduling model available in LLVM.

The main goal of this tool is not just to predict the performance of the code
when run on the target, but also help with diagnosing potential performance
issues.

Given an assembly code sequence, llvm-mca estimates the IPC (Instructions Per
Cycle), as well as hardware resource pressure. The analysis and reporting style
were inspired by the IACA tool from Intel.

OPTIONS
-------

If ``input`` is "``-``" or omitted, :program:`llvm-mca` reads from standard
input. Otherwise, it will read from the specified filename.

If the :option:`-o` option is omitted, then :program:`llvm-mca` will send its output
to standard output if the input is from standard input.  If the :option:`-o`
option specifies "``-``", then the output will also be sent to standard output.


.. option:: -help

 Print a summary of command line options.

.. option:: -mtriple=<target triple>

 Specify a target triple string.

.. option:: -march=<arch>

 Specify the architecture for which to analyze the code. It defaults to the
 host default target.

.. option:: -mcpu=<cpuname>

 Specify the processor for whic to run the analysis.
 By default this defaults to a "generic" processor. It is not autodetected to
 the current architecture.

.. option:: -output-asm-variant=<variant id>

 Specify the output assembly variant for the report generated by the tool.
 On x86, possible values are [0, 1]. A value of 0 (vic. 1) for this flag enables
 the AT&T (vic. Intel) assembly format for the code printed out by the tool in
 the analysis report.

.. option:: -dispatch=<width>

 Specify a different dispatch width for the processor. The dispatch width
 defaults to the 'IssueWidth' specified by the processor scheduling model.
 If width is zero, then the default dispatch width is used.

.. option:: -max-retire-per-cycle=<retire throughput>

 Specify the retire throughput (i.e. how many instructions can be retired by the
 retire control unit every cycle).

.. option:: -register-file-size=<size>

 Specify the size of the register file. When specified, this flag limits
 how many temporary registers are available for register renaming purposes. By
 default, the number of temporary registers is unlimited. A value of zero for
 this flag means "unlimited number of temporary registers".

.. option:: -iterations=<number of iterations>

 Specify the number of iterations to run. If this flag is set to 0, then the
 tool sets the number of iterations to a default value (i.e. 70).

.. option:: -noalias=<bool>

  If set, the tool assumes that loads and stores don't alias. This is the
  default behavior.

.. option:: -lqueue=<load queue size>

  Specify the size of the load queue in the load/store unit emulated by the tool.
  By default, the tool assumes an unbound number of entries in the load queue.
  A value of zero for this flag is ignored, and the default load queue size is
  used instead. 

.. option:: -squeue=<store queue size>

  Specify the size of the store queue in the load/store unit emulated by the
  tool. By default, the tool assumes an unbound number of entries in the store
  queue. A value of zero for this flag is ignored, and the default store queue
  size is used instead.

.. option:: -verbose

  Enable verbose output. In particular, this flag enables a number of extra
  statistics and performance counters for the dispatch logic, the reorder
  buffer, the retire control unit and the register file.

.. option:: -timeline

  Enable the timeline view.

.. option:: -timeline-max-iterations=<iterations>

  Limit the number of iterations to print in the timeline view. By default, the
  timeline view prints information for up to 10 iterations.

.. option:: -timeline-max-cycles=<cycles>

  Limit the number of cycles in the timeline view. By default, the number of
  cycles is set to 80.

.. option:: -resource-pressure

  Enable the resource pressure view. This is enabled by default.

.. option:: -instruction-tables

  Prints resource pressure information based on the static information
  available from the processor model. This differs from the resource pressure
  view because it doesn't require that the code is simulated. It instead prints
  the theoretical uniform distribution of resource pressure for every
  instruction in sequence.


EXIT STATUS
-----------

:program:`llvm-mca` returns 0 on success. Otherwise, an error message is printed
to standard error, and the tool returns 1.
[llvm-mca] LLVM Machine Code Analyzer. llvm-mca is an LLVM based performance analysis tool that can be used to statically measure the performance of code, and to help triage potential problems with target scheduling models. llvm-mca uses information which is already available in LLVM (e.g. scheduling models) to statically measure the performance of machine code in a specific cpu. Performance is measured in terms of throughput as well as processor resource consumption. The tool currently works for processors with an out-of-order backend, for which there is a scheduling model available in LLVM. The main goal of this tool is not just to predict the performance of the code when run on the target, but also help with diagnosing potential performance issues. Given an assembly code sequence, llvm-mca estimates the IPC (instructions per cycle), as well as hardware resources pressure. The analysis and reporting style were mostly inspired by the IACA tool from Intel. This patch is related to the RFC on llvm-dev visible at this link: http://lists.llvm.org/pipermail/llvm-dev/2018-March/121490.html Differential Revision: https://reviews.llvm.org/D43951 llvm-svn: 326998 2018-03-08 21:05:02 +08:00			`llvm-mca - LLVM Machine Code Analyzer`
			`=====================================`

			`SYNOPSIS`
			`--------`

			:program:`llvm-mca` [options] [input]

			`DESCRIPTION`
			`-----------`

			:program:`llvm-mca` is a performance analysis tool that uses information
			`available in LLVM (e.g. scheduling models) to statically measure the performance`
			`of machine code in a specific CPU.`

			`Performance is measured in terms of throughput as well as processor resource`
			`consumption. The tool currently works for processors with an out-of-order`
			`backend, for which there is a scheduling model available in LLVM.`

			`The main goal of this tool is not just to predict the performance of the code`
			`when run on the target, but also help with diagnosing potential performance`
			`issues.`

			`Given an assembly code sequence, llvm-mca estimates the IPC (Instructions Per`
			`Cycle), as well as hardware resource pressure. The analysis and reporting style`
			`were inspired by the IACA tool from Intel.`

			`OPTIONS`
			`-------`

			If ``input`` is "``-``" or omitted, :program:`llvm-mca` reads from standard
			`input. Otherwise, it will read from the specified filename.`

			If the :option:`-o` option is omitted, then :program:`llvm-mca` will send its output
			to standard output if the input is from standard input. If the :option:`-o`
			option specifies "``-``", then the output will also be sent to standard output.


			`.. option:: -help`

			`Print a summary of command line options.`

			`.. option:: -mtriple=<target triple>`

			`Specify a target triple string.`

			`.. option:: -march=<arch>`

			`Specify the architecture for which to analyze the code. It defaults to the`
			`host default target.`

			`.. option:: -mcpu=<cpuname>`

			`Specify the processor for whic to run the analysis.`
			`By default this defaults to a "generic" processor. It is not autodetected to`
			`the current architecture.`

			`.. option:: -output-asm-variant=<variant id>`

			`Specify the output assembly variant for the report generated by the tool.`
			`On x86, possible values are [0, 1]. A value of 0 (vic. 1) for this flag enables`
			`the AT&T (vic. Intel) assembly format for the code printed out by the tool in`
			`the analysis report.`

			`.. option:: -dispatch=<width>`

			`Specify a different dispatch width for the processor. The dispatch width`
			`defaults to the 'IssueWidth' specified by the processor scheduling model.`
			`If width is zero, then the default dispatch width is used.`

			`.. option:: -max-retire-per-cycle=<retire throughput>`

			`Specify the retire throughput (i.e. how many instructions can be retired by the`
			`retire control unit every cycle).`

			`.. option:: -register-file-size=<size>`

			`Specify the size of the register file. When specified, this flag limits`
			`how many temporary registers are available for register renaming purposes. By`
			`default, the number of temporary registers is unlimited. A value of zero for`
			`this flag means "unlimited number of temporary registers".`

			`.. option:: -iterations=<number of iterations>`

			`Specify the number of iterations to run. If this flag is set to 0, then the`
			`tool sets the number of iterations to a default value (i.e. 70).`

			`.. option:: -noalias=<bool>`

			`If set, the tool assumes that loads and stores don't alias. This is the`
			`default behavior.`

			`.. option:: -lqueue=<load queue size>`

			`Specify the size of the load queue in the load/store unit emulated by the tool.`
			`By default, the tool assumes an unbound number of entries in the load queue.`
			`A value of zero for this flag is ignored, and the default load queue size is`
			`used instead.`

			`.. option:: -squeue=<store queue size>`

			`Specify the size of the store queue in the load/store unit emulated by the`
			`tool. By default, the tool assumes an unbound number of entries in the store`
			`queue. A value of zero for this flag is ignored, and the default store queue`
			`size is used instead.`

			`.. option:: -verbose`

			`Enable verbose output. In particular, this flag enables a number of extra`
			`statistics and performance counters for the dispatch logic, the reorder`
			`buffer, the retire control unit and the register file.`

			`.. option:: -timeline`

			`Enable the timeline view.`

			`.. option:: -timeline-max-iterations=<iterations>`

			`Limit the number of iterations to print in the timeline view. By default, the`
			`timeline view prints information for up to 10 iterations.`

			`.. option:: -timeline-max-cycles=<cycles>`

			`Limit the number of cycles in the timeline view. By default, the number of`
			`cycles is set to 80.`

[llvm-mca] Update the commandline docs after r328305. Document that flag -resource-pressure can be used to enable/disable the resource pressure view. This change should have been part of r328305. llvm-svn: 328492 2018-03-26 21:21:48 +08:00			`.. option:: -resource-pressure`

			`Enable the resource pressure view. This is enabled by default.`

[llvm-mca] Add flag -instruction-tables to print the theoretical resource pressure distribution for instructions (PR36874) The goal of this patch is to address most of PR36874. To fully fix PR36874 we need to split the "InstructionInfo" view from the "SummaryView". That would make easy to check the latency and rthroughput as well. The patch reuses all the logic from ResourcePressureView to print out the "instruction tables". We have an entry for every instruction in the input sequence. Each entry reports the theoretical resource pressure distribution. Resource pressure is uniformly distributed across all the processor resource units of a group. At the moment, the backend pipeline is not configurable, so the only way to fix this is by creating a different driver that simply sends instruction events to the resource pressure view. That means, we don't use the Backend interface. Instead, it is simpler to just have a different code-path for when flag -instruction-tables is specified. Once Clement addresses bug 36663, then we can port the "instruction tables" logic into a stage of our configurable pipeline. Updated the BtVer2 test cases (thanks Simon for the help). Now we pass flag -instruction-tables to each modified test. Differential Revision: https://reviews.llvm.org/D44839 llvm-svn: 328487 2018-03-26 20:04:53 +08:00			`.. option:: -instruction-tables`

			`Prints resource pressure information based on the static information`
			`available from the processor model. This differs from the resource pressure`
			`view because it doesn't require that the code is simulated. It instead prints`
			`the theoretical uniform distribution of resource pressure for every`
			`instruction in sequence.`

[llvm-mca] LLVM Machine Code Analyzer. llvm-mca is an LLVM based performance analysis tool that can be used to statically measure the performance of code, and to help triage potential problems with target scheduling models. llvm-mca uses information which is already available in LLVM (e.g. scheduling models) to statically measure the performance of machine code in a specific cpu. Performance is measured in terms of throughput as well as processor resource consumption. The tool currently works for processors with an out-of-order backend, for which there is a scheduling model available in LLVM. The main goal of this tool is not just to predict the performance of the code when run on the target, but also help with diagnosing potential performance issues. Given an assembly code sequence, llvm-mca estimates the IPC (instructions per cycle), as well as hardware resources pressure. The analysis and reporting style were mostly inspired by the IACA tool from Intel. This patch is related to the RFC on llvm-dev visible at this link: http://lists.llvm.org/pipermail/llvm-dev/2018-March/121490.html Differential Revision: https://reviews.llvm.org/D43951 llvm-svn: 326998 2018-03-08 21:05:02 +08:00
			`EXIT STATUS`
			`-----------`

			:program:`llvm-mca` returns 0 on success. Otherwise, an error message is printed
			`to standard error, and the tool returns 1.`