llvm-project/llvm/test/tools/llvm-mca/X86/BtVer2
Andrea Di Biagio aa9b6468bd [MCA][Bottleneck Analysis] Teach how to compute a critical sequence of instructions based on the simulation.
This patch teaches the bottleneck analysis how to identify and print the most
expensive sequence of instructions according to the simulation. Fixes PR37494.

The goal is to help users identify the sequence of instruction which is most
critical for performance.

A dependency graph is internally used by the bottleneck analysis to describe
data dependencies and processor resource interferences between instructions.

There is one node in the graph for every instruction in the input assembly
sequence. The number of nodes in the graph is independent from the number of
iterations simulated by the tool. It means that a single node of the graph
represents all the possible instances of a same instruction contributed by the
simulated iterations.

Edges are dynamically "discovered" by the bottleneck analysis by observing
instruction state transitions and "backend pressure increase" events generated
by the Execute stage. Information from the events is used to identify critical
dependencies, and materialize edges in the graph. A dependency edge is uniquely
identified by a pair of node identifiers plus an instance of struct
DependencyEdge::Dependency (which provides more details about the actual
dependency kind).

The bottleneck analysis internally ranks dependency edges based on their impact
on the runtime (see field DependencyEdge::Dependency::Cost). To this end, each
edge of the graph has an associated cost. By default, the cost of an edge is a
function of its latency (in cycles). In practice, the cost of an edge is also a
function of the number of cycles where the dependency has been seen as
'contributing to backend pressure increases'. The idea is that the higher the
cost of an edge, the higher is the impact of the dependency on performance. To
put it in another way, the cost of an edge is a measure of criticality for
performance.

Note how a same edge may be found in multiple iteration of the simulated loop.
The logic that adds new edges to the graph checks if an equivalent dependency
already exists (duplicate edges are not allowed). If an equivalent dependency
edge is found, field DependencyEdge::Frequency of that edge is incremented by
one, and the new cost is cumulatively added to the existing edge cost.

At the end of simulation, costs are propagated to nodes through the edges of the
graph. The goal is to identify a critical sequence from a node of the root-set
(composed by node of the graph with no predecessors) to a 'sink node' with no
successors.  Note that the graph is intentionally kept acyclic to minimize the
complexity of the critical sequence computation algorithm (complexity is
currently linear in the number of nodes in the graph).

The critical path is finally computed as a sequence of dependency edges. For
edges describing processor resource interferences, the view also prints a
so-called "interference probability" value (by dividing field
DependencyEdge::Frequency by the total number of iterations).

Examples of critical sequence computations can be found in tests added/modified
by this patch.

On output streams that support colored output, instructions from the critical
sequence are rendered with a different color.

Strictly speaking the analysis conducted by the bottleneck analysis view is not
a critical path analysis. The cost of an edge doesn't only depend on the
dependency latency. More importantly, the cost of a same edge may be computed
differently by different iterations.

The number of dependencies is discovered dynamically based on the events
generated by the simulator. However, their number is not fixed. This is
especially true for edges that model processor resource interferences; an
interference may not occur in every iteration. For that reason, it makes sense
to also print out a "probability of interference".

By construction, the accuracy of this analysis (as always) is strongly dependent
on the simulation (and therefore the quality of the information available in the
scheduling model).

That being said, the critical sequence effectively identifies a performance
criticality. Instructions from that sequence are expected to have a very big
impact on performance. So, users can take advantage of this information to focus
their attention on specific interactions between instructions.
In my experience, it works quite well in practice, and produces useful
output (in a reasonable amount time).

Differential Revision: https://reviews.llvm.org/D63543

llvm-svn: 364045
2019-06-21 13:32:54 +00:00
..
add-sequence.s [llvm-mca] Add fields "Total uOps" and "uOps Per Cycle" to the report generated by the SummaryView. 2018-08-29 17:56:39 +00:00
bottleneck-hints-1.s [MCA][Bottleneck Analysis] Teach how to compute a critical sequence of instructions based on the simulation. 2019-06-21 13:32:54 +00:00
bottleneck-hints-2.s [MCA][Bottleneck Analysis] Teach how to compute a critical sequence of instructions based on the simulation. 2019-06-21 13:32:54 +00:00
bottleneck-hints-3.s [MCA][Bottleneck Analysis] Teach how to compute a critical sequence of instructions based on the simulation. 2019-06-21 13:32:54 +00:00
bottleneck-hints-4.s [MCA][Bottleneck Analysis] Teach how to compute a critical sequence of instructions based on the simulation. 2019-06-21 13:32:54 +00:00
bottleneck-hints-none.s [llvm-mca] Enable bottleneck analysis when flag -all-views is specified. 2019-06-10 16:56:25 +00:00
clear-super-register-1.s [X86][Btver2] Fix BSF/BSR schedule 2018-09-28 10:26:48 +00:00
clear-super-register-2.s [llvm-mca] Add fields "Total uOps" and "uOps Per Cycle" to the report generated by the SummaryView. 2018-08-29 17:56:39 +00:00
dependency-breaking-cmp.s [llvm-mca] Add fields "Total uOps" and "uOps Per Cycle" to the report generated by the SummaryView. 2018-08-29 17:56:39 +00:00
dependency-breaking-pcmpeq.s [llvm-mca] Add fields "Total uOps" and "uOps Per Cycle" to the report generated by the SummaryView. 2018-08-29 17:56:39 +00:00
dependency-breaking-pcmpgt.s [llvm-mca] Add fields "Total uOps" and "uOps Per Cycle" to the report generated by the SummaryView. 2018-08-29 17:56:39 +00:00
dependency-breaking-sbb-1.s [llvm-mca] Add fields "Total uOps" and "uOps Per Cycle" to the report generated by the SummaryView. 2018-08-29 17:56:39 +00:00
dependency-breaking-sbb-2.s [llvm-mca] Add fields "Total uOps" and "uOps Per Cycle" to the report generated by the SummaryView. 2018-08-29 17:56:39 +00:00
dependent-pmuld-paddd.s [llvm-mca] Add fields "Total uOps" and "uOps Per Cycle" to the report generated by the SummaryView. 2018-08-29 17:56:39 +00:00
dot-product.s [X86][BtVer2] Update latency of horizontal operations. 2019-01-16 18:18:01 +00:00
hadd-read-after-ld-1.s [X86][BtVer2] Update latency of horizontal operations. 2019-01-16 18:18:01 +00:00
hadd-read-after-ld-2.s [X86][BtVer2] Update latency of horizontal operations. 2019-01-16 18:18:01 +00:00
instruction-info-view.s [X86][BtVer2] Update latency of horizontal operations. 2019-01-16 18:18:01 +00:00
int-to-fpu-forwarding-1.s [MC][X86] Correctly model additional operand latency caused by transfer delays from the integer to the floating point unit. 2019-01-23 16:35:07 +00:00
int-to-fpu-forwarding-2.s [X86] Remove the suffix on vcvt[u]si2ss/sd register variants in assembly printing. 2019-05-06 21:39:51 +00:00
int-to-fpu-forwarding-3.s [MC][X86] Correctly model additional operand latency caused by transfer delays from the integer to the floating point unit. 2019-01-23 16:35:07 +00:00
load-store-alias.s [llvm-mca] Add fields "Total uOps" and "uOps Per Cycle" to the report generated by the SummaryView. 2018-08-29 17:56:39 +00:00
memcpy-like-test.s [llvm-mca] Add fields "Total uOps" and "uOps Per Cycle" to the report generated by the SummaryView. 2018-08-29 17:56:39 +00:00
one-idioms.s [llvm-mca] Add fields "Total uOps" and "uOps Per Cycle" to the report generated by the SummaryView. 2018-08-29 17:56:39 +00:00
partial-reg-update-2.s [llvm-mca] Add fields "Total uOps" and "uOps Per Cycle" to the report generated by the SummaryView. 2018-08-29 17:56:39 +00:00
partial-reg-update-3.s [llvm-mca] Add fields "Total uOps" and "uOps Per Cycle" to the report generated by the SummaryView. 2018-08-29 17:56:39 +00:00
partial-reg-update-4.s [llvm-mca] Add fields "Total uOps" and "uOps Per Cycle" to the report generated by the SummaryView. 2018-08-29 17:56:39 +00:00
partial-reg-update-5.s [llvm-mca] Add fields "Total uOps" and "uOps Per Cycle" to the report generated by the SummaryView. 2018-08-29 17:56:39 +00:00
partial-reg-update-6.s [llvm-mca] Add fields "Total uOps" and "uOps Per Cycle" to the report generated by the SummaryView. 2018-08-29 17:56:39 +00:00
partial-reg-update-7.s [llvm-mca] Fix an invalid memory read introduced by r346487. 2018-11-22 12:48:57 +00:00
partial-reg-update.s [llvm-mca] Add fields "Total uOps" and "uOps Per Cycle" to the report generated by the SummaryView. 2018-08-29 17:56:39 +00:00
pipes-fpu.s [llvm-mca] Add fields "Total uOps" and "uOps Per Cycle" to the report generated by the SummaryView. 2018-08-29 17:56:39 +00:00
pr37790.s [X86] Add missing properties on llvm.x86.sse.{st,ld}mxcsr 2019-06-19 08:44:31 +00:00
rank.s [llvm-mca] Add fields "Total uOps" and "uOps Per Cycle" to the report generated by the SummaryView. 2018-08-29 17:56:39 +00:00
rcu-statistics.s [llvm-mca][View] Improved Retire Control Unit Statistics. 2018-11-23 12:12:57 +00:00
read-advance-1.s [llvm-mca] Add fields "Total uOps" and "uOps Per Cycle" to the report generated by the SummaryView. 2018-08-29 17:56:39 +00:00
read-advance-2.s [llvm-mca] Add fields "Total uOps" and "uOps Per Cycle" to the report generated by the SummaryView. 2018-08-29 17:56:39 +00:00
read-advance-3.s [llvm-mca] Add fields "Total uOps" and "uOps Per Cycle" to the report generated by the SummaryView. 2018-08-29 17:56:39 +00:00
reg-move-elimination-1.s [llvm-mca] Add extra counters for move elimination in view RegisterFileStatistics. 2018-11-01 18:04:39 +00:00
reg-move-elimination-2.s [llvm-mca] Add extra counters for move elimination in view RegisterFileStatistics. 2018-11-01 18:04:39 +00:00
reg-move-elimination-3.s [llvm-mca] Add extra counters for move elimination in view RegisterFileStatistics. 2018-11-01 18:04:39 +00:00
reg-move-elimination-4.s [llvm-mca] Add extra counters for move elimination in view RegisterFileStatistics. 2018-11-01 18:04:39 +00:00
reg-move-elimination-5.s [llvm-mca] Add extra counters for move elimination in view RegisterFileStatistics. 2018-11-01 18:04:39 +00:00
reg-move-elimination-6.s [MCA] Correctly update register definitions in the PRF after move elimination. 2019-02-18 14:15:25 +00:00
register-files-1.s [llvm-mca] Report the number of dispatched micro opcodes in the DispatchStatistics view. 2018-08-30 10:50:20 +00:00
register-files-2.s [llvm-mca] Report the number of dispatched micro opcodes in the DispatchStatistics view. 2018-08-30 10:50:20 +00:00
register-files-3.s [llvm-mca] Report the number of dispatched micro opcodes in the DispatchStatistics view. 2018-08-30 10:50:20 +00:00
register-files-4.s [MCA] Always check if scheduler resources are unavailable when reporting dispatch stalls. 2019-02-26 14:19:00 +00:00
register-files-5.s [llvm-mca] Report the number of dispatched micro opcodes in the DispatchStatistics view. 2018-08-30 10:50:20 +00:00
resources-aes.s [X86][Btver2] Fix BLENDV and AESDEC schedules 2018-10-02 15:13:18 +00:00
resources-avx1.s [X86] Add missing properties on llvm.x86.sse.{st,ld}mxcsr 2019-06-19 08:44:31 +00:00
resources-bmi1.s [llvm-mca][X86] Add missing tzcntw tests 2019-01-22 14:53:52 +00:00
resources-cmov.s [llvm-mca] Use a different character to flag instructions with side-effects in the Instruction Info View. NFC 2018-07-11 12:44:44 +00:00
resources-cmpxchg.s [llvm-mca][x86] Add CMPXCHG instruction resource tests 2018-08-01 17:25:11 +00:00
resources-f16c.s [llvm-mca] Use a different character to flag instructions with side-effects in the Instruction Info View. NFC 2018-07-11 12:44:44 +00:00
resources-lea.s [X86][BtVer2] correctly model the latency/throughput of LEA instructions. 2018-07-19 16:42:15 +00:00
resources-lzcnt.s [llvm-mca] Use a different character to flag instructions with side-effects in the Instruction Info View. NFC 2018-07-11 12:44:44 +00:00
resources-mmx.s [llvm-mca] Use a different character to flag instructions with side-effects in the Instruction Info View. NFC 2018-07-11 12:44:44 +00:00
resources-movbe.s [llvm-mca][x86] Add MOVBE resource tests to all supporting targets 2018-07-17 17:41:45 +00:00
resources-pclmul.s [llvm-mca][x86] Add PCLMUL instruction resource tests 2018-08-01 16:25:50 +00:00
resources-popcnt.s [llvm-mca] Use a different character to flag instructions with side-effects in the Instruction Info View. NFC 2018-07-11 12:44:44 +00:00
resources-prefetchw.s [X86][BtVer2] Update the WriteLoad latency. 2019-01-21 12:04:10 +00:00
resources-sse1.s [X86] Add missing properties on llvm.x86.sse.{st,ld}mxcsr 2019-06-19 08:44:31 +00:00
resources-sse2.s [X86] Remove the suffix on vcvt[u]si2ss/sd register variants in assembly printing. 2019-05-06 21:39:51 +00:00
resources-sse3.s [llvm-mca][X86] Add missing monitor/mwait tests 2019-01-22 15:48:16 +00:00
resources-sse4a.s [llvm-mca] Use a different character to flag instructions with side-effects in the Instruction Info View. NFC 2018-07-11 12:44:44 +00:00
resources-sse41.s [X86][Btver2] Fix BLENDV and AESDEC schedules 2018-10-02 15:13:18 +00:00
resources-sse42.s [X86][Btver2] Fix PCmpIStrI/PCmpIStrM schedules 2018-09-30 16:38:38 +00:00
resources-ssse3.s [X86][BtVer2] Update latency of mmx horizontal operations 2019-01-21 18:04:25 +00:00
resources-x86_32.s [llvm-mca][x86] Add 32-bit instruction resource tests 2018-07-31 17:33:08 +00:00
resources-x86_64.s [llvm-mca][X86] Add ADC/SBB with zero test cases 2019-03-06 12:51:16 +00:00
resources-x87.s [X86] Print all register forms of x87 fadd/fsub/fdiv/fmul as having two arguments where on is %st. 2019-02-04 17:28:18 +00:00
scheduler-queue-usage.s [llvm-mca][scheduler-stats] Print issued micro opcodes per cycle. NFCI 2019-04-08 16:05:54 +00:00
simple-test.s [llvm-mca] Add fields "Total uOps" and "uOps Per Cycle" to the report generated by the SummaryView. 2018-08-29 17:56:39 +00:00
unsupported-instruction.s [llvm-mca] report an error if the assembly sequence contains an unsupported instruction. 2018-07-09 12:30:55 +00:00
vbroadcast-operand-latency.s [X86][BtVer2] Remove wrong ReadAdvance from AVX vbroadcast(ss|sd|f128) instructions. 2018-08-31 16:05:48 +00:00
vec-logic-read-after-ld-1.s [llvm-mca] Add fields "Total uOps" and "uOps Per Cycle" to the report generated by the SummaryView. 2018-08-29 17:56:39 +00:00
vec-logic-read-after-ld-2.s [llvm-mca] Add fields "Total uOps" and "uOps Per Cycle" to the report generated by the SummaryView. 2018-08-29 17:56:39 +00:00
zero-idioms-avx-256.s [X86][BtVer2] Teach how to identify zero-idiom VPERM2F128rr instructions. 2018-10-01 10:35:13 +00:00
zero-idioms.s [X86][Btver2] PSUBS/PSUBUS instructions are zero-idioms 2018-09-28 14:20:42 +00:00