llvm-project/llvm/tools/llvm-mca/Views/SummaryView.h

//===--------------------- SummaryView.h ------------------------*- C++ -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
/// \file
///
/// This file implements the summary view.
///
/// The goal of the summary view is to give a very quick overview of the
/// performance throughput. Below is an example of summary view:
///
///
/// Iterations:        300
/// Instructions:      900
/// Total Cycles:      610
/// Dispatch Width:    2
/// IPC:               1.48
/// Block RThroughput: 2.0
///
/// The summary view collects a few performance numbers. The two main
/// performance indicators are 'Total Cycles' and IPC (Instructions Per Cycle).
///
//===----------------------------------------------------------------------===//

#ifndef LLVM_TOOLS_LLVM_MCA_SUMMARYVIEW_H
#define LLVM_TOOLS_LLVM_MCA_SUMMARYVIEW_H

#include "llvm/ADT/DenseMap.h"
#include "llvm/MC/MCSchedule.h"
#include "llvm/MCA/View.h"
#include "llvm/Support/raw_ostream.h"

namespace llvm {
namespace mca {

/// A view that collects and prints a few performance numbers.
class SummaryView : public View {
  const llvm::MCSchedModel &SM;
  llvm::ArrayRef<llvm::MCInst> Source;
  const unsigned DispatchWidth;
  unsigned LastInstructionIdx;
  unsigned TotalCycles;
  // The total number of micro opcodes contributed by a block of instructions.
  unsigned NumMicroOps;

  struct DisplayValues {
    unsigned Instructions;
    unsigned Iterations;
    unsigned TotalInstructions;
    unsigned TotalCycles;
    unsigned DispatchWidth;
    unsigned TotalUOps;
    double IPC;
    double UOpsPerCycle;
    double BlockRThroughput;
  };

  // For each processor resource, this vector stores the cumulative number of
  // resource cycles consumed by the analyzed code block.
  llvm::SmallVector<unsigned, 8> ProcResourceUsage;

  // Each processor resource is associated with a so-called processor resource
  // mask. This vector allows to correlate processor resource IDs with processor
  // resource masks. There is exactly one element per each processor resource
  // declared by the scheduling model.
  llvm::SmallVector<uint64_t, 8> ProcResourceMasks;

  // Used to map resource indices to actual processor resource IDs.
  llvm::SmallVector<unsigned, 8> ResIdx2ProcResID;

  /// Compute the data we want to print out in the object DV.
  void collectData(DisplayValues &DV) const;

public:
  SummaryView(const llvm::MCSchedModel &Model, llvm::ArrayRef<llvm::MCInst> S,
              unsigned Width);

  void onCycleEnd() override { ++TotalCycles; }
  void onEvent(const HWInstructionEvent &Event) override;
  void printView(llvm::raw_ostream &OS) const override;
  StringRef getNameAsString() const override { return "SummaryView"; }
  json::Value toJSON() const override;
};
} // namespace mca
} // namespace llvm

#endif
[llvm-mca] [NFC] Formatting code Applied clang-format to all files. Discarded BottleneckAnalysis.h 80-column width violation since it contains an example of report. Caught some typos and minor style details. Reviewed By: andreadb Differential Revision: https://reviews.llvm.org/D105900 2021-07-14 01:07:03 +08:00			`//===--------------------- SummaryView.h ------------------------- C++ --===//`
[llvm-mca] Move the logic that prints the summary into its own view. NFCI llvm-svn: 327128 2018-03-09 21:52:03 +08:00			`//`
Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636 2019-01-19 16:50:56 +08:00			`// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.`
			`// See https://llvm.org/LICENSE.txt for license information.`
			`// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception`
[llvm-mca] Move the logic that prints the summary into its own view. NFCI llvm-svn: 327128 2018-03-09 21:52:03 +08:00			`//`
			`//===----------------------------------------------------------------------===//`
			`/// \file`
			`///`
			`/// This file implements the summary view.`
			`///`
			`/// The goal of the summary view is to give a very quick overview of the`
			`/// performance throughput. Below is an example of summary view:`
			`///`
			`///`
[llvm-mca] Print the "Block RThroughput" in the SummaryView. This patch implements the "block reciprocal throughput" computation in the SummaryView. The block reciprocal throughput is computed as the MAX of: - NumMicroOps / DispatchWidth - Resource Cycles / #Units (for every resource consumed). The block throughput is bounded from above by the hardware dispatch throughput. That is because the DispatchWidth is an upper bound on how many opcodes can be part of a single dispatch group. The block throughput is also limited by the amount of hardware parallelism. The number of available resource units affects how the resource pressure is distributed, and also how many blocks can be delivered every cycle. llvm-svn: 333095 2018-05-23 23:59:27 +08:00			`/// Iterations: 300`
			`/// Instructions: 900`
			`/// Total Cycles: 610`
			`/// Dispatch Width: 2`
			`/// IPC: 1.48`
			`/// Block RThroughput: 2.0`
[llvm-mca] Move the logic that prints the summary into its own view. NFCI llvm-svn: 327128 2018-03-09 21:52:03 +08:00			`///`
[llvm-mca] Split the InstructionInfoView from the SummaryView. llvm-svn: 328358 2018-03-24 03:40:04 +08:00			`/// The summary view collects a few performance numbers. The two main`
[llvm-mca] Move the logic that prints the summary into its own view. NFCI llvm-svn: 327128 2018-03-09 21:52:03 +08:00			`/// performance indicators are 'Total Cycles' and IPC (Instructions Per Cycle).`
			`///`
			`//===----------------------------------------------------------------------===//`

			`#ifndef LLVM_TOOLS_LLVM_MCA_SUMMARYVIEW_H`
			`#define LLVM_TOOLS_LLVM_MCA_SUMMARYVIEW_H`

[llvm-mca] Print the "Block RThroughput" in the SummaryView. This patch implements the "block reciprocal throughput" computation in the SummaryView. The block reciprocal throughput is computed as the MAX of: - NumMicroOps / DispatchWidth - Resource Cycles / #Units (for every resource consumed). The block throughput is bounded from above by the hardware dispatch throughput. That is because the DispatchWidth is an upper bound on how many opcodes can be part of a single dispatch group. The block throughput is also limited by the amount of hardware parallelism. The number of available resource units affects how the resource pressure is distributed, and also how many blocks can be delivered every cycle. llvm-svn: 333095 2018-05-23 23:59:27 +08:00			`#include "llvm/ADT/DenseMap.h"`
			`#include "llvm/MC/MCSchedule.h"`
[MCA] Moved View.h and View.cpp from /tools/llvm-mca/ to /lib/MCA/. Moved View.h and View.cpp from /tools/llvm-mca/Views/ to /lib/MCA/ and /include/llvm/MCA/. This is so that targets can define their own Views within the /lib/Target/ directory (so that the View can use backend functionality). To enable these Views within mca, targets will need to add them to the vector of Views returned by their target's CustomBehaviour::getViews() methods. Differential Revision: https://reviews.llvm.org/D108520 2021-08-22 08:37:02 +08:00			`#include "llvm/MCA/View.h"`
[llvm-mca] Move the logic that prints the summary into its own view. NFCI llvm-svn: 327128 2018-03-09 21:52:03 +08:00			`#include "llvm/Support/raw_ostream.h"`

[llvm-mca] Move namespace mca inside llvm:: Summary: This allows to remove `using namespace llvm;` in those .cpp files When we want to revisit the decision (everything resides in llvm::mca::) in the future, we can move things to a nested namespace of llvm::mca::, to conceptually make them separate from the rest of llvm::mca::* Reviewers: andreadb, mattd Reviewed By: andreadb Subscribers: javed.absar, tschuett, gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D53407 llvm-svn: 345612 2018-10-30 23:56:08 +08:00			`namespace llvm {`
[llvm-mca] Move the logic that prints the summary into its own view. NFCI llvm-svn: 327128 2018-03-09 21:52:03 +08:00			`namespace mca {`

Remove \brief commands from doxygen comments. We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272 2018-05-01 23:54:18 +08:00			`/// A view that collects and prints a few performance numbers.`
[llvm-mca] Move the logic that prints the summary into its own view. NFCI llvm-svn: 327128 2018-03-09 21:52:03 +08:00			`class SummaryView : public View {`
[llvm-mca] Print the "Block RThroughput" in the SummaryView. This patch implements the "block reciprocal throughput" computation in the SummaryView. The block reciprocal throughput is computed as the MAX of: - NumMicroOps / DispatchWidth - Resource Cycles / #Units (for every resource consumed). The block throughput is bounded from above by the hardware dispatch throughput. That is because the DispatchWidth is an upper bound on how many opcodes can be part of a single dispatch group. The block throughput is also limited by the amount of hardware parallelism. The number of available resource units affects how the resource pressure is distributed, and also how many blocks can be delivered every cycle. llvm-svn: 333095 2018-05-23 23:59:27 +08:00			`const llvm::MCSchedModel &SM;`
[llvm-mca] Removed dependency on mca::SourcMgr in some Views. NFC llvm-svn: 345376 2018-10-26 18:48:04 +08:00			`llvm::ArrayRef<llvm::MCInst> Source;`
[llvm-mca] Move the logic that prints the summary into its own view. NFCI llvm-svn: 327128 2018-03-09 21:52:03 +08:00			`const unsigned DispatchWidth;`
[llvm-mca] Removed dependency on mca::SourcMgr in some Views. NFC llvm-svn: 345376 2018-10-26 18:48:04 +08:00			`unsigned LastInstructionIdx;`
[llvm-mca] Move the logic that prints the summary into its own view. NFCI llvm-svn: 327128 2018-03-09 21:52:03 +08:00			`unsigned TotalCycles;`
[llvm-mca] Print the "Block RThroughput" in the SummaryView. This patch implements the "block reciprocal throughput" computation in the SummaryView. The block reciprocal throughput is computed as the MAX of: - NumMicroOps / DispatchWidth - Resource Cycles / #Units (for every resource consumed). The block throughput is bounded from above by the hardware dispatch throughput. That is because the DispatchWidth is an upper bound on how many opcodes can be part of a single dispatch group. The block throughput is also limited by the amount of hardware parallelism. The number of available resource units affects how the resource pressure is distributed, and also how many blocks can be delivered every cycle. llvm-svn: 333095 2018-05-23 23:59:27 +08:00			`// The total number of micro opcodes contributed by a block of instructions.`
			`unsigned NumMicroOps;`
[MCA] Highlight kernel bottlenecks in the summary view. This patch adds a new flag named -bottleneck-analysis to print out information about throughput bottlenecks. MCA knows how to identify and classify dynamic dispatch stalls. However, it doesn't know how to analyze and highlight kernel bottlenecks. The goal of this patch is to teach MCA how to correlate increases in backend pressure to backend stalls (and therefore, the loss of throughput). From a Scheduler point of view, backend pressure is a function of the scheduler buffer usage (i.e. how the number of uOps in the scheduler buffers changes over time). Backend pressure increases (or decreases) when there is a mismatch between the number of opcodes dispatched, and the number of opcodes issued in the same cycle. Since buffer resources are limited, continuous increases in backend pressure would eventually leads to dispatch stalls. So, there is a strong correlation between dispatch stalls, and how backpressure changed over time. This patch teaches how to identify situations where backend pressure increases due to: - unavailable pipeline resources. - data dependencies. Data dependencies may delay execution of instructions and therefore increase the time that uOps have to spend in the scheduler buffers. That often translates to an increase in backend pressure which may eventually lead to a bottleneck. Contention on pipeline resources may also delay execution of instructions, and lead to a temporary increase in backend pressure. Internally, the Scheduler classifies instructions based on whether register / memory operands are available or not. An instruction is marked as "ready to execute" only if data dependencies are fully resolved. Every cycle, the Scheduler attempts to execute all instructions that are ready to execute. If an instruction cannot execute because of unavailable pipeline resources, then the Scheduler internally updates a BusyResourceUnits mask with the ID of each unavailable resource. ExecuteStage is responsible for tracking changes in backend pressure. If backend pressure increases during a cycle because of contention on pipeline resources, then ExecuteStage sends a "backend pressure" event to the listeners. That event would contain information about instructions delayed by resource pressure, as well as the BusyResourceUnits mask. Note that ExecuteStage also knows how to identify situations where backpressure increased because of delays introduced by data dependencies. The SummaryView observes "backend pressure" events and prints out a "bottleneck report". Example of bottleneck report: ``` Cycles with backend pressure increase [ 99.89% ] Throughput Bottlenecks: Resource Pressure [ 0.00% ] Data Dependencies: [ 99.89% ] - Register Dependencies [ 0.00% ] - Memory Dependencies [ 99.89% ] ``` A bottleneck report is printed out only if increases in backend pressure eventually caused backend stalls. About the time complexity: Time complexity is linear in the number of instructions in the Scheduler::PendingSet. The average slowdown tends to be in the range of ~5-6%. For memory intensive kernels, the slowdown can be significant if flag -noalias=false is specified. In the worst case scenario I have observed a slowdown of ~30% when flag -noalias=false was specified. We can definitely recover part of that slowdown if we optimize class LSUnit (by doing extra bookkeeping to speedup queries). For now, this new analysis is disabled by default, and it can be enabled via flag -bottleneck-analysis. Users of MCA as a library can enable the generation of pressure events through the constructor of ExecuteStage. This patch partially addresses https://bugs.llvm.org/show_bug.cgi?id=37494 Differential Revision: https://reviews.llvm.org/D58728 llvm-svn: 355308 2019-03-04 19:52:34 +08:00
[llvm-mca][NFC] Refactor views to separate data collection from printing. Reviewed By: andreadb, lebedev.ri Differential Revision: https://reviews.llvm.org/D86177 2020-08-20 02:53:39 +08:00			`struct DisplayValues {`
			`unsigned Instructions;`
			`unsigned Iterations;`
			`unsigned TotalInstructions;`
			`unsigned TotalCycles;`
			`unsigned DispatchWidth;`
			`unsigned TotalUOps;`
			`double IPC;`
			`double UOpsPerCycle;`
			`double BlockRThroughput;`
			`};`

[llvm-mca] Move the logic that computes the block throughput into Support.h. NFC This will allow us to share the logic that computes the block throughput with other views. llvm-svn: 333755 2018-06-01 22:35:21 +08:00			`// For each processor resource, this vector stores the cumulative number of`
			`// resource cycles consumed by the analyzed code block.`
			`llvm::SmallVector<unsigned, 8> ProcResourceUsage;`

			`// Each processor resource is associated with a so-called processor resource`
			`// mask. This vector allows to correlate processor resource IDs with processor`
			`// resource masks. There is exactly one element per each processor resource`
			`// declared by the scheduling model.`
			`llvm::SmallVector<uint64_t, 8> ProcResourceMasks;`
[llvm-mca] Print the "Block RThroughput" in the SummaryView. This patch implements the "block reciprocal throughput" computation in the SummaryView. The block reciprocal throughput is computed as the MAX of: - NumMicroOps / DispatchWidth - Resource Cycles / #Units (for every resource consumed). The block throughput is bounded from above by the hardware dispatch throughput. That is because the DispatchWidth is an upper bound on how many opcodes can be part of a single dispatch group. The block throughput is also limited by the amount of hardware parallelism. The number of available resource units affects how the resource pressure is distributed, and also how many blocks can be delivered every cycle. llvm-svn: 333095 2018-05-23 23:59:27 +08:00
[MCA][ResourceManager] Add a table that maps processor resource indices to processor resource identifiers. This patch adds a lookup table to speed up resource queries in the ResourceManager. This patch also moves helper function 'getResourceStateIndex()' from ResourceManager.cpp to Support.h, so that we can reuse that logic in the SummaryView (and potentially other views in llvm-mca). No functional change intended. llvm-svn: 354470 2019-02-20 22:53:18 +08:00			`// Used to map resource indices to actual processor resource IDs.`
			`llvm::SmallVector<unsigned, 8> ResIdx2ProcResID;`

[llvm-mca][NFC] Refactor views to separate data collection from printing. Reviewed By: andreadb, lebedev.ri Differential Revision: https://reviews.llvm.org/D86177 2020-08-20 02:53:39 +08:00			`/// Compute the data we want to print out in the object DV.`
			`void collectData(DisplayValues &DV) const;`

[llvm-mca] Move the logic that prints the summary into its own view. NFCI llvm-svn: 327128 2018-03-09 21:52:03 +08:00			`public:`
[llvm-mca] Removed dependency on mca::SourcMgr in some Views. NFC llvm-svn: 345376 2018-10-26 18:48:04 +08:00			`SummaryView(const llvm::MCSchedModel &Model, llvm::ArrayRef<llvm::MCInst> S,`
[MCA] Moved the bottleneck analysis to its own file. NFCI llvm-svn: 358554 2019-04-17 14:02:05 +08:00			`unsigned Width);`
[llvm-mca] Move the logic that prints the summary into its own view. NFCI llvm-svn: 327128 2018-03-09 21:52:03 +08:00
[MCA] Moved the bottleneck analysis to its own file. NFCI llvm-svn: 358554 2019-04-17 14:02:05 +08:00			`void onCycleEnd() override { ++TotalCycles; }`
[llvm-mca] Simplify eventing by adding an onEvent templated method. Summary: This patch eliminates some redundancy in iterating across Listeners for the Instruction and Stall HWEvents, by introducing a template onEvent routine. This change was suggested by @courbet in https://reviews.llvm.org/D48576. I hope that this patch addresses that suggestion appropriately. I do like this change better than what we had previously. Reviewers: andreadb, courbet, RKSimon Reviewed By: andreadb, courbet Subscribers: javed.absar, tschuett, gbedwell, llvm-commits, courbet Differential Revision: https://reviews.llvm.org/D48672 llvm-svn: 336916 2018-07-13 00:56:17 +08:00			`void onEvent(const HWInstructionEvent &Event) override;`
[llvm-mca] Split the InstructionInfoView from the SummaryView. llvm-svn: 328358 2018-03-24 03:40:04 +08:00			`void printView(llvm::raw_ostream &OS) const override;`
[llvm-mca] Initial implementation of serialization using JSON. The views implemented at this time are Summary, Timeline, ResourcePressure and InstructionInfo. Use --json on the command line to obtain JSON output. 2021-01-22 06:04:13 +08:00			`StringRef getNameAsString() const override { return "SummaryView"; }`
			`json::Value toJSON() const override;`
[llvm-mca] Move the logic that prints the summary into its own view. NFCI llvm-svn: 327128 2018-03-09 21:52:03 +08:00			`};`
			`} // namespace mca`
[llvm-mca] Move namespace mca inside llvm:: Summary: This allows to remove `using namespace llvm;` in those .cpp files When we want to revisit the decision (everything resides in llvm::mca::) in the future, we can move things to a nested namespace of llvm::mca::, to conceptually make them separate from the rest of llvm::mca::* Reviewers: andreadb, mattd Reviewed By: andreadb Subscribers: javed.absar, tschuett, gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D53407 llvm-svn: 345612 2018-10-30 23:56:08 +08:00			`} // namespace llvm`
[llvm-mca] Move the logic that prints the summary into its own view. NFCI llvm-svn: 327128 2018-03-09 21:52:03 +08:00
			`#endif`