18 KiB
Static scheduling infrastructure
Scheduling is a common concern in hardware design, for example in high-level
synthesis flows targeting an FSM+Datapath execution model ("static HLS"). This
document gives an overview of, and provides rationale for, the infrastructure in
the circt::scheduling
namespace. At its core, it defines an extensible
problem model that acts as an interface between clients (i.e. passes that
have a need to schedule a graph-like IR) and reusable algorithm
implementations.
This infrastructure aims to provide:
- a library of ready-to-use problem definitions and schedulers for clients to hook into.
- an API to make algorithm implementations comparable and reusable.
- a mechanism to extend problem definitions to model additional concerns and constraints.
Getting started
Let's walk through a simple example. Assume we want to schedule the
computation in the entry block of a function such as @foo(...)
in the listing
below. This means we want to assign integer start times to each of the
operations in this untimed IR.
func @foo(%a1 : i32, %a2 : i32, %a3 : i32, %a4 : i32) -> i32 {
%0 = arith.addi %a1, %a2 : i32
%1 = arith.addi %0, %a3 : i32
%2:3 = "more.results"(%0, %1) : (i32, i32) -> (i32, i32, i32)
%3 = arith.addi %a4, %2#1 : i32
%4 = arith.addi %2#0, %2#2 : i32
%5 = arith.addi %3, %3 : i32
%6 = "more.operands"(%3, %4, %5) : (i32, i32, i32) -> i32
return %6 : i32
}
Our only constraint is that an operation can start after its operands have
been computed. The operations in our source IR are unaware of time, so we need
to associate them with a suitable operator type. Operator types are an
abstraction of the target architecture onto which we want to schedule the source
IR. Here, the only property we need to model is their latency. Let's assume
that additions take 1 time step, the operations in the dummy more.
dialect
take 3 time steps. As the return operation just passes control back to the
caller, we assume a latency of 0 time steps for it.
Boilerplate
The scheduling infrastructure currently has three toplevel header files.
//...
#include "circt/Scheduling/Problems.h"
#include "circt/Scheduling/Algorithms.h"
#include "circt/Scheduling/Utilities.h"
//...
using namespace circt::scheduling;
Constructing a problem instance
Our stated goal requires solving an acyclic scheduling problem without resource
constraints, represented by the Problem
class in the scheduling
infrastructure. We need to construct an instance of the problem, which serves
as a container for the problem components as well as their properties. The
MLIR operation passed as an argument to the get(...)
method is used to emit
diagnostics.
auto prob = Problem::get(func);
Then, we set up the operator types with the latencies as discussed in the introduction. Operator types are identified by string handles.
auto retOpr = prob.getOrInsertOperatorType("return");
prob.setLatency(retOpr, 0);
auto addOpr = prob.getOrInsertOperatorType("add");
prob.setLatency(addOpr, 1);
auto mcOpr = prob.getOrInsertOperatorType("multicycle");
prob.setLatency(mcOpr, 3);
Next, we register all operations that we want to consider in the problem instance, and link them to one of the operator types.
auto &block = func.getBlocks().front();
for (auto &op : block) {
prob.insertOperation(&op);
if (isa<func::ReturnOp>(op))
prob.setLinkedOperatorType(&op, retOpr);
else if (isa<arith::AddIOp>(op))
prob.setLinkedOperatorType(&op, addOpr);
else
prob.setLinkedOperatorType(&op, mcOpr);
}
Note that we do not have to tell the instance about the dependences between
the operations in this simple example because the problem model automatically
includes the SSA def-use-edges maintained by MLIR. However, we often have to
consider additional dependences that are not represented by value flow, such as
memory dependences. For these situations, so-called auxiliary
dependences between operations are inserted explicitly into the problem:
prob.insertDependence(srcOp, destOp)
.
Scheduling
Before we attempt to schedule, we invoke the check()
method, which ensures
that the constructed instance is complete and valid. For example, the check
would capture if we had forgot to set an operator type's latency. We dump the
instance to visualize the dependence graph.
auto checkRes = prob.check();
assert(succeeded(checkRes));
dumpAsDOT(prob, "sched-problem.dot");
We use a simple list scheduler, available via the Algorithms.h
header, to
compute a solution for the instance.
auto schedRes = scheduleASAP(prob);
assert(succeeded(schedRes));
Working with the solution
The solution is now stored in the instance, and we invoke the problem's
verify()
method to ensure that the computed start times adhere to the
precedence constraint we stated earlier, i.e. operations start after their
operands have computed their results. We can also convince ourselves of that by
dumping the instance and inspecting the solution.
auto verifRes = prob.verify();
assert(succeeded(verifRes));
dumpAsDOT(prob, "sched-solution.dot");
To inspect the solution programmatically, we can query the instance in the
following way. Note that by convention, all getters in the problem classes
return Optional<T>
values, but as we have already verified that the start
times for registered operations are set, we can directly dereference the values.
for (auto &op : prob.getOperations())
llvm::dbgs() << *prob.getStartTime(&op) << "\n";
And that's it! For a more practical example, have a look at the
AffineToPipeline
pass.
Extensible problem model
Theory and terminology
Scheduling problems come in many flavors and variants in the context of hardware design. In order to make the scheduling infrastructure as modular and flexible as CIRCT itself, it is build on the following idea of an extensible problem model:
An instance is comprised of components called operations, dependences and operator types. Operations and dependences form a graph structure and correspond to the source IR to be scheduled. Operator types encode the characteristics of the target IR. The components as well as the instance can be annotated with properties. Properties are either input or solution properties, based on whether they are supplied by the client, or computed by the algorithm. The values of these properties are subject to the input constraints and solution constraints, which are a first-class concern in the model and are intended to be strictly enforced before respectively after scheduling.
Concrete problem definitions derived from this model share the same representation of the components, but differ in their sets of properties (and potentially distinction of input and solution properties) and input and solution constraints. Hence, we tie together properties and constraints to model a specific scheduling problem. Extending one (or more!) parent problems means inheriting or adding properties, and redefining the constraints (as these don't always compose automatically).
A key benefit of this approach is that these problem definitions provide a reliable contract between the clients and algorithms, making it clear which information needs to be provided, and what kind of solution is to be expected. Clients can therefore choose a problem definition that fits their needs, and algorithms can opt-in to accepting a specific subset of problems, which they can solve efficiently. Extensibility is ensured because new problem definitions can be added to the infrastructure (or inside a specific lowering flow, or even out-of-tree) without adapting any existing users.
Implementation
See Problems.h / Problems.cpp.
Problem definitions
The Problem
class is currently the base of the problem hierarchy. Several
extended problems are currently defined via
virtual multiple inheritance. Upon construction, a containingOp
is passed to
instances. This MLIR operation is currently only used to emit diagnostics, and
has no semantic meaning beyond that.
Components
The infrastructure uses the following representation of the problem components.
Operations are just mlir::Operation *
s.
We distinguish two kinds of dependences, def-use and auxiliary. Def-use
dependences are part of the SSA graph maintained by MLIR, and can distinguish
specific result and operand numbers. As we expect any relevant graph-like input
IR to use this MLIR facility, instances automatically consider these edges
between registered operations. Auxiliary dependences, in contrast, only specify
a source and destination operation, and have to be explicitly added to the
instance by the client, e.g. for control or memory dependences. The
detail::Dependence
class abstracts the differences between both kinds, in
order to offer a uniform API to iterate over dependences and query their
properties.
Lastly, operator types are identified by mlir::StringAttr
s, in order to give
clients maximum flexibility in modeling their operator library. This may change
in the future, when a CIRCT-wide concept to model physical properties of
hardware emerges.
Properties
Properties can involve arbitrary data types, as long as these can be stored in
maps. Problem classes offer public getter and setter methods to access a given
components properties. Getters return optional values, in order to indicate if a
property is unset. For example, the signature of the method the queries the
computed start time is Optional<unsigned> getStartTime(Operation *op)
.
Constraints
Clients call the virtual Problem::check()
method to test any input
constraints, and Problem::verify()
to test the solution constraints. Problem
classes are expected to override them as needed. There are no further
restrictions of how these methods are implemented, but it is recommended to
introduce helper methods that test a specific aspect and can be reused in
extended problems. In addition, it makes sense to check/verify the properties in
an order that avoids redundant tests for the presence of a particular property
as well as redundant iteration over the problem components.
Available problem definitions
See the linked Doxygen docs for more details.
- Problem: A basic, acyclic problem at the root of the problem hierarchy. Operations are linked to operator types, which have integer latencies. The solution comprises integer start times adhering to the precedence constraints implied by the dependences.
- CyclicProblem:
Cyclic extension of
Problem
. Its solution solution can be used to construct a pipelined datapath with a fixed, integer initiation interval, in which the execution of multiple iterations/samples/etc. may overlap. Operator types are assumed to be fully pipelined. - SharedOperatorsProblem: A resource-constrained scheduling problem that corresponds to multiplexing multiple operations onto a pre-allocated number of fully pipelined operator instances.
- ModuloProblem: Models an HLS classic: Pipeline scheduling with limited resources.
- ChainingProblem:
Extends
Problem
to consider the accumulation of physical propagation delays on combinational paths along SSA dependences. - ChainingCyclicProblem:
Extends
ChainingProblem
andCyclicProblem
to consider the accumulation of physical propagation delays on combinational paths along SSA dependences on a cyclic scheduling problem. Note that the problem does not model propagation delays along inter-iteration dependences. These are commonly represented as auxiliary dependences, which are already excluded in the parent ChainingProblem. In addition, the ChainingCyclicProblem explicitly prohibits the use of def-use dependences with a non-zero distance.
NB: The classes listed above each model a trait-like aspect of scheduling.
These can be used as-is, but are also intended for mixing and matching, even
though we currently do not provide definitions for all possible combinations in
order not to pollute the infrastructure. For example, the ChainingProblem
may
be of limited use standalone, but can serve as a parent class for a future
chaining-enabled modulo scheduling problem.
Available schedulers
- ASAP list scheduler
(
ASAPScheduler.cpp
): Solves the basicProblem
with a worklist algorithm. This is mostly a problem-API demo from the viewpoint of an algorithm implementation. - Linear programming-based schedulers
(
SimplexSchedulers.cpp
): SolvesProblem
,CyclicProblem
andChainingProblem
optimally, andSharedOperatorsProblem
/ModuloProblem
with simple (not state-of-the-art!) heuristics. This family of schedulers shares a tailored implementation of the simplex algorithm, as proposed by de Dinechin. See the sources for more details and literature references. - Integer linear programming-based scheduler
(
LPSchedulers.cpp
): Demo implementation for using an ILP solver via the OR-Tools integration.
Utilities
See
Utilities.h
:
- Topological graph traversal
- DFA to compute combinational path delays
- DOT dump
Adding a new problem
See e.g. #2233, which added the
ChainingProblem
.
- Decide where to add it. Guideline: If it is trait-like and similar to the
existing problem mentioned above, add it to
Problems.h
. If the model is specific to your use-case, it is best to start out in locally in your dialect/pass. - Declare the new problem class and inherit virtually from the relevant
superclasses (at least
Problem
). - Define additional properties (private), and the corresponding public
getters/setters. Getters return
Optional<T>
values, to indicate an unset state.- Note that dependence properties are somewhat expensive to store, making it
desirable that clients and algorithms expect and handle the unset state.
This should be clearly documented. Example:
distance
property inCyclicProblem
.
- Note that dependence properties are somewhat expensive to store, making it
desirable that clients and algorithms expect and handle the unset state.
This should be clearly documented. Example:
- Redefine the
getProperties(*)
methods to get dumping support. These should consider any properties the new class adds, plus properties defined in the superclass(es). - Redefine
check()
(input constraints) andverify()
(solution constraints). If possible, follow the design used in the existing problem classes.
Testing
Please extend the SSP dialect to enable testing of the new problem definition.
- If the problem defines any new properties, add them to
SSPAttributes.td
. - Instantiate the
Default<ProblemT>
template for the new problem. - Handle the problem class in the
-ssp-roundtrip
pass. - Write a couple of "positive" testcases, as well as at least one error test for
each input/solution constraint, as validated by
check()
/verify()
. See the existing test cases for inspiration.
Adding a new scheduler
See e.g. #2650, which added a
scheduler for the CyclicProblem
.
- Schedulers should opt-in to specific problems by providing entry points for
the problem subclasses they support. Example:
LogicalResult awesomeScheduler(Problem &prob); LogicalResult awesomeScheduler(CyclicProblem &prob);
- Schedulers can expect that the input invariants were enforced by a
check()
-call in the client, and must compute a solution that complies with the solution constraints when the client calls the problem'sverify()
method. - Schedulers can live anywhere. If a new algorithm is not entirely
dialect/pass-specific and supports problems defined in
Problems.h
, it should offer entry points inAlgorithms.h
. - Objectives are not part of the problem signature. Therefore, if an algorithm supports optimizing for different objectives, clients should be able to select one via the entry point(s).
Testing
- To enable testing, add the new scheduler to the
-ssp-schedule
pass, and invoke it from the test cases for the supported problems (example). - If the algorithm may fail in certain situations (e.g., "linear program is infeasible"), add suitable error tests as well.