llvm-project/mlir/docs/Bufferization.md

14 KiB

Bufferization

[TOC]

Overview

Bufferization in MLIR is the process of converting the tensor type to the memref type. MLIR provides a composable system that allows dialects to systematically bufferize a program. This system is a simple application of MLIR's dialect conversion infrastructure. The bulk of the code related to bufferization is a set of ordinary ConversionPattern's that dialect authors write for converting ops that operate on tensor's to ops that operate on memref's. A set of conventions and best practices are followed that allow these patterns to be run across multiple independent passes (rather than requiring a single huge atomic conversion pass), which makes the compilation pipelines scalable, robust, and easy to debug.

This document is targeted at people looking to utilize MLIR's bufferization functionality, along with people who want to extend it to cover their own ops.

NOTE: Before reading this document, please watch the talk "Type Conversions the Not-So-Hard-Way: MLIR's New Bufferization Infrastructure" (slides, recording). That talk gives a high-level overview of the bufferization infrastructure and important conceptual details related to using the MLIR dialect conversion infrastructure.

Bufferization's place in a compilation pipeline

Bufferization itself does not free any of the buffers that have been allocated, nor does it do anything particularly intelligent with the placement of buffers w.r.t. control flow. Thus, a realistic compilation pipeline will usually consist of:

  1. Bufferization
  2. Buffer optimizations such as buffer-hoisting, buffer-loop-hoisting, and promote-buffers-to-stack, which do optimizations that are only exposed after bufferization.
  3. Finally, running the buffer deallocation pass.

After buffer deallocation has been completed, the program will be quite difficult to transform due to the presence of the deallocation ops. Thus, other optimizations such as linalg fusion on memrefs should be done before that stage.

General structure of the bufferization process

Bufferization consists of running multiple partial bufferization passes, followed by one finalizing bufferization pass.

There is typically one partial bufferization pass per dialect (though other subdivisions are possible). For example, for a dialect X there will typically be a pass X-bufferize that knows how to bufferize all the ops in that dialect. By running pass X-bufferize for each dialect X in the program, all the ops in the program are incrementally bufferized.

Partial bufferization passes create programs where only some ops have been bufferized. These passes will create materializations (also sometimes called "casts") that convert between the tensor and memref type, which allows bridging between ops that have been bufferized and ops that have not yet been bufferized.

Finalizing bufferizations complete the bufferization process, and guarantee that there are no tensors remaining in the program. This involves eliminating the materializations. The pass finalizing-bufferize provides a minimal pass that only eliminates materializations and issues an error if any unbufferized ops exist in the program.

However, it is possible for a finalizing bufferization to do more than just eliminate materializations. By adding patterns (just as a partial bufferization would), it is possible for a finalizing bufferization pass to simultaneously bufferize ops and eliminate materializations. This has a number of disadvantages discussed in the talk and should generally be avoided.

Example

As a concrete example, we will look at the bufferization pipeline from the mlir-npcomp reference backend (code). The code, slightly simplified and annotated, is reproduced here:

  // Partial bufferization passes.
  pm.addPass(createTensorConstantBufferizePass());
  pm.addNestedPass<FuncOp>(createTCPBufferizePass()); // Bufferizes the downstream `tcp` dialect.
  pm.addNestedPass<FuncOp>(createSCFBufferizePass());
  pm.addNestedPass<FuncOp>(createLinalgBufferizePass());
  pm.addNestedPass<FuncOp>(createStdBufferizePass());
  pm.addNestedPass<FuncOp>(createTensorBufferizePass());
  pm.addPass(createFuncBufferizePass());

  // Finalizing bufferization pass.
  pm.addNestedPass<FuncOp>(createFinalizingBufferizePass());

Looking first at the partial bufferization passes, we see that there are a sequence of FuncOp passes (which run in parallel on functions). These function passes are bracketed by tensor-constant-bufferize and func-bufferize, which are module passes (and thus serialize the parallel compilation process). These two passes must be module passes because they make changes to the top-level module.

The bulk of the bufferization work is done by the function passes. Most of these passes are provided as part of the upstream MLIR distribution and bufferize their respective dialects (e.g. scf-bufferize bufferizes the scf dialect). The tcp-bufferize pass is an exception -- it is a partial bufferization pass used to bufferize the downstream tcp dialect, and fits in perfectly with all the other passes provided upstream.

The last pass is the finalizing bufferization pass. The mlir-npcomp reference backend has arranged that all ops are bufferized by partial bufferizations, so that the upstream finalizing-bufferize pass can be used as the finalizing bufferization pass. This gives excellent diagnostics when something goes wrong with the bufferization process, such as due to an op that wasn't handled by any pattern.

How to write a partial bufferization pass

The contract of a partial bufferization pass is that a subset of ops (or kinds of ops, customizable by a ConversionTarget) get bufferized.

A partial bufferization pass is just a pass that uses the dialect conversion framework to apply ConversionPatterns with a tensor to memref type conversion.

To describe how to write such a pass, we will walk through an example, the tensor-bufferize pass (code, test) that bufferizes the tensor dialect.

The bulk of the code in the pass will be a set of conversion patterns, with a simple example being BufferizeCastOp).

class BufferizeCastOp : public OpConversionPattern<tensor::CastOp> {
public:
  using OpConversionPattern::OpConversionPattern;
  LogicalResult
  matchAndRewrite(tensor::CastOp op, ArrayRef<Value> operands,
                  ConversionPatternRewriter &rewriter) const override {
    auto resultType = getTypeConverter()->convertType(op.getType());
    rewriter.replaceOpWithNewOp<MemRefCastOp>(op, resultType, operands[0]);
    return success();
  }
};

See the talk for more details on how to write these patterns.

The pass itself is very small, and follows the basic pattern of any dialect conversion pass.

void mlir::populateTensorBufferizePatterns(
    MLIRContext *context, BufferizeTypeConverter &typeConverter,
    OwningRewritePatternList &patterns) {
  patterns.insert<BufferizeCastOp, BufferizeExtractOp>(typeConverter, context);
}

struct TensorBufferizePass : public TensorBufferizeBase<TensorBufferizePass> {
  void runOnFunction() override {
    auto *context = &getContext();
    BufferizeTypeConverter typeConverter;
    OwningRewritePatternList patterns;
    ConversionTarget target(*context);

    populateTensorBufferizePatterns(context, typeConverter, patterns);
    target.addIllegalOp<tensor::CastOp, tensor::ExtractOp>();
    target.addLegalDialect<StandardOpsDialect>();

    if (failed(
            applyPartialConversion(getFunction(), target, std::move(patterns))))
      signalPassFailure();
  }
};

The pass has all the hallmarks of a dialect conversion pass that does type conversions: a TypeConverter, a OwningRewritePatternList, and a ConversionTarget, and a call to applyPartialConversion. Note that a function populateTensorBufferizePatterns is separated, so that power users can use the patterns independently, if necessary (such as to combine multiple sets of conversion patterns into a single conversion call, for performance).

One convenient utility provided by the MLIR bufferization infrastructure is the BufferizeTypeConverter, which comes pre-loaded with the necessary conversions and materializations between tensor and memref.

In this case, the StandardOpsDialect is marked as legal, so the tensor_load and tensor_to_memref ops, which are inserted automatically by the dialect conversion framework as materializations, are legal. There is a helper populateBufferizeMaterializationLegality (code) which helps with this in general.

Other partial bufferization examples

  • linalg-bufferize (code, test)

    • Bufferizes the linalg dialect.
    • This is an example of how to simultaneously bufferize all the ops that satisfy a certain OpInterface with a single pattern. Specifically, BufferizeAnyLinalgOp (code) bufferizes any ops that implements the LinalgOp interface.
  • scf-bufferize (code, test)

    • Bufferizes ops from the scf dialect.
    • This is an example of how to bufferize ops that implement RegionBranchOpInterface (that is, they use regions to represent control flow).
    • The bulk of the work is done by lib/Dialect/SCF/Transforms/StructuralTypeConversions.cpp (code), which is well-commented and covers how to correctly convert ops that contain regions.
  • func-bufferize (code, test)

    • Bufferizes func, call, and BranchOpInterface ops.
    • This is an example of how to bufferize ops that have multi-block regions.
    • This is an example of a pass that is not split along dialect subdivisions.
  • tensor-constant-bufferize (code, test)

    • Bufferizes only std.constant ops of tensor type.
    • This is an example of setting up the legality so that only a subset of std.constant ops get bufferized.
    • This is an example of a pass that is not split along dialect subdivisions.

How to write a finalizing bufferization pass

The contract of a finalizing bufferization pass is that all tensors are gone from the program.

The easiest way to write a finalizing bufferize pass is to not write one at all! MLIR provides a pass finalizing-bufferize which eliminates the tensor_load / tensor_to_memref materialization ops inserted by partial bufferization passes and emits an error if that is not sufficient to remove all tensors from the program.

This pass is sufficient when partial bufferization passes have bufferized all the ops in the program, leaving behind only the materializations. When possible, it is recommended to structure your pass pipeline this way, as this has the significant advantage that if an op does not get bufferized (due to a missing pattern, bug in the code, etc.), finalizing-bufferize will emit a nice clean error, and the IR seen by finalizing-bufferize will only contain only one unbufferized op.

However, before the current bufferization infrastructure was put in place, bufferization could only be done as a single finalizing bufferization mega-pass that used the populate*BufferizePatterns functions from multiple dialects to simultaneously bufferize everything at once. Thus, one might see code in downstream projects structured this way. This structure is not recommended in new code. A helper, populateEliminateBufferizeMaterializationsPatterns (code) is available for such passes to provide patterns that eliminate tensor_load and tensor_to_memref.

Changes since the talk

  • func-bufferize was changed to be a partial conversion pass, and there is a new finalizing-bufferize which serves as a general finalizing bufferization pass.