This patch is the third in a series of patches fixing markdown links and references inside the mlir documentation.
This patch addresses all broken references to other markdown files and sections inside the Tutorials folder.
Differential Revision: https://reviews.llvm.org/D103017
Update the paragraph on generic / indexed_generic to reflect the unification of these operations.
Differential Revision: https://reviews.llvm.org/D102775
Removed some of the older raw "MLIRized" versions that are
no longer needed now that the sparse runtime support library
can focus on the proper sparse tensor types rather than the
opague pointer approach of the past. This avoids legacy...
Reviewed By: penpornk
Differential Revision: https://reviews.llvm.org/D102960
A test in ir.c makes use of casting a void* to an integer type to print it's address. This cast is currently done with the datatype `long` however, which is only guaranteed to be equal to the pointer width on LP64 system. Other platforms may use a length not equal to the pointer width. 64bit Windows as an example uses 32 bit for `long` which does not match the 64 bit pointers.
This also results in clang warning due to `-Wvoid-pointer-to-int-cast`.
Technically speaking, since the test only passes the value 42, it does not cause any issues, but it'd be nice to fix the warning at least.
Differential Revision: https://reviews.llvm.org/D103085
This patch is the first in a series of patches fixing markdown links and references inside the mlir documentation. I chose to split it in a few reviews to be able to iterate quicker and to ease review.
This patch addresses all broken references to other markdown files and sections inside the Dialects folder.
One change that was also done was to insert '/' between the markdown files and section:
Example:
Builtin.md#integertype
was changed to:
Builtin.md/#integertype
After compilation, hugo then translates the later to jump directly to the integer type section, but not the former. Not inserting the slash would simply jump to just the Builtin page, instead of the integertype section. I therefore changed occurrences of the former version to the later as well.
Differential Revision: https://reviews.llvm.org/D103011
This patch is the second in a series of patches fixing markdown links and references inside the mlir documentation.
This patch addresses all broken references to other markdown files and sections inside the Rationale folder.
In addition to fixing the links and references like in the previous patch, I also changed references which are URLs to the mlir.llvm.org/docs website, to proper relative markdown references instead.
Differential Revision: https://reviews.llvm.org/D103013
Disallow transfer ops that change the element type of the transfer. Such transfers could be supported in the future, if needed.
Differential Revision: https://reviews.llvm.org/D102746
Prevent users of `iter_args` of an affine for loop from being hoisted
out of it. Otherwise, LICM leads to a violation of the SSA dominance
(as demonstrated in the added test case).
Fixes: https://bugs.llvm.org/show_bug.cgi?id=50103
Reviewed By: bondhugula, ayzhuang
Differential Revision: https://reviews.llvm.org/D102984
This previously handled memref::SubviewOp, but this can be extended to
all ops implementing the interface.
Differential Revision: https://reviews.llvm.org/D103076
Lower a 1D vector transfer op to LLVM if the last dim stride is 1. Also fixes a bug in the original unit stride computation.
Differential Revision: https://reviews.llvm.org/D102897
We are currently explicitly setting the flag solely based on the value of `-verify`, which ends up ignoring the situation where the user explicitly disabled this option from the command line.
Differential Revision: https://reviews.llvm.org/D102952
This exposes the iterations and top-down processing as flags, and also
allows controlling whether region simplification is desirable for a client.
This allows deleting some duplicated entrypoints to
applyPatternsAndFoldGreedily.
This also deletes the Constant Preprocessing pass, which isn't worth it
on balance.
All defaults are all kept the same, so no one should see a behavior change.
Differential Revision: https://reviews.llvm.org/D102988
This is the fourth and final patch in a series of patches fixing markdown links and references inside the mlir documentation. This patch combined with the other three should fix almost every broken link on mlir.llvm.org as far as I can tell.
This patch in particular addresses all Markdown files in the top level docs directory.
Differential Revision: https://reviews.llvm.org/D103032
Deconstrains several TOSA operators to align with the current TOSA spec, including all the elementwise ops.
Note: some more ops are under consideration for further cleanup; they will follow once the spec has been updated.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D102958
Drop old cmake variable names that were kept around so that zorg
buildbot could be migrated, which has now happened (D102977). D102976
had fixed the inconsistent names.
Differential Revision: https://reviews.llvm.org/D102997
Fix inconsistent MLIR CMake variable names. Consistently name them as
MLIR_ENABLE_<feature>.
Eg: MLIR_CUDA_RUNNER_ENABLED -> MLIR_ENABLE_CUDA_RUNNER
MLIR follows (or has mostly followed) the convention of naming
cmake enabling variables in the from MLIR_ENABLE_... etc. Using a
convention here is easy and also important for convenience. A counter
pattern was started with variables named MLIR_..._ENABLED. This led to a
sequence of related counter patterns: MLIR_CUDA_RUNNER_ENABLED,
MLIR_ROCM_RUNNER_ENABLED, etc.. From a naming standpoint, the imperative
form is more meaningful. Additional discussion at:
https://llvm.discourse.group/t/mlir-cmake-enable-variable-naming-convention/3520
Switch all inconsistent ones to the ENABLE form. Keep the couple of old
mappings needed until buildbot config is migrated.
Differential Revision: https://reviews.llvm.org/D102976
Steps for normalizing dynamic memrefs for tiled layout map
1. Check if original map is tiled layout. Only tiled layout is supported.
2. Create normalized memrefType. Dimensions that include dynamic dimensions
in the map output will be dynamic dimensions.
3. Create new maps to calculate each dimension size of new memref.
In tiled layout, the dimension size can be calculated by replacing
"floordiv <tile size>" with "ceildiv <tile size>" and
"mod <tile size>" with "<tile size>".
4. Create AffineApplyOp to apply the new maps. The output of AffineApplyOp is
dynamicSizes for new AllocOp.
5. Add the new dynamic sizes in new AllocOp.
This patch also set MemRefsNormalizable trant in CastOp and DimOp since
they used with dynamic memrefs.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D97655
This makes it possible for targets to define their own MCObjectFileInfo.
This MCObjectFileInfo is then used to determine things like section alignment.
This is a follow up to D101462 and prepares for the RISCV backend defining the
text section alignment depending on the enabled extensions.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D101921
This adds the ability to specify a location when creating BlockArguments.
Notably Value::getLoc() will return this correctly, which makes diagnostics
more precise (e.g. the example in test-legalize-type-conversion.mlir).
This is currently optional to avoid breaking any existing code - if
absent, the BlockArgument defaults to using the location of its enclosing
operation (preserving existing behavior).
The bulk of this change is plumbing location tracking through the parser
and printer to make sure it can round trip (in -mlir-print-debuginfo
mode). This is complete for generic operations, but requires manual
adoption for custom ops.
I added support for function-like ops to round trip their argument
locations - they print correctly, but when parsing the locations are
dropped on the floor. I intend to fix this, but it will require more
invasive plumbing through "function_like_impl" stuff so I think it
best to split it out to its own patch.
This is a reapply of the patch here: https://reviews.llvm.org/D102567
with an additional change: we now never defer block argument locations,
guaranteeing that we can round trip correctly.
This isn't required in all cases, but allows us to hill climb here and
works around unrelated bugs like https://bugs.llvm.org/show_bug.cgi?id=50451
Differential Revision: https://reviews.llvm.org/D102991
All lines after the first are currently indented by one char further to the left than the first line. This leads to the first character of each sentence being cut from the resulting Markdown file after compilation. The text also contains 3 references to sections of other markdown files. One was missing the file, while the other two had outdated files, leading to 404 errors in the documentation.
Differential Revision: https://reviews.llvm.org/D102983
Add a test case to test the complete execution of WMMA ops on a Nvidia
GPU with tensor cores. These tests are enabled under
MLIR_RUN_CUDA_TENSOR_CORE_TESTS.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D95334
These pass documents belong on the main pass page, and not generated as
top level pages.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D102947
This revision completes the "dimension ordering" feature
of sparse tensor types that enables the programmer to
define a preferred order on dimension access (other than
the default left-to-right order). This enables e.g. selection
of column-major over row-major storage for sparse matrices,
but generalized to any rank, as in:
dimOrdering = affine_map<(i,j,k,l,m,n,o,p) -> (p,o,j,k,i,l,m,n)>
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D102856
The previous implementation did not handle casting behavior properly and
did not consider aliases.
Differential Revision: https://reviews.llvm.org/D102785
This pattern inlines operands to a linalg.generic operation that use a constant
index and hence are loop-invariant scalars. This reduces the number of
linalg.generic operands and unlocks some canonicalizations that rely on seeing
an explicit tensor.extract.
Differential Revision: https://reviews.llvm.org/D102682
I noticed while packaging mlir that most mlir library names start
with `libMLIR`. The only different libary was `libMlirLspServerLib.a`.
That's why I changed the library to be similarly named to the others.
Differential Revision: https://reviews.llvm.org/D102881
This adds the straightforward conversion for EqualOp
(two complex numbers are equal if both the real and the imaginary part are equal).
Differential Revision: https://reviews.llvm.org/D102840
The current implementation has several key limitations and weirdness, e.g local reproducers don't support dynamic pass pipelines, error messages don't include the passes that failed, etc. This revision refactors the implementation to support more use cases, and also be much cleaner.
The main change in this revision, aside from moving the implementation out of Pass.cpp and into its own file, is the addition of a crash recovery pass instrumentation. For local reproducers, this instrumentation handles setting up the recovery context before executing each pass. For global reproducers, the instrumentation is used to provide a more detailed error message, containing information about which passes are running and on which operations.
Example of new message:
```
error: Failures have been detected while processing an MLIR pass pipeline
note: Pipeline failed while executing [`TestCrashRecoveryPass` on 'module' operation: @foo]: reproducer generated at `crash-recovery.mlir.tmp`
```
Differential Revision: https://reviews.llvm.org/D101854
This flag will print the IR after a pass only in the case where the pass failed. This can be useful to more easily view the invalid IR, without needing to print after every pass in the pipeline.
Differential Revision: https://reviews.llvm.org/D101853
Skip the sparsification pass for Linalg ops without annotated tensors
(or cases that are not properly handled yet).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D102787
Also, fix a small typo where the "unsigned" splat variants were not
being created with an unsigned type.
Differential Revision: https://reviews.llvm.org/D102797
Reland Note: This was accidentally reverted in 80d981eda6, but is an important improvement even outside of the driving motivator in D102567.
We currently use SourceMgr::getLineAndColumn to get the line and column for an SMLoc, but this includes a call to StringRef::find_last_of that ends up dominating compile time. In D102567, we start creating locations from the input file for block arguments which resulted in an extreme performance regression for modules with very large amounts of block arguments. This revision switches to just using a pointer offset from the beginning of the line to calculate the column(all MLIR files are simple ascii), resulting in a compile time reduction from 4700 seconds (1 hour and 18 minutes) to 8 seconds.
vector.transfer_read and vector.transfer_write operations are converted
to llvm intrinsics with specific alignment information, however there
doesn't seem to be a way in llvm to take information from llvm.assume
intrinsics and change this alignment information. In any
event, due the to the structure of the llvm.assume instrinsic, applying
this information at the llvm level is more cumbersome. Instead, let's
generate the masked vector load and store instrinsic with the right
alignment information from MLIR in the first place. Since
we're bothering to do this, lets just emit the proper alignment for
loads, stores, scatter, and gather ops too.
Differential Revision: https://reviews.llvm.org/D100444
The patch extends the yaml code generation to support the following new OpDSL constructs:
- captures
- constants
- iteration index accesses
- predefined types
These changes have been introduced by revision
https://reviews.llvm.org/D101364.
Differential Revision: https://reviews.llvm.org/D102075
VectorTransferPermutationMapLoweringPatterns can be enabled via a pass option. These additional patterns lower permutation maps to minor identity maps with broadcasting, if possible, allowing for more efficient vector load/stores. The option is deactivated by default.
Differential Revision: https://reviews.llvm.org/D102593
LinalgOps that are all parallel do not use the value of `outs`
tensor. The semantics is that the `outs` tensor is fully
overwritten. Using anything other than `init_tensor` can add false
dependencies between operations, when the use is just for the shape of
the tensor. Adding a canonicalization to always use `init_tensor` in
such cases, breaks this dependence.
Differential Revision: https://reviews.llvm.org/D102561
Original interfaces are not safe to be called during dialect conversion.
This is because some ops (e.g. `dynamic_reshape(input, target_shape)`)
depend on the values of their operands to calculate the output shape.
However the operands may be out of reach during dialect conversion (e.g.
converting from tensor world to buffer world). This patch provides a new
kind of interface which accpets user-provided operands to solve this
problem.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D102317
"[mlir] Speed up Lexer::getEncodedSourceLocation"
This reverts commit 3043be9d2d and commit
861d69a525.
This change resulted in printing textual MLIR that can't be parsed; see
review thread https://reviews.llvm.org/D102567 for details.
At present, a lot of code contains main function bodies like "return failed(mlir::MlirOptMain(...);". This is unfortunate for two reasons: a) it uses ADL, which is maybe not what the free "failed" function was designed for; and b) it is a bit awkward to read, requring the reader to both understand the boolean nature of the value and the semantics of main's return value. (And it's also not portable, since 1 is not a portable success value.)
The replacement code, `return mlir::AsMainReturnCode(mlir::MlirOptMain(...))` is a bit more self-explanatory.
The change applies the new function to a few internal uses of MlirOptMain, too.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D102641
We currently use SourceMgr::getLineAndColumn to get the line and column for an SMLoc, but this includes a call to StringRef::find_last_of that ends up dominating compile time. In D102567, we start creating locations from the input file for block arguments which resulted in an extreme performance regression for modules with very large amounts of block arguments. This revision switches to just using a pointer offset from the beginning of the line to calculate the column(all MLIR files are simple ascii), resulting in a compile time reduction from 4700 seconds (1 hour and 18 minutes) to 8 seconds.
Differential Revision: https://reviews.llvm.org/D102734
This is a hook that allows for providing custom initialization of the pattern, e.g. if it has bounded recursion, setting the debug name, etc., without needing to define a custom constructor. A non-virtual hook was chosen to avoid polluting the vtable with code that we really just want to be inlined when constructing the pattern. The alternative to this would be to just define a constructor for each pattern, this unfortunately creates a lot of otherwise unnecessary boiler plate for a lot of patterns and a hook provides a much simpler/cleaner interface for the very common case.
Differential Revision: https://reviews.llvm.org/D102440
We currently do not document how the pattern rewriter infra treats recursion when it gets detected. This revision adds a blurb on recursion in patterns, and how patterns can signal that they are equipped to handle it.
Differential Revision: https://reviews.llvm.org/D102439
The version is used by LSP clients to ignore stale diagnostics, and can be used in a followup to help verify incremental changes.
Differential Revision: https://reviews.llvm.org/D102644
The FIRRTL dialect in CIRCT uses inherently signful types, and APSInt
is the best way to model that. Add a couple of helpers that make it
easier to work with an IntegerAttr that carries a sign.
This follows the example of getZExt() and getSExt() which assert when
the underlying type of the attribute is unexpected. In this case
we assert fail when the underlying type of the attribute is signless.
This is strictly additive, so it is NFC. It is tested in the CIRCT
repo.
Differential Revision: https://reviews.llvm.org/D102701
This adds the ability to specify a location when creating BlockArguments.
Notably Value::getLoc() will return this correctly, which makes diagnostics
more precise (e.g. the example in test-legalize-type-conversion.mlir).
This is currently optional to avoid breaking any existing code - if
absent, the BlockArgument defaults to using the location of its enclosing
operation (preserving existing behavior).
The bulk of this change is plumbing location tracking through the parser
and printer to make sure it can round trip (in -mlir-print-debuginfo
mode). This is complete for generic operations, but requires manual
adoption for custom ops.
I added support for function-like ops to round trip their argument
locations - they print correctly, but when parsing the locations are
dropped on the floor. I intend to fix this, but it will require more
invasive plumbing through "function_like_impl" stuff so I think it
best to split it out to its own patch.
Differential Revision: https://reviews.llvm.org/D102567
During affine loop fusion, create private memrefs for escaping memrefs
too under the conditions that:
-- the source is not removed after fusion, and
-- the destination does not write to the memref.
This creates more fusion opportunities as illustrated in the test case.
Reviewed By: bondhugula, ayzhuang
Differential Revision: https://reviews.llvm.org/D102604
Comment was poorly written. Changed to bail on contradictory information in
the double round.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D102651
- Enables inferring return type for ConstShape, takes into account valid return types;
- The compatible return type function could be reused, leaving that for next use refactoring;
Differential Revision: https://reviews.llvm.org/D102182
The experimental flag for "inplace" bufferization in the sparse
compiler can be replaced with the new inplace attribute. This gives
a uniform way of expressing the more efficient way of bufferization.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D102538
This change makes the conversion of an mlir::OpState to bool `explicit`. Idiomatic boolean uses continue to work as before, but questionable implicit uses (e.g. accumulating over a range of OpStates to count "true" states) become ill-formed. This makes the class interface a lilttle less error-prone.
I tested this change on our internal (fairly large) codebase, and only one fix was needed, which was ultimately an improvement of the affected code.
Reviewed By: rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D101989
Initial version of pooling assumed normalization was accross all elements
equally. TOSA actually requires the noramalization is perform by how
many elements were summed (edges are not artifically dimmer). Updated
the lowering to reflect this change with corresponding tests.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D102540
Translate ExitDataOp with delete and copyout operands to runtime call.
This is done in a similar way as D101504.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D102381
Broadcast dimensions of vector transfer ops are always in-bounds. This is consistent with the fact that the starting position of a transfer is always in-bounds.
Differential Revision: https://reviews.llvm.org/D102566
This brings it in line with the bultin unrealized_conversion_cast,
which memref.buffer_cast is a specialized version of.
Differential Revision: https://reviews.llvm.org/D102608
At the moment `MlirModule`s can be converted to `MlirOperation`s, but not
the other way around (at least not without going around the C API). This
makes it impossible to e.g. run passes over a `ModuleOp` created through
`mlirOperationCreate`.
Reviewed By: nicolasvasilache, mehdi_amini
Differential Revision: https://reviews.llvm.org/D102497
Splitting the memref dialect lead to an introduction of several dependencies
to avoid compilation issues. The canonicalize pass also depends on the
memref dialect, but it shouldn't. This patch resolves the dependencies
and the unintuitive includes are removed. However, the dependency moves
to the constructor of the std dialect.
Differential Revision: https://reviews.llvm.org/D102060
Replace the templated linalgLowerOpToLoops method by three specialized methods linalgOpToLoops, LinalgOpToParallelLoops, and linalgOpToAffineLoops.
Differential Revision: https://reviews.llvm.org/D102324
Add TransferWritePermutationLowering, which replaces permutation maps of TransferWriteOps with vector.transpose.
Differential Revision: https://reviews.llvm.org/D102548
Provide an option to specify optimization level when creating an
ExecutionEngine via the MLIR JIT Python binding. Not only is the
specified optimization level used for code generation, but all LLVM
optimization passes at the optimization level are also run prior to
machine code generation (akin to the mlir-cpu-runner tool).
Default opt level continues to remain at level two (-O2).
Contributions in part from Prashant Kumar <prashantk@polymagelabs.com>
as well.
Differential Revision: https://reviews.llvm.org/D102551
We are moving from just dense/compressed to more general dim level
types, so we need more than just an "i1" array for annotations.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D102520
This change allows the SRC and DST of dma_start operations to be located in the
same memory space. This applies to both the Affine dialect and Memref dialect
versions of these Ops. The documention has been updated to reflect this by
explicitly stating overlapping memory locations are not supported (undefined
behavior).
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D102274
test/lib/Transforms/ has bitrot and become somewhat of a dumping grounds for testing pretty much any part of the project. This revision cleans this up, and moves the files within to a directory that reflects what is actually being tested.
Differential Revision: https://reviews.llvm.org/D102456
Group functions/structs in namespaces for better code readability.
Depends On D102123
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D102124
Make "target rank" a pass option of VectorToSCF.
Depends On D102101
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D102123
Lowering div elementwise op to the linalg dialect. Since tosa only supports integer division, that is the only version that is currently implemented.
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D102430
This covers the extremely common case of replacing all uses of a Value
with a new op that is itself a user of the original Value.
This should also be a little bit more efficient than the
`SmallPtrSet<Operation *, 1>{op}` idiom that was being used before.
Differential Revision: https://reviews.llvm.org/D102373
Support OpImageQuerySize in spirv dialect
co-authored-by: Alan Liu <alanliu.yf@gmail.com>
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D102029
Create a copy of vector-to-loops.mlir and adapt the test for
ProgressiveVectorToSCF. Fix a small bug in getExtractOp() triggered by
this test.
Differential Revision: https://reviews.llvm.org/D102388
Do not rely on pass labels to detect if the pattern was already applied in the past (which allows for more some extra optimizations to avoid extra InsertOps and ExtractOps). Instead, check if these optimizations can be applied on-the-fly.
This also fixes a bug, where vector.insert and vector.extract ops sometimes disappeared in the middle of the pass because they get folded away, but the next application of the pattern expected them to be there.
Differential Revision: https://reviews.llvm.org/D102206
Rounding to integers requires rounding (for floating points) and clipping
to the min/max values of the destination range. Added this behavior and
updated tests appropriately.
Reviewed By: sjarus, silvas
Differential Revision: https://reviews.llvm.org/D102375
Instead of an SCF for loop, these pattern generate fully unrolled loops with no temporary buffer allocations.
Differential Revision: https://reviews.llvm.org/D101981
Broadcast dimensions of a vector transfer op have no corresponding dimension in the mask vector. E.g., a 2-D TransferReadOp, where one dimension is a broadcast, can have a 1-D `mask` attribute.
This commit also adds a few additional transfer op integration tests for various combinations of broadcasts, masking, dim transposes, etc.
Differential Revision: https://reviews.llvm.org/D101745
Broadcast dimensions of a vector transfer op have no corresponding dimension in the mask vector. E.g., a 2-D TransferReadOp, where one dimension is a broadcast, can have a 1-D `mask` attribute.
This commit also adds a few additional transfer op integration tests for various combinations of broadcasts, masking, dim transposes, etc.
Differential Revision: https://reviews.llvm.org/D101745
First set of "boilerplate" to get sparse tensor
passes available through CAPI and Python.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D102362
This allows for diagnostics emitted during parsing/verification to be surfaced to the user by the language client, as opposed to just being emitted to the logs like they are now.
Differential Revision: https://reviews.llvm.org/D102293
LLVM's build system contains support for configuring a distribution, but
it can often be useful to be able to configure multiple distributions
(e.g. if you want separate distributions for the tools and the
libraries). Add this support to the build system, along with
documentation and usage examples.
Reviewed By: phosek
Differential Revision: https://reviews.llvm.org/D89177
This patch begins to translate acc.enter_data operation to call to tgt runtime call.
It currently only translate create/copyin operands of memref type. This acts as a basis to add support
for FIR types in the Flang/OpenACC support. It follows more or less a similar path than clang
with `omp target enter data map` directives.
This patch is taking a different approach than D100678 and perform a translation to LLVM IR
and make use of the OpenMPIRBuilder instead of doing a conversion to the LLVMIR dialect.
OpenACC support in Flang will rely on the current OpenMP runtime where 1:1 lowering can be
applied. Some extension will be added where features are not available yet.
Big part of this code will be shared for other standalone data operations in the OpenACC
dialect such as acc.exit_data and acc.update.
It is likely that parts of the lowering can also be shared later with the ops for
standalone data directives in the OpenMP dialect when they are introduced.
This is an initial translation and it probably needs more work.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D101504
The current static checker for linalg does not work on the decreasing
index cases well. So, this is to Update the current static bound checker
for linalg to cover decreasing index cases.
Reviewed By: hanchung
Differential Revision: https://reviews.llvm.org/D102302
This factors out the pass timing code into a separate `TimingManager`
that can be plugged into the `PassManager` from the outside. Users are
able to provide their own implementation of this manager, and use it to
time additional code paths outside of the pass manager. Also allows for
multiple `PassManager`s to run and contribute to a single timing report.
More specifically, moves most of the existing infrastructure in
`Pass/PassTiming.cpp` into a new `Support/Timing.cpp` file and adds a
public interface in `Support/Timing.h`. The `PassTiming` instrumentation
becomes a wrapper around the new timing infrastructure which adapts the
instrumentation callbacks to the new timers.
Reviewed By: rriddle, lattner
Differential Revision: https://reviews.llvm.org/D100647
Add a conversion pass to convert higher-level type before translation.
This conversion extract meangingful information and pack it into a struct that
the translation (D101504) will be able to understand.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D102170
DialectAsmParser already allows converting an llvm::SMLoc location to a
mlir::Location location. This commit adds the same functionality to OpAsmParser.
Implementation is copied from DialectAsmParser.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D102165
First step in adding alignment as an attribute to MLIR global definitions. Alignment can be specified for global objects in LLVM IR. It can also be specified as a named attribute in the LLVMIR dialect of MLIR. However, this attribute has no standing and is discarded during translation from MLIR to LLVM IR. This patch does two things: First, it adds the attribute to the syntax of the llvm.mlir.global operation, and by doing this it also adds accessors and verifications. The syntax is "align=XX" (with XX being an integer), placed right after the value of the operation. Second, it allows transforming this operation to and from LLVM IR. It is checked whether the value is an integer power of 2.
Reviewed By: ftynse, mehdi_amini
Differential Revision: https://reviews.llvm.org/D101492
Updated tests to include broadcast of left and right. Includes
bypass if in-type and out-type match shape (no broadcasting).
Differential Revision: https://reviews.llvm.org/D102276
Diagnostics are intended to be read by users, and in most cases displayed in a terminal. When not eliding huge element attributes, in some cases we end up dumping hundreds of megabytes(gigabytes) to the terminal (or logs), completely obfuscating the main diagnostic being shown.
Differential Revision: https://reviews.llvm.org/D102272
This is actually necessary for correctness, as memref.reinterpret_cast
doesn't verify if the output shape doesn't match the static sizes.
Differential Revision: https://reviews.llvm.org/D102232
VectorTransfer split previously only split read xfer ops. This adds
the same logic to write ops. The resulting code involves 2
conditionals for write ops while read ops only needed 1, but the created
ops are built upon the same patterns, so pattern matching/expectations
are all consistent other than in regards to the if/else ops.
Differential Revision: https://reviews.llvm.org/D102157
OpAsmParser (and DialectAsmParser) supports a pair of
parseInteger/parseOptionalInteger methods, which allow parsing a bare
integer into a C type of your choice (e.g. int8_t) using templates. It
was implemented in terms of a virtual method call that is hard coded to
int64_t because "that should be big enough".
Change the virtual method hook to return an APInt instead. This allows
asmparsers for custom ops to parse large integers if they want to, without
changing any of the clients of the fixed size C API.
Differential Revision: https://reviews.llvm.org/D102120
All glue and clutter in the linalg ops has been replaced by proper
sparse tensor type encoding. This code is no longer needed. Thanks
to ntv@ for giving us a temporary home in linalg.
So long, and thanks for all the fish.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D102098
A very elaborate, but also very fun revision because all
puzzle pieces are finally "falling in place".
1. replaces lingalg annotations + flags with proper sparse tensor types
2. add rigorous verification on sparse tensor type and sparse primitives
3. removes glue and clutter on opaque pointers in favor of sparse tensor types
4. migrates all tests to use sparse tensor types
NOTE: next CL will remove *all* obsoleted sparse code in Linalg
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D102095
According to the API contract, LinalgLoopDistributionOptions
expects to work on parallel iterators. When getting processor
information, only loop ranges for parallel dimensions should
be fed in. But right now after generating scf.for loop nests,
we feed in *all* loops, including the ones materialized for
reduction iterators. This can cause unexpected distribution
of reduction dimensions. This commit fixes it.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D102079
* The PybindAdaptors.h file has been evolving across different sub-projects (npcomp, circt) and has been successfully used for out of tree python API interop/extensions and defining custom types.
* Since sparse_tensor.encoding is the first in-tree custom attribute we are supporting, it seemed like the right time to upstream this header and use it to define the attribute in a way that we can support for both in-tree and out-of-tree use (prior, I had not wanted to upstream dead code which was not used in-tree).
* Adapted the circt version of `mlir_type_subclass`, also providing an `mlir_attribute_subclass`. As we get a bit of mileage on this, I would like to transition the builtin types/attributes to this mechanism and delete the old in-tree only `PyConcreteType` and `PyConcreteAttribute` template helpers (which cannot work reliably out of tree as they depend on internals).
* Added support for defaulting the MlirContext if none is passed so that we can support the same idioms as in-tree versions.
There is quite a bit going on here and I can split it up if needed, but would prefer to keep the first use and the header together so sending out in one patch.
Differential Revision: https://reviews.llvm.org/D102144
* Adds dialect registration, hand coded 'encoding' attribute and test.
* An MLIR CAPI tablegen backend for attributes does not exist, and this is a relatively complicated case. I opted to hand code it in a canonical way for now, which will provide a reasonable blueprint for building out the tablegen version in the future.
* Also added a (local) CMake function for declaring new CAPI tests, since it was getting repetitive/buggy.
Differential Revision: https://reviews.llvm.org/D102141
When using parallel loop construct, the OpenMP specification allows for
guided, auto and runtime as scheduling variants (as well as static and
dynamic which are already supported).
This adds the translation from MLIR to LLVM-IR for these scheduling
variants.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D101435
In the buffer deallocation pass, unranked memref types are not properly supported.
After investigating this issue, it turns out that the Clone and Dealloc operation
does not support unranked memref types in the current implementation.
This patch adds the missing feature and enables the transformation of any memref
type.
This patch solves this bug: https://bugs.llvm.org/show_bug.cgi?id=48385
Differential Revision: https://reviews.llvm.org/D101760
Previously, the OpenMP to LLVM IR conversion was setting the alloca insertion
point to the same position as the main compuation when converting OpenMP
`parallel` operations. This is problematic if, for example, the `parallel`
operation is placed inside a loop and would keep allocating on stack on each
iteration leading to stack overflow.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D101307
Inside a templated function, other class members need to be called with
this->.
Otherwise we get: explicit qualification required to use member
'setDebugName' from dependent base class.
We are able to bind the result from native function while rewriting
pattern. In matching pattern, if we want to get some values back, we can
do that by passing parameter as return value placeholder. Besides, add
the semantic of '$_self' in NativeCodeCall while matching, it'll be the
operation that defines certain operand.
Differential Revision: https://reviews.llvm.org/D100746
For `AffineLoopFusion` pass, add `memref` dialect as a dependent
dialect. Since the fusion pass can create `memref::AllocOp`s, the
dialect must be registered in its dependent dialects.
The missing dependency was not discovered until now because the above
said op creation happes only when the input already has
`memref::AllocOp`s in it, and all dialects in the input are
automatically added to the context.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D102104
Motivation: we have passes with lot of rewrites and when one one them segfaults or asserts, it is very hard to find waht exactly pattern failed without debug info.
Differential Revision: https://reviews.llvm.org/D101443
The current design uses a unique entry for each argument/result attribute, with the name of the entry being something like "arg0". This provides for a somewhat sparse design, but ends up being much more expensive (from a runtime perspective) in-practice. The design requires building a string every time we lookup the dictionary for a specific arg/result, and also requires N attribute lookups when collecting all of the arg/result attribute dictionaries.
This revision restructures the design to instead have an ArrayAttr that contains all of the attribute dictionaries for arguments and another for results. This design reduces the number of attribute name lookups to 1, and allows for O(1) lookup for individual element dictionaries. The major downside is that we can end up with larger memory usage, as the ArrayAttr contains an entry for each element even if that element has no attributes. If the memory usage becomes too problematic, we can experiment with a more sparse structure that still provides a lot of the wins in this revision.
This dropped the compilation time of a somewhat large TensorFlow model from ~650 seconds to ~400 seconds.
Differential Revision: https://reviews.llvm.org/D102035
This provides information when the user hovers over a part of the source .mlir file. This revision adds the following hover behavior:
* Operation:
- Shows the generic form.
* Operation Result:
- Shows the parent operation name, result number(s), and type(s).
* Block:
- Shows the parent operation name, block number, predecessors, and successors.
* Block Argument:
- Shows the parent operation name, parent block, argument number, and type.
Differential Revision: https://reviews.llvm.org/D101113
This it to make more clear the difference between this and
an AliasAnalysis.
For example, given a sequence of subviews that create values
A -> B -> C -> d:
BufferViewFlowAnalysis::resolve(B) => {B, C, D}
AliasAnalysis::resolve(B) => {A, B, C, D}
Differential Revision: https://reviews.llvm.org/D100838
Implements proper (de-)serialization logic for BranchConditionalOp when
such ops have true/false target operands.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D101602
Replace all `linalg.indexed_generic` ops by `linalg.generic` ops that access the iteration indices using the `linalg.index` op.
Differential Revision: https://reviews.llvm.org/D101612
The pattern to convert subtensor ops to their rank-reduced versions
(by dropping unit-dims in the result) can also convert to a zero-rank
tensor. Handle that case.
This also fixes a OOB access bug in the existing pattern for such
cases.
Differential Revision: https://reviews.llvm.org/D101949
Nearly complete alignment to spec v0.22
- Adds Div op
- Concat inputs now variadic
- Removes Placeholder op
Note: TF side PR https://github.com/tensorflow/tensorflow/pull/48921 deletes Concat legalizations to avoid breaking TensorFlow CI. This must be merged only after the TF PR has merged.
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D101958
It is currently stored in the high bits, which is disallowed on certain
platforms (e.g. android). This revision switches the representation to use
the low bits instead, fixing crashes/breakages on those platforms.
Differential Revision: https://reviews.llvm.org/D101969
This expose a lambda control instead of just a boolean to control unit
dimension folding.
This however gives more control to user to pick a good heuristic.
Folding reshapes helps fusion opportunities but may generate sub-optimal
generic ops.
Differential Revision: https://reviews.llvm.org/D101917
Fixing a minor bug which lead to element type of the output being
modified when folding reshapes with generic op.
Differential Revision: https://reviews.llvm.org/D101942
Implements support for undialated depthwise convolution using the existing
depthwise convolution operation. Once convolutions migrate to yaml defined
versions we can rewrite for cleaner implementation.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D101579
This untangles the MCContext and the MCObjectFileInfo. There is a circular
dependency between MCContext and MCObjectFileInfo. Currently this dependency
also exists during construction: You can't contruct a MOFI without a MCContext
without constructing the MCContext with a dummy version of that MOFI first.
This removes this dependency during construction. In a perfect world,
MCObjectFileInfo wouldn't depend on MCContext at all, but only be stored in the
MCContext, like other MC information. This is future work.
This also shifts/adds more information to the MCContext making it more
available to the different targets. Namely:
- TargetTriple
- ObjectFileType
- SubtargetInfo
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D101462
These instructions map to SVE-specific instrinsics that accept a
predicate operand to support control flow in vector code.
Differential Revision: https://reviews.llvm.org/D100982
This patch adds support for vectorizing loops with 'iter_args'
implementing known reductions along the vector dimension. Comparing to
the non-vector-dimension case, two additional things are done during
vectorization of such loops:
- The resulting vector returned from the loop is reduced to a scalar
using `vector.reduce`.
- In some cases a mask is applied to the vector yielded at the end of
the loop to prevent garbage values from being written to the
accumulator.
Vectorization of reduction loops is disabled by default. To enable it, a
map from loops to array of reduction descriptors should be explicitly passed to
`vectorizeAffineLoops`, or `vectorize-reductions=true` should be passed
to the SuperVectorize pass.
Current limitations:
- Loops with a non-unit step size are not supported.
- n-D vectorization with n > 1 is not supported.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D100694
The old index op handling let the new index operations point back to the
producer block. As a result, after fusion some index operations in the
fused block had back references to the old producer block resulting in
illegal IR. The patch now relies on a block and value mapping to avoid
such back references.
Differential Revision: https://reviews.llvm.org/D101887
While we figure out how to best add Standard support for scalable
vectors, these instructions provide a workaround for basic arithmetic
between scalable vectors.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D100837
This revision migrates more code from Linalg into the new permanent home of
SparseTensor. It replaces the test passes with proper compiler passes.
NOTE: the actual removal of the last glue and clutter in Linalg will follow
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D101811
We weren't properly visiting region successors when the terminator wasn't return like, which could create incorrect results in the analysis. This revision ensures that we properly visit region successors, to avoid optimistically assuming a value is constant when it isn't.
Differential Revision: https://reviews.llvm.org/D101783
All linalg.init operations must be fed into a linalg operation before
subtensor. The inserted linalg.fill guarantees it executes correctly.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D101848
TransferReadOps that are a scalar read + broadcast are handled by TransferReadToVectorLoadLowering.
Differential Revision: https://reviews.llvm.org/D101808
Lowerings equal and arithmetic_right_shift for elementwise ops to linalg dialect using linalg.generic
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D101804
Given the source and destination shapes, if they are static, or if the
expanded/collapsed dimensions are unit-extent, it is possible to
compute the reassociation maps that can be used to reshape one type
into another. Add a utility method to return the reassociation maps
when possible.
This utility function can be used to fuse a sequence of reshape ops,
given the type of the source of the producer and the final result
type. This pattern supercedes a more constrained folding pattern added
to DropUnitDims pass.
Differential Revision: https://reviews.llvm.org/D101343
Convert subtensor and subtensor_insert operations to use their
rank-reduced versions to drop unit dimensions.
Differential Revision: https://reviews.llvm.org/D101495
The current implementation had a bug as it was relying on the target vector
dimension sizes to calculate where to insert broadcast. If several dimensions
have the same size we may insert the broadcast on the wrong dimension. The
correct broadcast cannot be inferred from the type of the source and
destination vector.
Instead when we want to extend transfer ops we calculate an "inverse" map to the
projected permutation and insert broadcast in place of the projected dimensions.
Differential Revision: https://reviews.llvm.org/D101738
* NFC but has some fixes for CMake glitches discovered along the way (things not cleaning properly, co-mingled depends).
* Includes previously unsubmitted fix in D98681 and a TODO to fix it more appropriately in a smaller followup.
Differential Revision: https://reviews.llvm.org/D101493
Move TransposeOp lowering in its own populate function as in some cases
it is better to keep it during ContractOp lowering to better
canonicalize it rather than emiting scalar insert/extract.
Differential Revision: https://reviews.llvm.org/D101647
Add missing check in -test-affine-data-copy without which a test case
that has no affine.loads at all would crash this test pass. Fix two
clang-tidy warnings in the file while at this. (Not adding a test case
given the triviality.)
Differential Revision: https://reviews.llvm.org/D101719
* This makes them consistent with custom types/attributes, whose constructors will do a type checked conversion. Of course, the base classes can represent everything so never error.
* More importantly, this makes it possible to subclass Type and Attribute out of tree in sensible ways.
Differential Revision: https://reviews.llvm.org/D101734
Added canonicalization for vector_load and vector_store. An existing
pattern SimplifyAffineOp can be reused to compose maps that supplies
result into them. Added AffineVectorStoreOp and AffineVectorLoadOp
into static_assert of SimplifyAffineOp to allow operation to use it.
This fixes the bug filed: https://bugs.llvm.org/show_bug.cgi?id=50058
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D101691
(1) migrates the encoding from TensorDialect into the new SparseTensorDialect
(2) replaces dictionary-based storage and builders with struct-like data
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D101669
Stop using the compatibility spellings of `OF_{None,Text,Append}`
left behind by 1f67a3cba9. A follow-up
will remove them.
Differential Revision: https://reviews.llvm.org/D101650
Three patterns are added to convert into vector.multi_reduction into a
sequence of vector.reduction as the following:
- Transpose the inputs so inner most dimensions are always reduction.
- Reduce rank of vector.multi_reduction into 2d with inner most
reduction dim (get the 2d canical form)
- 2D canonical form is converted into a sequence of vector.reduction.
There are two things we might worth in a follow up diff:
- An scf.for (maybe optionally) around vector.reduction instead of unrolling it.
- Breakdown the vector.reduction into a sequence of vector.reduction
(e.g tree-based reduction) instead of relying on how downstream dialects
handle it.
Note: this will requires passing target-vector-length
Differential Revision: https://reviews.llvm.org/D101570
This is the very first step toward removing the glue and clutter from linalg and
replace it with proper sparse tensor types. This revision migrates the LinalgSparseOps
into SparseTensorOps of a sparse tensor dialect. This also provides a new home for
sparse tensor related transformation.
NOTE: the actual replacement with sparse tensor types (and removal of linalg glue/clutter)
will follow but I am trying to keep the amount of changes per revision manageable.
Differential Revision: https://reviews.llvm.org/D101573
Constant-0 dim expr values should be avoided for linalg as it can prevent
fusion. This includes adding support for rank-0 reshapes.
Differential Revision: https://reviews.llvm.org/D101418
This is the very first step toward removing the glue and clutter from linalg and
replace it with proper sparse tensor types. This revision migrates the LinalgSparseOps
into SparseTensorOps of a sparse tensor dialect. This also provides a new home for
sparse tensor related transformation.
NOTE: the actual replacement with sparse tensor types (and removal of linalg glue/clutter)
will follow but I am trying to keep the amount of changes per revision manageable.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D101488
This enables to express more complex parallel loops in the affine framework,
for example, in cases of tiling by sizes not dividing loop trip counts perfectly
or inner wavefront parallelism, among others. One can't use affine.max/min
and supply values to the nested loop bounds since the results of such
affine.max/min operations aren't valid symbols. Making them valid symbols
isn't an option since they would introduce selection trees into memref
subscript arithmetic as an unintended and undesired consequence. Also
add support for converting such loops to SCF. Drop some API that isn't used in
the core repo from AffineParallelOp since its semantics becomes ambiguous in
presence of max/min bounds. Loop normalization is currently unavailable for
such loops.
Depends On D101171
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D101172
Introduce a basic support for parallelizing affine loops with reductions
expressed using iteration arguments. Affine parallelism detector now has a flag
to assume such reductions are parallel. The transformation handles a subset of
parallel reductions that are can be expressed using affine.parallel:
integer/float addition and multiplication. This requires to detect the
reduction operation since affine.parallel only supports a fixed set of
reduction operators.
Reviewed By: chelini, kumasento, bondhugula
Differential Revision: https://reviews.llvm.org/D101171
FillOp allows complex ops, and filling a properly sized buffer with
a default zero complex number is implemented.
Differential Revision: https://reviews.llvm.org/D99939
This will allow the bindings to be built as a library and reused in out-of-tree
projects that want to provide bindings on top of MLIR bindings.
Reviewed By: stellaraccident, mikeurbach
Differential Revision: https://reviews.llvm.org/D101075
This revision adds support for vectorizing more general linalg operations with projected permutation maps.
This is achieved by eagerly broadcasting the intermediate vector to the common size
of the iteration domain of the linalg op. This allows a much more natural expression of
generalized vectorization but may introduce additional computations until all the
proper canonicalizations are implemented.
This generalization modifies the vector.transfer_read/write permutation logic and
exposes the fact that the logic employed in vector.contract was too ad-hoc.
As a consequence, changes occur in the permutation / transposition logic for contraction. In turn this prompts supporting more cases in the lowering of contract
to matrix intrinsics, which is required to make the corresponding tests pass.
Differential revision: https://reviews.llvm.org/D101165
The patch extends the OpDSL with support for:
- Constant values
- Capture scalar parameters
- Access the iteration indices using the index operation
- Provide predefined floating point and integer types.
Up to now the patch only supports emitting the new nodes. The C++/yaml path is not fully implemented. The fill_rng_2d operation defined in emit_structured_generic.py makes use of the new DSL constructs.
Differential Revision: https://reviews.llvm.org/D101364
This adds a method to directly invoke `mlirOperationDestroy` on the
MlirOperation wrapped by a PyOperation.
Reviewed By: stellaraccident, mehdi_amini
Differential Revision: https://reviews.llvm.org/D101422
Previously, this API would return the PyObjectRef, rather than the
underlying PyOperation.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D101416
Canonicalizations for subtensor operations defaulted to use the
rank-reduced version of the operation, but the cast inserted to get
back the original type would be illegal if the rank was actually
reduced. Instead make the canonicalization not reduce the rank of the
operation.
Differential Revision: https://reviews.llvm.org/D101258
The current canonicalization did not remap operation results correctly
and attempted to erase tiledLoop, which is incorrect if not all tensor
results are folded.
Tensor inputs, if not used in the body of TiledLoopOp, can be removed.
memref::CastOp can be folded into TiledLoopOp as well.
Differential Revision: https://reviews.llvm.org/D101445
Both, `shape.broadcast` and `shape.cstr_broadcastable` accept dynamic and static
extent tensors. If their operands are casted, we can use the original value
instead.
Differential Revision: https://reviews.llvm.org/D101376
Add a section attribute to LLVM_GlobalOp, during module translation attribute value is propagated to llvm
Reviewed By: sgrechanik, ftynse, mehdi_amini
Differential Revision: https://reviews.llvm.org/D100947
This adds `mlirOperationSetOperand` to the IR C API, similar to the
function to get an operand.
In the Python API, this adds `operands[index] = value` syntax, similar
to the syntax to get an operand with `operands[index]`.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D101398
Add the `getCapsule()` and `createFromCapsule()` methods to the
PyValue class, as well as the necessary interoperability.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D101090
MatMul and FullyConnected have transposed dimensions for the weights.
Also, removed uneeded tensor reshape for bias.
Differential Revision: https://reviews.llvm.org/D101220
Quantized negation can be performed using higher bits operations.
Minimal bits are picked to perform the operation.
Differential Revision: https://reviews.llvm.org/D101225
Explicitly check for uninitialized to prevent crashes in edge cases where the derived analysis creates a lattice element for a value that hasn't been visited yet.
Like `print-ir-after-all` and `-before-all`, this allows to inspect IR for
debug purposes. While the former allow to inspect only between passes, this
change allows to follow the rewrites that happen within passes.
Differential Revision: https://reviews.llvm.org/D100940
Empty extent tensor operands were only removed when they were defined as a
constant. Additionally, we can remove them if they are known to be empty by
their type `tensor<0xindex>`.
Differential Revision: https://reviews.llvm.org/D101351