llvm-project

Commit Graph

Author	SHA1	Message	Date
Sean Silva	7f2ebde735	[mlir] Split BufferUtils.h out of Bufferize.h These utilities are more closely associated with the buffer optimizations and buffer deallocation than with the dialect conversion stuff in Bufferize.h. So move them out. This makes Bufferize.h very easy to understand and completely focused on dialect conversion. Differential Revision: https://reviews.llvm.org/D91563	2020-11-19 12:56:36 -08:00
Sean Silva	b866574246	[mlir] Add BufferResultsToOutParams pass. This pass allows removing getResultConversionKind from BufferizeTypeConverter. This pass replaces the AppendToArgumentsList functionality. As far as I could tell, the only use of this functionlity is to perform the transformation that is implemented in this pass. Future patches will remove the getResultConversionKind machinery from BufferizeTypeConverter, but sending this patch for individual review for clarity. Differential Revision: https://reviews.llvm.org/D90071	2020-10-30 14:06:14 -07:00
River Riddle	b6eb26fd0e	[mlir][NFC] Move around the code related to PatternRewriting to improve layering There are several pieces of pattern rewriting infra in IR/ that really shouldn't be there. This revision moves those pieces to a better location such that they are easier to evolve in the future(e.g. with PDL). More concretely this revision does the following: * Create a Transforms/GreedyPatternRewriteDriver.h and move the applyandFold methods there. The definitions for these methods are already in Transforms/ so it doesn't make sense for the declarations to be in IR. Create a new lib/Rewrite library and move PatternApplicator there. This new library will be focused on applying rewrites, and will also include compiling rewrites with PDL. Differential Revision: https://reviews.llvm.org/D89103	2020-10-26 18:01:06 -07:00
Marcel Koester	1b1c61ff47	[mlir] Refactored BufferPlacement transformation. The current BufferPlacement transformation contains several concepts for hoisting allocations. However, more advanced hoisting techniques should not be integrated into the BufferPlacement transformation. Hence, this CL refactors the current BufferPlacement pass into three separate pieces: BufferDeallocation and BufferAllocation(Loop)Hoisting. Moreover, it extends the hoisting functionality by allowing to move allocations out of loops. Differential Revision: https://reviews.llvm.org/D87756	2020-10-19 12:52:16 +02:00
Sean Silva	1cca0f323e	[mlir] Refactor code out of BufferPlacement.cpp Now BufferPlacement.cpp doesn't depend on Bufferize.h. Part of the refactor discussed in: https://llvm.discourse.group/t/what-is-the-strategy-for-tensor-memref-conversion-bufferization/1938/17 Differential Revision: https://reviews.llvm.org/D89268	2020-10-14 12:39:16 -07:00
Geoffrey Martin-Noble	d4e889f1f5	Remove `Ops` suffix from dialect library names Dialects include more than just ops, so this suffix is outdated. Follows discussion in https://llvm.discourse.group/t/rfc-canonical-file-paths-to-dialects/621 Reviewed By: stellaraccident Differential Revision: https://reviews.llvm.org/D88530	2020-09-30 18:00:44 -07:00
Abhishek Varma	76d07503f0	[MLIR] Introduce inter-procedural memref layout normalization -- Introduces a pass that normalizes the affine layout maps to the identity layout map both within and across functions by rewriting function arguments and call operands where necessary. -- Memref normalization is now implemented entirely in the module pass '-normalize-memrefs' and the limited intra-procedural version has been removed from '-simplify-affine-structures'. -- Run using -normalize-memrefs. -- Return ops are not handled and would be handled in the subsequent revisions. Signed-off-by: Abhishek Varma <abhishek.varma@polymagelabs.com> Differential Revision: https://reviews.llvm.org/D84490	2020-07-30 18:12:56 +05:30
Ehsan Toosi	0f03b2bfda	[mlir] Add redundant copy removal transform This pass removes redundant dialect-independent Copy operations in different situations like the following: %from = ... %to = ... ... (no user/alias for %to) copy(%from, %to) ... (no user/alias for %from) dealloc %from use(%to) Differential Revision: https://reviews.llvm.org/D82757	2020-07-03 15:36:25 +02:00
Marcel Koester	33879aa0bf	[mlir] Fixed GCC compile issues and linking problems using SHARED_LIBS. Differential Revision: https://reviews.llvm.org/D81839	2020-06-15 15:46:21 +02:00
Alex Zinenko	c25b20c0f6	[mlir] NFC: Rename LoopOps dialect to SCF (Structured Control Flow) This dialect contains various structured control flow operaitons, not only loops, reflect this in the name. Drop the Ops suffix for consistency with other dialects. Note that this only moves the files and changes the C++ namespace from 'loop' to 'scf'. The visible IR prefix remains the same and will be updated separately. The conversions will also be updated separately. Differential Revision: https://reviews.llvm.org/D79578	2020-05-11 15:04:27 +02:00
Stephen Neuendorffer	5469f434bb	[MLIR] Reapply: Adjust libMLIR building to more closely follow libClang This reverts commit `ab1ca6e60f`.	2020-05-04 20:47:57 -07:00
Stephen Neuendorffer	ab1ca6e60f	Revert "[MLIR] Adjust libMLIR building to more closely follow libClang" This reverts commit `4f0f436749`. This seems to show some compile dependence problems, and also breaks flang.	2020-05-04 12:40:12 -07:00
Valentin Churavy	4f0f436749	[MLIR] Adjust libMLIR building to more closely follow libClang - Exports MLIR targets to be used out-of-tree. - mimicks `add_clang_library` and `add_flang_library`. - Fixes libMLIR.so After https://reviews.llvm.org/D77515 libMLIR.so was no longer containing any object files. We originally had a cludge there that made it work with the static initalizers and when switchting away from that to the way the clang shlib does it, I noticed that MLIR doesn't create a `obj.{name}` target, and doesn't export it's targets to `lib/cmake/mlir`. This is due to MLIR using `add_llvm_library` under the hood, which adds the target to `llvmexports`. Differential Revision: https://reviews.llvm.org/D78773 [MLIR] Fix libMLIR.so and LLVM_LINK_LLVM_DYLIB Primarily, this patch moves all mlir references to LLVM libraries into either LLVM_LINK_COMPONENTS or LINK_COMPONENTS. This enables magic in the llvm cmake files to automatically replace reference to LLVM components with references to libLLVM.so when necessary. Among other things, this completes fixing libMLIR.so, which has been broken for some configurations since D77515. Unlike previously, the pattern is now that mlir libraries should almost always use add_mlir_library. Previously, some libraries still used add_llvm_library. However, this confuses the export of targets for use out of tree because libraries specified with add_llvm_library are exported by LLVM. Instead users which don't need/can't be linked into libMLIR.so can specify EXCLUDE_FROM_LIBMLIR A common error mode is linking with LLVM libraries outside of LINK_COMPONENTS. This almost always results in symbol confusion or multiply defined options in LLVM when the same object file is included as a static library and as part of libLLVM.so. To catch these errors more directly, there's now mlir_check_all_link_libraries. To simplify usage of add_mlir_library, we assume that all mlir libraries depend on LLVMSupport, so it's not necessary to separately specify it. tested with: BUILD_SHARED_LIBS=on, BUILD_SHARED_LIBS=off + LLVM_BUILD_LLVM_DYLIB, BUILD_SHARED_LIBS=off + LLVM_BUILD_LLVM_DYLIB + LLVM_LINK_LLVM_DYLIB. By: Stephen Neuendorffer <stephen.neuendorffer@xilinx.com> Differential Revision: https://reviews.llvm.org/D79067 [MLIR] Move from using target_link_libraries to LINK_LIBS This allows us to correctly generate dependencies for derived targets, such as targets which are created for object libraries. By: Stephen Neuendorffer <stephen.neuendorffer@xilinx.com> Differential Revision: https://reviews.llvm.org/D79243 Three commits have been squashed to avoid intermediate build breakage.	2020-05-04 11:40:46 -07:00
Ehsan Toosi	5c352e69e7	Providing buffer assignment for MLIR We have provided a generic buffer assignment transformation ported from TensorFlow. This generic transformation pass automatically analyzes the values and their aliases (also in other blocks) and returns the valid positions for Alloc and Dealloc operations. To find these positions, the algorithm uses the block Dominator and Post-Dominator analyses. In our proposed algorithm, we have considered aliasing, liveness, nested regions, branches, conditional branches, critical edges, and independency to custom block terminators. This implementation doesn't support block loops. However, we have considered this in our design. For this purpose, it is only required to have a loop analysis to insert Alloc and Dealloc operations outside of these loops in some special cases. Differential Revision: https://reviews.llvm.org/D78484	2020-04-28 10:17:59 +02:00
River Riddle	152d29cc74	[mlir][Transforms] Add pass to perform sparse conditional constant propagation This revision adds the initial pass for performing SCCP generically in MLIR. SCCP is an algorithm for propagating constants across control flow, and optimistically assumes all values to be constant unless proven otherwise. It currently supports branching control, with support for regions and inter-procedural propagation being added in followups. Differential Revision: https://reviews.llvm.org/D78397	2020-04-21 02:59:25 -07:00
Stephen Neuendorffer	f061295732	[MLIR] Complete refactoring of Affine dialect into sub-libraries. There were some unused CMakeFiles for Affine/IR and Affine/EDSC. This change builds separate MLIRAffineOps and MLIRAffineEDSC libraries using those CMakeFiles. This combination replaces the old MLIRAffine library. Differential Revision: https://reviews.llvm.org/D78317	2020-04-16 13:41:17 -07:00
River Riddle	8155e41ac6	[mlir][Pass] Add a tablegen backend for defining Pass information This will greatly simplify a number of things related to passes: * Enables generation of pass registration * Enables generation of boiler plate pass utilities * Enables generation of pass documentation This revision focuses on adding the basic structure and adds support for generating the registration for passes in the Transforms/ directory. Future revisions will add more support and move more passes over. Differential Revision: https://reviews.llvm.org/D76656	2020-04-01 02:10:46 -07:00
Tres Popp	27c201aa1d	[MLIR] Add parallel loop collapsing. This allows conversion of a ParallelLoop from N induction variables to some nuber of induction variables less than N. The first intended use of this is for the GPUDialect to convert ParallelLoops to iterate over 3 dimensions so they can be launched as GPU Kernels. To implement this: - Normalize each iteration space of the ParallelLoop - Use the same induction variable in a new ParallelLoop for multiple original iterations. - Split the new induction variable back into the original set of values inside the body of the ParallelLoop. Subscribers: mgorny, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, Joonsoo, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76363	2020-03-26 09:32:52 +01:00
Uday Bondhugula	b873761496	[MLIR][NFC] Move some of the affine transforms / tests to dialect dirs Move some of the affine transforms and their test cases to their respective dialect directory. This patch does not complete the move, but takes care of a good part. Renames: prefix 'affine' to affine loop tiling cl options, vectorize -> super-vectorize Signed-off-by: Uday Bondhugula <uday@polymagelabs.com> Differential Revision: https://reviews.llvm.org/D76565	2020-03-23 08:25:07 +05:30
Rob Suderman	e708471395	[mlir][NFC] Cleanup AffineOps directory structure Summary: Change AffineOps Dialect structure to better group both IR and Tranforms. This included extracting transforms directly related to AffineOps. Also move AffineOps to Affine. Differential Revision: https://reviews.llvm.org/D76161	2020-03-20 14:23:43 -07:00
Rob Suderman	4d60f47b08	[mlir][NFC] Renamed VectorOps to Vector Summary: Renamed VectorOps to Vector to avoid the redundant Ops suffix. Differential Revision: https://reviews.llvm.org/D76317	2020-03-17 15:28:08 -07:00
River Riddle	43959a2592	[mlir][NFC] Move the LoopLike interface out of Transforms/ and into Interfaces/ Differential Revision: https://reviews.llvm.org/D76155	2020-03-14 13:37:56 -07:00
Valentin Churavy	7c64f6bf52	[MLIR] Add support for libMLIR.so Putting this up mainly for discussion on how this should be done. I am interested in MLIR from the Julia side and we currently have a strong preference to dynamically linking against the LLVM shared library, and would like to have a MLIR shared library. This patch adds a new cmake function add_mlir_library() which accumulates a list of targets to be compiled into libMLIR.so. Note that not all libraries make sense to be compiled into libMLIR.so. In particular, we want to avoid libraries which primarily exist to support certain tools (such as mlir-opt and mlir-cpu-runner). Note that the resulting libMLIR.so depends on LLVM, but does not contain any LLVM components. As a result, it is necessary to link with libLLVM.so to avoid linkage errors. So, libMLIR.so requires LLVM_BUILD_LLVM_DYLIB=on FYI, Currently it appears that LLVM_LINK_LLVM_DYLIB is broken because mlir-tblgen is linked against libLLVM.so and and independent LLVM components. Previous version of this patch broke depencies on TableGen targets. This appears to be because it compiled all libraries to OBJECT libraries (probably because cmake is generating different target names). Avoiding object libraries results in correct dependencies. (updated by Stephen Neuendorffer) Differential Revision: https://reviews.llvm.org/D73130	2020-03-06 13:25:18 -08:00
Stephen Neuendorffer	4594d0e943	[MLIR] Move from add_dependencies() to DEPENDS add_llvm_library and add_llvm_executable may need to create new targets with appropriate dependencies. As a result, it is not sufficient in some configurations (namely LLVM_BUILD_LLVM_DYLIB=on) to only call add_dependencies(). Instead, the explicit TableGen dependencies must be passed to add_llvm_library() or add_llvm_executable() using the DEPENDS keyword. Differential Revision: https://reviews.llvm.org/D74930	2020-03-06 13:25:17 -08:00
Stephen Neuendorffer	1c82dd39f9	[MLIR] Ensure that target_link_libraries() always has a keyword. CMake allows calling target_link_libraries() without a keyword, but this usage is not preferred when also called with a keyword, and has surprising behavior. This patch explicitly specifies a keyword when using target_link_libraries(). Differential Revision: https://reviews.llvm.org/D75725	2020-03-06 09:14:01 -08:00
Stephen Neuendorffer	798e661567	Revert "[MLIR] Move from using target_link_libraries to LINK_LIBS for llvm libraries." This reverts commit `7a6c689771`. This breaks the build with cmake 3.13.4, but succeeds with cmake 3.15.3	2020-02-29 11:52:08 -08:00
Stephen Neuendorffer	d675df0379	Revert "[MLIR] Move from add_dependencies() to DEPENDS" This reverts commit `31e07d716a`.	2020-02-29 11:52:08 -08:00
Stephen Neuendorffer	dd046c9612	Revert "[MLIR] Add support for libMLIR.so" This reverts commit `e17d9c11d4`. It breaks the build.	2020-02-29 11:09:21 -08:00
Valentin Churavy	e17d9c11d4	[MLIR] Add support for libMLIR.so Putting this up mainly for discussion on how this should be done. I am interested in MLIR from the Julia side and we currently have a strong preference to dynamically linking against the LLVM shared library, and would like to have a MLIR shared library. This patch adds a new cmake function add_mlir_library() which accumulates a list of targets to be compiled into libMLIR.so. Note that not all libraries make sense to be compiled into libMLIR.so. In particular, we want to avoid libraries which primarily exist to support certain tools (such as mlir-opt and mlir-cpu-runner). Note that the resulting libMLIR.so depends on LLVM, but does not contain any LLVM components. As a result, it is necessary to link with libLLVM.so to avoid linkage errors. So, libMLIR.so requires LLVM_BUILD_LLVM_DYLIB=on FYI, Currently it appears that LLVM_LINK_LLVM_DYLIB is broken because mlir-tblgen is linked against libLLVM.so and and independent LLVM components. Previous version of this patch broke depencies on TableGen targets. This appears to be because it compiled all libraries to OBJECT libraries (probably because cmake is generating different target names). Avoiding object libraries results in correct dependencies. (updated by Stephen Neuendorffer) Differential Revision: https://reviews.llvm.org/D73130	2020-02-29 10:47:27 -08:00
Stephen Neuendorffer	31e07d716a	[MLIR] Move from add_dependencies() to DEPENDS add_llvm_library and add_llvm_executable may need to create new targets with appropriate dependencies. As a result, it is not sufficient in some configurations (namely LLVM_BUILD_LLVM_DYLIB=on) to only call add_dependencies(). Instead, the explicit TableGen dependencies must be passed to add_llvm_library() or add_llvm_executable() using the DEPENDS keyword. Differential Revision: https://reviews.llvm.org/D74930	2020-02-29 10:47:27 -08:00
Stephen Neuendorffer	7a6c689771	[MLIR] Move from using target_link_libraries to LINK_LIBS for llvm libraries. When compiling libLLVM.so, add_llvm_library() manipulates the link libraries being used. This means that when using add_llvm_library(), we need to pass the list of libraries to be linked (using the LINK_LIBS keyword) instead of using the standard target_link_libraries call. This is preparation for properly dealing with creating libMLIR.so as well. Differential Revision: https://reviews.llvm.org/D74864	2020-02-29 10:47:26 -08:00
Stephen Neuendorffer	dc1056a3f1	Revert "[MLIR] Move from using target_link_libraries to LINK_LIBS for llvm libraries." This reverts commit `2f265e3528`.	2020-02-28 14:13:30 -08:00
Stephen Neuendorffer	67f2a43cf8	Revert "[MLIR] Move from add_dependencies() to DEPENDS" This reverts commit `8a2b86b2c2`.	2020-02-28 12:17:40 -08:00
Stephen Neuendorffer	c6f3fc4999	Revert "[MLIR] Add support for libMLIR.so" This reverts commit `1246e86716`.	2020-02-28 12:17:39 -08:00
Valentin Churavy	1246e86716	[MLIR] Add support for libMLIR.so Putting this up mainly for discussion on how this should be done. I am interested in MLIR from the Julia side and we currently have a strong preference to dynamically linking against the LLVM shared library, and would like to have a MLIR shared library. This patch adds a new cmake function add_mlir_library() which accumulates a list of targets to be compiled into libMLIR.so. Note that not all libraries make sense to be compiled into libMLIR.so. In particular, we want to avoid libraries which primarily exist to support certain tools (such as mlir-opt and mlir-cpu-runner). Note that the resulting libMLIR.so depends on LLVM, but does not contain any LLVM components. As a result, it is necessary to link with libLLVM.so to avoid linkage errors. So, libMLIR.so requires LLVM_BUILD_LLVM_DYLIB=on FYI, Currently it appears that LLVM_LINK_LLVM_DYLIB is broken because mlir-tblgen is linked against libLLVM.so and and independent LLVM components (updated by Stephen Neuendorffer) Differential Revision: https://reviews.llvm.org/D73130	2020-02-28 11:35:19 -08:00
Stephen Neuendorffer	8a2b86b2c2	[MLIR] Move from add_dependencies() to DEPENDS add_llvm_library and add_llvm_executable may need to create new targets with appropriate dependencies. As a result, it is not sufficient in some configurations (namely LLVM_BUILD_LLVM_DYLIB=on) to only call add_dependencies(). Instead, the explicit TableGen dependencies must be passed to add_llvm_library() or add_llvm_executable() using the DEPENDS keyword. Differential Revision: https://reviews.llvm.org/D74930	2020-02-28 11:35:18 -08:00
Stephen Neuendorffer	2f265e3528	[MLIR] Move from using target_link_libraries to LINK_LIBS for llvm libraries. When compiling libLLVM.so, add_llvm_library() manipulates the link libraries being used. This means that when using add_llvm_library(), we need to pass the list of libraries to be linked (using the LINK_LIBS keyword) instead of using the standard target_link_libraries call. This is preparation for properly dealing with creating libMLIR.so as well. Differential Revision: https://reviews.llvm.org/D74864	2020-02-28 11:35:17 -08:00
River Riddle	abe3e5babd	[mlir] Add support for generating debug locations from intermediate levels of the IR. Summary: This revision adds a utility to generate debug locations from the IR during compilation, by snapshotting to a output stream and using the locations that operations were dumped in that stream. The new locations may either; * Replace the original location of the operation. old: loc("original_source.cpp":1:1) new: loc("snapshot_source.mlir":10:10) * Fuse with the original locations as NamedLocs with a specific tag. old: loc("original_source.cpp":1:1) new: loc(fused["original_source.cpp":1:1, "snapshot"("snapshot_source.mlir":10:10)]) This feature may be used by a debugger to display the code at various different levels of the IR. It would also be able to show the different levels of IR attached to a specific source line in the original source file. This feature may also be used to generate locations for operations generated during compilation, that don't necessarily have a user source location to attach to. This requires changes in the printer to track the locations of operations emitted in the stream. Moving forward we need to properly(and efficiently) track the number of newlines emitted to the stream during printing. Differential Revision: https://reviews.llvm.org/D74019	2020-02-08 15:11:29 -08:00
Stephen Neuendorffer	b3dd31711a	[MLIR] Move test passes out of lib/Analysis Summary: This breaks a cyclic library dependency where MLIRPass used the verifier in MLIRAnalysis, but MLIRAnalysis also contained passes used for testing. The presence of the test passes here is archaeology, predating test/lib/Transform. Reviewers: rriddle Reviewed By: rriddle Subscribers: merge_guards_bot, mgorny, mehdi_amini, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74067	2020-02-05 11:26:49 -08:00
River Riddle	b276dec5b6	[mlir] Add a DCE pass for dead symbols. Summary: This pass deletes all symbols that are found to be unreachable. This is done by computing the set of operations that are known to be live, propagating that liveness to other symbols, and then deleting all symbols that are not within this live set. Differential Revision: https://reviews.llvm.org/D72482	2020-01-27 23:29:30 -08:00
Nicolas Vasilache	edfaf925cf	Drop MaterializeVectorTransfers in favor of simpler declarative unrolling Now that we have unrolling as a declarative pattern, we can drop a full pass that has gone stale. In the future we may want to add specific unrolling patterns for VectorTransferReadOp. PiperOrigin-RevId: 283806880	2019-12-04 12:11:42 -08:00
River Riddle	fafb708b9a	Merge DCE and unreachable block elimination into a new utility 'simplifyRegions'. This moves the different canonicalizations of regions into one place and invokes them in the fixed-point iteration of the canonicalizer. PiperOrigin-RevId: 281617072	2019-11-20 15:53:19 -08:00
Sean Silva	e4f83c6c26	Add multi-level DCE pass. This is a simple multi-level DCE pass that operates pretty generically on the IR. Its key feature compared to the existing peephole dead op folding that happens during canonicalization is being able to delete recursively dead cycles of the use-def graph, including block arguments. PiperOrigin-RevId: 281568202	2019-11-20 12:55:10 -08:00
Nicolas Vasilache	0b271b7dfe	Refactor the LowerVectorTransfers pass to use the RewritePattern infra - NFC This is step 1/n in refactoring infrastructure along the Vector dialect to make it ready for retargetability and composable progressive lowering. PiperOrigin-RevId: 280529784	2019-11-14 15:40:07 -08:00
Alex Zinenko	971b8dd4d8	Move Affine to Standard conversion to lib/Conversion This is essentially a dialect conversion and conceptually belongs to conversions. PiperOrigin-RevId: 280460034	2019-11-14 10:35:21 -08:00
Mehdi Amini	f1f9e3b8d1	Fix CMake configuration after introduction of LICM and LoopLikeInterface `b843cc5d5a` introduced a new op LICM transformation and a LoopLike interface, but missed the CMake aspects of it. This should fix the build. PiperOrigin-RevId: 275038533	2019-10-16 08:37:39 -07:00
Jacques Pienaar	2660623a88	Add pass generate per block in a function a GraphViz Dot graph with ops as nodes * Add GraphTraits that treat a block as a graph, Operation* as node and use-relationship for edges; - Just basic graph output; * Add use iterator to iterate over all uses of an Operation; * Add testing pass to generate op graph; This does not support arbitrary operations other than function nor nested regions yet. PiperOrigin-RevId: 268121782	2019-09-09 18:12:41 -07:00
River Riddle	0ba0087887	Add the initial inlining infrastructure. This defines a set of initial utilities for inlining a region(or a FuncOp), and defines a simple inliner pass for testing purposes. A new dialect interface is defined, DialectInlinerInterface, that allows for dialects to override hooks controlling inlining legality. The interface currently provides the following hooks, but these are just premilinary and should be changed/added to/modified as necessary: * isLegalToInline - Determine if a region can be inlined into one of this dialect, or if an operation of this dialect can be inlined into a given region. * shouldAnalyzeRecursively - Determine if an operation with regions should be analyzed recursively for legality. This allows for child operations to be closed off from the legality checks for operations like lambdas. * handleTerminator - Process a terminator that has been inlined. This cl adds support for inlining StandardOps, but other dialects will be added in followups as necessary. PiperOrigin-RevId: 267426759	2019-09-05 12:24:13 -07:00
Uday Bondhugula	18b8d4352b	Introduce explicit copying optimization by generalizing the DMA generation pass Explicit copying to contiguous buffers is a standard technique to avoid conflict misses and TLB misses, and improve hardware prefetching performance. When done in conjunction with cache tiling, it nearly eliminates all cache conflict and TLB misses, and a single hardware prefetch stream is needed per data tile. - generalize/extend DMA generation pass (renamed data copying pass) to perform either point-wise explicit copies to fast memory buffers or DMAs (depending on a cmd line option). All logic is the same as erstwhile -dma-generate. - -affine-dma-generate is now renamed -affine-data-copy; when -dma flag is provided, DMAs are generated, or else explicit copy loops are generated (point-wise) by default. - point-wise copying could be used for CPUs (or GPUs); some indicative performance numbers with a "C" version of the MLIR when compiled with and without this optimization (about 2x improvement here). With a matmul on 4096^2 matrices on a single core of an Intel Core i7 Skylake i7-8700K with clang 8.0.0: clang -O3: 518s clang -O3 with MLIR tiling (128x128): 24.5s clang -O3 with MLIR tiling + data copying 12.4s (code equivalent to test/Transforms/data-copy.mlir func @matmul) - fix some misleading comments. - change default fast-mem space to 0 (more intuitive now with the default copy generation using point-wise copies instead of DMAs) On a simple 3-d matmul loop nest, code generated with -affine-data-copy: ``` affine.for %arg3 = 0 to 4096 step 128 { affine.for %arg4 = 0 to 4096 step 128 { %0 = affine.apply #map0(%arg3, %arg4) %1 = affine.apply #map1(%arg3, %arg4) %2 = alloc() : memref<128x128xf32, 2> // Copy-in Out matrix. affine.for %arg5 = 0 to 128 { %5 = affine.apply #map2(%arg3, %arg5) affine.for %arg6 = 0 to 128 { %6 = affine.apply #map2(%arg4, %arg6) %7 = load %arg2[%5, %6] : memref<4096x4096xf32> affine.store %7, %2[%arg5, %arg6] : memref<128x128xf32, 2> } } affine.for %arg5 = 0 to 4096 step 128 { %5 = affine.apply #map0(%arg3, %arg5) %6 = affine.apply #map1(%arg3, %arg5) %7 = alloc() : memref<128x128xf32, 2> // Copy-in LHS. affine.for %arg6 = 0 to 128 { %11 = affine.apply #map2(%arg3, %arg6) affine.for %arg7 = 0 to 128 { %12 = affine.apply #map2(%arg5, %arg7) %13 = load %arg0[%11, %12] : memref<4096x4096xf32> affine.store %13, %7[%arg6, %arg7] : memref<128x128xf32, 2> } } %8 = affine.apply #map0(%arg5, %arg4) %9 = affine.apply #map1(%arg5, %arg4) %10 = alloc() : memref<128x128xf32, 2> // Copy-in RHS. affine.for %arg6 = 0 to 128 { %11 = affine.apply #map2(%arg5, %arg6) affine.for %arg7 = 0 to 128 { %12 = affine.apply #map2(%arg4, %arg7) %13 = load %arg1[%11, %12] : memref<4096x4096xf32> affine.store %13, %10[%arg6, %arg7] : memref<128x128xf32, 2> } } // Compute. affine.for %arg6 = #map7(%arg3) to #map8(%arg3) { affine.for %arg7 = #map7(%arg4) to #map8(%arg4) { affine.for %arg8 = #map7(%arg5) to #map8(%arg5) { %11 = affine.load %7[-%arg3 + %arg6, -%arg5 + %arg8] : memref<128x128xf32, 2> %12 = affine.load %10[-%arg5 + %arg8, -%arg4 + %arg7] : memref<128x128xf32, 2> %13 = affine.load %2[-%arg3 + %arg6, -%arg4 + %arg7] : memref<128x128xf32, 2> %14 = mulf %11, %12 : f32 %15 = addf %13, %14 : f32 affine.store %15, %2[-%arg3 + %arg6, -%arg4 + %arg7] : memref<128x128xf32, 2> } } } dealloc %10 : memref<128x128xf32, 2> dealloc %7 : memref<128x128xf32, 2> } %3 = affine.apply #map0(%arg3, %arg4) %4 = affine.apply #map1(%arg3, %arg4) // Copy out result matrix. affine.for %arg5 = 0 to 128 { %5 = affine.apply #map2(%arg3, %arg5) affine.for %arg6 = 0 to 128 { %6 = affine.apply #map2(%arg4, %arg6) %7 = affine.load %2[%arg5, %arg6] : memref<128x128xf32, 2> store %7, %arg2[%5, %6] : memref<4096x4096xf32> } } dealloc %2 : memref<128x128xf32, 2> } } ``` With -affine-data-copy -dma: ``` affine.for %arg3 = 0 to 4096 step 128 { %0 = affine.apply #map3(%arg3) %1 = alloc() : memref<128xf32, 2> %2 = alloc() : memref<1xi32> affine.dma_start %arg2[%arg3], %1[%c0], %2[%c0], %c128_0 : memref<4096xf32>, memref<128xf32, 2>, memref<1xi32> affine.dma_wait %2[%c0], %c128_0 : memref<1xi32> %3 = alloc() : memref<1xi32> affine.for %arg4 = 0 to 4096 step 128 { %5 = affine.apply #map0(%arg3, %arg4) %6 = affine.apply #map1(%arg3, %arg4) %7 = alloc() : memref<128x128xf32, 2> %8 = alloc() : memref<1xi32> affine.dma_start %arg0[%arg3, %arg4], %7[%c0, %c0], %8[%c0], %c16384, %c4096, %c128_2 : memref<4096x4096xf32>, memref<128x128xf32, 2>, memref<1xi32> affine.dma_wait %8[%c0], %c16384 : memref<1xi32> %9 = affine.apply #map3(%arg4) %10 = alloc() : memref<128xf32, 2> %11 = alloc() : memref<1xi32> affine.dma_start %arg1[%arg4], %10[%c0], %11[%c0], %c128_1 : memref<4096xf32>, memref<128xf32, 2>, memref<1xi32> affine.dma_wait %11[%c0], %c128_1 : memref<1xi32> affine.for %arg5 = #map3(%arg3) to #map5(%arg3) { affine.for %arg6 = #map3(%arg4) to #map5(%arg4) { %12 = affine.load %7[-%arg3 + %arg5, -%arg4 + %arg6] : memref<128x128xf32, 2> %13 = affine.load %10[-%arg4 + %arg6] : memref<128xf32, 2> %14 = affine.load %1[-%arg3 + %arg5] : memref<128xf32, 2> %15 = mulf %12, %13 : f32 %16 = addf %14, %15 : f32 affine.store %16, %1[-%arg3 + %arg5] : memref<128xf32, 2> } } dealloc %11 : memref<1xi32> dealloc %10 : memref<128xf32, 2> dealloc %8 : memref<1xi32> dealloc %7 : memref<128x128xf32, 2> } %4 = affine.apply #map3(%arg3) affine.dma_start %1[%c0], %arg2[%arg3], %3[%c0], %c128 : memref<128xf32, 2>, memref<4096xf32>, memref<1xi32> affine.dma_wait %3[%c0], %c128 : memref<1xi32> dealloc %3 : memref<1xi32> dealloc %2 : memref<1xi32> dealloc %1 : memref<128xf32, 2> } ``` Signed-off-by: Uday Bondhugula <uday@polymagelabs.com> Closes tensorflow/mlir#50 PiperOrigin-RevId: 261221903	2019-08-01 16:31:58 -07:00
Nicolas Vasilache	48a1baeb8a	Refactor LoopParametricTiling as a test pass - NFC This CL moves LoopParametricTiling into test/lib as a pass for purely testing purposes. PiperOrigin-RevId: 259300264	2019-07-22 04:31:17 -07:00

1 2

66 Commits