llvm-project

Commit Graph

Author	SHA1	Message	Date
lorenzo chelini	0a74a7161b	[mlir] scf::ForOp: Drop iter arguments (and corresponding result) with no use 'ForOpIterArgsFolder' can now remove iterator arguments (and corresponding results) with no use. Example: ``` %cst = constant 32 : i32 %0:2 = scf.for %arg1 = %lb to %ub step %step iter_args(%arg2 = %arg0, %arg3 = %cst) -> (i32, i32) { %1 = addu %arg2, %cst : i32 scf.yield %1, %1 : i32, i32 } use(%0#0) ``` %arg3 is not used in the block, and its corresponding result `%0#1` has no use, thus remove the iter argument. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D98711	2021-03-17 12:06:17 +00:00
Stephan Herhut	5837fdc4cc	[mlir][llvm] Pass struct results as parameter in c wrapper Returning structs directly in LLVM does not necessarily align with the C ABI of the platform. This might happen to work on Linux but for small structs this breaks on Windows. With this change, the wrappers work platform independently. Differential Revision: https://reviews.llvm.org/D98725	2021-03-17 12:58:52 +01:00
Gaurav Shukla	8e3075c2b0	[MLIR] Fix lowering of Affine IfOp in the presence of yield values. This commit fixes the lowering of `Affine.IfOp` to `SCF.IfOp` in the presence of yield values. These changes have been made as a part of `-lower-affine` pass. Differential Revision: https://reviews.llvm.org/D98760	2021-03-17 16:33:32 +05:30
River Riddle	caa7038a89	[mlir][IR] Move the remaining builtin attributes to ODS. With this revision, all builtin attributes and types will have been moved to the ODS generator. Differential Revision: https://reviews.llvm.org/D98474	2021-03-16 16:31:53 -07:00
River Riddle	425e11eea1	[mlir][AttrTypeDefGen] Add support for custom parameter comparators Some parameters to attributes and types rely on special comparison routines other than operator== to ensure equality. This revision adds support for those parameters by allowing them to specify a `comparator` code block that determines if `$_lhs` and `$_rhs` are equal. An example of one of these paramters is APFloat, which requires `bitwiseIsEqual` for bitwise comparison (which we want for attribute equality). Differential Revision: https://reviews.llvm.org/D98473	2021-03-16 16:31:53 -07:00
River Riddle	1f13963ec1	[mlir][pdl] Cast the OperationPosition to Position to fix MSVC miscompile If we don't cast, MSVC picks an overload that hasn't been defined yet(not sure why) and miscompiles.	2021-03-16 16:11:14 -07:00
Eugene Zhulenev	74f6138bd9	[mlir] Add lowering from math::Log1p to LLVM [mlir] Add lowering from math::Log1p to LLVM Reviewed By: cota Differential Revision: https://reviews.llvm.org/D98662	2021-03-16 15:59:09 -07:00
River Riddle	85ab413b53	[mlir][PDL] Add support for variadic operands and results in the PDL byte code Supporting ranges in the byte code requires additional complexity, given that a range can't be easily representable as an opaque void , as is possible with the existing bytecode value types (Attribute, Type, Value, etc.). To enable representing a range with void , an auxillary storage is used for the actual range itself, with the pointer being passed around in the normal byte code memory. For type ranges, a TypeRange is stored. For value ranges, a ValueRange is stored. The above problem represents a majority of the complexity involved in this revision, the rest is adapting/adding byte code operations to support the changes made to the PDL interpreter in the parent revision. After this revision, PDL will have initial end-to-end support for variadic operands/results. Differential Revision: https://reviews.llvm.org/D95723	2021-03-16 13:20:19 -07:00
River Riddle	3a833a0e0e	[mlir][PDL] Add support for variadic operands and results in the PDL Interpreter This revision extends the PDL Interpreter dialect to add support for variadic operands and results, with ranges of these values represented via the recently added !pdl.range type. To support this extension, three new operations have been added that closely match the single variant: * pdl_interp.check_types : Compare a range of types with a known range. * pdl_interp.create_types : Create a constant range of types. * pdl_interp.get_operands : Get a range of operands from an operation. * pdl_interp.get_results : Get a range of results from an operation. * pdl_interp.switch_types : Switch on a range of types. This revision handles adding support in the interpreter dialect and the conversion from PDL to PDLInterp. Support for variadic operands and results in the bytecode will be added in a followup revision. Differential Revision: https://reviews.llvm.org/D95722	2021-03-16 13:20:19 -07:00
River Riddle	1eb6994d6a	[mlir][PDL] Add support for variadic operands and results in PDL This revision extends the PDL dialect to add support for variadic operands and results, with ranges of these values represented via the recently added !pdl.range type. To support this extension, three new operations have been added that closely match the single variant: * pdl.operands : Define a range of input operands. * pdl.results : Extract a result group from an operation. * pdl.types : Define a handle to a range of types. Support for these in the pdl interpreter dialect and byte code will be added in followup revisions. Differential Revision: https://reviews.llvm.org/D95721	2021-03-16 13:20:18 -07:00
River Riddle	02c4c0d5b2	[mlir][pdl] Remove CreateNativeOp in favor of a more general ApplyNativeRewriteOp. This has a numerous amount of benefits, given the overly clunky nature of CreateNativeOp: * Users can now call into arbitrary rewrite functions from inside of PDL, allowing for more natural interleaving of PDL/C++ and enabling for more of the pattern to be in PDL. * Removes the need for an additional set of C++ functions/registry/etc. The new ApplyNativeRewriteOp will use the same PDLRewriteFunction as the existing RewriteOp. This reduces the API surface area exposed to users. This revision also introduces a new PDLResultList class. This class is used to provide results of native rewrite functions back to PDL. We introduce a new class instead of using a SmallVector to simplify the work necessary for variadics, given that ranges will require some changes to the structure of PDLValue. Differential Revision: https://reviews.llvm.org/D95720	2021-03-16 13:20:18 -07:00
River Riddle	242762c9a3	[mlir][pdl] Restructure how results are represented. Up until now, results have been represented as additional results to a pdl.operation. This is fairly clunky, as it mismatches the representation of the rest of the IR constructs(e.g. pdl.operand) and also isn't a viable representation for operations returned by pdl.create_native. This representation also creates much more difficult problems when factoring in support for variadic result groups, optional results, etc. To resolve some of these problems, and simplify adding support for variable length results, this revision extracts the representation for results out of pdl.operation in the form of a new `pdl.result` operation. This operation returns the result of an operation at a given index, e.g.: ``` %root = pdl.operation ... %result = pdl.result 0 of %root ``` Differential Revision: https://reviews.llvm.org/D95719	2021-03-16 13:20:18 -07:00
Nicolas Vasilache	b661788b77	[mlir] NFC - Expose GlobalCreator so it can be reused.	2021-03-16 12:29:04 +00:00
Adrian Kuegel	2995e161b0	[mlir]: Add canonicalization for dim of 1D alloc of size rank. Differential Revision: https://reviews.llvm.org/D97542	2021-03-16 10:38:57 +01:00
Lorenzo Chelini	fd7eee64c5	scf::ForOp: Fold away iterator arguments with no use and for which the corresponding input is yielded Enhance 'ForOpIterArgsFolder' to remove unused iteration arguments in a scf::ForOp. If the block argument corresponding to the given iterator has no use and the yielded value equals the input, we fold it away. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D98503	2021-03-16 07:01:25 +00:00
Aart Bik	6ad7b97e20	[mlir][amx] Add Intel AMX dialect (architectural-specific vector dialect) The Intel Advanced Matrix Extensions (AMX) provides a tile matrix multiply unit (TMUL), a tile control register (TILECFG), and eight tile registers TMM0 through TMM7 (TILEDATA). This new MLIR dialect provides a bridge between MLIR concepts like vectors and memrefs and the lower level LLVM IR details of AMX. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D98470	2021-03-15 17:59:05 -07:00
Alex Zinenko	e82a30bdce	[mlir] enable Python bindings for the MemRef dialect A previous commit moved multiple ops from Standard to MemRef dialect. Some of these ops are exercised in Python bindings. Enable bindings for the newly created MemRef dialect and update a test accordingly.	2021-03-15 14:07:51 +01:00
Alex Zinenko	0fb4a201c0	[mlir] fix shared-lib build fallout of `e2310704d8` The patch in question broke the build with shared libraries due to missing dependencies, one of which would have been circular between MLIRStandard and MLIRMemRef if added. Fix this by moving more code around and swapping the dependency direction. MLIRMemRef now depends on MLIRStandard, but MLIRStandard does _not_ depend on MLIRMemRef. Arguably, this is the right direction anyway since numerous libraries depend on MLIRStandard and don't necessarily need to depend on MLIRMemref. Other otable changes include: - some EDSC code is moved inline to MemRef/EDSC/Intrinsics.h because it creates MemRef dialect operations; - a utility function related to shape moved to BuiltinTypes.h/cpp because it only realtes to shaped types and not any particular dialect (standard dialect is erroneously believed to contain MemRefType); - a Python test for the standard dialect is disabled completely because the ops it tests moved to the new MemRef dialect, but it is not exposed to Python bindings, and the change for that is non-trivial.	2021-03-15 13:41:38 +01:00
Julian Gross	e2310704d8	[MLIR] Create memref dialect and move dialect-specific ops from std. Create the memref dialect and move dialect-specific ops from std dialect to this dialect. Moved ops: AllocOp -> MemRef_AllocOp AllocaOp -> MemRef_AllocaOp AssumeAlignmentOp -> MemRef_AssumeAlignmentOp DeallocOp -> MemRef_DeallocOp DimOp -> MemRef_DimOp MemRefCastOp -> MemRef_CastOp MemRefReinterpretCastOp -> MemRef_ReinterpretCastOp GetGlobalMemRefOp -> MemRef_GetGlobalOp GlobalMemRefOp -> MemRef_GlobalOp LoadOp -> MemRef_LoadOp PrefetchOp -> MemRef_PrefetchOp ReshapeOp -> MemRef_ReshapeOp StoreOp -> MemRef_StoreOp SubViewOp -> MemRef_SubViewOp TransposeOp -> MemRef_TransposeOp TensorLoadOp -> MemRef_TensorLoadOp TensorStoreOp -> MemRef_TensorStoreOp TensorToMemRefOp -> MemRef_BufferCastOp ViewOp -> MemRef_ViewOp The roadmap to split the memref dialect from std is discussed here: https://llvm.discourse.group/t/rfc-split-the-memref-dialect-from-std/2667 Differential Revision: https://reviews.llvm.org/D98041	2021-03-15 11:14:09 +01:00
Alex Zinenko	40d8e4d3f9	Revert "[Canonicalizer] Process regions top-down instead of bottom up & reuse existing constants." This reverts commit `b5d9a3c923`. The commit introduced a memory error in canonicalization/operation walking that is exposed when compiled with ASAN. It leads to crashes in some "release" configurations.	2021-03-15 10:27:55 +01:00
Frederik Gossen	b55f424ffc	[MLIR] Add canonicalization for `shape.broadcast` Remove redundant operands and fold if only one left. Differential Revision: https://reviews.llvm.org/D98402	2021-03-15 10:11:28 +01:00
Frederik Gossen	2a71f95767	[MLIR] Allow compatible shapes in `Elementwise` operations Differential Revision: https://reviews.llvm.org/D98186	2021-03-15 09:56:20 +01:00
Chris Lattner	91a6ad5ad8	[m_Constant] Check #operands/results before hasTrait() We know that all ConstantLike operations have one result and no operands, so check this first before doing the trait check. This change speeds up Canonicalize on a CIRCT testcase by ~5%. Differential Revision: https://reviews.llvm.org/D98615	2021-03-14 20:14:19 -07:00
Chris Lattner	b5d9a3c923	[Canonicalizer] Process regions top-down instead of bottom up & reuse existing constants. Two changes: 1) Change the canonicalizer to walk the function in top-down order instead of bottom-up order. This composes well with the "top down" nature of constant folding and simplification, reducing iterations and re-evaluation of ops in simple cases. 2) Explicitly enter existing constants into the OperationFolder table before canonicalizing. Previously we would "constant fold" them and rematerialize them, wastefully recreating a bunch fo constants, which lead to pointless memory traffic. Both changes together provide a 33% speedup for canonicalize on some mid-size CIRCT examples. One artifact of this change is that the constants generated in normal pattern application get inserted at the top of the function as the patterns are applied. Because of this, we get "inverted" constants more often, which is an aethetic change to the IR but does permute some testcases. Differential Revision: https://reviews.llvm.org/D98609	2021-03-14 18:21:42 -07:00
Aart Bik	e7ee4eaaf7	[mlir][sparse] disable nonunit stride dense vectorization This is a temporary work-around to get our all-annotations-all-flags stress testing effort run clean. In the long run, we want to provide efficient implementations of strided loads and stores though Reviewed By: bixia Differential Revision: https://reviews.llvm.org/D98563	2021-03-12 16:49:32 -08:00
Eugene Zhulenev	39b2cd4009	[mlir] Annotate functions used only in debug mode with LLVM_ATTRIBUTE_UNUSED Functions used only in `assert` cause warnings in release mode Reviewed By: mehdi_amini, dcaballe, ftynse Differential Revision: https://reviews.llvm.org/D98476	2021-03-12 11:25:46 -08:00
Alex Zinenko	4affd0c40e	[mlir] fix a memory leak in NestedPattern NestedPattern uses a BumpPtrAllocator to store child (nested) pattern objects to decrease the overhead of dynamic allocation. This assumes all allocations happen inside the allocator that will be freed as a whole. However, NestedPattern contains `std::function` as a member, which allocates internally using `new`, unaware of the BumpPtrAllocator. Since NestedPattern only holds pointers to the nested patterns allocated in the BumpPtrAllocator, it never calls their destructors, so the destructor of the `std::function`s they contain are never called either, leaking the allocated memory. Make NestedPattern explicitly call destructors of nested patterns. This additionally requires to actually copy the nested patterns in copy-construction and copy-assignment instead of just sharing the pointer to the arena-allocated list of children to avoid double-free. An alternative solution would be to add reference counting to the list of arena-allocated list of children. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D98485	2021-03-12 18:52:14 +01:00
Alex Zinenko	be5b844a35	[mlir] fix memory leak on failure path in parser Forward references to blocks lead to `Block`s being allocated in the parser, but they are not necessarily included into a region if parsing fails, leading to a leak. Clean them up in parser destructor. Reviewed By: rriddle, mehdi_amini Differential Revision: https://reviews.llvm.org/D98403	2021-03-12 09:24:08 +01:00
Marius Brehler	849f8183fb	[mlir] Fix ConstantOp verifier This restricts the attributes to integers for constants of type IndexType. So far an attribute like StringAttr as in %c1 = constant "" : index is valid. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D98216	2021-03-12 08:49:25 +01:00
Sergei Grechanik	fd2b08969b	[mlir][Vector] Lowering of transfer_read/write to vector.load/store This patch introduces progressive lowering patterns for rewriting vector.transfer_read/write to vector.load/store and vector.broadcast in certain supported cases. Reviewed By: dcaballe, nicolasvasilache Differential Revision: https://reviews.llvm.org/D97822	2021-03-11 18:17:51 -08:00
Mehdi Amini	e1364f1068	Replace use of OperationState with builder::create in GPU Kernel Outlining (NFC) OperationState is a low level API that is rarely indicated, the builder API convenient wrapper is preferred when possible.	2021-03-12 00:14:02 +00:00
Diego Caballero	0fd0fb5329	Reland: [mlir][Affine][Vector] Add initial support for 'iter_args' to Affine vectorizer. This patch adds support for vectorizing loops with 'iter_args' when those loops are not a vector dimension. This allows vectorizing outer loops with an inner 'iter_args' loop (e.g., reductions). Vectorizing scenarios where 'iter_args' loops are vector dimensions would require more work (e.g., analysis, generating horizontal reduction, etc.) not included in this patch. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D97892	2021-03-12 01:08:28 +02:00
Diego Caballero	96891f0418	Reland: [mlir][Vector][Affine] Improve affine vectorizer algorithm This patch replaces the root-terminal vectorization approach implemented in the Affine vectorizer with a topological order approach that vectorizes all the operations within the target loop nest. These are the most important changes introduced by the new algorithm: * Removed tracking of root and terminal ops. Existing vectorization functionality is preserved and extended so that loop nests without root-terminal chains can be vectorized. * Vectorizing a loop nest now only requires a single topological traversal. * A new vector loop nest is incrementally built along the vectorization process. The original scalar loop is kept intact. No cloning guard is needed to recover the scalar loop if vectorization fails. This approach also simplifies the challenging task of replacing a loop operation amid the vectorization process without invalidating the analysis information that depends on the original loop. * Vectorization of specific operations has been implemented as independent, preparing them to be moved to a potential vectorization interface. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D97442	2021-03-12 00:19:50 +02:00
River Riddle	31bb8efd69	[mlir][StorageUniquer] Properly call the destructor on non-trivially destructible storage instances This allows for storage instances to store data that isn't uniqued in the context, or contain otherwise non-trivial logic, in the rare situations that they occur. Storage instances with trivial destructors will still have their destructor skipped. A consequence of this is that the storage instance definition must be visible from the place that registers the type. Differential Revision: https://reviews.llvm.org/D98311	2021-03-11 11:35:32 -08:00
Diego Caballero	ed193bce9d	[mlir][Vector][Affine] Fix heap-use-after-free in vectorizer This patch fixes a heap-use-after-free introduced by the recent changes in the vectorizer: https://reviews.llvm.org/rG95db7b4aeaad590f37720898e339a6d54313422f The problem is due to the way candidate loops are visited. All candidate loops are pattern-matched beforehand using the 'NestedMatch' utility. These matches may intersect with each other so it may happen that we try to vectorize a loop that was previously vectorized. The new vectorization algorithm replaces the original loops that are vectorized with new loops and, therefore, any reference to the original loops in the pre-computed matches becomes invalid. This patch fixes the problem by classifying the candidate matches into buckets before vectorization. Each bucket contains all the matches that intersect. The vectorizer uses these buckets to make sure that we only vectorize one match from each bucket, at most. Differential Revision: https://reviews.llvm.org/D98382	2021-03-11 20:44:07 +02:00
Nikita Popov	f3f0c6cd47	[mlir] Remove uses of type-less CreateLoad() APIs (NFC) For the use in LLVMOps.td I used the getPointerElementType() escape hatch, as it's not obvious to me how the load type should be properly obtained here.	2021-03-11 18:39:20 +01:00
Alex Zinenko	3ba14fa0ce	[mlir] Introduce data layout modeling subsystem Data layout information allows to answer questions about the size and alignment properties of a type. It enables, among others, the generation of various linear memory addressing schemes for containers of abstract types and deeper reasoning about vectors. This introduces the subsystem for modeling data layouts in MLIR. The data layout subsystem is designed to scale to MLIR's open type and operation system. At the top level, it consists of attribute interfaces that can be implemented by concrete data layout specifications; type interfaces that should be implemented by types subject to data layout; operation interfaces that must be implemented by operations that can serve as data layout scopes (e.g., modules); and dialect interfaces for data layout properties unrelated to specific types. Built-in types are handled specially to decrease the overall query cost. A concrete default implementation of these interfaces is provided in the new Target dialect. Defaults for built-in types that match the current behavior are also provided. Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D97067	2021-03-11 16:54:47 +01:00
Arpith C. Jacob	b4a516cc43	[mlir] Add LLVM loop codegen options to control software pipelining Support specifying the II and disabling pipelining. Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D98420	2021-03-11 16:46:44 +01:00
Tres Popp	25a20b8aa6	[mlir] Correct verifyCompatibleShapes verifyCompatibleShapes is not transitive. Create an n-ary version and update SameOperandShapes and SameOperandAndResultShapes traits to use it. Differential Revision: https://reviews.llvm.org/D98331	2021-03-11 13:04:10 +01:00
Julian Gross	2aef202981	[mlir] Fix invalid hoisting of dependent allocs in buffer hoisting pass. Buffer hoisting moves allocs upwards although it has dependency within its nested region. This patch fixes this issue. https://bugs.llvm.org/show_bug.cgi?id=49142 Differential Revision: https://reviews.llvm.org/D98248	2021-03-11 11:46:16 +01:00
Frederik Gossen	b975e3b5aa	[MLIR] Add canoncalization for `shape.is_broadcastable` Canonicalize `is_broadcastable` to constant true if fewer than 2 unique shape operands. Eliminate redundant operands, otherwise. Differential Revision: https://reviews.llvm.org/D98361	2021-03-11 10:10:34 +01:00
Christian Sigg	2224221fb3	[mlir] Add NVVM to CUBIN conversion to mlir-opt If MLIR_CUDA_RUNNER_ENABLED, register a 'gpu-to-cubin' conversion pass to mlir-opt. The next step is to switch CUDA integration tests from mlir-cuda-runner to mlir-opt + mlir-cpu-runner and remove mlir-cuda-runner. Depends On D98279 Reviewed By: herhut, rriddle, mehdi_amini Differential Revision: https://reviews.llvm.org/D98203	2021-03-11 10:07:11 +01:00
River Riddle	4e02eb8014	[mlir] Optimize the implementation of RegionDCE The current implementation has some inefficiencies that become noticeable when running on large modules. This revision optimizes the code, and updates some out-dated idioms with newer utilities. The main components of this optimization include: * Add an overload of Block::eraseArguments that allows for O(N) erasure of disjoint arguments. * Don't process entry block arguments given that we don't erase them at this point. * Don't track individual operation results, given that we don't erase them. We can just track the parent operation. Differential Revision: https://reviews.llvm.org/D98309	2021-03-10 16:39:50 -08:00
Emilio Cota	c0891706bc	[mlir] Add polynomial approximation for math::Log2 ``` name old cpu/op new cpu/op delta BM_mlir_Log2_f32/10 134ns ±15% 45ns ± 4% -66.39% (p=0.000 n=20+17) BM_mlir_Log2_f32/100 1.03µs ±16% 0.12µs ±10% -88.78% (p=0.000 n=20+18) BM_mlir_Log2_f32/1k 10.3µs ±16% 0.7µs ± 5% -93.24% (p=0.000 n=20+17) BM_mlir_Log2_f32/10k 104µs ±15% 7µs ±14% -93.25% (p=0.000 n=20+20) BM_eigen_s_Log2_f32/10 95.3ns ±17% 90.9ns ± 6% ~ (p=0.228 n=20+18) BM_eigen_s_Log2_f32/100 907ns ± 3% 911ns ± 6% ~ (p=0.539 n=16+20) BM_eigen_s_Log2_f32/1k 9.88µs ± 4% 9.85µs ± 3% ~ (p=0.790 n=16+17) BM_eigen_s_Log2_f32/10k 105µs ±10% 110µs ±16% ~ (p=0.459 n=16+20) BM_eigen_v_Log2_f32/10 32.5ns ±31% 33.9ns ±14% +4.31% (p=0.028 n=17+20) BM_eigen_v_Log2_f32/100 176ns ± 8% 180ns ± 7% +2.19% (p=0.045 n=16+17) BM_eigen_v_Log2_f32/1k 1.44µs ± 4% 1.50µs ± 9% +3.91% (p=0.001 n=16+17) BM_eigen_v_Log2_f32/10k 14.5µs ±10% 15.0µs ± 8% +3.92% (p=0.002 n=16+19) ``` Reviewed By: ezhulenev Differential Revision: https://reviews.llvm.org/D98282	2021-03-10 14:49:22 -08:00
Christian Sigg	6a291ed0f0	[mlir] Remove unnecessary copying of pass options I missed a comment in D98279 that you don't need to copy pass options. Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D98366	2021-03-10 21:55:28 +01:00
Alex Zinenko	79da91c59a	Revert "[mlir][Vector][Affine] Improve affine vectorizer algorithm" This reverts commit `95db7b4aea`. This breaks vectorize_2d.mlir and vectorize_3d.mlir test under ASAN (use after free).	2021-03-10 20:25:49 +01:00
Alex Zinenko	ed715536f1	Revert "[mlir][Affine][Vector] Add initial support for 'iter_args' to Affine vectorizer." This reverts commit `77a9d1549f`. Parent commit is broken.	2021-03-10 20:25:32 +01:00
Diego Caballero	77a9d1549f	[mlir][Affine][Vector] Add initial support for 'iter_args' to Affine vectorizer. This patch adds support for vectorizing loops with 'iter_args' when those loops are not a vector dimension. This allows vectorizing outer loops with an inner 'iter_args' loop (e.g., reductions). Vectorizing scenarios where 'iter_args' loops are vector dimensions would require more work (e.g., analysis, generating horizontal reduction, etc.) not included in this patch. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D97892	2021-03-10 20:40:21 +02:00
Diego Caballero	95db7b4aea	[mlir][Vector][Affine] Improve affine vectorizer algorithm This patch replaces the root-terminal vectorization approach implemented in the Affine vectorizer with a topological order approach that vectorizes all the operations within the target loop nest. These are the most important changes introduced by the new algorithm: * Removed tracking of root and terminal ops. Existing vectorization functionality is preserved and extended so that loop nests without root-terminal chains can be vectorized. * Vectorizing a loop nest now only requires a single topological traversal. * A new vector loop nest is incrementally built along the vectorization process. The original scalar loop is kept intact. No cloning guard is needed to recover the scalar loop if vectorization fails. This approach also simplifies the challenging task of replacing a loop operation amid the vectorization process without invalidating the analysis information that depends on the original loop. * Vectorization of specific operations has been implemented as independent, preparing them to be moved to a potential vectorization interface. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D97442	2021-03-10 20:29:58 +02:00
Vladislav Vinogradov	b599f464d4	[mlir][CMAKE] Fix build with BUILD_SHARED_LIBS=ON Link `MLIRStandardToLLVM` to `MLIRAVX512Transforms`, since the latter uses `LLVMTypeConverter` defined in the first one. Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D98336	2021-03-10 14:52:36 +01:00

1 2 3 4 5 ...

5141 Commits