llvm-project

Commit Graph

Author	SHA1	Message	Date
Uday Bondhugula	bc2a543225	fix loop unroll and jam - operand mapping - imperfect nest case - fix operand mapping while cloning sub-blocks to jam - was incorrect for imperfect nests where def/use was across sub-blocks - strengthen/generalize the first test case to cover the previously missed scenario - clean up the other cases while on this. Previously, unroll-jamming the following nest ``` affine.for %arg0 = 0 to 2048 { %0 = alloc() : memref<512x10xf32> affine.for %arg1 = 0 to 10 { %1 = affine.load %0[%arg0, %arg1] : memref<512x10xf32> } dealloc %0 : memref<512x10xf32> } ``` would yield ``` %0 = alloc() : memref<512x10xf32> %1 = affine.apply #map0(%arg0) %2 = alloc() : memref<512x10xf32> affine.for %arg1 = 0 to 10 { %4 = affine.load %0[%arg0, %arg1] : memref<512x10xf32> %5 = affine.apply #map0(%arg0) %6 = affine.load %0[%5, %arg1] : memref<512x10xf32> } dealloc %0 : memref<512x10xf32> %3 = affine.apply #map0(%arg0) dealloc %0 : memref<512x10xf32> ``` instead of ``` module { affine.for %arg0 = 0 to 2048 step 2 { %0 = alloc() : memref<512x10xf32> %1 = affine.apply #map0(%arg0) %2 = alloc() : memref<512x10xf32> affine.for %arg1 = 0 to 10 { %4 = affine.load %0[%arg0, %arg1] : memref<512x10xf32> %5 = affine.apply #map0(%arg0) %6 = affine.load %2[%5, %arg1] : memref<512x10xf32> } dealloc %0 : memref<512x10xf32> %3 = affine.apply #map0(%arg0) dealloc %2 : memref<512x10xf32> } ``` Signed-off-by: Uday Bondhugula <uday@polymagelabs.com> Closes tensorflow/mlir#98 COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/98 from bondhugula:ujam ddbc853f69b5608b3e8ff9b5ac1f6a5a0bb315a4 PiperOrigin-RevId: 266073460	2019-08-28 23:42:50 -07:00
Uday Bondhugula	aa2cee9cf5	Refactor / improve replaceAllMemRefUsesWith Refactor replaceAllMemRefUsesWith to split it into two methods: the new method does the replacement on a single op, and is used by the existing one. - make the methods return LogicalResult instead of bool - Earlier, when replacement failed (due to non-deferencing uses of the memref), the set of ops that had already been processed would have been replaced leaving the IR in an inconsistent state. Now, a pass is made over all ops to first check for non-deferencing uses, and then replacement is performed. No test cases were affected because all clients of this method were first checking for non-deferencing uses before calling this method (for other reasons). This isn't true for a use case in another upcoming PR (scalar replacement); clients can now bail out with consistent IR on failure of replaceAllMemRefUsesWith. Add test case. - multiple deferencing uses of the same memref in a single op is possible (we have no such use cases/scenarios), and this has always remained unsupported. Add an assertion for this. - minor fix to another test pipeline-data-transfer case. Signed-off-by: Uday Bondhugula <uday@polymagelabs.com> Closes tensorflow/mlir#87 PiperOrigin-RevId: 265808183	2019-08-27 17:56:56 -07:00
Andy Ly	6a501e3d1b	Support folding of ops with inner ops in GreedyPatternRewriteDriver. This fixes a bug when folding ops with inner ops and inner ops are still being visited. PiperOrigin-RevId: 265475780	2019-08-26 09:44:39 -07:00
River Riddle	305516fcd3	Allow isolated regions to form isolated SSA name scopes in the printer. This will allow for naming values the same as existing SSA values for regions attached to operations that are isolated from above. This fits in with how the system already allows separate name scopes for sibling regions. This name shadowing can be enabled in the custom parser of operations by setting the 'enableNameShadowing' flag to true when calling 'parseRegion'. %arg = constant 10 : i32 foo.op { %arg = constant 10 : i32 } PiperOrigin-RevId: 264255999	2019-08-19 15:27:10 -07:00
River Riddle	a481032a33	Refactor ElementsAttr::getValue and DenseElementsAttr::getSplatValue. All 'getValue' variants now require that the index is valid, queryable via 'isValidIndex'. 'getSplatValue' now requires that the attribute is a proper splat. This allows for querying these methods on DenseElementAttr with all possible value types; e.g. float, int, APInt, etc. This also allows for removing unnecessary conversions to Attribute that really want the underlying value. PiperOrigin-RevId: 263437337	2019-08-14 15:03:53 -07:00
Andy Ly	55f2e24ab3	Remove ops in regions/blocks from worklist when parent op is being removed via GreedyPatternRewriteDriver::replaceOp. This fixes a bug where ops inside the parent op are visited even though the parent op has been removed. PiperOrigin-RevId: 261953580	2019-08-06 11:08:54 -07:00
Uday Bondhugula	18b8d4352b	Introduce explicit copying optimization by generalizing the DMA generation pass Explicit copying to contiguous buffers is a standard technique to avoid conflict misses and TLB misses, and improve hardware prefetching performance. When done in conjunction with cache tiling, it nearly eliminates all cache conflict and TLB misses, and a single hardware prefetch stream is needed per data tile. - generalize/extend DMA generation pass (renamed data copying pass) to perform either point-wise explicit copies to fast memory buffers or DMAs (depending on a cmd line option). All logic is the same as erstwhile -dma-generate. - -affine-dma-generate is now renamed -affine-data-copy; when -dma flag is provided, DMAs are generated, or else explicit copy loops are generated (point-wise) by default. - point-wise copying could be used for CPUs (or GPUs); some indicative performance numbers with a "C" version of the MLIR when compiled with and without this optimization (about 2x improvement here). With a matmul on 4096^2 matrices on a single core of an Intel Core i7 Skylake i7-8700K with clang 8.0.0: clang -O3: 518s clang -O3 with MLIR tiling (128x128): 24.5s clang -O3 with MLIR tiling + data copying 12.4s (code equivalent to test/Transforms/data-copy.mlir func @matmul) - fix some misleading comments. - change default fast-mem space to 0 (more intuitive now with the default copy generation using point-wise copies instead of DMAs) On a simple 3-d matmul loop nest, code generated with -affine-data-copy: ``` affine.for %arg3 = 0 to 4096 step 128 { affine.for %arg4 = 0 to 4096 step 128 { %0 = affine.apply #map0(%arg3, %arg4) %1 = affine.apply #map1(%arg3, %arg4) %2 = alloc() : memref<128x128xf32, 2> // Copy-in Out matrix. affine.for %arg5 = 0 to 128 { %5 = affine.apply #map2(%arg3, %arg5) affine.for %arg6 = 0 to 128 { %6 = affine.apply #map2(%arg4, %arg6) %7 = load %arg2[%5, %6] : memref<4096x4096xf32> affine.store %7, %2[%arg5, %arg6] : memref<128x128xf32, 2> } } affine.for %arg5 = 0 to 4096 step 128 { %5 = affine.apply #map0(%arg3, %arg5) %6 = affine.apply #map1(%arg3, %arg5) %7 = alloc() : memref<128x128xf32, 2> // Copy-in LHS. affine.for %arg6 = 0 to 128 { %11 = affine.apply #map2(%arg3, %arg6) affine.for %arg7 = 0 to 128 { %12 = affine.apply #map2(%arg5, %arg7) %13 = load %arg0[%11, %12] : memref<4096x4096xf32> affine.store %13, %7[%arg6, %arg7] : memref<128x128xf32, 2> } } %8 = affine.apply #map0(%arg5, %arg4) %9 = affine.apply #map1(%arg5, %arg4) %10 = alloc() : memref<128x128xf32, 2> // Copy-in RHS. affine.for %arg6 = 0 to 128 { %11 = affine.apply #map2(%arg5, %arg6) affine.for %arg7 = 0 to 128 { %12 = affine.apply #map2(%arg4, %arg7) %13 = load %arg1[%11, %12] : memref<4096x4096xf32> affine.store %13, %10[%arg6, %arg7] : memref<128x128xf32, 2> } } // Compute. affine.for %arg6 = #map7(%arg3) to #map8(%arg3) { affine.for %arg7 = #map7(%arg4) to #map8(%arg4) { affine.for %arg8 = #map7(%arg5) to #map8(%arg5) { %11 = affine.load %7[-%arg3 + %arg6, -%arg5 + %arg8] : memref<128x128xf32, 2> %12 = affine.load %10[-%arg5 + %arg8, -%arg4 + %arg7] : memref<128x128xf32, 2> %13 = affine.load %2[-%arg3 + %arg6, -%arg4 + %arg7] : memref<128x128xf32, 2> %14 = mulf %11, %12 : f32 %15 = addf %13, %14 : f32 affine.store %15, %2[-%arg3 + %arg6, -%arg4 + %arg7] : memref<128x128xf32, 2> } } } dealloc %10 : memref<128x128xf32, 2> dealloc %7 : memref<128x128xf32, 2> } %3 = affine.apply #map0(%arg3, %arg4) %4 = affine.apply #map1(%arg3, %arg4) // Copy out result matrix. affine.for %arg5 = 0 to 128 { %5 = affine.apply #map2(%arg3, %arg5) affine.for %arg6 = 0 to 128 { %6 = affine.apply #map2(%arg4, %arg6) %7 = affine.load %2[%arg5, %arg6] : memref<128x128xf32, 2> store %7, %arg2[%5, %6] : memref<4096x4096xf32> } } dealloc %2 : memref<128x128xf32, 2> } } ``` With -affine-data-copy -dma: ``` affine.for %arg3 = 0 to 4096 step 128 { %0 = affine.apply #map3(%arg3) %1 = alloc() : memref<128xf32, 2> %2 = alloc() : memref<1xi32> affine.dma_start %arg2[%arg3], %1[%c0], %2[%c0], %c128_0 : memref<4096xf32>, memref<128xf32, 2>, memref<1xi32> affine.dma_wait %2[%c0], %c128_0 : memref<1xi32> %3 = alloc() : memref<1xi32> affine.for %arg4 = 0 to 4096 step 128 { %5 = affine.apply #map0(%arg3, %arg4) %6 = affine.apply #map1(%arg3, %arg4) %7 = alloc() : memref<128x128xf32, 2> %8 = alloc() : memref<1xi32> affine.dma_start %arg0[%arg3, %arg4], %7[%c0, %c0], %8[%c0], %c16384, %c4096, %c128_2 : memref<4096x4096xf32>, memref<128x128xf32, 2>, memref<1xi32> affine.dma_wait %8[%c0], %c16384 : memref<1xi32> %9 = affine.apply #map3(%arg4) %10 = alloc() : memref<128xf32, 2> %11 = alloc() : memref<1xi32> affine.dma_start %arg1[%arg4], %10[%c0], %11[%c0], %c128_1 : memref<4096xf32>, memref<128xf32, 2>, memref<1xi32> affine.dma_wait %11[%c0], %c128_1 : memref<1xi32> affine.for %arg5 = #map3(%arg3) to #map5(%arg3) { affine.for %arg6 = #map3(%arg4) to #map5(%arg4) { %12 = affine.load %7[-%arg3 + %arg5, -%arg4 + %arg6] : memref<128x128xf32, 2> %13 = affine.load %10[-%arg4 + %arg6] : memref<128xf32, 2> %14 = affine.load %1[-%arg3 + %arg5] : memref<128xf32, 2> %15 = mulf %12, %13 : f32 %16 = addf %14, %15 : f32 affine.store %16, %1[-%arg3 + %arg5] : memref<128xf32, 2> } } dealloc %11 : memref<1xi32> dealloc %10 : memref<128xf32, 2> dealloc %8 : memref<1xi32> dealloc %7 : memref<128x128xf32, 2> } %4 = affine.apply #map3(%arg3) affine.dma_start %1[%c0], %arg2[%arg3], %3[%c0], %c128 : memref<128xf32, 2>, memref<4096xf32>, memref<1xi32> affine.dma_wait %3[%c0], %c128 : memref<1xi32> dealloc %3 : memref<1xi32> dealloc %2 : memref<1xi32> dealloc %1 : memref<128xf32, 2> } ``` Signed-off-by: Uday Bondhugula <uday@polymagelabs.com> Closes tensorflow/mlir#50 PiperOrigin-RevId: 261221903	2019-08-01 16:31:58 -07:00
Nicolas Vasilache	54175c240a	Fix backward slice corner case In the backward slice computation, BlockArgument coming from function arguments represent a natural boundary for the traversal and should not trigger llvm_unreachable. This CL also improves the error message and adds a relevant test. PiperOrigin-RevId: 260118630	2019-07-26 03:49:17 -07:00
Nicolas Vasilache	fae4d94990	Use "standard" load and stores in LowerVectorTransfers Clipping creates non-affine memory accesses, use std_load and std_store instead of affine_load and affine_store. In the future we may also want a fill with the neutral element rather than clip, this would make the accesses affine if we wanted more analyses and transformations to happen post lowering to pointwise copies. PiperOrigin-RevId: 260110503	2019-07-26 02:34:24 -07:00
River Riddle	1293708473	Add support for an analysis mode to DialectConversion. This mode analyzes which operations are legalizable to the given target if a conversion were to be applied, i.e. no rewrites are ever performed even on success. This mode is useful for device partitioning or other utilities that may want to analyze the effect of conversion to different targets before performing it. The analysis method currently just fills a provided set with the operations that were found to be legalizable. This can be extended in the future to capture more information as necessary. PiperOrigin-RevId: 259987105	2019-07-25 11:31:07 -07:00
Nicolas Vasilache	dd652ce9cc	Fix backward slice computation to iterate through known control flow This CL fixes an oversight with dealing with loops in slicing analysis. The forward slice computation properly propagates through loops but not the backward slice. Add relevant unit tests. PiperOrigin-RevId: 259903396	2019-07-25 01:33:35 -07:00
Nicolas Vasilache	8ebb4281aa	Cleanup slicing test. Remove hardcoded SSA names and make use of CHECK-LABEL directives. PiperOrigin-RevId: 259767803	2019-07-24 10:28:33 -07:00
Alex Zinenko	480d68f8de	Affine loop parallelism detection: conservatively handle unknown ops The loop parallelism detection utility only collects the affine.load and affine.store operations appearing inside the loop to analyze the access patterns for the absence of dependences. However, any operation, including unregistered operations, can appear in a body of an affine loop. If such operation has side effects, the result of parallelism analysis is incorrect. Conservatively assume affine loops are not parallel in presence of operations other than affine.load, affine.store, affine.for, affine.terminator that may have side effects. This required to update the loop-fusion unit test that relies on parallelism analysis and was exercising loop fusion in presence of an unregistered operation. PiperOrigin-RevId: 259560935	2019-07-23 10:18:46 -07:00
Uday Bondhugula	b5f8a4be27	Introduce parser library method to parse list of region arguments - introduce parseRegionArgumentList (similar to parseOperandList) to parse a list of region arguments with a delimiter - allows defining custom parse for op's with multiple/variadic number of region arguments - use this on the gpu.launch op (although the latter has a fixed number of region arguments) - add a test dialect op to test region argument list parsing (with the no delimiter case) Signed-off-by: Uday Bondhugula <uday@polymagelabs.com> Closes tensorflow/mlir#40 PiperOrigin-RevId: 259442536	2019-07-22 17:42:08 -07:00
Nicolas Vasilache	48a1baeb8a	Refactor LoopParametricTiling as a test pass - NFC This CL moves LoopParametricTiling into test/lib as a pass for purely testing purposes. PiperOrigin-RevId: 259300264	2019-07-22 04:31:17 -07:00
River Riddle	00bdc8e070	Refactor region type signature conversion to be explicit via patterns. This cl enforces that the conversion of the type signatures for regions, and thus their entry blocks, is handled via ConversionPatterns. A new hook 'applySignatureConversion' is added to the ConversionPatternRewriter to perform the desired conversion on a region. This also means that the handling of rewriting the signature of a FuncOp is moved to a pattern. A default implementation is provided via 'mlir::populateFuncOpTypeConversionPattern'. This removes the hacky implicit 'dynamically legal' status of FuncOp that was present previously, and leaves it up to the user to decide when/how to convert the signature of a function. PiperOrigin-RevId: 259161999	2019-07-20 19:06:07 -07:00
Nicolas Vasilache	6204acacc7	Uniformize test name - NFC PiperOrigin-RevId: 258956693	2019-07-19 11:40:43 -07:00
Nicolas Vasilache	db4cd1c8dc	Utility function to map a loop on a parametric grid of virtual processors This CL introduces a simple loop utility function which rewrites the bounds and step of a loop so as to become mappable on a regular grid of processors whose identifiers are given by SSA values. A corresponding unit test is added. For example, using CUDA terminology, and assuming a 2-d grid with processorIds = [blockIdx.x, threadIdx.x] and numProcessors = [gridDim.x, blockDim.x], the loop: ``` loop.for %i = %lb to %ub step %step { ... } ``` is rewritten into a version resembling the following pseudo-IR: ``` loop.for %i = %lb + threadIdx.x + blockIdx.x * blockDim.x to %ub step %gridDim.x * blockDim.x { ... } ``` PiperOrigin-RevId: 258945942	2019-07-19 11:40:31 -07:00
Nicolas Vasilache	5bc344743c	Uniformize the API for the mlir::tile functions on AffineForOp and loop::ForOp This CL adapts the recently introduced parametric tiling to have an API matching the tiling of AffineForOp. The transformation using stripmineSink is more general and produces imperfectly nested loops. Perfect nesting invariants of the tiled version are obtained by selectively applying hoisting of ops to isolate perfectly nested bands. Such hoisting may fail to produce a perfect loop nest in cases where ForOp transitively depend on enclosing induction variables. In such cases, the API provides a LogicalResult return but the SimpleParametricLoopTilingPass does not currently use this result. A new unit test is added with a triangular loop for which the perfect nesting property does not hold. For this example, the old behavior was to produce IR that did not verify (some use was not dominated by its def). PiperOrigin-RevId: 258928309	2019-07-19 11:40:25 -07:00
River Riddle	9e3c2650d2	Refactor the conversion of block argument types in DialectConversion. This cl begins a large refactoring over how signature types are converted in the DialectConversion infrastructure. The signatures of blocks are now converted on-demand when an operation held by that block is being converted. This allows for handling the case where a region is created as part of a pattern, something that wasn't possible previously. This cl also generalizes the region signature conversion used by FuncOp to work on any region of any operation. This generalization allows for removing the 'apply*Conversion' functions that were specific to FuncOp/ModuleOp. The implementation currently uses a new hook on TypeConverter, 'convertRegionSignature', but this should ideally be removed in favor of using Patterns. That depends on adding support to the PatternRewriter used by ConversionPattern to allow applying signature conversions to regions, which should be coming in a followup. PiperOrigin-RevId: 258645733	2019-07-19 11:38:45 -07:00
River Riddle	491ef84dc4	Add support for explicitly marking dialects and operations as illegal. This explicit tag is useful is several ways: ) This simplifies how to mark sub sections of a dialect as explicitly unsupported, e.g. my target supports all operations in the foo dialect except for these select few. This is useful for partial lowerings between dialects. ) Partial conversions will now verify that operations that were explicitly marked as illegal must be converted. This provides some guarantee that the operations that need to be lowered by a specific pass will be. PiperOrigin-RevId: 258582879	2019-07-19 11:38:25 -07:00
Alex Zinenko	fc044e8929	Introduce loop coalescing utility and a simple pass Multiple (perfectly) nested loops with independent bounds can be combined into a single loop and than subdivided into blocks of arbitrary size for load balancing or more efficient parallelism exploitation. However, MLIR wants to preserve the multi-dimensional multi-loop structure at higher levels of abstraction. Introduce a transformation that coalesces nested loops with independent bounds so that they can be further subdivided by tiling. PiperOrigin-RevId: 258151016	2019-07-16 13:43:44 -07:00
Nicolas Vasilache	cca53e8527	Extract std.for std.if and std.terminator in their own dialect These ops should not belong to the std dialect. This CL extracts them in their own dialect and updates the corresponding conversions and tests. PiperOrigin-RevId: 258123853	2019-07-16 13:43:18 -07:00
River Riddle	a764c19d17	Fix a bug in DialectConversion when using RewritePattern. When using a RewritePattern and replacing an operation with an existing value, that value may have already been replaced by something else. This cl ensures that only the final value is used when applying rewrites. PiperOrigin-RevId: 258058488	2019-07-16 13:43:12 -07:00
Nicolas Vasilache	cab671d166	Lower affine control flow to std control flow to LLVM dialect This CL splits the lowering of affine to LLVM into 2 parts: 1. affine -> std 2. std -> LLVM The conversions mostly consists of splitting concerns between the affine and non-affine worlds from existing conversions. Short-circuiting of affine `if` conditions was never tested or exercised and is removed in the process, it can be reintroduced later if needed. LoopParametricTiling.cpp is updated to reflect the newly added ForOp::build. PiperOrigin-RevId: 257794436	2019-07-12 08:44:28 -07:00
Alex Zinenko	054e25c079	EDSC: use affine.load/store instead of std.load/store Standard load and store operations are evolving to be separated from the Affine constructs. Special affine.load/store have been introduced to uphold the restrictions of the Affine control flow constructs on their operands. EDSC-produced loads and stores were originally intended to uphold those restrictions as well so they should use affine.load/store instead of std.load/store. PiperOrigin-RevId: 257443307	2019-07-12 08:42:28 -07:00
River Riddle	89bc449cee	Standardize the value numbering in the AsmPrinter. Change the AsmPrinter to number values breadth-first so that values in adjacent regions can have the same name. This allows for ModuleOp to contain operations that produce results. This also standardizes the special name of region entry arguments to "arg[0-9+]" now that Functions are also operations. PiperOrigin-RevId: 257225069	2019-07-09 10:41:00 -07:00
Alex Zinenko	9d03f5674f	Implement parametric tiling on standard for loops Parametric tiling can be used to extract outer loops with fixed number of iterations. This in turn enables mapping to GPU kernels on a fixed grid independently of the range of the original loops, which may be unknown statically, making the kernel adaptable to different sizes. Provide a utility function that also computes the parametric tile size given the range of the loop. Exercise the utility function through a simple pass that applies it to all top-level loop nests. Permutability or parallelism checks must be performed before calling this utility function in actual passes. Note that parametric tiling cannot be implemented in a purely affine way, although it can be encoded using semi-affine maps. The choice to implement it on standard loops is guided by them being the common representation between Affine loops, Linalg and GPU kernels. PiperOrigin-RevId: 257180251	2019-07-09 06:37:41 -07:00
River Riddle	e7d594bb1c	Replace the implementation of Function and Module with FuncOp and ModuleOp. This is an important step in allowing for the top-level of the IR to be extensible. FuncOp and ModuleOp contain all of the necessary functionality, while using the existing operation infrastructure. As an interim step, many of the usages of Function and Module, including the name, will remain the same. In the future, many of these will be relaxed to allow for many different types of top-level operations to co-exist. PiperOrigin-RevId: 256427100	2019-07-03 14:37:18 -07:00
Andy Davis	2e1187dd25	Globally change load/store/dma_start/dma_wait operations over to affine.load/store/dma_start/dma_wait. In most places, this is just a name change (with the exception of affine.dma_start swapping the operand positions of its tag memref and num_elements operands). Significant code changes occur here: ) Vectorization: LoopAnalysis.cpp, Vectorize.cpp ) Affine Transforms: Transforms/Utils/Utils.cpp PiperOrigin-RevId: 256395088	2019-07-03 14:37:06 -07:00
River Riddle	84bd67fc4f	Update the 1->N legalizer test to use "test.return" so that the conversion cast is elided properly. PiperOrigin-RevId: 255979732	2019-07-01 11:38:47 -07:00
Alex Zinenko	5eef726bc8	TypeConversion: do not materialize conversion of the type to itself Type conversion does not necessarily affect all types, some of them may remain untouched. The type conversion tool from the dialect conversion framework will unconditionally insert a temporary cast operation from the type to itself anyway, and will try to materialize it to a real conversion operation if there are remaining uses. Simply use the original value instead. PiperOrigin-RevId: 255975450	2019-07-01 09:56:56 -07:00
Alex Zinenko	a83fd0d2c7	Run FileCheck on test-legalizer.mlir The RUN line was missing a call to FileCheck making the test always pass. Add the call to FileCheck and temporarily disable one of the tests that does not produce the expected result. PiperOrigin-RevId: 255974805	2019-07-01 09:56:44 -07:00
Andy Davis	f487d20bf0	Add affine-to-standard lowerings for affine.load/store/dma_start/dma_wait. PiperOrigin-RevId: 255960171	2019-07-01 09:56:22 -07:00
River Riddle	7c755d06aa	Refactor DialectConversion to use 'materializeConversion' when a type conversion must persist after the conversion has finished. During conversion, if a type conversion has dangling uses a type conversion must persist after conversion has finished to maintain valid IR. In these cases, we now query the TypeConverter to materialize a conversion for us. This allows for the default case of a full conversion to continue working as expected, but also handle the degenerate cases more robustly. PiperOrigin-RevId: 255637171	2019-06-28 11:29:04 -07:00
River Riddle	679a3b4191	Change the attribute dictionary syntax to separate name and value with '='. The current syntax separates the name and value with ':', but ':' is already overloaded by several other things(e.g. trailing types). This makes the syntax difficult to parse in some situtations: Old: "foo: 10 : i32" New: "foo = 10 : i32" PiperOrigin-RevId: 255097928	2019-06-25 19:06:34 -07:00
River Riddle	4842b2d42e	Modify the syntax of the the ElementsAttrs to print the type as a colon type. This is the standard syntax for types on operations, and is also already used by IntegerAttr and FloatAttr. Example: dense<5> : tensor<i32> dense<[3]> : tensor<1xi32> PiperOrigin-RevId: 255069157	2019-06-25 16:06:58 -07:00
River Riddle	66ed7d6d83	Update the OperationFolder to find a valid insertion point when materializing constants. The OperationFolder currently just inserts into the entry block of a Function, but regions may be isolated above, i.e. explicit capture only, and blindly inserting constants may break the invariants of these regions. PiperOrigin-RevId: 254987796	2019-06-25 09:43:21 -07:00
Nicolas Vasilache	dac75ae5ff	Split test-specific passes out of mlir-opt Instead put their impl in test/lib and link them into mlir-test-opt PiperOrigin-RevId: 254837439	2019-06-24 17:47:12 -07:00
River Riddle	b67cab4c44	Update CSE to respect nested regions that are isolated from above. This cl also removes the unused 'NthRegionIsIsolatedFromAbove' trait as it was replaced with a more general 'IsIsolatedFromAbove'. PiperOrigin-RevId: 254709704	2019-06-24 13:44:53 -07:00
River Riddle	704a7fb13e	Add support for 1->N type mappings in the dialect conversion infrastructure. To support these mappings a hook must be overridden on the type converter: 'materializeConversion' :to generate a cast operation from the new types to the old type. This operation is automatically erased if all uses are removed, otherwise it remains in the IR for the user to handle. PiperOrigin-RevId: 254411383	2019-06-22 09:16:06 -07:00
River Riddle	9764ae3f24	Refactor the TypeConverter to support more robust type conversions: * Support for 1->0 type mappings, i.e. when the argument is being removed. * Reordering types when converting a type signature. * Adding new inputs when converting a type signature. This cl also lays down the initial foundation for supporting 1->N type mappings, but full support will come in a followup. Moving forward, function signature changes will be driven by populating a SignatureConversion instance. This class contains all of the necessary information for adding/removing/remapping function signatures; e.g. addInputs, addResults, remapInputs, etc. PiperOrigin-RevId: 254064665	2019-06-19 23:08:33 -07:00
Geoffrey Martin-Noble	fd99b6ce97	Remove unnecessary -verify-diagnostics These were likely added in error because of confusion about the flag when it was just called "-verify". The extra flag doesn't cause much harm, but it does make mlir-opt do more work and clutter the RUN line PiperOrigin-RevId: 254037016	2019-06-19 23:08:13 -07:00
Geoffrey Martin-Noble	d7d69569e7	Rename -verify mlir-opt flag to -verify-expected-diagnostics This name has caused some confusion because it suggests that it's running op verification (and that this verification isn't getting run by default). PiperOrigin-RevId: 254035268	2019-06-19 23:08:03 -07:00
Andy Davis	898cf0e968	LoopFusion: adds support for computing forward computation slices, which will enable fusion of consumer loop nests into their producers in subsequent CLs. PiperOrigin-RevId: 253601994	2019-06-19 23:03:42 -07:00
River Riddle	6a0555a875	Refactor SplatElementsAttr to inherit from DenseElementsAttr as opposed to being a separate Attribute type. DenseElementsAttr provides a better internal representation for splat values as well as better API for accessing elements. PiperOrigin-RevId: 253138287	2019-06-19 23:01:52 -07:00
River Riddle	5da741f671	Add basic cost modeling to the dialect conversion infrastructure. This initial cost model favors specific patterns based upon two criteria: 1) Lowest minimum pattern stack depth when legalizing. - This leads the system to favor patterns that have lower legalization stacks, i.e. represent a more direct mapping to the target. 2) Pattern benefit. - When considering multiple patterns with the same legalization depth, this favors patterns with a larger specified benefit. PiperOrigin-RevId: 252713470	2019-06-19 22:59:06 -07:00
Amit Sabne	7a43da6060	Loop invariant code motion - remove reliance on getForwardSlice. Add more tests. -- PiperOrigin-RevId: 250950703	2019-06-01 20:13:30 -07:00
Rasmus Munk Larsen	861c55e150	Add a rank op to MLIR. Example: %1 = rank %0 : index -- PiperOrigin-RevId: 250505411	2019-06-01 20:06:51 -07:00
Andy Davis	a560f2c646	Affine Loop Fusion Utility Module (1/n). ) Adds LoopFusionUtils which will expose a set of loop fusion utilities (e.g. dependence checks, fusion cost/storage reduction, loop fusion transformation) for use by loop fusion algorithms. Support for checking block-level fusion-preventing dependences is added in this CL (additional loop fusion utilities will be added in subsequent CLs). ) Adds TestLoopFusion test pass for testing LoopFusionUtils at a fine granularity. *) Adds unit test for testing dependence check for block-level fusion-preventing dependences. -- PiperOrigin-RevId: 249861071	2019-06-01 20:00:23 -07:00

1 2 3 4 5 ...

306 Commits