llvm-project

Commit Graph

Author	SHA1	Message	Date
Mahesh Ravishankar	c584771f54	Revert "[mlir][TilingInterface] Enable tile and fuse using TilingInterface." This reverts commit `ea75511319` due to build failures.	2022-06-21 16:56:59 +00:00
Mahesh Ravishankar	ea75511319	[mlir][TilingInterface] Enable tile and fuse using TilingInterface. This patch implements tile and fuse transformation for ops that implement the tiling interface. To do so, - `TilingInterface` needs a new method that generates a tiled implementation of the operation based on the tile of the result needed. - A pattern is added that replaces a `tensor.extract_slice` whose source is defined by an operation that implements the `TilingInterface` with a tiled implementation that produces the extracted slice in-place (using the method added to `TilingInterface`). - A pattern is added that takes a sequence of operations that implement the `TilingInterface` (for now `LinalgOp`s), tiles the consumer, and greedily fuses its producers iteratively. Differential Revision: https://reviews.llvm.org/D127809	2022-06-21 16:47:14 +00:00
bixia1	bdeae1f57b	[mlir][sparse][taco] Support f16. Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D128105	2022-06-21 09:08:26 -07:00
Nicolas Vasilache	f439b31971	[mlir][Linalg] Split reduction transform op This revision separates the `LinalgSplitReduction` pattern, whose application is based on attributes, from its implementation. A transform dialect op extension is added to control the application of the transformation at a finer granularity. Differential Revision: https://reviews.llvm.org/D128165	2022-06-21 05:01:26 -07:00
Nicolas Vasilache	98dbaed1e6	[mlir][SCF] Fold tensor.cast feeding into scf.foreach_thread.parallel_insert_slice Differential Revision: https://reviews.llvm.org/D128247	2022-06-21 01:19:18 -07:00
Nicolas Vasilache	a489aa745b	[mlir][SCF] Add scf::ForeachThread canonicalization. This revision adds the necessary plumbing for canonicalizing scf::ForeachThread with the `AffineOpSCFCanonicalizationPattern`. In the process the `loopMatcher` helper is updated to take OpFoldResult instead of just values. This allows composing various scenarios without the need for an artificial builder. Differential Revision: https://reviews.llvm.org/D128244	2022-06-21 00:54:46 -07:00
Kazu Hirata	6d5fc1e3d5	[mlir] Don't use Optional::getValue (NFC)	2022-06-20 23:20:25 -07:00
Shraiysh Vaishay	23fec3405b	[mlir][OpenMP] Add omp.taskgroup operation This patch adds omp.taskgroup operation according to OpenMP 5.0 2.17.6. Also added tests for the same. Reviewed By: kiranchandramohan, peixin Differential Revision: https://reviews.llvm.org/D127250	2022-06-21 10:17:24 +05:30
Kazu Hirata	064a08cd95	Don't use Optional::hasValue (NFC)	2022-06-20 20:05:16 -07:00
Mogball	d883a02a7c	[mlir][ods] Remove StructAttr Depends on D127373 Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D127375	2022-06-21 01:10:05 +00:00
lewuathe	0bae40eff6	[mlir][math] Lower cos,sin to libm Lower math.cos and math.sin to libm Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D128028	2022-06-21 08:38:07 +09:00
Kazu Hirata	5413bf1bac	Don't use Optional::hasValue (NFC)	2022-06-20 11:33:56 -07:00
Kazu Hirata	037f09959a	[mlir] Don't use Optional::hasValue (NFC)	2022-06-20 11:22:37 -07:00
Krzysztof Drewniak	8e61fdc727	[mlir][ROCDL] Define MLIR wrappers around new MFMA intrinsics In order to support newer hardware, define wrappers around MFMA intrinsics that have not previously been exposed in the ROCDL dialect. A `amdgpu.mfma` wrapper around these instructions is in development and will provide a more user-friendly interface to them. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D128079	2022-06-20 15:03:45 +00:00
Alex Zinenko	8b68da2c7d	[mlir] move SCF headers to SCF/{IR,Transforms} respectively This aligns the SCF dialect file layout with the majority of the dialects. Reviewed By: jpienaar Differential Revision: https://reviews.llvm.org/D128049	2022-06-20 10:18:01 +02:00
lewuathe	72ee11a8cf	[mlir][complex] Convert complex.conj to libm Add conversion for complex.conj to libm call Reviewed By: bixia Differential Revision: https://reviews.llvm.org/D127473	2022-06-20 09:38:50 +09:00
Kazu Hirata	30c675878c	Use value_or instead of getValueOr (NFC)	2022-06-19 10:34:41 -07:00
bixia1	e5e7e51473	[mlir][sparse][taco] Support complex types. Support complex types of float and double. See the added test for an example. Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D128076	2022-06-17 16:06:53 -07:00
Benjamin Kramer	3420cd7caf	[mlir][sparse] Add testing for bf16 and fallback for software bf16 This adds weak versions of the truncation libcalls in case the runtime environment doesn't have them. Differential Revision: https://reviews.llvm.org/D128091	2022-06-17 21:54:01 +02:00
Aart Bik	86d5d34c72	[mlir][sparse] renable f16 tests Sparse library ABI issues are fixed. https://github.com/llvm/llvm-project/issues/55992 Reviewed By: bkramer Differential Revision: https://reviews.llvm.org/D128086	2022-06-17 12:48:24 -07:00
bixia1	48f4407c1a	[mlir][linalg] Extend opdsl to support operations on complex types. Linalg opdsl now supports negf/add/sub/mul on complex types. Add a test. Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D128010	2022-06-17 09:34:26 -07:00
Aart Bik	aef20f59a5	[mlir][sparse] move from by-value to by-reference for data types This fixes all sorts of ABI issues due to passing by-value (using by-reference with memref's exclusively). Reviewed By: bkramer Differential Revision: https://reviews.llvm.org/D128018	2022-06-17 08:39:25 -07:00
Christopher Bate	51b925df94	[mlir][nvgpu] shared memory access optimization pass This change adds a transformation and pass to the NvGPU dialect that attempts to optimize reads/writes from a memref representing GPU shared memory in order to avoid bank conflicts. Given a value representing a shared memory memref, it traverses all reads/writes within the parent op and, subject to suitable conditions, rewrites all last dimension index values such that element locations in the final (col) dimension are given by `newColIdx = col % vecSize + perm[row](col/vecSize,row)` where `perm` is a permutation function indexed by `row` and `vecSize` is the vector access size in elements (currently assumes 128bit vectorized accesses, but this can be made a parameter). This specific transformation can help optimize typical distributed & vectorized accesses common to loading matrix multiplication operands to/from shared memory. Differential Revision: https://reviews.llvm.org/D127457	2022-06-17 09:31:05 -06:00
Phoebe Wang	655ba9c8a1	Reland "Reland "Reland "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI"""" This resolves problems reported in commit `1a20252978`. 1. Promote to float lowering for nodes XINT_TO_FP 2. Bail out f16 from shuffle combine due to vector type is not legal in the version	2022-06-17 21:34:05 +08:00
Matthias Springer	b3ebe3beed	[mlir][bufferize] Bufferize after TensorCopyInsertion This change changes the bufferization so that it utilizes the new TensorCopyInsertion pass. One-Shot Bufferize no longer calls the One-Shot Analysis. Instead, it relies on the TensorCopyInsertion pass to make the entire IR fully inplacable. The `bufferize` implementations of all ops are simplified; they no longer have to account for out-of-place bufferization decisions. These were already materialized in the IR in the form of `bufferization.alloc_tensor` ops during the TensorCopyInsertion pass. Differential Revision: https://reviews.llvm.org/D127652	2022-06-17 13:29:52 +02:00
Alex Zinenko	610139d2d9	[mlir] replace 'emit_c_wrappers' func->llvm conversion option with a pass The 'emit_c_wrappers' option in the FuncToLLVM conversion requests C interface wrappers to be emitted for every builtin function in the module. While this has been useful to bootstrap the interface, it is problematic in the longer term as it may unintentionally affect the functions that should retain their existing interface, e.g., libm functions obtained by lowering math operations (see D126964 for an example). Since D77314, we have a finer-grain control over interface generation via an attribute that avoids the problem entirely. Remove the 'emit_c_wrappers' option. Introduce the '-llvm-request-c-wrappers' pass that can be run in any pipeline that needs blanket emission of functions to annotate all builtin functions with the attribute before performing the usual lowering that accounts for the attribute. Reviewed By: chelini Differential Revision: https://reviews.llvm.org/D127952	2022-06-17 11:10:31 +02:00
Benjamin Kramer	1a20252978	Revert "Reland "Reland "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI"""" This reverts commit `04a3d5f3a1`. I see two more issues: - uitofp/sitofp from i32/i64 to half now generates __floatsihf/__floatdihf, which exists in neither compiler-rt nor libgcc - This crashes when legalizing the bitcast: ``` ; RUN: llc < %s -mcpu=skx define void @main.45(ptr nocapture readnone %retval, ptr noalias nocapture readnone %run_options, ptr noalias nocapture readnone %params, ptr noalias nocapture readonly %buffer_table, ptr noalias nocapture readnone %status, ptr noalias nocapture readnone %prof_counters) local_unnamed_addr { entry: %fusion = load ptr, ptr %buffer_table, align 8 %0 = getelementptr inbounds ptr, ptr %buffer_table, i64 1 %Arg_1.2 = load ptr, ptr %0, align 8 %1 = getelementptr inbounds ptr, ptr %buffer_table, i64 2 %Arg_0.1 = load ptr, ptr %1, align 8 %2 = load half, ptr %Arg_0.1, align 8 %3 = bitcast half %2 to i16 %4 = and i16 %3, 32767 %5 = icmp eq i16 %4, 0 %6 = and i16 %3, -32768 %broadcast.splatinsert = insertelement <4 x half> poison, half %2, i64 0 %broadcast.splat = shufflevector <4 x half> %broadcast.splatinsert, <4 x half> poison, <4 x i32> zeroinitializer %broadcast.splatinsert9 = insertelement <4 x i16> poison, i16 %4, i64 0 %broadcast.splat10 = shufflevector <4 x i16> %broadcast.splatinsert9, <4 x i16> poison, <4 x i32> zeroinitializer %broadcast.splatinsert11 = insertelement <4 x i16> poison, i16 %6, i64 0 %broadcast.splat12 = shufflevector <4 x i16> %broadcast.splatinsert11, <4 x i16> poison, <4 x i32> zeroinitializer %broadcast.splatinsert13 = insertelement <4 x i16> poison, i16 %3, i64 0 %broadcast.splat14 = shufflevector <4 x i16> %broadcast.splatinsert13, <4 x i16> poison, <4 x i32> zeroinitializer %wide.load = load <4 x half>, ptr %Arg_1.2, align 8 %7 = fcmp uno <4 x half> %broadcast.splat, %wide.load %8 = fcmp oeq <4 x half> %broadcast.splat, %wide.load %9 = bitcast <4 x half> %wide.load to <4 x i16> %10 = and <4 x i16> %9, <i16 32767, i16 32767, i16 32767, i16 32767> %11 = icmp eq <4 x i16> %10, zeroinitializer %12 = and <4 x i16> %9, <i16 -32768, i16 -32768, i16 -32768, i16 -32768> %13 = or <4 x i16> %12, <i16 1, i16 1, i16 1, i16 1> %14 = select <4 x i1> %11, <4 x i16> %9, <4 x i16> %13 %15 = icmp ugt <4 x i16> %broadcast.splat10, %10 %16 = icmp ne <4 x i16> %broadcast.splat12, %12 %17 = or <4 x i1> %15, %16 %18 = select <4 x i1> %17, <4 x i16> <i16 -1, i16 -1, i16 -1, i16 -1>, <4 x i16> <i16 1, i16 1, i16 1, i16 1> %19 = add <4 x i16> %18, %broadcast.splat14 %20 = select i1 %5, <4 x i16> %14, <4 x i16> %19 %21 = select <4 x i1> %8, <4 x i16> %9, <4 x i16> %20 %22 = bitcast <4 x i16> %21 to <4 x half> %23 = select <4 x i1> %7, <4 x half> <half 0xH7E00, half 0xH7E00, half 0xH7E00, half 0xH7E00>, <4 x half> %22 store <4 x half> %23, ptr %fusion, align 16 ret void } ``` llc: llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:977: void (anonymous namespace)::SelectionDAGLegalize::LegalizeOp(llvm::SDNode ): Assertion `(TLI.getTypeAction(DAG.getContext(), Op.getValueType()) == TargetLowering::TypeLegal \|\| Op.getOpcode() == ISD::TargetConstant \|\| Op.getOpcode() == ISD::Register) && "Unexpected illegal type!"' failed.	2022-06-17 09:43:07 +02:00
Phoebe Wang	04a3d5f3a1	Reland "Reland "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI""" Fix the crash on lowering X86ISD::FCMP.	2022-06-17 12:12:17 +08:00
Jacques Pienaar	02b9ddb2f2	[mlir] Disable warning in test of deprecated feature (NFC) Disable warning for deprecation in test of deprecated feature. Also remove additional test of deprecated feature from TestOps.td.	2022-06-16 20:15:13 -07:00
Jacques Pienaar	d30c0221cf	[mlir] Split MLProgram global load and store to Graph variants * Split ops into X_graph variants as discussed; * Remove tokens from non-Graph region variants and rely on side-effect modelling there while removing side-effect modelling from Graph variants and relying on explicit ordering there; * Make tokens required to be produced by Graph variants - but kept explicit token type specification given previous discussion on this potentially being configurable in future; This results in duplicating some code. I considered adding helper functions but decided against adding an abstraction there early given size of duplication and creating accidental coupling. Differential Revision: https://reviews.llvm.org/D127813	2022-06-16 20:01:54 -07:00
Jacques Pienaar	287ade415e	[mlir][doc] Avoid duplication with constraints and defs Where a constraint also has a def, emit the def only to avoid duplicate output (and def has more complete info). Also move attributes and types to the end rather than some on top and some at end. Differential Revision: https://reviews.llvm.org/D127823	2022-06-16 19:42:56 -07:00
Aart Bik	2a2886160d	[mlir][sparse] improved testing and codegen for semi-ring operations The semi-ring blocks were simply "inlined" by the sparse compiler but without any filtering or patching. This revision improves the analysis (rejecting blocks that use non-invariant computations from outside their blocks, except for linalg.index) and also improves the codegen by properly patching up index computations (previous version crashed). With a regression test. Also updated the documentation now that the example code is properly working. Reviewed By: bixia Differential Revision: https://reviews.llvm.org/D128000	2022-06-16 16:13:42 -07:00
Aart Bik	36c01876d7	[mlir][sparse] fix asan issue The LinalgElementwiseOpFusion pass has become smarter, and converts the simple conversion linalg operation into a sparse dialect convert operation. However, since our current bufferization does not take the new semantics into consideration, we leak memory of the allocation. For now, this has been fixed by making the operation less trivial. Reviewed By: bixia Differential Revision: https://reviews.llvm.org/D128002	2022-06-16 14:49:02 -07:00
bixia1	bbb73ade43	[mlir][complex] Add Python bindings for complex ops. Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D127916	2022-06-16 14:19:11 -07:00
Thomas Raoux	f011d32c3a	[mlir][vector] Fix contraction op lowering with mixed types contraction op can have mixed type, add support for this case to the pattern lowering contraction op to outerproduct. Differential Revision: https://reviews.llvm.org/D127926	2022-06-16 16:40:56 +00:00
Thomas Raoux	046ebeb605	[mlir][linalg] Relax convolution vectorization to support mixed types Support the case where convolution does float extension of the inputs. Differential Revision: https://reviews.llvm.org/D127925	2022-06-16 16:29:46 +00:00
Lei Zhang	2320a4ae90	[mlir][spirv] Workaround driver bug in math.ctlz conversion again The previous approach does not work as the Adreno driver is clever at optimizing away the selection. So now check two inputs together. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D127930	2022-06-16 10:53:49 -04:00
Mark Browning	bccf27d934	[mlir][python] Actually set UseLocalScope printing flag The useLocalScope printing flag has been passed around between pybind methods, but doesn't actually enable the corresponding printing flag. Reviewed By: stellaraccident Differential Revision: https://reviews.llvm.org/D127907	2022-06-15 22:01:34 -07:00
Lei Zhang	f3bc0fccd6	[mlir][spirv] Define spv.ISubBorrowOp Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D127909	2022-06-15 20:38:53 -04:00
Frederik Gossen	3cd5696a33	Revert "Reland "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI""" This reverts commit `e1c5afa47d`. This introduces crashes in the JAX backend on CPU. A reproducer in LLVM is below. Let me know if you have trouble reproducing this. ; ModuleID = '__compute_module' source_filename = "__compute_module" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-grtev4-linux-gnu" @0 = private unnamed_addr constant [4 x i8] c"\00\00\00?" @1 = private unnamed_addr constant [4 x i8] c"\1C}\908" @2 = private unnamed_addr constant [4 x i8] c"?\00\\4" @3 = private unnamed_addr constant [4 x i8] c"%ci1" @4 = private unnamed_addr constant [4 x i8] zeroinitializer @5 = private unnamed_addr constant [4 x i8] c"\00\00\00\C0" @6 = private unnamed_addr constant [4 x i8] c"\00\00\00B" @7 = private unnamed_addr constant [4 x i8] c"\94\B4\C22" @8 = private unnamed_addr constant [4 x i8] c"^\09B6" @9 = private unnamed_addr constant [4 x i8] c"\15\F3M?" @10 = private unnamed_addr constant [4 x i8] c"e\CC\\;" @11 = private unnamed_addr constant [4 x i8] c"d\BD/>" @12 = private unnamed_addr constant [4 x i8] c"V\F4I=" @13 = private unnamed_addr constant [4 x i8] c"\10\CB,<" @14 = private unnamed_addr constant [4 x i8] c"\AC\E3\D6:" @15 = private unnamed_addr constant [4 x i8] c"\DC\A8E9" @16 = private unnamed_addr constant [4 x i8] c"\C6\FA\897" @17 = private unnamed_addr constant [4 x i8] c"%\F9\955" @18 = private unnamed_addr constant [4 x i8] c"\B5\DB\813" @19 = private unnamed_addr constant [4 x i8] c"\B4W_\B2" @20 = private unnamed_addr constant [4 x i8] c"\1Cc\8F\B4" @21 = private unnamed_addr constant [4 x i8] c"~3\94\B6" @22 = private unnamed_addr constant [4 x i8] c"3Yq\B8" @23 = private unnamed_addr constant [4 x i8] c"\E9\17\17\BA" @24 = private unnamed_addr constant [4 x i8] c"\F1\B2\8D\BB" @25 = private unnamed_addr constant [4 x i8] c"\F8t\C2\BC" @26 = private unnamed_addr constant [4 x i8] c"\82[\C2\BD" @27 = private unnamed_addr constant [4 x i8] c"uB-?" @28 = private unnamed_addr constant [4 x i8] c"^\FF\9B\BE" @29 = private unnamed_addr constant [4 x i8] c"\00\00\00A" ; Function Attrs: uwtable define void @main.158(ptr %retval, ptr noalias %run_options, ptr noalias %params, ptr noalias %buffer_table, ptr noalias %status, ptr noalias %prof_counters) #0 { entry: %fusion.invar_address.dim.1 = alloca i64, align 8 %fusion.invar_address.dim.0 = alloca i64, align 8 %0 = getelementptr inbounds ptr, ptr %buffer_table, i64 1 %Arg_0.1 = load ptr, ptr %0, align 8, !invariant.load !0, !dereferenceable !1, !align !2 %1 = getelementptr inbounds ptr, ptr %buffer_table, i64 0 %fusion = load ptr, ptr %1, align 8, !invariant.load !0, !dereferenceable !1, !align !2 store i64 0, ptr %fusion.invar_address.dim.0, align 8 br label %fusion.loop_header.dim.0 return: ; preds = %fusion.loop_exit.dim.0 ret void fusion.loop_header.dim.0: ; preds = %fusion.loop_exit.dim.1, %entry %fusion.indvar.dim.0 = load i64, ptr %fusion.invar_address.dim.0, align 8 %2 = icmp uge i64 %fusion.indvar.dim.0, 3 br i1 %2, label %fusion.loop_exit.dim.0, label %fusion.loop_body.dim.0 fusion.loop_body.dim.0: ; preds = %fusion.loop_header.dim.0 store i64 0, ptr %fusion.invar_address.dim.1, align 8 br label %fusion.loop_header.dim.1 fusion.loop_header.dim.1: ; preds = %fusion.loop_body.dim.1, %fusion.loop_body.dim.0 %fusion.indvar.dim.1 = load i64, ptr %fusion.invar_address.dim.1, align 8 %3 = icmp uge i64 %fusion.indvar.dim.1, 1 br i1 %3, label %fusion.loop_exit.dim.1, label %fusion.loop_body.dim.1 fusion.loop_body.dim.1: ; preds = %fusion.loop_header.dim.1 %4 = getelementptr inbounds [3 x [1 x half]], ptr %Arg_0.1, i64 0, i64 %fusion.indvar.dim.0, i64 0 %5 = load half, ptr %4, align 2, !invariant.load !0, !noalias !3 %6 = fpext half %5 to float %7 = call float @llvm.fabs.f32(float %6) %constant.121 = load float, ptr @29, align 4 %compare.2 = fcmp ole float %7, %constant.121 %8 = zext i1 %compare.2 to i8 %constant.120 = load float, ptr @0, align 4 %multiply.95 = fmul float %7, %constant.120 %constant.119 = load float, ptr @5, align 4 %add.82 = fadd float %multiply.95, %constant.119 %constant.118 = load float, ptr @4, align 4 %multiply.94 = fmul float %add.82, %constant.118 %constant.117 = load float, ptr @19, align 4 %add.81 = fadd float %multiply.94, %constant.117 %multiply.92 = fmul float %add.82, %add.81 %constant.116 = load float, ptr @18, align 4 %add.79 = fadd float %multiply.92, %constant.116 %multiply.91 = fmul float %add.82, %add.79 %subtract.87 = fsub float %multiply.91, %add.81 %constant.115 = load float, ptr @20, align 4 %add.78 = fadd float %subtract.87, %constant.115 %multiply.89 = fmul float %add.82, %add.78 %subtract.86 = fsub float %multiply.89, %add.79 %constant.114 = load float, ptr @17, align 4 %add.76 = fadd float %subtract.86, %constant.114 %multiply.88 = fmul float %add.82, %add.76 %subtract.84 = fsub float %multiply.88, %add.78 %constant.113 = load float, ptr @21, align 4 %add.75 = fadd float %subtract.84, %constant.113 %multiply.86 = fmul float %add.82, %add.75 %subtract.83 = fsub float %multiply.86, %add.76 %constant.112 = load float, ptr @16, align 4 %add.73 = fadd float %subtract.83, %constant.112 %multiply.85 = fmul float %add.82, %add.73 %subtract.81 = fsub float %multiply.85, %add.75 %constant.111 = load float, ptr @22, align 4 %add.72 = fadd float %subtract.81, %constant.111 %multiply.83 = fmul float %add.82, %add.72 %subtract.80 = fsub float %multiply.83, %add.73 %constant.110 = load float, ptr @15, align 4 %add.70 = fadd float %subtract.80, %constant.110 %multiply.82 = fmul float %add.82, %add.70 %subtract.78 = fsub float %multiply.82, %add.72 %constant.109 = load float, ptr @23, align 4 %add.69 = fadd float %subtract.78, %constant.109 %multiply.80 = fmul float %add.82, %add.69 %subtract.77 = fsub float %multiply.80, %add.70 %constant.108 = load float, ptr @14, align 4 %add.68 = fadd float %subtract.77, %constant.108 %multiply.79 = fmul float %add.82, %add.68 %subtract.75 = fsub float %multiply.79, %add.69 %constant.107 = load float, ptr @24, align 4 %add.67 = fadd float %subtract.75, %constant.107 %multiply.77 = fmul float %add.82, %add.67 %subtract.74 = fsub float %multiply.77, %add.68 %constant.106 = load float, ptr @13, align 4 %add.66 = fadd float %subtract.74, %constant.106 %multiply.76 = fmul float %add.82, %add.66 %subtract.72 = fsub float %multiply.76, %add.67 %constant.105 = load float, ptr @25, align 4 %add.65 = fadd float %subtract.72, %constant.105 %multiply.74 = fmul float %add.82, %add.65 %subtract.71 = fsub float %multiply.74, %add.66 %constant.104 = load float, ptr @12, align 4 %add.64 = fadd float %subtract.71, %constant.104 %multiply.73 = fmul float %add.82, %add.64 %subtract.69 = fsub float %multiply.73, %add.65 %constant.103 = load float, ptr @26, align 4 %add.63 = fadd float %subtract.69, %constant.103 %multiply.71 = fmul float %add.82, %add.63 %subtract.67 = fsub float %multiply.71, %add.64 %constant.102 = load float, ptr @11, align 4 %add.62 = fadd float %subtract.67, %constant.102 %multiply.70 = fmul float %add.82, %add.62 %subtract.66 = fsub float %multiply.70, %add.63 %constant.101 = load float, ptr @28, align 4 %add.61 = fadd float %subtract.66, %constant.101 %multiply.68 = fmul float %add.82, %add.61 %subtract.65 = fsub float %multiply.68, %add.62 %constant.100 = load float, ptr @27, align 4 %add.60 = fadd float %subtract.65, %constant.100 %subtract.64 = fsub float %add.60, %add.62 %multiply.66 = fmul float %subtract.64, %constant.120 %constant.99 = load float, ptr @6, align 4 %divide.4 = fdiv float %constant.99, %7 %add.59 = fadd float %divide.4, %constant.119 %multiply.65 = fmul float %add.59, %constant.118 %constant.98 = load float, ptr @3, align 4 %add.58 = fadd float %multiply.65, %constant.98 %multiply.64 = fmul float %add.59, %add.58 %constant.97 = load float, ptr @7, align 4 %add.57 = fadd float %multiply.64, %constant.97 %multiply.63 = fmul float %add.59, %add.57 %subtract.63 = fsub float %multiply.63, %add.58 %constant.96 = load float, ptr @2, align 4 %add.56 = fadd float %subtract.63, %constant.96 %multiply.62 = fmul float %add.59, %add.56 %subtract.62 = fsub float %multiply.62, %add.57 %constant.95 = load float, ptr @8, align 4 %add.55 = fadd float %subtract.62, %constant.95 %multiply.61 = fmul float %add.59, %add.55 %subtract.61 = fsub float %multiply.61, %add.56 %constant.94 = load float, ptr @1, align 4 %add.54 = fadd float %subtract.61, %constant.94 %multiply.60 = fmul float %add.59, %add.54 %subtract.60 = fsub float %multiply.60, %add.55 %constant.93 = load float, ptr @10, align 4 %add.53 = fadd float %subtract.60, %constant.93 %multiply.59 = fmul float %add.59, %add.53 %subtract.59 = fsub float %multiply.59, %add.54 %constant.92 = load float, ptr @9, align 4 %add.52 = fadd float %subtract.59, %constant.92 %subtract.58 = fsub float %add.52, %add.54 %multiply.58 = fmul float %subtract.58, %constant.120 %9 = call float @llvm.sqrt.f32(float %7) %10 = fdiv float 1.000000e+00, %9 %multiply.57 = fmul float %multiply.58, %10 %11 = trunc i8 %8 to i1 %12 = select i1 %11, float %multiply.66, float %multiply.57 %13 = fptrunc float %12 to half %14 = getelementptr inbounds [3 x [1 x half]], ptr %fusion, i64 0, i64 %fusion.indvar.dim.0, i64 0 store half %13, ptr %14, align 2, !alias.scope !3 %invar.inc1 = add nuw nsw i64 %fusion.indvar.dim.1, 1 store i64 %invar.inc1, ptr %fusion.invar_address.dim.1, align 8 br label %fusion.loop_header.dim.1 fusion.loop_exit.dim.1: ; preds = %fusion.loop_header.dim.1 %invar.inc = add nuw nsw i64 %fusion.indvar.dim.0, 1 store i64 %invar.inc, ptr %fusion.invar_address.dim.0, align 8 br label %fusion.loop_header.dim.0 fusion.loop_exit.dim.0: ; preds = %fusion.loop_header.dim.0 br label %return } ; Function Attrs: nocallback nofree nosync nounwind readnone speculatable willreturn declare float @llvm.fabs.f32(float %0) #1 ; Function Attrs: nocallback nofree nosync nounwind readnone speculatable willreturn declare float @llvm.sqrt.f32(float %0) #1 attributes #0 = { uwtable "denormal-fp-math"="preserve-sign" "no-frame-pointer-elim"="false" } attributes #1 = { nocallback nofree nosync nounwind readnone speculatable willreturn } !0 = !{} !1 = !{i64 6} !2 = !{i64 8} !3 = !{!4} !4 = !{!"buffer: {index:0, offset:0, size:6}", !5} !5 = !{!"XLA global AA domain"}	2022-06-15 18:04:42 -04:00
Min-Yih Hsu	cd8978e19e	[mlir][LLVMIR] Ask ICmpOp to return vector<Nxi1> when needed If any of the operands for ICmpOp is a vector, returns a vector<Nxi1> , rather than an i1 type result. Differential Revision: https://reviews.llvm.org/D127536	2022-06-15 14:33:48 -07:00
Min-Yih Hsu	719e24d39f	[mlir][LLVMIR] Use isScalableVectorType in ShuffleVectorOp::parse Instead of casting the incoming operand into VectorType to check if it's scalable or not. This is the place I missed to fix in `f088b99eac`. Differential Revision: https://reviews.llvm.org/D127535	2022-06-15 14:33:48 -07:00
Min-Yih Hsu	dcdd5d312f	[mlir][LLVMIR] Use insertelement if needed when translating ConstantAggregate When translating from a llvm::ConstantAggregate with vector type, we should lower to insertelement operations (if needed) rather than using insertvalue. Differential Revision: https://reviews.llvm.org/D127534	2022-06-15 14:33:47 -07:00
Thomas Raoux	a6f2c2291e	[mlir][GPUToNVVM] Fix bug in mma elementwise lowering The maxf implementation of wmma elementwise op was incorrect as the operands of the select to check for Nan were swapped. Differential Revision: https://reviews.llvm.org/D127879	2022-06-15 17:23:17 +00:00
Okwan Kwon	8010d7e044	[mlir] add an option to print op stats in JSON Differential Revision: https://reviews.llvm.org/D127691	2022-06-15 10:07:36 -07:00
Rob Suderman	640973f2b9	[tosa] Lower tosa.slice to tensor.slice for dynamic case Existing slice lowering only supporting static shapes. Reviewed By: NatashaKnk Differential Revision: https://reviews.llvm.org/D127704	2022-06-15 09:54:36 -07:00
Alex Zinenko	1d45282aa3	[mlir] address post-commit review for D127724 - make transform.alternatives op apply only to isolated-from-above payload IR scopes; - fix potential leak; - fix several typos.	2022-06-15 18:43:05 +02:00
Thomas Raoux	6834803c3d	[mlir][vector] NFC remove dependency of VectorTransform to GPU dialect Make the reduction distribution pattern more generic and remove layering problem. The new pattern to distribute reduction is now independent of GPU and takes a lamdba to decide how the distributed reduction should be generated. Differential Revision: https://reviews.llvm.org/D127867	2022-06-15 16:08:29 +00:00
Phoebe Wang	e1c5afa47d	Reland "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI"" Fixed the missing SQRT promotion. Adding several missing operations too.	2022-06-15 23:00:18 +08:00
Matthias Springer	989d2b5186	[mlir][tablegen] Generate default attr values in Python bindings When specifying an op attribute with a default value (via DefaultValuedAttr), the default value is a string of C++ code. In the general case, the default value of such an attribute cannot be translated to Python when generating the bindings. However, we can hard-code default Python values for frequently-used C++ default values. This change adds a Python default value for empty ArrayAttrs. Differential Revision: https://reviews.llvm.org/D127750	2022-06-15 16:40:27 +02:00
Thomas Joerg	37455b1f71	Revert "Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI"" This reverts commit `6e02e27536`. This introduces a crash in the backend. Reproducer in MLIR's LLVM dialect follows. Let me know if you have trouble reproducing this. module { llvm.func @malloc(i64) -> !llvm.ptr<i8> llvm.func @_mlir_ciface_tf_report_error(!llvm.ptr<i8>, i32, !llvm.ptr<i8>) llvm.mlir.global internal constant @error_message_2208944672953921889("failed to allocate memory at loc(\22-\22:3:8)\00") llvm.func @_mlir_ciface_tf_alloc(!llvm.ptr<i8>, i64, i64, i32, i32, !llvm.ptr<i32>) -> !llvm.ptr<i8> llvm.func @Rsqrt_CPU_DT_HALF_DT_HALF(%arg0: !llvm.ptr<i8>, %arg1: i64, %arg2: !llvm.ptr<i8>) -> !llvm.struct<(i64, ptr<i8>)> attributes {llvm.emit_c_interface, tf_entry} { %0 = llvm.mlir.constant(8 : i32) : i32 %1 = llvm.mlir.constant(8 : index) : i64 %2 = llvm.mlir.constant(2 : index) : i64 %3 = llvm.mlir.constant(dense<0.000000e+00> : vector<4xf16>) : vector<4xf16> %4 = llvm.mlir.constant(dense<[0, 1, 2, 3]> : vector<4xi32>) : vector<4xi32> %5 = llvm.mlir.constant(dense<1.000000e+00> : vector<4xf16>) : vector<4xf16> %6 = llvm.mlir.constant(false) : i1 %7 = llvm.mlir.constant(1 : i32) : i32 %8 = llvm.mlir.constant(0 : i32) : i32 %9 = llvm.mlir.constant(4 : index) : i64 %10 = llvm.mlir.constant(0 : index) : i64 %11 = llvm.mlir.constant(1 : index) : i64 %12 = llvm.mlir.constant(-1 : index) : i64 %13 = llvm.mlir.null : !llvm.ptr<f16> %14 = llvm.getelementptr %13[%9] : (!llvm.ptr<f16>, i64) -> !llvm.ptr<f16> %15 = llvm.ptrtoint %14 : !llvm.ptr<f16> to i64 %16 = llvm.alloca %15 x f16 {alignment = 32 : i64} : (i64) -> !llvm.ptr<f16> %17 = llvm.alloca %15 x f16 {alignment = 32 : i64} : (i64) -> !llvm.ptr<f16> %18 = llvm.mlir.null : !llvm.ptr<i64> %19 = llvm.getelementptr %18[%arg1] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64> %20 = llvm.ptrtoint %19 : !llvm.ptr<i64> to i64 %21 = llvm.alloca %20 x i64 : (i64) -> !llvm.ptr<i64> llvm.br ^bb1(%10 : i64) ^bb1(%22: i64): // 2 preds: ^bb0, ^bb2 %23 = llvm.icmp "slt" %22, %arg1 : i64 llvm.cond_br %23, ^bb2, ^bb3 ^bb2: // pred: ^bb1 %24 = llvm.bitcast %arg2 : !llvm.ptr<i8> to !llvm.ptr<struct<(ptr<f16>, ptr<f16>, i64)>> %25 = llvm.getelementptr %24[%10, 2] : (!llvm.ptr<struct<(ptr<f16>, ptr<f16>, i64)>>, i64) -> !llvm.ptr<i64> %26 = llvm.add %22, %11 : i64 %27 = llvm.getelementptr %25[%26] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64> %28 = llvm.load %27 : !llvm.ptr<i64> %29 = llvm.getelementptr %21[%22] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64> llvm.store %28, %29 : !llvm.ptr<i64> llvm.br ^bb1(%26 : i64) ^bb3: // pred: ^bb1 llvm.br ^bb4(%10, %11 : i64, i64) ^bb4(%30: i64, %31: i64): // 2 preds: ^bb3, ^bb5 %32 = llvm.icmp "slt" %30, %arg1 : i64 llvm.cond_br %32, ^bb5, ^bb6 ^bb5: // pred: ^bb4 %33 = llvm.bitcast %arg2 : !llvm.ptr<i8> to !llvm.ptr<struct<(ptr<f16>, ptr<f16>, i64)>> %34 = llvm.getelementptr %33[%10, 2] : (!llvm.ptr<struct<(ptr<f16>, ptr<f16>, i64)>>, i64) -> !llvm.ptr<i64> %35 = llvm.add %30, %11 : i64 %36 = llvm.getelementptr %34[%35] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64> %37 = llvm.load %36 : !llvm.ptr<i64> %38 = llvm.mul %37, %31 : i64 llvm.br ^bb4(%35, %38 : i64, i64) ^bb6: // pred: ^bb4 %39 = llvm.bitcast %arg2 : !llvm.ptr<i8> to !llvm.ptr<ptr<f16>> %40 = llvm.getelementptr %39[%11] : (!llvm.ptr<ptr<f16>>, i64) -> !llvm.ptr<ptr<f16>> %41 = llvm.load %40 : !llvm.ptr<ptr<f16>> %42 = llvm.getelementptr %13[%11] : (!llvm.ptr<f16>, i64) -> !llvm.ptr<f16> %43 = llvm.ptrtoint %42 : !llvm.ptr<f16> to i64 %44 = llvm.alloca %7 x i32 : (i32) -> !llvm.ptr<i32> llvm.store %8, %44 : !llvm.ptr<i32> %45 = llvm.call @_mlir_ciface_tf_alloc(%arg0, %31, %43, %8, %7, %44) : (!llvm.ptr<i8>, i64, i64, i32, i32, !llvm.ptr<i32>) -> !llvm.ptr<i8> %46 = llvm.bitcast %45 : !llvm.ptr<i8> to !llvm.ptr<f16> %47 = llvm.icmp "eq" %31, %10 : i64 %48 = llvm.or %6, %47 : i1 %49 = llvm.mlir.null : !llvm.ptr<i8> %50 = llvm.icmp "ne" %45, %49 : !llvm.ptr<i8> %51 = llvm.or %50, %48 : i1 llvm.cond_br %51, ^bb7, ^bb13 ^bb7: // pred: ^bb6 %52 = llvm.urem %31, %9 : i64 %53 = llvm.sub %31, %52 : i64 llvm.br ^bb8(%10 : i64) ^bb8(%54: i64): // 2 preds: ^bb7, ^bb9 %55 = llvm.icmp "slt" %54, %53 : i64 llvm.cond_br %55, ^bb9, ^bb10 ^bb9: // pred: ^bb8 %56 = llvm.mul %54, %11 : i64 %57 = llvm.add %56, %10 : i64 %58 = llvm.add %57, %10 : i64 %59 = llvm.getelementptr %41[%58] : (!llvm.ptr<f16>, i64) -> !llvm.ptr<f16> %60 = llvm.bitcast %59 : !llvm.ptr<f16> to !llvm.ptr<vector<4xf16>> %61 = llvm.load %60 {alignment = 2 : i64} : !llvm.ptr<vector<4xf16>> %62 = "llvm.intr.sqrt"(%61) : (vector<4xf16>) -> vector<4xf16> %63 = llvm.fdiv %5, %62 : vector<4xf16> %64 = llvm.getelementptr %46[%58] : (!llvm.ptr<f16>, i64) -> !llvm.ptr<f16> %65 = llvm.bitcast %64 : !llvm.ptr<f16> to !llvm.ptr<vector<4xf16>> llvm.store %63, %65 {alignment = 2 : i64} : !llvm.ptr<vector<4xf16>> %66 = llvm.add %54, %9 : i64 llvm.br ^bb8(%66 : i64) ^bb10: // pred: ^bb8 %67 = llvm.icmp "ult" %53, %31 : i64 llvm.cond_br %67, ^bb11, ^bb12 ^bb11: // pred: ^bb10 %68 = llvm.mul %53, %12 : i64 %69 = llvm.add %31, %68 : i64 %70 = llvm.mul %53, %11 : i64 %71 = llvm.add %70, %10 : i64 %72 = llvm.trunc %69 : i64 to i32 %73 = llvm.mlir.undef : vector<4xi32> %74 = llvm.insertelement %72, %73[%8 : i32] : vector<4xi32> %75 = llvm.shufflevector %74, %73 [0 : i32, 0 : i32, 0 : i32, 0 : i32] : vector<4xi32>, vector<4xi32> %76 = llvm.icmp "slt" %4, %75 : vector<4xi32> %77 = llvm.add %71, %10 : i64 %78 = llvm.getelementptr %41[%77] : (!llvm.ptr<f16>, i64) -> !llvm.ptr<f16> %79 = llvm.bitcast %78 : !llvm.ptr<f16> to !llvm.ptr<vector<4xf16>> %80 = llvm.intr.masked.load %79, %76, %3 {alignment = 2 : i32} : (!llvm.ptr<vector<4xf16>>, vector<4xi1>, vector<4xf16>) -> vector<4xf16> %81 = llvm.bitcast %16 : !llvm.ptr<f16> to !llvm.ptr<vector<4xf16>> llvm.store %80, %81 : !llvm.ptr<vector<4xf16>> %82 = llvm.load %81 {alignment = 2 : i64} : !llvm.ptr<vector<4xf16>> %83 = "llvm.intr.sqrt"(%82) : (vector<4xf16>) -> vector<4xf16> %84 = llvm.fdiv %5, %83 : vector<4xf16> %85 = llvm.bitcast %17 : !llvm.ptr<f16> to !llvm.ptr<vector<4xf16>> llvm.store %84, %85 {alignment = 2 : i64} : !llvm.ptr<vector<4xf16>> %86 = llvm.load %85 : !llvm.ptr<vector<4xf16>> %87 = llvm.getelementptr %46[%77] : (!llvm.ptr<f16>, i64) -> !llvm.ptr<f16> %88 = llvm.bitcast %87 : !llvm.ptr<f16> to !llvm.ptr<vector<4xf16>> llvm.intr.masked.store %86, %88, %76 {alignment = 2 : i32} : vector<4xf16>, vector<4xi1> into !llvm.ptr<vector<4xf16>> llvm.br ^bb12 ^bb12: // 2 preds: ^bb10, ^bb11 %89 = llvm.mul %2, %1 : i64 %90 = llvm.mul %arg1, %2 : i64 %91 = llvm.add %90, %11 : i64 %92 = llvm.mul %91, %1 : i64 %93 = llvm.add %89, %92 : i64 %94 = llvm.alloca %93 x i8 : (i64) -> !llvm.ptr<i8> %95 = llvm.bitcast %94 : !llvm.ptr<i8> to !llvm.ptr<ptr<f16>> llvm.store %46, %95 : !llvm.ptr<ptr<f16>> %96 = llvm.getelementptr %95[%11] : (!llvm.ptr<ptr<f16>>, i64) -> !llvm.ptr<ptr<f16>> llvm.store %46, %96 : !llvm.ptr<ptr<f16>> %97 = llvm.getelementptr %95[%2] : (!llvm.ptr<ptr<f16>>, i64) -> !llvm.ptr<ptr<f16>> %98 = llvm.bitcast %97 : !llvm.ptr<ptr<f16>> to !llvm.ptr<i64> llvm.store %10, %98 : !llvm.ptr<i64> %99 = llvm.bitcast %94 : !llvm.ptr<i8> to !llvm.ptr<struct<(ptr<f16>, ptr<f16>, i64, i64)>> %100 = llvm.getelementptr %99[%10, 3] : (!llvm.ptr<struct<(ptr<f16>, ptr<f16>, i64, i64)>>, i64) -> !llvm.ptr<i64> %101 = llvm.getelementptr %100[%arg1] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64> %102 = llvm.sub %arg1, %11 : i64 llvm.br ^bb14(%102, %11 : i64, i64) ^bb13: // pred: ^bb6 %103 = llvm.mlir.addressof @error_message_2208944672953921889 : !llvm.ptr<array<42 x i8>> %104 = llvm.getelementptr %103[%10, %10] : (!llvm.ptr<array<42 x i8>>, i64, i64) -> !llvm.ptr<i8> llvm.call @_mlir_ciface_tf_report_error(%arg0, %0, %104) : (!llvm.ptr<i8>, i32, !llvm.ptr<i8>) -> () %105 = llvm.mul %2, %1 : i64 %106 = llvm.mul %2, %10 : i64 %107 = llvm.add %106, %11 : i64 %108 = llvm.mul %107, %1 : i64 %109 = llvm.add %105, %108 : i64 %110 = llvm.alloca %109 x i8 : (i64) -> !llvm.ptr<i8> %111 = llvm.bitcast %110 : !llvm.ptr<i8> to !llvm.ptr<ptr<f16>> llvm.store %13, %111 : !llvm.ptr<ptr<f16>> %112 = llvm.getelementptr %111[%11] : (!llvm.ptr<ptr<f16>>, i64) -> !llvm.ptr<ptr<f16>> llvm.store %13, %112 : !llvm.ptr<ptr<f16>> %113 = llvm.getelementptr %111[%2] : (!llvm.ptr<ptr<f16>>, i64) -> !llvm.ptr<ptr<f16>> %114 = llvm.bitcast %113 : !llvm.ptr<ptr<f16>> to !llvm.ptr<i64> llvm.store %10, %114 : !llvm.ptr<i64> %115 = llvm.call @malloc(%109) : (i64) -> !llvm.ptr<i8> "llvm.intr.memcpy"(%115, %110, %109, %6) : (!llvm.ptr<i8>, !llvm.ptr<i8>, i64, i1) -> () %116 = llvm.mlir.undef : !llvm.struct<(i64, ptr<i8>)> %117 = llvm.insertvalue %10, %116[0] : !llvm.struct<(i64, ptr<i8>)> %118 = llvm.insertvalue %115, %117[1] : !llvm.struct<(i64, ptr<i8>)> llvm.return %118 : !llvm.struct<(i64, ptr<i8>)> ^bb14(%119: i64, %120: i64): // 2 preds: ^bb12, ^bb15 %121 = llvm.icmp "sge" %119, %10 : i64 llvm.cond_br %121, ^bb15, ^bb16 ^bb15: // pred: ^bb14 %122 = llvm.getelementptr %21[%119] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64> %123 = llvm.load %122 : !llvm.ptr<i64> %124 = llvm.getelementptr %100[%119] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64> llvm.store %123, %124 : !llvm.ptr<i64> %125 = llvm.getelementptr %101[%119] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64> llvm.store %120, %125 : !llvm.ptr<i64> %126 = llvm.mul %120, %123 : i64 %127 = llvm.sub %119, %11 : i64 llvm.br ^bb14(%127, %126 : i64, i64) ^bb16: // pred: ^bb14 %128 = llvm.call @malloc(%93) : (i64) -> !llvm.ptr<i8> "llvm.intr.memcpy"(%128, %94, %93, %6) : (!llvm.ptr<i8>, !llvm.ptr<i8>, i64, i1) -> () %129 = llvm.mlir.undef : !llvm.struct<(i64, ptr<i8>)> %130 = llvm.insertvalue %arg1, %129[0] : !llvm.struct<(i64, ptr<i8>)> %131 = llvm.insertvalue %128, %130[1] : !llvm.struct<(i64, ptr<i8>)> llvm.return %131 : !llvm.struct<(i64, ptr<i8>)> } llvm.func @_mlir_ciface_Rsqrt_CPU_DT_HALF_DT_HALF(%arg0: !llvm.ptr<struct<(i64, ptr<i8>)>>, %arg1: !llvm.ptr<i8>, %arg2: !llvm.ptr<struct<(i64, ptr<i8>)>>) attributes {llvm.emit_c_interface, tf_entry} { %0 = llvm.load %arg2 : !llvm.ptr<struct<(i64, ptr<i8>)>> %1 = llvm.extractvalue %0[0] : !llvm.struct<(i64, ptr<i8>)> %2 = llvm.extractvalue %0[1] : !llvm.struct<(i64, ptr<i8>)> %3 = llvm.call @Rsqrt_CPU_DT_HALF_DT_HALF(%arg1, %1, %2) : (!llvm.ptr<i8>, i64, !llvm.ptr<i8>) -> !llvm.struct<(i64, ptr<i8>)> llvm.store %3, %arg0 : !llvm.ptr<struct<(i64, ptr<i8>)>> llvm.return } }	2022-06-15 13:24:24 +02:00
Matthias Springer	a36c801d12	[mlir][bufferize] Better implementation of AnalysisState::isTensorYielded If `create-deallocs=0`, mark all bufferization.alloc_tensor ops as escaping. (Unless they already have an `escape` attribute.) In the absence of analysis information, check SSA use-def chains to see if the value may be yielded. Differential Revision: https://reviews.llvm.org/D127302	2022-06-15 10:15:47 +02:00
Matthias Springer	ad2e635fae	[mlir][linalg][bufferize] Remove always-aliasing-with-dest option This flag was introduced for a use case in IREE, but it is no longer needed. Differential Revision: https://reviews.llvm.org/D126965	2022-06-15 09:56:53 +02:00
Matthias Springer	d361ecbd0d	[mlir][SCF][bufferize] Implement `resolveConflicts` for SCF ops scf::ForOp and scf::WhileOp must insert buffer copies not only for out-of-place bufferizations, but also to enforce additional invariants wrt. to buffer aliasing behavior. This is currently happening in the respective `bufferize` methods. With this change, the tensor copy insertion pass will also enforce these invariants by inserting copies. The `bufferize` methods can then be simplified and made independent of the `AnalysisState` data structure in a subsequent change. Differential Revision: https://reviews.llvm.org/D126822	2022-06-15 09:07:31 +02:00
Lei Zhang	06c6758a98	[mlir][spirv] Handle corner cases for math.powf conversion Per GLSL Pow extended instruction spec: "Result is undefined if x < 0. Result is undefined if x = 0 and y <= 0." So we need to handle negative `x` values specifically. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D127816	2022-06-14 23:02:44 -04:00
jacquesguan	701a282af4	[mlir][Vector] Fold consecutive bitcast. This patch supports to fold consecutive bitcast into one bitcast. Differential Revision: https://reviews.llvm.org/D127723	2022-06-15 10:45:05 +08:00
Phoebe Wang	6e02e27536	Reland "[X86][RFC] Enable `_Float16` type support on X86 following the psABI" Disabled 2 mlir tests due to the runtime doesn't support `_Float16`, see the issue here https://github.com/llvm/llvm-project/issues/55992	2022-06-15 09:15:31 +08:00
Lei Zhang	b4dff404f3	[mlir][spirv] Fix math.ctlz for full zero bit cases If the integer has all zero bits, GLSL FindUMsb would return -1. So theoretically (31 - FindUMsb) should still give use the correct result. However, Adreno GPUshave issues with this: https://buildkite.com/iree/iree-test-android/builds/6482#01815f05-3926-466f-822a-1e20299e5461 This looks like a driver bug. So handle the corner case explicity to workaround it. Reviewed By: mravishankar Differential Revision: https://reviews.llvm.org/D127747	2022-06-14 19:39:27 -04:00
Mogball	ead75d9434	(Reland)[mlir] Add a generic data-flow analysis framework Removes one element of the pointer union to make it work on 32-bit systems. This patch introduces a generic data-flow analysis framework to MLIR. The framework implements a fixed-point iteration algorithm and a dependency graph between lattice states and analysis. Lattice states and points are fully extensible to support highly-customizable analyses. Reviewed By: phisiart, rriddle Differential Revision: https://reviews.llvm.org/D126751	2022-06-14 21:33:05 +00:00
Krzysztof Drewniak	b0b0043209	[mlir][Arith] Pass to switch signed ops for equivalent unsigned ones If all the arguments to and results of an operation are known to be non-negative when interpreted as signed (which also implies that all computations producing those values did not experience signed overflow), we can replace that operation with an equivalent one that operates on unsigned values. Such a replacement, when it is possible, can provide useful hints to backends, such as by allowing LLVM to replace remainder with bitwise operations in more cases. Depends on D124022 Depends on D124023 Reviewed By: Mogball Differential Revision: https://reviews.llvm.org/D124024	2022-06-14 21:18:29 +00:00
Frederik Gossen	a6fa12ab3b	Revert "[mlir] Add a generic data-flow analysis framework" This reverts commit `9dea117283`. The PointerUnion assumes 3 available bits, which is not the case on 32-bit machines.	2022-06-14 17:14:27 -04:00
Okwan Kwon	28331c6097	Revert "[mlir] add an option to print op stats in JSON" There is a failure from the python pass manager. This reverts commit `1a19abf38c`.	2022-06-14 14:09:18 -07:00
Okwan Kwon	1a19abf38c	[mlir] add an option to print op stats in JSON Differential Revision: https://reviews.llvm.org/D127691	2022-06-14 13:06:25 -07:00
Krzysztof Drewniak	75bfc6f295	[mlir][Arith] Implement InferIntRangeInterface for arithmetic ops Depends on D124023 Reviewed By: Mogball, rriddle Differential Revision: https://reviews.llvm.org/D124022	2022-06-14 18:30:34 +00:00
Mogball	9dea117283	[mlir] Add a generic data-flow analysis framework This patch introduces a generic data-flow analysis framework to MLIR. The framework implements a fixed-point iteration algorithm and a dependency graph between lattice states and analysis. Lattice states and points are fully extensible to support highly-customizable analyses. Reviewed By: phisiart, rriddle Differential Revision: https://reviews.llvm.org/D126751	2022-06-14 16:54:15 +00:00
Benjamin Kramer	ba0222cdc6	[mlir][linalg] Add named ops for depthwise 3d convolution Also complete the set by adding a variant of depthwise 1d convolution with the multiplier != 1. Differential Revision: https://reviews.llvm.org/D127687	2022-06-14 18:22:47 +02:00
Alex Zinenko	e3890b7fd6	[mlir] Introduce transform.alternatives op Introduce a transform dialect op that allows one to attempt different transformation sequences on the same piece of payload IR until one of them succeeds. This op fundamentally expands the scope of possibilities in the transform dialect that, until now, could only propagate transformation failure, at least using in-tree operations. This requires a more detailed specification of the execution model for the transform dialect that now indicates how failure is handled and propagated. Transformations described by transform operations now have tri-state results, with some errors being fundamentally irrecoverable (e.g., generating malformed IR) and some others being recoverable by containing ops. Existing transform ops directly implementing the `apply` interface method are updated to produce this directly. Transform ops with the `TransformEachTransformOpTrait` are currently considered to produce only irrecoverable failures and will be updated separately. Reviewed By: springerm Differential Revision: https://reviews.llvm.org/D127724	2022-06-14 17:51:30 +02:00
Chuanqi Xu	735e6c40b5	[Coroutines] Convert coroutine.presplit to enum attr This is required by @nikic in https://reviews.llvm.org/D127383 to decrease the cost to check whether a function is a coroutine and this fixes a FIXME too. Reviewed By: rjmccall, ezhulenev Differential Revision: https://reviews.llvm.org/D127471	2022-06-14 14:23:46 +08:00
Thomas Raoux	087aba4f0f	[mlir][vector] Add pattern to distribute vector reduction to GPU shuffles Add a pattern to do ad hoc lowering of vector.reduction to a sequence of warp shuffles. This allow distributing reduction on a warp for GPU targets. Also add an execution test for warp reduction. co-authored with @springerm Differential Revision: https://reviews.llvm.org/D127176	2022-06-14 05:49:16 +00:00
Thomas Raoux	76cf33dab2	[mlir][vector] Add patterns to ppropagate vector distribution Add patterns to propagate vector distribution and remove dead arguments. This handles propagation for several vector operations. recommit after minor bug fix. Differential Revision: https://reviews.llvm.org/D127167	2022-06-14 05:26:10 +00:00
jacquesguan	5179f885d1	[mlir][Arithmetic] Fold NegF in MulF and DivF. This patch adds the following combination: mulf(negf(x), negf(y)) -> mulf(x, y) divf(negf(x), negf(y)) -> divf(x, y) Differential Revision: https://reviews.llvm.org/D126044	2022-06-14 03:15:19 +00:00
jacquesguan	059ee5d937	[mlir][Vector] Support vectorize to vector.reduction or/and. This patch supports to vectorize affine.for of ori/andi to vector.reduction or/and. Differential Revision: https://reviews.llvm.org/D127090	2022-06-14 03:11:45 +00:00
Mogball	b1b4808c3f	[mlir] Fix CMake file	2022-06-13 22:36:14 +00:00
Mogball	537f220891	[mlir] Support getSuccessorInputs from parent op Ops that implement `RegionBranchOpInterface` are allowed to indicate that they can branch back to themselves in `getSuccessorRegions`, but there is no API that allows them to specify the forwarded operands. This patch enables that by changing `getSuccessorEntryOperands` to accept `None`. Fixes #54928 Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D127239	2022-06-13 22:21:34 +00:00
Mahesh Ravishankar	cf6a7c1947	[mlir][TilingInterface] Add pattern to tile using TilingInterface and implement TilingInterface for Linalg ops. This patch adds support for tiling operations that implement the TilingInterface. - It separates the loop constructs that are used to iterate over tile from the implementation of the tiling itself. For example, the use of destructive updates is more related to use of scf.for for iterating over tiles that are tensors. - To test the transformation, TilingInterface is implemented for LinalgOps. The separation of the looping constructs used from the implementation of tile code generation greatly simplifies the latter. - The implementation of TilingInterface for LinalgOp is kept as an external model for now till this approach can be fully flushed out to replace the existing tiling + fusion approaches in Linalg. Differential Revision: https://reviews.llvm.org/D127133	2022-06-13 20:37:44 +00:00
Thomas Raoux	2d32dac8bb	Revert "[mlir][vector] Add patterns to ppropagate vector distribution" This reverts commit `1c84800c42`. This was causing asan crash.	2022-06-13 17:55:31 +00:00
Lei Zhang	b5192cbe50	[mlir][spirv] Fix result type for arith.cmpi/cmpf conversion We cannot directly use the original result type; instead we need to deduce it from the converted operand type. This addresses invalid ops generated from converting single element vectors. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D127574	2022-06-13 13:15:23 -04:00
Lei Zhang	91de20c36d	[mlir][spirv] Use UnrealizedConversionCast in ArithmeticToSPIRV This avoids pulling in function converion patterns, which is not part of what we want to test in ArithmeticToSPIRV. It also allows using ConvertArithmeticToSPIRVPass as a standalone step. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D127573	2022-06-13 13:13:57 -04:00
Lei Zhang	cc020a2236	[mlir][spirv] Convert math.ctlz to spv.GLSL.FindUMsb Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D127582	2022-06-13 13:02:37 -04:00
Thomas Raoux	1c84800c42	[mlir][vector] Add patterns to ppropagate vector distribution Add patterns to propagate vector distribution and remove dead arguments. This handles propagation for several vector operations. Differential Revision: https://reviews.llvm.org/D127167	2022-06-13 16:38:50 +00:00
Mogball	e16d13322b	[mlir] (NFC) Clean up bazel and CMake target names All dialect targets in bazel have been named Dialect and all dialect targets in CMake have been named MLIRDialect.	2022-06-13 16:24:15 +00:00
Lei Zhang	a4360efb2c	[mlir][spirv] Convert single element vector.splat/fma Reviewed By: ThomasRaoux, hanchung Differential Revision: https://reviews.llvm.org/D127572	2022-06-13 12:18:16 -04:00
Ulrich Weigand	7095a1ff82	Fix endian conversion of sub-byte types When convertEndianOfCharForBEmachine is called with elementBitWidth smaller than CHAR_BIT, the default case is invoked, but this does nothing at all and leaves the output array unchanged. Fix DenseIntOrFPElementsAttr::convertEndianOfArrayRefForBEmachine by not calling convertEndianOfCharForBEmachine in this case, and instead simply copying the input to the output (for sub-byte types, endian conversion is in fact a no-op). Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D125676	2022-06-12 16:08:23 +02:00
Chia-hung Duan	ba3a9f51ff	[mlir:MultiOpDriver] Add operands to worklist should be checked Operand's defining op may not be valid for adding to the worklist under stict mode Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D127180	2022-06-11 15:56:23 +00:00
Lei Zhang	e90b56e411	[mlir][vulkan] Add missing '<>' in test IRs to fix test	2022-06-10 18:09:12 -04:00
Lei Zhang	11cf2d5f62	[mlir][spirv] Unify aliases of different bitwidth scalar types This commit extends the UnifyAliasedResourcePass to handle scalar types of different bitwidths. It requires to get the smaller bitwidth resource as the canonical resource so that we can avoid subcomponent load/store. Instead we load/store multiple smaller bitwidth ones. Reviewed By: hanchung Differential Revision: https://reviews.llvm.org/D127266	2022-06-10 18:01:31 -04:00
Thomas Raoux	ed0288f7c4	[mlir][vector] Add patterns for vector distribution Add pattern to hoist scalar code outside of warp distribute region as those cannot be distributed and we would want to execute them on all the lanes. Add patterns to distribute transfer_write ops. Those operations can be distributed in different ways and it is control by user. Differential Revision: https://reviews.llvm.org/D127152	2022-06-10 17:46:51 +00:00
Alex Zinenko	6403e1b12a	[mlir] add a dynamic user-after-parent-freed transform dialect check In the transform dialect, a transform IR handle may be pointing to a payload IR operation that is an ancestor of another payload IR operation pointed to by another handle. If such a "parent" handle is consumed by a transformation, this indicates that the associated operation is likely rewritten, which in turn means that the "child" handle may now be associated with a dangling pointer or a pointer to a different operation than originally. Add a handle invalidation mechanism to guard against such situations by reporting errors at runtime. Reviewed By: springerm Differential Revision: https://reviews.llvm.org/D127480	2022-06-10 13:05:34 +02:00
Matthias Springer	79f115911e	[mlir][bufferize] Avoid tensor copies when the data is not read There are various shortcuts in `BufferizationState::getBuffer` that avoid a buffer copy when we just need an allocation (and no initialization). This change adds those shortcuts to the TensorCopyInsertion pass, so that `getBuffer` can be simplified in a subsequent change. Differential Revision: https://reviews.llvm.org/D126821	2022-06-10 10:26:07 +02:00
Mogball	a31ff0af9b	[mlir][spirv] Replace StructAttrs with AttrDefs Depends on D127370 Reviewed By: antiagainst Differential Revision: https://reviews.llvm.org/D127373	2022-06-09 23:16:44 +00:00
Mogball	f1182bd6d5	[mlir][tosa] Replace StructAttrs with AttrDefs Depends on D127352 Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D127370	2022-06-09 23:01:51 +00:00
Mogball	d7ef488bb6	[mlir][gpu] Move GPU headers into IR/ and Transforms/ Depends on D127350 Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D127352	2022-06-09 22:49:03 +00:00
Mogball	7bdd3722f2	[mlir][gpu] Change ParalellLoopMappingAttr to AttrDef It was a StructAttr. Also adds a FieldParser for AffineMap. Depends on D127348 Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D127350	2022-06-09 22:23:21 +00:00
Mogball	ba79bb4973	[mlir][nvvm] Change MMAShapeAttr to AttrDef MMAShapeAttr was a StructAttr Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D127348	2022-06-09 22:14:45 +00:00
Matthias Springer	87b46776c4	[mlir][bufferize] Improve resolveConflicts for ExtractSliceOp It is sometimes better to make a copy of the OpResult instead of making a copy of the OpOperand. E.g., when bufferizing tensor.extract_slice. This implementation will eventually make parts of extract_slice's `bufferize` implementation obsolete (and simplify it). It will only need to handle in-place OpOperands. Differential Revision: https://reviews.llvm.org/D126819	2022-06-09 22:19:37 +02:00
Christopher Bate	9f1221521f	Recommit "[mlir][vector] Allow unroll of contraction in arbitrary order" Fixed issue with vector.contract default unroll permutation. Adds support for vector unroll transformations to unroll in different orders. For example, the vector.contract can be unrolled into a smaller set of contractions. There is a choice of how to unroll the decomposition based on the traversal order of (dim0, dim1, dim2). The choice of traversal order can now be specified by a callback which given by the caller of the transform. For now, only the vector.contract, vector.transfer_read/transfer_write operations support the callback. Differential Revision: https://reviews.llvm.org/D127004	2022-06-09 14:01:19 -06:00
Matthias Springer	3b2004e16b	[mlir][bufferization] Add TensorCopyInsertion pass This pass runs the One-Shot Analysis to find out which tensor OpOperands must bufferize out-of-place. It then rewrites those tensor OpOperands to explicit allocations with a copy in the form of `bufferization.alloc_tensor`. The resulting IR can then be bufferized without having to care about read-after-write conflicts. This change makes it possible to connect One-Shot Analysis to other bufferizations such as the sparse compiler. Differential Revision: https://reviews.llvm.org/D126573	2022-06-09 21:55:52 +02:00
Matthias Springer	56d68e8d7a	[mlir][bufferization] Add optional `copy` operand to AllocTensorOp If `copy` is specified, the newly allocated buffer is initialized with the given contents. Also add an optional `escape` attribute to indicate whether the buffer of the tensor may be returned from the parent block (aka. "escape") after bufferization. This change is in preparation of connecting One-Shot Bufferize to the sparse compiler. Differential Revision: https://reviews.llvm.org/D126570	2022-06-09 21:37:15 +02:00
Matthias Springer	88539c5bdb	[mlir][bufferize][NFC] Decouple dropping of equivalent return values from bufferization This simplifies the bufferization itself and is in preparation of connecting with the sparse compiler. Differential Revision: https://reviews.llvm.org/D126814	2022-06-09 18:39:05 +02:00
Matthias Springer	bf58256967	[mlir][bufferize] Fix bug in module equivalence analysis CallOp result are not equivalent to an OpOperand if the OpOperand bufferizes out-of-place. Differential Revision: https://reviews.llvm.org/D126813	2022-06-09 18:32:17 +02:00
Matthias Springer	92680126bf	[mlir][bufferize] Decouple promoteBufferResultsToOutParams from One-Shot Bufferize Users should explicitly run `-buffer-results-to-out-params` instead. The purpose of this change is to remove `finalizeBuffers`, which made it difficult to extend the bufferization to custom buffer types. Differential Revision: https://reviews.llvm.org/D126253	2022-06-09 18:25:26 +02:00
Matthias Springer	058af65e78	[mlir][bufferization] Decouple buffer-deallocation from One-Shot Bufferize The buffer deallocation pass must now be run explicitly when `allow-return-alloc` is set. This results in a few extra buffer copies in unoptimized test cases. The proper way to avoid such copies is to relax the OpOperand/OpResult aliasing contract on ops such as scf.for. Some of these copies can also be avoided by improving the buffer deallocation pass. Differential Revision: https://reviews.llvm.org/D126252	2022-06-09 18:20:39 +02:00
Yuanqiang Liu	56e19717f5	[MLIR][Shape] Generalize `shape.concat` to extent tensors The operation `shape.concat` was used for type shape only. We now enable it for extent tensors. Reviewed By: jpienaar Differential Revision: https://reviews.llvm.org/D127321	2022-06-09 08:23:26 -07:00
Matthias Springer	461dafd2a3	[mlir][bufferization] Add OneShotBufferize transform op This commit allows for One-Shot Bufferize to be used through the transform dialect. No op handle is currently returned for the bufferized IR. Differential Revision: https://reviews.llvm.org/D125098	2022-06-09 15:15:09 +02:00
Alex Zinenko	b6c58ec486	[mlir] add producer fusion to structured transform ops This relies on the existing TileAndFuse pattern for tensor-based structured ops. It complements pure tiling, from which some utilities are generalized. Depends On D127300 Reviewed By: springerm Differential Revision: https://reviews.llvm.org/D127319	2022-06-09 14:30:45 +02:00
Benjamin Kramer	abcf1496ad	Fix complex.conj integration test - It doesn't actually print the fractional part if the result is a whole number - One of the expectations was just wrong	2022-06-09 13:11:10 +02:00
Alex Zinenko	5f0d4f208e	[mlir] Introduce Transform ops for loops Introduce transform ops for "for" loops, in particular for peeling, software pipelining and unrolling, along with a couple of "IR navigation" ops. These ops are intended to be generalized to different kinds of loops when possible and therefore use the "loop" prefix. They currently live in the SCF dialect as there is no clear place to put transform ops that may span across several dialects, this decision is postponed until the ops actually need to handle non-SCF loops. Additionally refactor some common utilities for transform ops into trait or interface methods, and change the loop pipelining to be a returning pattern. Reviewed By: springerm Differential Revision: https://reviews.llvm.org/D127300	2022-06-09 11:41:55 +02:00
lewuathe	fff27d181c	[mlir][complex] Correctness check for complex.conj Add correctness check for complex.conj operation Reviewed By: pifon2a Differential Revision: https://reviews.llvm.org/D127377	2022-06-09 11:11:56 +02:00
bixia1	5b1c5fc53a	[mlir][sparse] Add complex number reading from files. Support complex numbers for Matrix Market Exchange Formats. Add a test case. Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D127138	2022-06-08 13:33:35 -07:00
bixia1	6c6eddb617	[mlir] Lower complex.power and complex.rsqrt to standard dialect. Add conversion tests and correctness tests. Reviewed By: pifon2a Differential Revision: https://reviews.llvm.org/D127255	2022-06-08 10:53:53 -07:00
dime10	4f55ed5a1e	Add Python bindings for the OpaqueType Implement the C-API and Python bindings for the builtin opaque type, which was previously missing. Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D127303	2022-06-08 19:51:00 +02:00
Mogball	ee70039ae2	[mlir] Fix handling of some region branch terminator successors When `RegionBranchOpInterface::getSuccessorRegions` is called for anything other than the parent op, it expects the operands of the terminator of the source region to be passed, not the operands of the parent op. This was not always respected. This fixes a bug in integer range inference and ForwardDataFlowSolver and changes `scf.while` to allow narrowing of successors using constant inputs. Fixes #55873 Reviewed By: mehdi_amini, krzysz00 Differential Revision: https://reviews.llvm.org/D127261	2022-06-08 17:17:03 +00:00
bixia1	ea8ed5cbcf	[mlir][sparse] Add F16 and BF16. This is the first PR to add `F16` and `BF16` support to the sparse codegen. There are still problems in supporting these two data types, such as `BF16` is not quite working yet. Add tests cases. Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D127010	2022-06-08 09:51:05 -07:00
lorenzo chelini	a0fc94ab61	[MLIR][Math] Add round operation Introduce RoundOp in the math dialect. The operation rounds the operand to the nearest integer value in floating-point format. RoundOp lowers to LLVM intrinsics 'llvm.intr.round' or as a function call to libm (round or roundf). Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D127286	2022-06-08 13:07:39 +02:00
Matthias Springer	032be23309	[mlir][bufferize] Improve buffer writability analysis Find writability conflicts (writes to buffers that are not allowed to be written to) by checking SSA use-def chains. This is better than the current writability analysis, which is too conservative and finds false positives. Differential Revision: https://reviews.llvm.org/D127256	2022-06-08 10:11:52 +02:00
Benjamin Kramer	6eb0f8e285	[mlir][MemRef] Fix a crash when expanding a scalar shape In this case the reassociation is empty, yielding no strides for the result type. Differential Revision: https://reviews.llvm.org/D127232	2022-06-08 09:37:40 +02:00
Christopher Bate	53fe155b3f	Revert "[mlir][vector] Allow unroll of contraction in arbitrary order" Reverts commit `1469ebf838` (original commit) Reverts commit `a392a39f75` (build fix for above commit) The commit broke tests in out-of-tree projects, indicating that some logical error was made in the previous change but not covered by current tests.	2022-06-07 14:54:01 -06:00
Kiran Chandramohan	dd32bf9a77	[Flang,MLIR,OpenMP] Fix a few tests that were not converting to LLVM A few OpenMP tests were retaining the FIR operands even after running the LLVM conversion pass. To fix these tests the legality checkes for OpenMP conversion are made stricter to include operands and results. The Flush, Single and Sections operations are added to conversions or legality checks. The RegionLessOpConversion is appropriately renamed to clarify that it works only for operations with Variable operands. The operands of the flush operation are changed to match those of Variable Operands. Fix for an OpenMP issue mentioned in https://github.com/llvm/llvm-project/issues/55210. Reviewed By: shraiysh, peixin, awarzynski Differential Revision: https://reviews.llvm.org/D127092	2022-06-07 09:55:53 +00:00
Alexander Batashev	8324561e33	[mlir][spirv] Correctly deduce PhysicalStorageBuffer64 addressing model According to the SPIR-V specification[1], PhysicalStorageBuffer storage class can only be used iff addressing model is PhysicalStorageBuffer64. [1]: https://www.khronos.org/registry/SPIR-V/specs/unified1/SPIRV.html#_addressing_model Reviewed By: antiagainst Differential Revision: https://reviews.llvm.org/D127067	2022-06-07 12:14:38 +03:00
lorenzo chelini	9b3712e0bf	[MLIR][LLVMIR] Add round intrinsic Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D126879	2022-06-07 10:27:55 +02:00
lewuathe	62a34f6a6f	[mlir][complex] Add complex.conj op Add complex.conj op to calculate the complex conjugate which is widely used for the mathematical operation on the complex space. Reviewed By: pifon2a Differential Revision: https://reviews.llvm.org/D127181	2022-06-07 09:38:35 +02:00
River Riddle	5919eab55c	[mlir:PDLL] Add support for inlay hints These allow for displaying additional inline information, such as the types of variables, names operands/results, constraint/rewrite arguments, etc. This requires a bump in the vscode extension to a newer version, as inlay hints are a new LSP feature. Differential Revision: https://reviews.llvm.org/D126033	2022-06-06 20:20:19 -07:00
River Riddle	6187178e83	[mlir:LSP] Switch document sync mode to Incremental This is much more efficient over the full mode, as it only requires sending smalls chunks of files. It also works around a weird command ordering issue (full document updates are being sent after other commands like code completion) in newer versions of vscode. Differential Revision: https://reviews.llvm.org/D126032	2022-06-06 20:20:19 -07:00
Georgios Pinitas	3bcaf2eb93	[mlir][tosa] Moves constant folding operations out of the Canonicalizer Transpose operations on constant data were getting folded during the canonicalization process. This has compile time cost proportional to the constant size. Moving this to a separate pass to enable optionality and flexibility of how such scenarios can be handled. Reviewed By: rsuderman, jpienaar, stellaraccident Differential Revision: https://reviews.llvm.org/D124685	2022-06-06 22:10:22 +00:00
Christopher Bate	1469ebf838	[mlir][vector] Allow unroll of contraction in arbitrary order Adds supprot for vector unroll transformations to unroll in different orders. For example, the `vector.contract` can be unrolled into a smaller set of contractions. There is a choice of how to unroll the decomposition based on the traversal order of (dim0, dim1, dim2). The choice of traversal order can now be specified by a callback which given by the caller of the transform. For now, only the `vector.contract`, `vector.transfer_read/transfer_write` operations support the callback. Differential Revision: https://reviews.llvm.org/D127004	2022-06-06 14:31:04 -06:00
Christopher Bate	cca662b849	[mlir][linalg] add conv_2d_nhwc_fhwc named op This operation should be supported as a named op because when the operands are viewed as having canonical layouts with decreasing strides, then the "reduction" dimensions of the filter (h, w, and c) are contiguous relative to each output channel. When lowered to a matrix multiplication, this layout is the simplest to deal with, and thus future transforms/vectorizations of `conv2d` may find using this named op convenient. Differential Revision: https://reviews.llvm.org/D126995	2022-06-06 13:18:08 -06:00
Christopher Bate	99069ab212	[mlir][linalg] fix crash when promoting rank-reducing memref.subviews This change adds support for promoting `linalg` operation operands that are produced by rank-reducing `memref.subview` ops. Differential Revision: https://reviews.llvm.org/D127086	2022-06-06 12:06:36 -06:00
Stella Laurenzo	768a251587	[mlir] Tunnel LLVM_USE_LINKER through to the standalone example build. When building in debug mode, the link time of the standalone sample is excessive, taking upwards of a minute if using BFD. This at least allows lld to be used if the main invocation was configured that way. On my machine, this gets a standalone test that requires a relink to run in ~13s for Debug mode. This is still a lot, but better than it was. I think we may want to do something about this test: it adds a lot of latency to a normal compile/test cycle and requires a bunch of arg fiddling to exclude. I think we may end up wanting a `check-mlir-heavy` target that can be used just prior to submit, and then make `check-mlir` just run unit/lite tests. More just thoughts for the future (none of that is done here). Reviewed By: bondhugula, mehdi_amini Differential Revision: https://reviews.llvm.org/D126585	2022-06-05 12:31:41 -07:00
Fangrui Song	d86a206f06	Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options	2022-06-05 00:31:44 -07:00
Christian Sigg	400fef081a	Recommit: "[MLIR][NVVM] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration." This change rolls `bcfc0a9051` forward (i.e., reverting `369ce54bb3`) with fixed CMakeLists.txt.	2022-06-05 09:11:43 +02:00
Jacques Pienaar	29794ab0fa	[mlir] Use context provided rather than getContext Avoids "pass state was never initialized" assertion failure.	2022-06-04 12:18:51 -07:00
Mehdi Amini	369ce54bb3	Revert "[MLIR][GPU] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration." This reverts commit `bcfc0a9051`. The build is broken with shared library enabled.	2022-06-04 08:35:45 +00:00
Christian Sigg	bcfc0a9051	[MLIR][GPU] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration. This is correct for all values, i.e. the same as promoting the division to fp32 in the NVPTX backend. But it is faster (~10% in average, sometimes more) because: - it performs less Newton iterations - it avoids the slow path for e.g. denormals - it allows reuse of the reciprocal for multiple divisions by the same divisor Test program: ``` #include <stdio.h> #include "cuda_fp16.h" // This is a variant of CUDA's own __hdiv which is fast than hdiv_promote below // and doesn't suffer from the perf cliff of div.rn.fp32 with 'special' values. __device__ half hdiv_newton(half a, half b) { float fa = __half2float(a); float fb = __half2float(b); float rcp; asm("{rcp.approx.ftz.f32 %0, %1;\n}" : "=f"(rcp) : "f"(fb)); float result = fa * rcp; auto exponent = reinterpret_cast<const unsigned&>(result) & 0x7f800000; if (exponent != 0 && exponent != 0x7f800000) { float err = __fmaf_rn(-fb, result, fa); result = __fmaf_rn(rcp, err, result); } return __float2half(result); } // Surprisingly, this is faster than CUDA's own __hdiv. __device__ half hdiv_promote(half a, half b) { return __float2half(__half2float(a) / __half2float(b)); } // This is an approximation that is accurate up to 1 ulp. __device__ half hdiv_approx(half a, half b) { float fa = __half2float(a); float fb = __half2float(b); float result; asm("{div.approx.ftz.f32 %0, %1, %2;\n}" : "=f"(result) : "f"(fa), "f"(fb)); return __float2half(result); } __global__ void CheckCorrectness() { int i = threadIdx.x + blockIdx.x * blockDim.x; half x = reinterpret_cast<const half&>(i); for (int j = 0; j < 65536; ++j) { half y = reinterpret_cast<const half&>(j); half d1 = hdiv_newton(x, y); half d2 = hdiv_promote(x, y); auto s1 = reinterpret_cast<const short&>(d1); auto s2 = reinterpret_cast<const short&>(d2); if (s1 != s2) { printf("%f (%u) / %f (%u), got %f (%hu), expected: %f (%hu)\n", __half2float(x), i, __half2float(y), j, __half2float(d1), s1, __half2float(d2), s2); //__trap(); } } } __device__ half dst; __global__ void ProfileBuiltin(half x) { #pragma unroll 1 for (int i = 0; i < 10000000; ++i) { x = x / x; } dst = x; } __global__ void ProfilePromote(half x) { #pragma unroll 1 for (int i = 0; i < 10000000; ++i) { x = hdiv_promote(x, x); } dst = x; } __global__ void ProfileNewton(half x) { #pragma unroll 1 for (int i = 0; i < 10000000; ++i) { x = hdiv_newton(x, x); } dst = x; } __global__ void ProfileApprox(half x) { #pragma unroll 1 for (int i = 0; i < 10000000; ++i) { x = hdiv_approx(x, x); } dst = x; } int main() { CheckCorrectness<<<256, 256>>>(); half one = __float2half(1.0f); ProfileBuiltin<<<1, 1>>>(one); // 1.001s ProfilePromote<<<1, 1>>>(one); // 0.560s ProfileNewton<<<1, 1>>>(one); // 0.508s ProfileApprox<<<1, 1>>>(one); // 0.304s auto status = cudaDeviceSynchronize(); printf("%s\n", cudaGetErrorString(status)); } ``` Reviewed By: herhut Differential Revision: https://reviews.llvm.org/D126158	2022-06-04 08:03:29 +02:00
wren romano	3cf03f1c56	[mlir][sparse] Adding IsSparseTensorPred and updating ops to use it Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D126994	2022-06-03 17:15:31 -07:00
Christopher Bate	9f819f4c62	[mlir][linalg] fix crash in vectorization of elementwise operations The current vectorization logic implicitly expects "elementwise" linalg ops to have projected permutations for indexing maps, but the precondition logic misses this check. This can result in a crash when executing the generic vectorization transform on an op with a non-projected permutation input indexing map. This change fixes the logic and adds a test (which crashes without this fix). Differential Revision: https://reviews.llvm.org/D127000	2022-06-03 16:38:13 -06:00
Krzysztof Drewniak	95aff23e29	Re-land "[mlir] Add integer range inference analysis"" This reverts commit `4e5ce2056e`. This relands commit `1350c9887d`. Reinstates the range analysis with the build issue fixed. Differential Revision: https://reviews.llvm.org/D126926	2022-06-03 17:13:48 +00:00
lewuathe	d4141c93a8	[mlir][complex] Check the correctness of tanh in complex dialect Correctness check for tanh operation in complex dialect. Ref: https://reviews.llvm.org/D126858 Reviewed By: pifon2a Differential Revision: https://reviews.llvm.org/D126946	2022-06-03 14:04:48 +02:00
Shraiysh Vaishay	f5d29c15bf	[mlir][OpenMP] Add memory_order clause tests This patch adds tests for memory_order clause for atomic update and capture operations. This patch also adds a check for making sure that the operations inside and omp.atomic.capture region do not specify the memory_order clause. Reviewed By: kiranchandramohan, peixin Differential Revision: https://reviews.llvm.org/D126195	2022-06-03 13:41:22 +05:30
Nicolas Vasilache	72de7588cc	[mlir][SCF] Add bufferization hook for scf.foreach_thread and terminator. `scf.foreach_thread` results alias with the underlying `scf.foreach_thread.parallel_insert_slice` destination operands and they bufferize to equivalent buffers in the absence of other conflicts. `scf.foreach_thread.parallel_insert_slice` conflict detection is similar to `tensor.insert_slice` conflict detection. Reviewed By: springerm Differential Revision: https://reviews.llvm.org/D126769	2022-06-03 07:14:05 +00:00
Thomas Raoux	271a48e029	[mlir][VectorToGPU] Fix bug generating incorrect ldmatrix ops ldmatrix transpose can only be used with types that are 16bits wide. Differential Revision: https://reviews.llvm.org/D126846	2022-06-03 04:30:22 +00:00
Thomas Raoux	205c08b54d	[mlir][scf] Add option to loop pipelining to not peel the epilogue Add an option to predicate the epilogue within the kernel instead of peeling the epilogue. This is a useful option to prevent generating large amount of code for deep pipeline. This currently require a user lamdba to implement operation predication. Differential Revision: https://reviews.llvm.org/D126753	2022-06-03 04:20:20 +00:00
Aart Bik	f8b692dd31	[mlir][python][f16] add ctype python binding support for f16 Similar to complex128/complex64, float16 has no direct support in the ctypes implementation. This fixes the issue by using a custom F16 type to change the view in and out of MLIR code Reviewed By: wrengr Differential Revision: https://reviews.llvm.org/D126928	2022-06-02 17:21:24 -07:00
River Riddle	bf352e0b2e	[mlir:PDLL] Add better support for providing Constraint/Pattern/Rewrite documentation This commit enables providing long-form documentation more seamlessly to the LSP by revamping decl documentation. For ODS imported constructs, we now also import descriptions and attach them to decls when possible. For PDLL constructs, the LSP will now try to provide documentation by parsing the comments directly above the decls location within the source file. This commit also adds a new parser flag `enableDocumentation` that gates the import and attachment of ODS documentation, which is unnecessary in the normal build process (i.e. it should only be used/consumed by tools). Differential Revision: https://reviews.llvm.org/D124881	2022-06-02 16:31:07 -07:00
Mehdi Amini	4e5ce2056e	Revert "[mlir] Add integer range inference analysis" This reverts commit `1350c9887d`. Shared library build is broken with undefined references.	2022-06-02 21:24:06 +00:00
Krzysztof Drewniak	1350c9887d	[mlir] Add integer range inference analysis This commit defines a dataflow analysis for integer ranges, which uses a newly-added InferIntRangeInterface to compute the lower and upper bounds on the results of an operation from the bounds on the arguments. The range inference is a flow-insensitive dataflow analysis that can be used to simplify code, such as by statically identifying bounds checks that cannot fail in order to eliminate them. The InferIntRangeInterface has one method, inferResultRanges(), which takes a vector of inferred ranges for each argument to an op implementing the interface and a callback allowing the implementation to define the ranges for each result. These ranges are stored as ConstantIntRanges, which hold the lower and upper bounds for a value. Bounds are tracked separately for the signed and unsigned interpretations of a value, which ensures that the impact of arithmetic overflows is correctly tracked during the analysis. The commit also adds a -test-int-range-inference pass to test the analysis until it is integrated into SCCP or otherwise exposed. Finally, this commit fixes some bugs relating to the handling of region iteration arguments and terminators in the data flow analysis framework. Depends on D124020 Depends on D124021 Reviewed By: rriddle, Mogball Differential Revision: https://reviews.llvm.org/D124023	2022-06-02 20:24:11 +00:00
Ashay Rane	5fee1799f4	[mlir] translate memref.reshape with static shapes but dynamic dims Prior to this patch, the lowering of memref.reshape operations to the LLVM dialect failed if the shape argument had a static shape with dynamic dimensions. This patch adds the necessary support so that when the shape argument has dynamic values, the lowering probes the dimension at runtime to set the size in the `MemRefDescriptor` type. This patch also computes the stride for dynamic dimensions by deriving it from the sizes of the inner dimensions. Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D126604	2022-06-02 10:00:58 -07:00
Alex Zinenko	ce2e198bc2	[mlir] add decompose and generalize to structured transform ops These ops complement the tiling/padding transformations by transforming higher-level named structured operations such as depthwise convolutions into lower-level and/or generic equivalents that are better handled by some downstream transformations. Differential Revision: https://reviews.llvm.org/D126698	2022-06-02 15:25:18 +02:00
Nicolas Vasilache	311967701a	[mlir][SCF] Add scf.foreach_thread.parallel_insert_slice canonicalization. Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D126761	2022-06-02 11:53:25 +00:00
lewuathe	9f0869a61d	[mlir][complex] Lower complex.sin/cos to libm Lower sin/cos operation in complex dialect to libm as a baseline. This follows up to https://reviews.llvm.org/D125550. Reviewed By: pifon2a Differential Revision: https://reviews.llvm.org/D126755	2022-06-02 10:39:00 +02:00
lewuathe	4b13b061ae	[mlir][complex] Sanity check for tan operation in complex dialect Add a sanity check for newly added tan operation in complex dialect. It follows-up to https://reviews.llvm.org/D126685. Differential Revision: https://reviews.llvm.org/D126858	2022-06-02 10:33:40 +02:00

1 2 3 4 5 ...

6286 Commits