llvm-project

Commit Graph

Author	SHA1	Message	Date
Alex Zinenko	8b68da2c7d	[mlir] move SCF headers to SCF/{IR,Transforms} respectively This aligns the SCF dialect file layout with the majority of the dialects. Reviewed By: jpienaar Differential Revision: https://reviews.llvm.org/D128049	2022-06-20 10:18:01 +02:00
NAKAMURA Takumi	0716b3ec09	[Bazel] Rename generated *_main.cpp to [tool-name]-driver.cpp which CMake uses. Fixup to llvmorg-15-init-12347-gf06abbb39380 The difference of basename affects its emitted object file. FIXME: Each rule's name is left as origin.	2022-06-18 13:35:23 +09:00
Benjamin Kramer	d5c29b23e1	[mlir][sparse] Inline the definition of LLVM_ATTRIBUTE_WEAK This library is supposed not to have a dependency on LLVM, and linking LLVMSupport into it breaks its shared library setup.	2022-06-17 22:41:10 +02:00
Benjamin Kramer	feb8313fbf	[bazel] Add missing dependency after 455679413 This is supposed to be header-only. Don't know how to express that in bazel.	2022-06-17 22:09:13 +02:00
Frederik Gossen	e694b22344	[mlir][nvgpu] Fix Bazel BUILD file Differential Revision: https://reviews.llvm.org/D128078	2022-06-17 14:35:30 -04:00
Christopher Bate	51b925df94	[mlir][nvgpu] shared memory access optimization pass This change adds a transformation and pass to the NvGPU dialect that attempts to optimize reads/writes from a memref representing GPU shared memory in order to avoid bank conflicts. Given a value representing a shared memory memref, it traverses all reads/writes within the parent op and, subject to suitable conditions, rewrites all last dimension index values such that element locations in the final (col) dimension are given by `newColIdx = col % vecSize + perm[row](col/vecSize,row)` where `perm` is a permutation function indexed by `row` and `vecSize` is the vector access size in elements (currently assumes 128bit vectorized accesses, but this can be made a parameter). This specific transformation can help optimize typical distributed & vectorized accesses common to loading matrix multiplication operands to/from shared memory. Differential Revision: https://reviews.llvm.org/D127457	2022-06-17 09:31:05 -06:00
Guillaume Chatelet	7b73f53790	[libc] Rely on __builtin_memcpy_inline for memcpy implementation This patch removes usage of `-mllvm -combiner-global-alias-analysis` and relies on compiler builtin to implement `memcpy`. Note that `-mllvm -combiner-global-alias-analysis` is actually only useful for functions where buffers can alias (namely `memcpy` and `memmove`). The other memory functions where not benefiting from the flag anyways. The upside is that the memory functions can now be compiled from source with thinlto (thinlto would not be able to carry on the flag when doing inlining). The downside is that for compilers other than clang (i.e. not providing `__builtin_memcpy_inline`) the codegen may be worse. Differential Revision: https://reviews.llvm.org/D128051	2022-06-17 14:22:26 +00:00
Alex Zinenko	610139d2d9	[mlir] replace 'emit_c_wrappers' func->llvm conversion option with a pass The 'emit_c_wrappers' option in the FuncToLLVM conversion requests C interface wrappers to be emitted for every builtin function in the module. While this has been useful to bootstrap the interface, it is problematic in the longer term as it may unintentionally affect the functions that should retain their existing interface, e.g., libm functions obtained by lowering math operations (see D126964 for an example). Since D77314, we have a finer-grain control over interface generation via an attribute that avoids the problem entirely. Remove the 'emit_c_wrappers' option. Introduce the '-llvm-request-c-wrappers' pass that can be run in any pipeline that needs blanket emission of functions to annotate all builtin functions with the attribute before performing the usual lowering that accounts for the attribute. Reviewed By: chelini Differential Revision: https://reviews.llvm.org/D127952	2022-06-17 11:10:31 +02:00
Guillaume Chatelet	c26366979b	[libc][bazel] Remove memcpy dependency in memmove	2022-06-17 09:07:24 +00:00
bixia1	bbb73ade43	[mlir][complex] Add Python bindings for complex ops. Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D127916	2022-06-16 14:19:11 -07:00
Guillaume Chatelet	4a6929f811	Revert "[libc] Apply no-builtin everywhere, remove unnecessary flags" This reverts commit `b2a9ea4420`.	2022-06-16 09:28:17 +00:00
Jacques Pienaar	02e32708bd	[mlir][bzl] Export textmate grammar file	2022-06-15 11:28:43 -07:00
Rob Suderman	640973f2b9	[tosa] Lower tosa.slice to tensor.slice for dynamic case Existing slice lowering only supporting static shapes. Reviewed By: NatashaKnk Differential Revision: https://reviews.llvm.org/D127704	2022-06-15 09:54:36 -07:00
Thomas Raoux	6834803c3d	[mlir][vector] NFC remove dependency of VectorTransform to GPU dialect Make the reduction distribution pattern more generic and remove layering problem. The new pattern to distribute reduction is now independent of GPU and takes a lamdba to decide how the distributed reduction should be generated. Differential Revision: https://reviews.llvm.org/D127867	2022-06-15 16:08:29 +00:00
Benjamin Kramer	55b76fb58e	[bazel] Port `b0b0043209`	2022-06-14 23:55:50 +02:00
Benjamin Kramer	8224fb7ef9	[bazel] Port `75bfc6f295`	2022-06-14 23:46:02 +02:00
Mogball	ead75d9434	(Reland)[mlir] Add a generic data-flow analysis framework Removes one element of the pointer union to make it work on 32-bit systems. This patch introduces a generic data-flow analysis framework to MLIR. The framework implements a fixed-point iteration algorithm and a dependency graph between lattice states and analysis. Lattice states and points are fully extensible to support highly-customizable analyses. Reviewed By: phisiart, rriddle Differential Revision: https://reviews.llvm.org/D126751	2022-06-14 21:33:05 +00:00
Frederik Gossen	a6fa12ab3b	Revert "[mlir] Add a generic data-flow analysis framework" This reverts commit `9dea117283`. The PointerUnion assumes 3 available bits, which is not the case on 32-bit machines.	2022-06-14 17:14:27 -04:00
Mogball	9dea117283	[mlir] Add a generic data-flow analysis framework This patch introduces a generic data-flow analysis framework to MLIR. The framework implements a fixed-point iteration algorithm and a dependency graph between lattice states and analysis. Lattice states and points are fully extensible to support highly-customizable analyses. Reviewed By: phisiart, rriddle Differential Revision: https://reviews.llvm.org/D126751	2022-06-14 16:54:15 +00:00
Thomas Raoux	087aba4f0f	[mlir][vector] Add pattern to distribute vector reduction to GPU shuffles Add a pattern to do ad hoc lowering of vector.reduction to a sequence of warp shuffles. This allow distributing reduction on a warp for GPU targets. Also add an execution test for warp reduction. co-authored with @springerm Differential Revision: https://reviews.llvm.org/D127176	2022-06-14 05:49:16 +00:00
Benjamin Kramer	b8cdff8894	[bazel] Unbreak the build after `cf6a7c1947`	2022-06-14 00:23:17 +02:00
Mahesh Ravishankar	cf6a7c1947	[mlir][TilingInterface] Add pattern to tile using TilingInterface and implement TilingInterface for Linalg ops. This patch adds support for tiling operations that implement the TilingInterface. - It separates the loop constructs that are used to iterate over tile from the implementation of the tiling itself. For example, the use of destructive updates is more related to use of scf.for for iterating over tiles that are tensors. - To test the transformation, TilingInterface is implemented for LinalgOps. The separation of the looping constructs used from the implementation of tile code generation greatly simplifies the latter. - The implementation of TilingInterface for LinalgOp is kept as an external model for now till this approach can be fully flushed out to replace the existing tiling + fusion approaches in Linalg. Differential Revision: https://reviews.llvm.org/D127133	2022-06-13 20:37:44 +00:00
Mogball	e16d13322b	[mlir] (NFC) Clean up bazel and CMake target names All dialect targets in bazel have been named Dialect and all dialect targets in CMake have been named MLIRDialect.	2022-06-13 16:24:15 +00:00
Benjamin Kramer	914e30ca5c	[bazel] Change references to driver-template.cpp.in after `6bc8163c79`	2022-06-10 10:22:55 +02:00
Adrian Kuegel	61132005a9	Fix bazel BUILD.	2022-06-10 08:26:00 +02:00
Mogball	2af69c6751	[mlir][NFC] Rename Bazel target aliases and consolidate targets This patch completes outstanding TODOs of removing aliases bazel target names. This patch also renames and cosolidates some bazel targets to be more in line with their CMake counterparts, e.g. combining `:LinalgOps` and `:LinalgInterfaces` into `:LinalgDialect`. Differential Revision: https://reviews.llvm.org/D127459	2022-06-09 23:58:07 +00:00
Mogball	a31ff0af9b	[mlir][spirv] Replace StructAttrs with AttrDefs Depends on D127370 Reviewed By: antiagainst Differential Revision: https://reviews.llvm.org/D127373	2022-06-09 23:16:44 +00:00
Mogball	f1182bd6d5	[mlir][tosa] Replace StructAttrs with AttrDefs Depends on D127352 Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D127370	2022-06-09 23:01:51 +00:00
Mogball	d7ef488bb6	[mlir][gpu] Move GPU headers into IR/ and Transforms/ Depends on D127350 Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D127352	2022-06-09 22:49:03 +00:00
Mogball	7bdd3722f2	[mlir][gpu] Change ParalellLoopMappingAttr to AttrDef It was a StructAttr. Also adds a FieldParser for AffineMap. Depends on D127348 Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D127350	2022-06-09 22:23:21 +00:00
Benjamin Kramer	3ae85bd67a	[bazel] Add missing dependency after `9f1221521f`.	2022-06-09 22:41:32 +02:00
Matthias Springer	461dafd2a3	[mlir][bufferization] Add OneShotBufferize transform op This commit allows for One-Shot Bufferize to be used through the transform dialect. No op handle is currently returned for the bufferized IR. Differential Revision: https://reviews.llvm.org/D125098	2022-06-09 15:15:09 +02:00
Alex Zinenko	5f0d4f208e	[mlir] Introduce Transform ops for loops Introduce transform ops for "for" loops, in particular for peeling, software pipelining and unrolling, along with a couple of "IR navigation" ops. These ops are intended to be generalized to different kinds of loops when possible and therefore use the "loop" prefix. They currently live in the SCF dialect as there is no clear place to put transform ops that may span across several dialects, this decision is postponed until the ops actually need to handle non-SCF loops. Additionally refactor some common utilities for transform ops into trait or interface methods, and change the loop pipelining to be a returning pattern. Reviewed By: springerm Differential Revision: https://reviews.llvm.org/D127300	2022-06-09 11:41:55 +02:00
bixia1	ea8ed5cbcf	[mlir][sparse] Add F16 and BF16. This is the first PR to add `F16` and `BF16` support to the sparse codegen. There are still problems in supporting these two data types, such as `BF16` is not quite working yet. Add tests cases. Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D127010	2022-06-08 09:51:05 -07:00
Tue Ly	63aa853389	[libc] Add expm1f function to bazel's build overlay. Add expm1f function to bazel's build overlay. Reviewed By: gchatelet Differential Revision: https://reviews.llvm.org/D127298	2022-06-08 09:49:47 -04:00
Reid Kleckner	570e76bb6c	[config] Remove vestigial LLVM_VERSION_INFO This has been superseded by the llvm/Support/VCSRevision.h header. So far as I can tell, nothing in the CMake build sets LLVM_VERSION_INFO. It was always undefined, and the ifdefs using it were dead. However, CMake is very flexible, so it's possible that I missed some ways to set this variable. One could, for example, probably pass -DLLVM_VERSION_INFO=x on the command line and get that through to configure_file, or set the variable in an obscure way (`set(${proj}_VERSION_INFO "x")`). I'm reasonably confident that isn't happening, but I'd like a second opinion. Update the Bazel and gn builds accordingly. Differential Revision: https://reviews.llvm.org/D126977	2022-06-07 11:36:26 -07:00
Reid Kleckner	b1c7889f32	[config] Remove RETSIGTYPE from config.h.cmake, NFC This doesn't need to be configurable. It was hardcoded to void in all LLVM build systems.	2022-06-07 11:35:25 -07:00
Mogball	5cac7cda95	[mlir][bazel] fix bazel build on VectorTransforms	2022-06-06 21:51:17 +00:00
Chris Bieneman	f06abbb393	LLVM Driver Multicall tool This patch adds an llvm-driver multicall tool that can combine multiple LLVM-based tools. The build infrastructure is enabled for a tool by adding the GENERATE_DRIVER option to the add_llvm_executable CMake call, and changing the tool's main function to a canonicalized tool_name_main format (i.e. llvm_ar_main, clang_main, etc...). As currently implemented llvm-driver contains dsymutil, llvm-ar, llvm-cxxfilt, llvm-objcopy, and clang (if clang is included in the build). llvm-driver can be enabled from builds by setting LLVM_TOOL_LLVM_DRIVER_BUILD=On. There are several limitations in the current implementation, which can be addressed in subsequent patches: (1) the multicall binary cannot currently properly handle multi-dispatch tools. This means symlinking llvm-ranlib to llvm-driver will not properly result in llvm-ar's main being called. (2) the multicall binary cannot be comprised of tools containing conflicting cl::opt options as the global cl::opt option list cannot contain duplicates. These limitations can be addressed in subsequent patches. Differential revision: https://reviews.llvm.org/D109977	2022-06-06 04:27:32 +00:00
Christian Sigg	400fef081a	Recommit: "[MLIR][NVVM] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration." This change rolls `bcfc0a9051` forward (i.e., reverting `369ce54bb3`) with fixed CMakeLists.txt.	2022-06-05 09:11:43 +02:00
Mehdi Amini	369ce54bb3	Revert "[MLIR][GPU] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration." This reverts commit `bcfc0a9051`. The build is broken with shared library enabled.	2022-06-04 08:35:45 +00:00
Christian Sigg	bcfc0a9051	[MLIR][GPU] Replace fdiv on fp16 with promoted (fp32) multiplication with reciprocal plus one (conditional) Newton iteration. This is correct for all values, i.e. the same as promoting the division to fp32 in the NVPTX backend. But it is faster (~10% in average, sometimes more) because: - it performs less Newton iterations - it avoids the slow path for e.g. denormals - it allows reuse of the reciprocal for multiple divisions by the same divisor Test program: ``` #include <stdio.h> #include "cuda_fp16.h" // This is a variant of CUDA's own __hdiv which is fast than hdiv_promote below // and doesn't suffer from the perf cliff of div.rn.fp32 with 'special' values. __device__ half hdiv_newton(half a, half b) { float fa = __half2float(a); float fb = __half2float(b); float rcp; asm("{rcp.approx.ftz.f32 %0, %1;\n}" : "=f"(rcp) : "f"(fb)); float result = fa * rcp; auto exponent = reinterpret_cast<const unsigned&>(result) & 0x7f800000; if (exponent != 0 && exponent != 0x7f800000) { float err = __fmaf_rn(-fb, result, fa); result = __fmaf_rn(rcp, err, result); } return __float2half(result); } // Surprisingly, this is faster than CUDA's own __hdiv. __device__ half hdiv_promote(half a, half b) { return __float2half(__half2float(a) / __half2float(b)); } // This is an approximation that is accurate up to 1 ulp. __device__ half hdiv_approx(half a, half b) { float fa = __half2float(a); float fb = __half2float(b); float result; asm("{div.approx.ftz.f32 %0, %1, %2;\n}" : "=f"(result) : "f"(fa), "f"(fb)); return __float2half(result); } __global__ void CheckCorrectness() { int i = threadIdx.x + blockIdx.x * blockDim.x; half x = reinterpret_cast<const half&>(i); for (int j = 0; j < 65536; ++j) { half y = reinterpret_cast<const half&>(j); half d1 = hdiv_newton(x, y); half d2 = hdiv_promote(x, y); auto s1 = reinterpret_cast<const short&>(d1); auto s2 = reinterpret_cast<const short&>(d2); if (s1 != s2) { printf("%f (%u) / %f (%u), got %f (%hu), expected: %f (%hu)\n", __half2float(x), i, __half2float(y), j, __half2float(d1), s1, __half2float(d2), s2); //__trap(); } } } __device__ half dst; __global__ void ProfileBuiltin(half x) { #pragma unroll 1 for (int i = 0; i < 10000000; ++i) { x = x / x; } dst = x; } __global__ void ProfilePromote(half x) { #pragma unroll 1 for (int i = 0; i < 10000000; ++i) { x = hdiv_promote(x, x); } dst = x; } __global__ void ProfileNewton(half x) { #pragma unroll 1 for (int i = 0; i < 10000000; ++i) { x = hdiv_newton(x, x); } dst = x; } __global__ void ProfileApprox(half x) { #pragma unroll 1 for (int i = 0; i < 10000000; ++i) { x = hdiv_approx(x, x); } dst = x; } int main() { CheckCorrectness<<<256, 256>>>(); half one = __float2half(1.0f); ProfileBuiltin<<<1, 1>>>(one); // 1.001s ProfilePromote<<<1, 1>>>(one); // 0.560s ProfileNewton<<<1, 1>>>(one); // 0.508s ProfileApprox<<<1, 1>>>(one); // 0.304s auto status = cudaDeviceSynchronize(); printf("%s\n", cudaGetErrorString(status)); } ``` Reviewed By: herhut Differential Revision: https://reviews.llvm.org/D126158	2022-06-04 08:03:29 +02:00
Reid Kleckner	d82b4fe50d	[bazel] Update build for config.h.cmake change	2022-06-03 12:58:04 -07:00
Benjamin Kramer	389c0b81d3	[bazel] Port `95aff23e29`	2022-06-03 20:43:23 +02:00
Benjamin Kramer	d9de52819d	[bazel] Add a missing dependency after `f3bdb56d61`	2022-06-01 22:52:06 +02:00
Stella Laurenzo	3bb7999339	[mlir] Add global_load and global_store ops to ml_program. * Adds simple, non-atomic, non-volatile, non-synchronized direct load/store ops. Differential Revision: https://reviews.llvm.org/D126230	2022-06-01 11:32:15 -07:00
Guillaume Chatelet	ffa479a452	[libc] fix typo in BUILD.bazel feature	2022-06-01 13:53:36 +00:00
Guillaume Chatelet	b2a9ea4420	[libc] Apply no-builtin everywhere, remove unnecessary flags Note, this is a re-submission of D125894 with `features = ["-header_modules"]` added to the main BUILD.bazel file. Some functions like `stpncpy` are implemented in terms of `memset` but are not currently using `-fno-builtin-memset`. This is somewhat hidden by the fact that we use `-ffreestanding` globally and that `-ffreestanding` implies `-fno-builtin` for Clang. This patch also removes `-mllvm -combiner-global-alias-analysis` that is Clang specific and that does not bring substantial gains on modern processors. Also we keep `-mllvm --tail-merge-threshold=0` for aarch64 in CMakeLists.txt but we omit it in the Bazel config. This is because Bazel consumes the source files directly and so it can use PGO to take optimal decisions locally. Differential Revision: https://reviews.llvm.org/D126773	2022-06-01 13:34:36 +00:00
Christian Sigg	f330db8b14	Fix bazel build after `59b273a166`.	2022-06-01 13:13:18 +02:00
Christian Sigg	7cb8b973fa	Fix bazel build after `59b273a166`. Reviewed By: tpopp Differential Revision: https://reviews.llvm.org/D126765	2022-06-01 12:12:04 +02:00

1 2 3 4 5 ...

492 Commits