llvm-project

Commit Graph

Author	SHA1	Message	Date
Alex Zinenko	0edb262d91	[mlir] enable doc generation for the transform dialect	2022-04-21 18:52:08 +02:00
Craig Topper	98b866892d	[RISCV] Add special case to constant materialization to remove trailing zeros first. If there are fewer than 12 trailing zeros, we'll try to use an ADDI at the end of the sequence. If we strip trailing zeros and end the sequence with a SLLI we might find a shorter sequence. Differential Revision: https://reviews.llvm.org/D124148	2022-04-21 09:43:32 -07:00
Stanislav Mekhanoshin	ac94073daa	[AMDGPU] Refine 64 bit misaligned LDS ops selection Here is the performance data: ``` Using platform: AMD Accelerated Parallel Processing Using device: gfx900:xnack- ds_write_b64 aligned by 8: 3.2 sec ds_write2_b32 aligned by 8: 3.2 sec ds_write_b16 * 4 aligned by 8: 7.0 sec ds_write_b8 * 8 aligned by 8: 13.2 sec ds_write_b64 aligned by 1: 7.3 sec ds_write2_b32 aligned by 1: 7.5 sec ds_write_b16 * 4 aligned by 1: 14.0 sec ds_write_b8 * 8 aligned by 1: 13.2 sec ds_write_b64 aligned by 2: 7.3 sec ds_write2_b32 aligned by 2: 7.5 sec ds_write_b16 * 4 aligned by 2: 7.1 sec ds_write_b8 * 8 aligned by 2: 13.3 sec ds_write_b64 aligned by 4: 4.6 sec ds_write2_b32 aligned by 4: 3.2 sec ds_write_b16 * 4 aligned by 4: 7.1 sec ds_write_b8 * 8 aligned by 4: 13.3 sec ds_read_b64 aligned by 8: 2.3 sec ds_read2_b32 aligned by 8: 2.2 sec ds_read_u16 * 4 aligned by 8: 4.8 sec ds_read_u8 * 8 aligned by 8: 8.6 sec ds_read_b64 aligned by 1: 4.4 sec ds_read2_b32 aligned by 1: 7.3 sec ds_read_u16 * 4 aligned by 1: 14.0 sec ds_read_u8 * 8 aligned by 1: 8.7 sec ds_read_b64 aligned by 2: 4.4 sec ds_read2_b32 aligned by 2: 7.3 sec ds_read_u16 * 4 aligned by 2: 4.8 sec ds_read_u8 * 8 aligned by 2: 8.7 sec ds_read_b64 aligned by 4: 4.4 sec ds_read2_b32 aligned by 4: 2.3 sec ds_read_u16 * 4 aligned by 4: 4.8 sec ds_read_u8 * 8 aligned by 4: 8.7 sec Using platform: AMD Accelerated Parallel Processing Using device: gfx1030 ds_write_b64 aligned by 8: 4.4 sec ds_write2_b32 aligned by 8: 4.3 sec ds_write_b16 * 4 aligned by 8: 7.9 sec ds_write_b8 * 8 aligned by 8: 13.0 sec ds_write_b64 aligned by 1: 23.2 sec ds_write2_b32 aligned by 1: 23.1 sec ds_write_b16 * 4 aligned by 1: 44.0 sec ds_write_b8 * 8 aligned by 1: 13.0 sec ds_write_b64 aligned by 2: 23.2 sec ds_write2_b32 aligned by 2: 23.1 sec ds_write_b16 * 4 aligned by 2: 7.9 sec ds_write_b8 * 8 aligned by 2: 13.1 sec ds_write_b64 aligned by 4: 13.5 sec ds_write2_b32 aligned by 4: 4.3 sec ds_write_b16 * 4 aligned by 4: 7.9 sec ds_write_b8 * 8 aligned by 4: 13.1 sec ds_read_b64 aligned by 8: 3.5 sec ds_read2_b32 aligned by 8: 3.4 sec ds_read_u16 * 4 aligned by 8: 5.3 sec ds_read_u8 * 8 aligned by 8: 8.5 sec ds_read_b64 aligned by 1: 13.1 sec ds_read2_b32 aligned by 1: 22.7 sec ds_read_u16 * 4 aligned by 1: 43.9 sec ds_read_u8 * 8 aligned by 1: 7.9 sec ds_read_b64 aligned by 2: 13.1 sec ds_read2_b32 aligned by 2: 22.7 sec ds_read_u16 * 4 aligned by 2: 5.6 sec ds_read_u8 * 8 aligned by 2: 7.9 sec ds_read_b64 aligned by 4: 13.1 sec ds_read2_b32 aligned by 4: 3.4 sec ds_read_u16 * 4 aligned by 4: 5.6 sec ds_read_u8 * 8 aligned by 4: 7.9 sec ``` GFX10 exposes a different pattern for sub-DWORD load/store performance than GFX9. On GFX9 it is faster to issue a single unaligned load or store than a fully split b8 access, where on GFX10 even a full split is better. However, this is a theoretical only gain because splitting an access to a sub-dword level will require more registers and packing/ unpacking logic, so ignoring this option it is better to use a single 64 bit instruction on a misaligned data with the exception of 4 byte aligned data where ds_read2_b32/ds_write2_b32 is better. Differential Revision: https://reviews.llvm.org/D123956	2022-04-21 09:37:16 -07:00
chenglin.bi	b543d28df7	[InstCombine] Add one use limitation for (X * C2) << C1 --> X * (C2 << C1) Follow up D123453, add one-use limitation for (X * C2) << C1 --> X * (C2 << C1) to make consistent with lshr (mul nuw x, MulC), ShAmtC -> mul nuw x, (MulC >> ShAmtC) Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D124183	2022-04-22 00:32:36 +08:00
Jacob Lambert	afcc6baac5	[clang][HIP] Updating driver to enable archive/bitcode to bitcode linking when targeting HIPAMD toolchain Differential Revision: https://reviews.llvm.org/D124151	2022-04-21 09:24:33 -07:00
Sanjay Patel	8960ba7491	Revert "[InstCombine] C0 <<{nsw, nuw} (X - C1) --> (C0 >> C1) << X" This reverts commit `5819f4a422`. This caused bots to fail with a crash/assert during the fold, so some constraint was missed.	2022-04-21 12:15:27 -04:00
chenglin.bi	e077e3a648	[InstCombine] add baseline test for (X * C2) << C1 --> X * (C2 << C1) without one use; NFC	2022-04-22 00:12:06 +08:00
Sam McCall	af3fb07154	[Frontend] Simplify PrecompiledPreamble::PCHStorage. NFC - Remove fiddly union, preambles are heavyweight - Remove fiddly move constructors in TempPCHFile and PCHStorage, use unique_ptr - Remove unneccesary accessors on PCHStorage - Remove trivial InMemoryStorage - Move implementation details into cpp file This is a prefactoring, followup change will change the in-memory PCHStorage to avoid extra string copies while creating it. Differential Revision: https://reviews.llvm.org/D124177	2022-04-21 18:10:13 +02:00
Nico Weber	889847922d	[lld/mac] Warn that writing zippered outputs isn't implemented A "zippered" dylib contains several LC_BUILD_VERSION load commands, usually one each for "normal" macOS and one for macCatalyst. These are usually created by passing something like -shared -target arm64-apple-macos -darwin-target-variant arm64-apple-ios13.1-macabi to clang, which turns it into -platform_version macos 12.0.0 12.3 -platform_version "mac catalyst" 14.0.0 15.4 for the linker. ld64.lld can read these files fine, but it can't write them. Before this change, it would just silently use the last -platform_version flag and ignore the rest. This change adds a warning that writing zippered dylibs isn't implemented yet instead. Sadly, parts of ld64.lld's test suite relied on the previous "silently use last flag" semantics for its test suite: `%lld` always expanded to `ld64.lld -platform_version macos 10.15 11.0` and tests that wanted a different value passed a 2nd `-platform_version` flag later on. But this now produces a warning if the platform passed to `-platform_version` is not `macos`. There weren't very many cases of this, so move these to use `%no-arg-lld` and manually pass `-arch`. Differential Revision: https://reviews.llvm.org/D124106	2022-04-21 12:05:56 -04:00
Adam Czachorowski	ad46aaede6	[clangd] Add beforeExecute() callback to FeatureModules. It runs immediatelly before FrontendAction::Execute() with a mutable CompilerInstance, allowing FeatureModules to register callbacks, remap files, etc. Differential Revision: https://reviews.llvm.org/D124176	2022-04-21 18:03:39 +02:00
Simon Pilgrim	f8a078f20c	[X86] Add test case for Issue #54911	2022-04-21 17:02:15 +01:00
Fangrui Song	ae46b3e01f	Revert D121279 "[MLIR][GPU] Add canonicalizer for gpu.memcpy" This reverts commit `12f55cac69`. Causes miscompile. Will follow up with a reproduce.	2022-04-21 08:55:13 -07:00
Simon Pilgrim	13d59a8ee4	[M68k] Regenerate cmp.ll tests M68k is still experimental so wasn't updated in a recent DAG combine	2022-04-21 16:54:00 +01:00
Tyler Mandry	d8c1d37ba3	[fuchsia] Don't include duplicate profiling symbols for Fuchsia InstrProfilingPlatformLinux.c already provides these symbols. Linker order saved us from noticing before. Reviewed By: mcgrathr Differential Revision: https://reviews.llvm.org/D124136	2022-04-21 15:44:37 +00:00
Byoungchan Lee	8a3afc6da5	[compiler-rt][Darwin] Add arm64 to simulator platforms This patch is the reland of `a8e5ce76b4`, which includes additional SDK version checks to ensure that XCode's headers support arm64 builds. Differential Revision: https://reviews.llvm.org/D119174	2022-04-21 17:42:31 +02:00
Sanjay Patel	5819f4a422	[InstCombine] C0 <<{nsw, nuw} (X - C1) --> (C0 >> C1) << X This is similar to an existing pre-shift-of-constant fold: `8a9c70fc01` ...but in this case, we need no-wrap on the shl and a negative offset: https://alive2.llvm.org/ce/z/_RVz99 Fixes #54890	2022-04-21 11:38:27 -04:00
Sanjay Patel	782d0105ba	[InstCombine] add tests for C << (X - C1); NFC	2022-04-21 11:38:26 -04:00
Paul Robinson	f80e369f61	[PS4] Driver: use correct --shared option	2022-04-21 08:19:42 -07:00
Kirill Bobyrev	9f05b111ee	[clangd] Include Cleaner: suppress unused warnings for IWYU pragma: export Add limited support for "IWYU pragma: export" - for now it just supresses the warning similar to "IWYU pragma: keep". Reviewed By: sammccall Differential Revision: https://reviews.llvm.org/D124170	2022-04-21 17:00:06 +02:00
Kirill Bobyrev	e1c0d2fb82	[clangd] Correctly identify self-contained headers included rercursively Right now when exiting the file Headers.cpp will identify the recursive inclusion (with a new FileID) as non self-contained and will add it to the set from which it will never be removed. As a result, we get incorrect results in the IncludeStructure and Include Cleaner. This patch is a fix. Reviewed By: sammccall Differential Revision: https://reviews.llvm.org/D124166	2022-04-21 16:54:59 +02:00
gbreynoo	1f71b5a386	[llvm-ar] Fix thin archive being wrongly converted to a full archive When using the L option to quick append a full archive to a thin archive, the thin archive was being wrongly converted to a full archive. I've fixed the issue and added a check for it in thin-to-full-archive.test and expanded some tests. Differential Revision: https://reviews.llvm.org/D123142	2022-04-21 15:48:26 +01:00
Alex Zinenko	30f22429d3	[mlir] Connect Transform dialect to PDL This introduces a pair of ops to the Transform dialect that connect it to PDL patterns. Transform dialect relies on PDL for matching the Payload IR ops that are about to be transformed. For this purpose, it provides a container op for patterns, a "pdl_match" op and transform interface implementations that call into the pattern matching infrastructure. To enable the caching of compiled patterns, this also provides the extension mechanism for TransformState. Extensions allow one to store additional information in the TransformState and thus communicate it between different Transform dialect operations when they are applied. They can be added and removed when applying transform ops. An extension containing a symbol table in which the pattern names are resolved and a pattern compilation cache is introduced as the first client. Depends On D123664 Reviewed By: Mogball Differential Revision: https://reviews.llvm.org/D124007	2022-04-21 16:23:10 +02:00
Eric Astor	82ecf9a0b1	[LLVM-ML] Add standard LLVM debug flags Adds support for -debug and -debug-only= flags. Reviewed By: ayzhao Differential Revision: https://reviews.llvm.org/D123545	2022-04-21 10:14:59 -04:00
Petar Avramovic	e06290e53f	AMDGPU/GlobalISel: Fix isVCC for uniform s1 with reg class on wave32 Fix isVCC for register that was assigned register class during inst-selection. This happens when register has multiple uses. For wave32, uniform i1 to vcc copy was selected like vcc to vcc copy when uniform i1 had assigned register class. Uniform i1 register with assigned register class will have s1 LLT, be defined using G_TRUNC and class will be SReg_32RegClass. Vcc i1 register with assigned register class will have s1 LLT, class will be SReg_32RegClass for wave32 and SReg_64RegClass for wave64 and register will not be defined by G_TRUNC. Differential Revision: https://reviews.llvm.org/D124163	2022-04-21 16:12:04 +02:00
Petar Avramovic	4e0dacb2cf	AMDGPU/GlobalISel: Precommit test for D124163	2022-04-21 16:12:03 +02:00
Nikita Popov	ead231dec0	[InstCombine] Fix typo in test (NFC) This is a copy paste mistake, this variant of the test was supposed to use poison instead of undef.	2022-04-21 15:58:51 +02:00
Karl Meakin	81904454f7	[AArch64] Add `foldOverflowCheck` DAG combine Differential Revision: https://reviews.llvm.org//D123779	2022-04-21 14:56:38 +01:00
Karl Meakin	13403a70e4	[AArch64] Add lowerings for {ADD,SUB}CARRY and S{ADD,SUB}O_CARRY Differential Revision: https://reviews.llvm.org/D123322	2022-04-21 14:56:37 +01:00
Nikita Popov	46c2b41d02	[InstCombine] Remove dead code (NFC) This was a leftover condition without code.	2022-04-21 15:53:53 +02:00
Jannik Silvanus	607f8ced39	[AMDGPU]: Fix failing assertion in SIMachineScheduler This fixes the assertion failure "Loop in the Block Graph!". SIMachineScheduler groups instructions into blocks (also referred to as coloring or groups) and then performs a two-level scheduling: inter-block scheduling, and intra-block scheduling. This approach requires that the dependency graph on the blocks which is obtained by contracting the blocks in the original dependency graph is acyclic. In other words: Whenever A and B end up in the same block, all vertices on a path from A to B must be in the same block. When compiling an example consisting of an export followed by a buffer store, we see a dependency between these two. This dependency may be false, but that is a different issue. This dependency was not correctly accounted for by SiMachineScheduler. A new test case si-scheduler-exports.ll demonstrating this is also added in this commit. The problematic part of SiMachineScheduler was a post-optimization of the block assignment that tried to group all export instructions into a separate export block for better execution performance. This routine correctly checked that any paths from exports to exports did not contain any non-exports, but not vice-versa: In case of an export with a non-export successor dependency, that single export was moved to a separate block, which could then be both a successor and a predecessor block of a non-export block. As fix, we now skip export grouping if there are exports with direct non-export successor dependencies. This fixes the issue at hand, but is slightly pessimistic: We could group all exports into a separate block that have neither direct nor indirect export successor dependencies. We will review the potential performance impact and potentially revisit with a more sophisticated implementation. Note that just grouping all exports without direct non-export successor dependencies could still lead to illegal blocks, since non-export A could depend on export B that depends on export C. In that case, export C has no non-export successor, but still may not be grouped into an export block.	2022-04-21 14:52:29 +01:00
Luo, Yuanke	fa4347261e	[X86] Add test case for SetCCMOVMSK combine. Create 2 users for MOVMSK to test if compiler would perform the combine "MOVMSK(CONCAT(X,Y)) == 0 -> MOVMSK(OR(X,Y))".	2022-04-21 21:47:40 +08:00
Nikita Popov	662f57ee21	[InstCombine] Add tests for memset with undef/poison value (NFC)	2022-04-21 15:45:54 +02:00
Nikita Popov	9001edc535	[InstCombine] Split up test for store with undef (NFC)	2022-04-21 15:41:38 +02:00
Markus Böck	850b2c6b3c	[mlir] Fix `Region`s `takeBody` method if the region is not empty The current implementation of takeBody first clears the Region, before then taking ownership of the blocks of the other regions. The issue here however, is that when clearing the region, it does not take into account references of operations to each other. In particular, blocks are deleted from front to back, and operations within a block are very likely to be deleted despite still having uses, causing an assertion to trigger [0]. This patch fixes that issue by simply calling dropAllReferences()before clearing the blocks. [0] `9a8bb4bc63/mlir/lib/IR/Operation.cpp (L154)` Differential Revision: https://reviews.llvm.org/D123913	2022-04-21 15:32:59 +02:00
Fabian Wolff	95d77383f2	[clang-tidy] Fix behavior of `modernize-use-using` with nested structs/unions Fixes https://github.com/llvm/llvm-project/issues/50334. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D113804	2022-04-21 15:18:31 +02:00
Andrew Savonichev	96e7487013	[NVPTX] Fix LIT tests with default nameTableKind Default nameTableKind results in the following DWARF section: .section .debug_pubnames { .b32 LpubNames_end0-LpubNames_start0 // Length of Public Names Info LpubNames_start0: [...] LpubNames_end0: } Without -mattr=+ptx75 ptxas complains about labels and label expressions: error : Feature 'labels1 - labels2 expression in .section' requires PTX ISA .version 7.5 or later error : Feature 'Defining labels in .section' requires PTX ISA .version 7.0 or later The patch modifies dbg-value-const-byref.ll to let it run without PTX 7.5 (available from CUDA 11.0), and adds a new test just for this case. Differential revision: https://reviews.llvm.org/D124108	2022-04-21 16:05:25 +03:00
Simon Pilgrim	ac213375d9	[InstCombine] Add nonpow2 (negative) test for D123374	2022-04-21 13:58:43 +01:00
Aaron Ballman	408226f20a	Fix Sphinx build	2022-04-21 08:52:29 -04:00
Nikita Popov	20cf4f8af8	[PhaseOrdering] Remove RUN lines for legacy PM (NFC)	2022-04-21 14:43:00 +02:00
wangpc	b1620d40d0	Revert "[RISCV] Precommit test for D122634" This reverts commit `360d44e86d`.	2022-04-21 20:32:56 +08:00
Nikolas Klauser	29c8c070a1	[libc++] Use bit field for checking if string is in long or short mode This makes the code a bit simpler and (I think) removes the undefined behaviour from the normal string layout. Reviewed By: ldionne, Mordante, #libc Spies: labath, dblaikie, JDevlieghere, krytarowski, jgorbe, jingham, saugustine, arichardson, libcxx-commits Differential Revision: https://reviews.llvm.org/D123580	2022-04-21 14:20:21 +02:00
Pavel Labath	1056c56786	[lldb] Adjust libc++ string formatter for changes in D123580 The code needs more TLC, but for now I've tried making only the changes that are necessary to get the tests passing -- postponing the more invasive changes after I create a more comprehensive test. In a couple of places I have changed the index-based element accesses to name-based ones (as these are less sensitive to code perturbations). I'm not sure why the code was using indexes in the first place, but I've (manually) tested the change with various libc++ versions, and found no issues with this approach. Differential Revision: https://reviews.llvm.org/D124113	2022-04-21 14:07:56 +02:00
Nikola Tesic	c5600aef88	[Debugify] Limit number of processed functions for original mode Debugify in OriginalDebugInfo mode, does (DebugInfo) collect-before-pass & check-after-pass for each instruction, which is pretty expensive. When used to analyze DebugInfo losses in large projects (like LLVM), this raises the build time unacceptably. This patch introduces a limit for the number of processed functions per compile unit. By default, the limit is set to UINT_MAX (practically unlimited), and by using the introduced option -debugify-func-limit the limit could be set to any positive integer number. Differential revision: https://reviews.llvm.org/D115714	2022-04-21 13:58:17 +02:00
Markus Böck	a41aaf166f	[mlir] Make `Regions`s `cloneInto` multithread-readable Prior to this patch, `cloneInto` would do a simple walk over the blocks and contained operations and clone and map them as it encounters them. As finishing touch it then remaps any successor and operands it has remapped during that process. This is generally fine, but sadly leads to a lot of uses of both operations and blocks from the source region, in the cloned operations in the target region. Those uses lead to writes in the use-def list of the operations, making `cloneInto` never thread safe. This patch reimplements `cloneInto` in three steps to avoid ever creating any extra uses on elements in the source region: * It first creates the mapping of all blocks and block operands * It then clones all operations to create the mapping of all operation results, but does not yet clone any regions or set the operands * After all operation results have been mapped, it now sets the operations operands and clones their regions. That way it is now possible to call `cloneInto` from multiple threads if the Region or Operation is isolated-from-above. This allows creating copies of functions or to use `mlir::inlineCall` with the same source region from multiple threads. In the general case, the method is thread-safe if through cloning, no new uses of `Value`s from outside the cloned Operation/Region are created. This can be ensured by mapping any outside operands via the `BlockAndValueMapping` to `Value`s owned by the caller thread. While I was at it, I also reworked the `clone` method of `Operation` a little bit and added a proper options class to avoid having a `cloneWithoutRegionsAndOperands` method, and be more extensible in the future. `cloneWithoutRegions` is now also a simple wrapper that calls `clone` with the proper options set. That way all the operation cloning code is now contained solely within `clone`. Differential Revision: https://reviews.llvm.org/D123917	2022-04-21 13:43:00 +02:00
Hui Xie	3d3103b733	[libcxx][ranges] add views::join adaptor object. added test coverage to join_view - added views::join adaptor object - added test for the adaptor object - fixed some join_view's tests. e.g iter_swap test - added some negative tests for join_view to test that operations do not exist when constraints aren't met - added tests that locks down issues that were already addressed in previous change - LWG3500 `join_view::iterator::operator->()` is bogus - LWG3313 `join_view::iterator::operator--` is incorrectly constrained - LWG3517 `join_view::iterator`'s `iter_swap` is underconstrained - P2328R1 join_view should join all views of ranges - fixed some issues in join_view and added tests - LWG3535 `join_view::iterator::iterator_category` and `::iterator_concept` lie - LWG3474 Nesting ``join_views`` is broken because of CTAD - added tests for an LWG issue that isn't resolved in the standard yet, but the previous code has workaround. - LWG3569 Inner iterator not default_initializable Reviewed By: #libc, var-const Spies: var-const, libcxx-commits Differential Revision: https://reviews.llvm.org/D123466	2022-04-21 13:10:46 +02:00
Dmitry Preobrazhensky	81af32b9a3	[AMDGPU][MC][NFC][GFX940] Corrected an error position Differential Revision: https://reviews.llvm.org/D124099	2022-04-21 14:04:46 +03:00
Uday Bondhugula	f47a38f517	Add async dependencies support for gpu.launch op Add async dependencies support for gpu.launch op: this allows specifying a list of async tokens ("streams") as dependencies for the launch. Update the GPU kernel outlining pass lowering to propagate async dependencies from gpu.launch to gpu.launch_func op. Previously, a new stream was being created and destroyed for a kernel launch. The async deps support allows the kernel launch to be serialized on an existing stream. Differential Revision: https://reviews.llvm.org/D123499	2022-04-21 16:25:59 +05:30
Alexey Moksyakov	48e894a536	[BOLT] Add R_AARCH64_PREL16/32/64 relocations support Reviewed By: yota9, rafauler Differential Revision: https://reviews.llvm.org/D122294	2022-04-21 13:52:47 +03:00
Vladislav Khmelevsky	63686af1e1	[BOLT] Fix build with GCC 7.3.0 The gcc 7.3.0 version raises "could not covert" error without std::move used explicitly. Differential Revision: https://reviews.llvm.org/D124009	2022-04-21 13:47:58 +03:00
Dmitry Preobrazhensky	b4231ac4be	[AMDGPU][GFX90A+] Disabled ds_ordered_count and exp Differential Revision: https://reviews.llvm.org/D124087	2022-04-21 13:16:44 +03:00

1 2 3 4 5 ...

421736 Commits All Branches Search

421736 Commits

All Branches