llvm-project

Commit Graph

Author	SHA1	Message	Date
Ulrich Weigand	9778ec057c	[SystemZ] Add z16 scheduler description Add scheduler description for the new IBM z16 processor. Patch by Jonas Paulsson.	2022-04-21 20:38:16 +02:00
Sam McCall	eadf352707	Revert "[Frontend] avoid copy of PCH data when PrecompiledPreamble stores it in memory" This reverts commit `6e22dac2e2`. Seems to cause bot failures e.g. https://lab.llvm.org/buildbot/#/builders/109/builds/37071	2022-04-21 20:22:47 +02:00
Sanjay Patel	49f950ae26	[InstCombine] add more tests for a planned shift fold; NFC These are reductions for a missed constraint (the offset constant must be less than the bitwidth) that caused the first version of the patch ( `5819f4a422` ) to be reverted.	2022-04-21 14:08:50 -04:00
Fangrui Song	409eb5dc3e	[LegacyPM] Remove GCOVProfilerLegacyPass Using the legacy PM for the optimization pipeline was deprecated in 13.0.0. Following recent changes to remove non-core features of the legacy PM/optimization pipeline, remove GCOVProfilerLegacyPass. I have checked many LLVM users and only llvm-hs[1] uses the legacy gcov pass. [1]: https://github.com/llvm-hs/llvm-hs/issues/392 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D123829	2022-04-21 10:59:30 -07:00
Ulrich Weigand	1283ccb610	Support z16 processor name The recently announced IBM z16 processor implements the architecture already supported as "arch14" in LLVM. This patch adds support for "z16" as an alternate architecture name for arch14.	2022-04-21 19:58:22 +02:00
Sam McCall	6e22dac2e2	[Frontend] avoid copy of PCH data when PrecompiledPreamble stores it in memory Instead of unconditionally copying the PCHBuffer into an ostream which can be backed either by a string or a file, just make the PCHBuffer itself the in-memory storage. Differential Revision: https://reviews.llvm.org/D124180	2022-04-21 19:52:59 +02:00
Haojian Wu	84051d8226	[clangd] Fix a declare-constructor tweak crash on incomplete fields. Differential Revision: https://reviews.llvm.org/D124154	2022-04-21 19:44:43 +02:00
Ulrich Weigand	e4085a012c	[sanitizer] Fix prctl unit test on non-SMT systems On systems where the kernel supports the PR_SCHED_CORE interface, but there is no SMT, the prctl call will set errno to ENODEV, which currently causes the test to fail. Fix by accepting ENODEV in addition to EINVAL.	2022-04-21 19:31:04 +02:00
Fangrui Song	d133538b8b	[LegacyPM] Remove MemorySanitizerLegacyPass Using the legacy PM for the optimization pipeline was deprecated in 13.0.0. Following recent changes to remove non-core features of the legacy PM/optimization pipeline, remove MemorySanitizerLegacyPass. Differential Revision: https://reviews.llvm.org/D123894	2022-04-21 10:21:46 -07:00
Vasileios Porpodas	889588ee97	[SLP] Refactoring isLegalBroadcastLoad() to use `ElementCount`. Replacing `unsigned` with `ElementCount` in the argument of `isLegalBroadcastLoad()`. This helps reduce the diff of a future SLP patch for AArch64.	2022-04-21 10:19:00 -07:00
Wael Yehia	f296b4c444	[AIX] Always pass namedsects option when linking with PGO. Differential Revision: https://reviews.llvm.org/D124046	2022-04-21 17:01:37 +00:00
chenglin.bi	25aba1abb5	Revert "[InstCombine] Add one use limitation for (X * C2) << C1 --> X * (C2 << C1)" This reverts commit `b543d28df7`.	2022-04-22 00:56:20 +08:00
Alex Zinenko	0edb262d91	[mlir] enable doc generation for the transform dialect	2022-04-21 18:52:08 +02:00
Craig Topper	98b866892d	[RISCV] Add special case to constant materialization to remove trailing zeros first. If there are fewer than 12 trailing zeros, we'll try to use an ADDI at the end of the sequence. If we strip trailing zeros and end the sequence with a SLLI we might find a shorter sequence. Differential Revision: https://reviews.llvm.org/D124148	2022-04-21 09:43:32 -07:00
Stanislav Mekhanoshin	ac94073daa	[AMDGPU] Refine 64 bit misaligned LDS ops selection Here is the performance data: ``` Using platform: AMD Accelerated Parallel Processing Using device: gfx900:xnack- ds_write_b64 aligned by 8: 3.2 sec ds_write2_b32 aligned by 8: 3.2 sec ds_write_b16 * 4 aligned by 8: 7.0 sec ds_write_b8 * 8 aligned by 8: 13.2 sec ds_write_b64 aligned by 1: 7.3 sec ds_write2_b32 aligned by 1: 7.5 sec ds_write_b16 * 4 aligned by 1: 14.0 sec ds_write_b8 * 8 aligned by 1: 13.2 sec ds_write_b64 aligned by 2: 7.3 sec ds_write2_b32 aligned by 2: 7.5 sec ds_write_b16 * 4 aligned by 2: 7.1 sec ds_write_b8 * 8 aligned by 2: 13.3 sec ds_write_b64 aligned by 4: 4.6 sec ds_write2_b32 aligned by 4: 3.2 sec ds_write_b16 * 4 aligned by 4: 7.1 sec ds_write_b8 * 8 aligned by 4: 13.3 sec ds_read_b64 aligned by 8: 2.3 sec ds_read2_b32 aligned by 8: 2.2 sec ds_read_u16 * 4 aligned by 8: 4.8 sec ds_read_u8 * 8 aligned by 8: 8.6 sec ds_read_b64 aligned by 1: 4.4 sec ds_read2_b32 aligned by 1: 7.3 sec ds_read_u16 * 4 aligned by 1: 14.0 sec ds_read_u8 * 8 aligned by 1: 8.7 sec ds_read_b64 aligned by 2: 4.4 sec ds_read2_b32 aligned by 2: 7.3 sec ds_read_u16 * 4 aligned by 2: 4.8 sec ds_read_u8 * 8 aligned by 2: 8.7 sec ds_read_b64 aligned by 4: 4.4 sec ds_read2_b32 aligned by 4: 2.3 sec ds_read_u16 * 4 aligned by 4: 4.8 sec ds_read_u8 * 8 aligned by 4: 8.7 sec Using platform: AMD Accelerated Parallel Processing Using device: gfx1030 ds_write_b64 aligned by 8: 4.4 sec ds_write2_b32 aligned by 8: 4.3 sec ds_write_b16 * 4 aligned by 8: 7.9 sec ds_write_b8 * 8 aligned by 8: 13.0 sec ds_write_b64 aligned by 1: 23.2 sec ds_write2_b32 aligned by 1: 23.1 sec ds_write_b16 * 4 aligned by 1: 44.0 sec ds_write_b8 * 8 aligned by 1: 13.0 sec ds_write_b64 aligned by 2: 23.2 sec ds_write2_b32 aligned by 2: 23.1 sec ds_write_b16 * 4 aligned by 2: 7.9 sec ds_write_b8 * 8 aligned by 2: 13.1 sec ds_write_b64 aligned by 4: 13.5 sec ds_write2_b32 aligned by 4: 4.3 sec ds_write_b16 * 4 aligned by 4: 7.9 sec ds_write_b8 * 8 aligned by 4: 13.1 sec ds_read_b64 aligned by 8: 3.5 sec ds_read2_b32 aligned by 8: 3.4 sec ds_read_u16 * 4 aligned by 8: 5.3 sec ds_read_u8 * 8 aligned by 8: 8.5 sec ds_read_b64 aligned by 1: 13.1 sec ds_read2_b32 aligned by 1: 22.7 sec ds_read_u16 * 4 aligned by 1: 43.9 sec ds_read_u8 * 8 aligned by 1: 7.9 sec ds_read_b64 aligned by 2: 13.1 sec ds_read2_b32 aligned by 2: 22.7 sec ds_read_u16 * 4 aligned by 2: 5.6 sec ds_read_u8 * 8 aligned by 2: 7.9 sec ds_read_b64 aligned by 4: 13.1 sec ds_read2_b32 aligned by 4: 3.4 sec ds_read_u16 * 4 aligned by 4: 5.6 sec ds_read_u8 * 8 aligned by 4: 7.9 sec ``` GFX10 exposes a different pattern for sub-DWORD load/store performance than GFX9. On GFX9 it is faster to issue a single unaligned load or store than a fully split b8 access, where on GFX10 even a full split is better. However, this is a theoretical only gain because splitting an access to a sub-dword level will require more registers and packing/ unpacking logic, so ignoring this option it is better to use a single 64 bit instruction on a misaligned data with the exception of 4 byte aligned data where ds_read2_b32/ds_write2_b32 is better. Differential Revision: https://reviews.llvm.org/D123956	2022-04-21 09:37:16 -07:00
chenglin.bi	b543d28df7	[InstCombine] Add one use limitation for (X * C2) << C1 --> X * (C2 << C1) Follow up D123453, add one-use limitation for (X * C2) << C1 --> X * (C2 << C1) to make consistent with lshr (mul nuw x, MulC), ShAmtC -> mul nuw x, (MulC >> ShAmtC) Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D124183	2022-04-22 00:32:36 +08:00
Jacob Lambert	afcc6baac5	[clang][HIP] Updating driver to enable archive/bitcode to bitcode linking when targeting HIPAMD toolchain Differential Revision: https://reviews.llvm.org/D124151	2022-04-21 09:24:33 -07:00
Sanjay Patel	8960ba7491	Revert "[InstCombine] C0 <<{nsw, nuw} (X - C1) --> (C0 >> C1) << X" This reverts commit `5819f4a422`. This caused bots to fail with a crash/assert during the fold, so some constraint was missed.	2022-04-21 12:15:27 -04:00
chenglin.bi	e077e3a648	[InstCombine] add baseline test for (X * C2) << C1 --> X * (C2 << C1) without one use; NFC	2022-04-22 00:12:06 +08:00
Sam McCall	af3fb07154	[Frontend] Simplify PrecompiledPreamble::PCHStorage. NFC - Remove fiddly union, preambles are heavyweight - Remove fiddly move constructors in TempPCHFile and PCHStorage, use unique_ptr - Remove unneccesary accessors on PCHStorage - Remove trivial InMemoryStorage - Move implementation details into cpp file This is a prefactoring, followup change will change the in-memory PCHStorage to avoid extra string copies while creating it. Differential Revision: https://reviews.llvm.org/D124177	2022-04-21 18:10:13 +02:00
Nico Weber	889847922d	[lld/mac] Warn that writing zippered outputs isn't implemented A "zippered" dylib contains several LC_BUILD_VERSION load commands, usually one each for "normal" macOS and one for macCatalyst. These are usually created by passing something like -shared -target arm64-apple-macos -darwin-target-variant arm64-apple-ios13.1-macabi to clang, which turns it into -platform_version macos 12.0.0 12.3 -platform_version "mac catalyst" 14.0.0 15.4 for the linker. ld64.lld can read these files fine, but it can't write them. Before this change, it would just silently use the last -platform_version flag and ignore the rest. This change adds a warning that writing zippered dylibs isn't implemented yet instead. Sadly, parts of ld64.lld's test suite relied on the previous "silently use last flag" semantics for its test suite: `%lld` always expanded to `ld64.lld -platform_version macos 10.15 11.0` and tests that wanted a different value passed a 2nd `-platform_version` flag later on. But this now produces a warning if the platform passed to `-platform_version` is not `macos`. There weren't very many cases of this, so move these to use `%no-arg-lld` and manually pass `-arch`. Differential Revision: https://reviews.llvm.org/D124106	2022-04-21 12:05:56 -04:00
Adam Czachorowski	ad46aaede6	[clangd] Add beforeExecute() callback to FeatureModules. It runs immediatelly before FrontendAction::Execute() with a mutable CompilerInstance, allowing FeatureModules to register callbacks, remap files, etc. Differential Revision: https://reviews.llvm.org/D124176	2022-04-21 18:03:39 +02:00
Simon Pilgrim	f8a078f20c	[X86] Add test case for Issue #54911	2022-04-21 17:02:15 +01:00
Fangrui Song	ae46b3e01f	Revert D121279 "[MLIR][GPU] Add canonicalizer for gpu.memcpy" This reverts commit `12f55cac69`. Causes miscompile. Will follow up with a reproduce.	2022-04-21 08:55:13 -07:00
Simon Pilgrim	13d59a8ee4	[M68k] Regenerate cmp.ll tests M68k is still experimental so wasn't updated in a recent DAG combine	2022-04-21 16:54:00 +01:00
Tyler Mandry	d8c1d37ba3	[fuchsia] Don't include duplicate profiling symbols for Fuchsia InstrProfilingPlatformLinux.c already provides these symbols. Linker order saved us from noticing before. Reviewed By: mcgrathr Differential Revision: https://reviews.llvm.org/D124136	2022-04-21 15:44:37 +00:00
Byoungchan Lee	8a3afc6da5	[compiler-rt][Darwin] Add arm64 to simulator platforms This patch is the reland of `a8e5ce76b4`, which includes additional SDK version checks to ensure that XCode's headers support arm64 builds. Differential Revision: https://reviews.llvm.org/D119174	2022-04-21 17:42:31 +02:00
Sanjay Patel	5819f4a422	[InstCombine] C0 <<{nsw, nuw} (X - C1) --> (C0 >> C1) << X This is similar to an existing pre-shift-of-constant fold: `8a9c70fc01` ...but in this case, we need no-wrap on the shl and a negative offset: https://alive2.llvm.org/ce/z/_RVz99 Fixes #54890	2022-04-21 11:38:27 -04:00
Sanjay Patel	782d0105ba	[InstCombine] add tests for C << (X - C1); NFC	2022-04-21 11:38:26 -04:00
Paul Robinson	f80e369f61	[PS4] Driver: use correct --shared option	2022-04-21 08:19:42 -07:00
Kirill Bobyrev	9f05b111ee	[clangd] Include Cleaner: suppress unused warnings for IWYU pragma: export Add limited support for "IWYU pragma: export" - for now it just supresses the warning similar to "IWYU pragma: keep". Reviewed By: sammccall Differential Revision: https://reviews.llvm.org/D124170	2022-04-21 17:00:06 +02:00
Kirill Bobyrev	e1c0d2fb82	[clangd] Correctly identify self-contained headers included rercursively Right now when exiting the file Headers.cpp will identify the recursive inclusion (with a new FileID) as non self-contained and will add it to the set from which it will never be removed. As a result, we get incorrect results in the IncludeStructure and Include Cleaner. This patch is a fix. Reviewed By: sammccall Differential Revision: https://reviews.llvm.org/D124166	2022-04-21 16:54:59 +02:00
gbreynoo	1f71b5a386	[llvm-ar] Fix thin archive being wrongly converted to a full archive When using the L option to quick append a full archive to a thin archive, the thin archive was being wrongly converted to a full archive. I've fixed the issue and added a check for it in thin-to-full-archive.test and expanded some tests. Differential Revision: https://reviews.llvm.org/D123142	2022-04-21 15:48:26 +01:00
Alex Zinenko	30f22429d3	[mlir] Connect Transform dialect to PDL This introduces a pair of ops to the Transform dialect that connect it to PDL patterns. Transform dialect relies on PDL for matching the Payload IR ops that are about to be transformed. For this purpose, it provides a container op for patterns, a "pdl_match" op and transform interface implementations that call into the pattern matching infrastructure. To enable the caching of compiled patterns, this also provides the extension mechanism for TransformState. Extensions allow one to store additional information in the TransformState and thus communicate it between different Transform dialect operations when they are applied. They can be added and removed when applying transform ops. An extension containing a symbol table in which the pattern names are resolved and a pattern compilation cache is introduced as the first client. Depends On D123664 Reviewed By: Mogball Differential Revision: https://reviews.llvm.org/D124007	2022-04-21 16:23:10 +02:00
Eric Astor	82ecf9a0b1	[LLVM-ML] Add standard LLVM debug flags Adds support for -debug and -debug-only= flags. Reviewed By: ayzhao Differential Revision: https://reviews.llvm.org/D123545	2022-04-21 10:14:59 -04:00
Petar Avramovic	e06290e53f	AMDGPU/GlobalISel: Fix isVCC for uniform s1 with reg class on wave32 Fix isVCC for register that was assigned register class during inst-selection. This happens when register has multiple uses. For wave32, uniform i1 to vcc copy was selected like vcc to vcc copy when uniform i1 had assigned register class. Uniform i1 register with assigned register class will have s1 LLT, be defined using G_TRUNC and class will be SReg_32RegClass. Vcc i1 register with assigned register class will have s1 LLT, class will be SReg_32RegClass for wave32 and SReg_64RegClass for wave64 and register will not be defined by G_TRUNC. Differential Revision: https://reviews.llvm.org/D124163	2022-04-21 16:12:04 +02:00
Petar Avramovic	4e0dacb2cf	AMDGPU/GlobalISel: Precommit test for D124163	2022-04-21 16:12:03 +02:00
Nikita Popov	ead231dec0	[InstCombine] Fix typo in test (NFC) This is a copy paste mistake, this variant of the test was supposed to use poison instead of undef.	2022-04-21 15:58:51 +02:00
Karl Meakin	81904454f7	[AArch64] Add `foldOverflowCheck` DAG combine Differential Revision: https://reviews.llvm.org//D123779	2022-04-21 14:56:38 +01:00
Karl Meakin	13403a70e4	[AArch64] Add lowerings for {ADD,SUB}CARRY and S{ADD,SUB}O_CARRY Differential Revision: https://reviews.llvm.org/D123322	2022-04-21 14:56:37 +01:00
Nikita Popov	46c2b41d02	[InstCombine] Remove dead code (NFC) This was a leftover condition without code.	2022-04-21 15:53:53 +02:00
Jannik Silvanus	607f8ced39	[AMDGPU]: Fix failing assertion in SIMachineScheduler This fixes the assertion failure "Loop in the Block Graph!". SIMachineScheduler groups instructions into blocks (also referred to as coloring or groups) and then performs a two-level scheduling: inter-block scheduling, and intra-block scheduling. This approach requires that the dependency graph on the blocks which is obtained by contracting the blocks in the original dependency graph is acyclic. In other words: Whenever A and B end up in the same block, all vertices on a path from A to B must be in the same block. When compiling an example consisting of an export followed by a buffer store, we see a dependency between these two. This dependency may be false, but that is a different issue. This dependency was not correctly accounted for by SiMachineScheduler. A new test case si-scheduler-exports.ll demonstrating this is also added in this commit. The problematic part of SiMachineScheduler was a post-optimization of the block assignment that tried to group all export instructions into a separate export block for better execution performance. This routine correctly checked that any paths from exports to exports did not contain any non-exports, but not vice-versa: In case of an export with a non-export successor dependency, that single export was moved to a separate block, which could then be both a successor and a predecessor block of a non-export block. As fix, we now skip export grouping if there are exports with direct non-export successor dependencies. This fixes the issue at hand, but is slightly pessimistic: We could group all exports into a separate block that have neither direct nor indirect export successor dependencies. We will review the potential performance impact and potentially revisit with a more sophisticated implementation. Note that just grouping all exports without direct non-export successor dependencies could still lead to illegal blocks, since non-export A could depend on export B that depends on export C. In that case, export C has no non-export successor, but still may not be grouped into an export block.	2022-04-21 14:52:29 +01:00
Luo, Yuanke	fa4347261e	[X86] Add test case for SetCCMOVMSK combine. Create 2 users for MOVMSK to test if compiler would perform the combine "MOVMSK(CONCAT(X,Y)) == 0 -> MOVMSK(OR(X,Y))".	2022-04-21 21:47:40 +08:00
Nikita Popov	662f57ee21	[InstCombine] Add tests for memset with undef/poison value (NFC)	2022-04-21 15:45:54 +02:00
Nikita Popov	9001edc535	[InstCombine] Split up test for store with undef (NFC)	2022-04-21 15:41:38 +02:00
Markus Böck	850b2c6b3c	[mlir] Fix `Region`s `takeBody` method if the region is not empty The current implementation of takeBody first clears the Region, before then taking ownership of the blocks of the other regions. The issue here however, is that when clearing the region, it does not take into account references of operations to each other. In particular, blocks are deleted from front to back, and operations within a block are very likely to be deleted despite still having uses, causing an assertion to trigger [0]. This patch fixes that issue by simply calling dropAllReferences()before clearing the blocks. [0] `9a8bb4bc63/mlir/lib/IR/Operation.cpp (L154)` Differential Revision: https://reviews.llvm.org/D123913	2022-04-21 15:32:59 +02:00
Fabian Wolff	95d77383f2	[clang-tidy] Fix behavior of `modernize-use-using` with nested structs/unions Fixes https://github.com/llvm/llvm-project/issues/50334. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D113804	2022-04-21 15:18:31 +02:00
Andrew Savonichev	96e7487013	[NVPTX] Fix LIT tests with default nameTableKind Default nameTableKind results in the following DWARF section: .section .debug_pubnames { .b32 LpubNames_end0-LpubNames_start0 // Length of Public Names Info LpubNames_start0: [...] LpubNames_end0: } Without -mattr=+ptx75 ptxas complains about labels and label expressions: error : Feature 'labels1 - labels2 expression in .section' requires PTX ISA .version 7.5 or later error : Feature 'Defining labels in .section' requires PTX ISA .version 7.0 or later The patch modifies dbg-value-const-byref.ll to let it run without PTX 7.5 (available from CUDA 11.0), and adds a new test just for this case. Differential revision: https://reviews.llvm.org/D124108	2022-04-21 16:05:25 +03:00
Simon Pilgrim	ac213375d9	[InstCombine] Add nonpow2 (negative) test for D123374	2022-04-21 13:58:43 +01:00
Aaron Ballman	408226f20a	Fix Sphinx build	2022-04-21 08:52:29 -04:00

1 2 3 4 5 ...

421848 Commits All Branches Search

421848 Commits

All Branches