Commit Graph

421848 Commits

Author SHA1 Message Date
Ulrich Weigand 9778ec057c [SystemZ] Add z16 scheduler description
Add scheduler description for the new IBM z16 processor.

Patch by Jonas Paulsson.
2022-04-21 20:38:16 +02:00
Sam McCall eadf352707 Revert "[Frontend] avoid copy of PCH data when PrecompiledPreamble stores it in memory"
This reverts commit 6e22dac2e2.

Seems to cause bot failures e.g.
https://lab.llvm.org/buildbot/#/builders/109/builds/37071
2022-04-21 20:22:47 +02:00
Sanjay Patel 49f950ae26 [InstCombine] add more tests for a planned shift fold; NFC
These are reductions for a missed constraint (the offset
constant must be less than the bitwidth) that caused the
first version of the patch ( 5819f4a422 ) to be reverted.
2022-04-21 14:08:50 -04:00
Fangrui Song 409eb5dc3e [LegacyPM] Remove GCOVProfilerLegacyPass
Using the legacy PM for the optimization pipeline was deprecated in 13.0.0.
Following recent changes to remove non-core features of the legacy
PM/optimization pipeline, remove GCOVProfilerLegacyPass.

I have checked many LLVM users and only llvm-hs[1] uses the legacy gcov pass.

[1]: https://github.com/llvm-hs/llvm-hs/issues/392

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D123829
2022-04-21 10:59:30 -07:00
Ulrich Weigand 1283ccb610 Support z16 processor name
The recently announced IBM z16 processor implements the architecture
already supported as "arch14" in LLVM.  This patch adds support for
"z16" as an alternate architecture name for arch14.
2022-04-21 19:58:22 +02:00
Sam McCall 6e22dac2e2 [Frontend] avoid copy of PCH data when PrecompiledPreamble stores it in memory
Instead of unconditionally copying the PCHBuffer into an ostream which can be
backed either by a string or a file, just make the PCHBuffer itself the
in-memory storage.

Differential Revision: https://reviews.llvm.org/D124180
2022-04-21 19:52:59 +02:00
Haojian Wu 84051d8226 [clangd] Fix a declare-constructor tweak crash on incomplete fields.
Differential Revision: https://reviews.llvm.org/D124154
2022-04-21 19:44:43 +02:00
Ulrich Weigand e4085a012c [sanitizer] Fix prctl unit test on non-SMT systems
On systems where the kernel supports the PR_SCHED_CORE
interface, but there is no SMT, the prctl call will set
errno to ENODEV, which currently causes the test to fail.

Fix by accepting ENODEV in addition to EINVAL.
2022-04-21 19:31:04 +02:00
Fangrui Song d133538b8b [LegacyPM] Remove MemorySanitizerLegacyPass
Using the legacy PM for the optimization pipeline was deprecated in 13.0.0.
Following recent changes to remove non-core features of the legacy
PM/optimization pipeline, remove MemorySanitizerLegacyPass.

Differential Revision: https://reviews.llvm.org/D123894
2022-04-21 10:21:46 -07:00
Vasileios Porpodas 889588ee97 [SLP] Refactoring isLegalBroadcastLoad() to use `ElementCount`.
Replacing `unsigned` with `ElementCount` in the argument of `isLegalBroadcastLoad()`.
This helps reduce the diff of a future SLP patch for AArch64.
2022-04-21 10:19:00 -07:00
Wael Yehia f296b4c444 [AIX] Always pass namedsects option when linking with PGO.
Differential Revision: https://reviews.llvm.org/D124046
2022-04-21 17:01:37 +00:00
chenglin.bi 25aba1abb5 Revert "[InstCombine] Add one use limitation for (X * C2) << C1 --> X * (C2 << C1)"
This reverts commit b543d28df7.
2022-04-22 00:56:20 +08:00
Alex Zinenko 0edb262d91 [mlir] enable doc generation for the transform dialect 2022-04-21 18:52:08 +02:00
Craig Topper 98b866892d [RISCV] Add special case to constant materialization to remove trailing zeros first.
If there are fewer than 12 trailing zeros, we'll try to use an ADDI
at the end of the sequence. If we strip trailing zeros and end the
sequence with a SLLI we might find a shorter sequence.

Differential Revision: https://reviews.llvm.org/D124148
2022-04-21 09:43:32 -07:00
Stanislav Mekhanoshin ac94073daa [AMDGPU] Refine 64 bit misaligned LDS ops selection
Here is the performance data:
```
Using platform: AMD Accelerated Parallel Processing
Using device: gfx900:xnack-

ds_write_b64                       aligned by  8:  3.2 sec
ds_write2_b32                      aligned by  8:  3.2 sec
ds_write_b16 * 4                   aligned by  8:  7.0 sec
ds_write_b8 * 8                    aligned by  8: 13.2 sec
ds_write_b64                       aligned by  1:  7.3 sec
ds_write2_b32                      aligned by  1:  7.5 sec
ds_write_b16 * 4                   aligned by  1: 14.0 sec
ds_write_b8 * 8                    aligned by  1: 13.2 sec
ds_write_b64                       aligned by  2:  7.3 sec
ds_write2_b32                      aligned by  2:  7.5 sec
ds_write_b16 * 4                   aligned by  2:  7.1 sec
ds_write_b8 * 8                    aligned by  2: 13.3 sec
ds_write_b64                       aligned by  4:  4.6 sec
ds_write2_b32                      aligned by  4:  3.2 sec
ds_write_b16 * 4                   aligned by  4:  7.1 sec
ds_write_b8 * 8                    aligned by  4: 13.3 sec
ds_read_b64                        aligned by  8:  2.3 sec
ds_read2_b32                       aligned by  8:  2.2 sec
ds_read_u16 * 4                    aligned by  8:  4.8 sec
ds_read_u8 * 8                     aligned by  8:  8.6 sec
ds_read_b64                        aligned by  1:  4.4 sec
ds_read2_b32                       aligned by  1:  7.3 sec
ds_read_u16 * 4                    aligned by  1: 14.0 sec
ds_read_u8 * 8                     aligned by  1:  8.7 sec
ds_read_b64                        aligned by  2:  4.4 sec
ds_read2_b32                       aligned by  2:  7.3 sec
ds_read_u16 * 4                    aligned by  2:  4.8 sec
ds_read_u8 * 8                     aligned by  2:  8.7 sec
ds_read_b64                        aligned by  4:  4.4 sec
ds_read2_b32                       aligned by  4:  2.3 sec
ds_read_u16 * 4                    aligned by  4:  4.8 sec
ds_read_u8 * 8                     aligned by  4:  8.7 sec

Using platform: AMD Accelerated Parallel Processing
Using device: gfx1030

ds_write_b64                       aligned by  8:  4.4 sec
ds_write2_b32                      aligned by  8:  4.3 sec
ds_write_b16 * 4                   aligned by  8:  7.9 sec
ds_write_b8 * 8                    aligned by  8: 13.0 sec
ds_write_b64                       aligned by  1: 23.2 sec
ds_write2_b32                      aligned by  1: 23.1 sec
ds_write_b16 * 4                   aligned by  1: 44.0 sec
ds_write_b8 * 8                    aligned by  1: 13.0 sec
ds_write_b64                       aligned by  2: 23.2 sec
ds_write2_b32                      aligned by  2: 23.1 sec
ds_write_b16 * 4                   aligned by  2:  7.9 sec
ds_write_b8 * 8                    aligned by  2: 13.1 sec
ds_write_b64                       aligned by  4: 13.5 sec
ds_write2_b32                      aligned by  4:  4.3 sec
ds_write_b16 * 4                   aligned by  4:  7.9 sec
ds_write_b8 * 8                    aligned by  4: 13.1 sec
ds_read_b64                        aligned by  8:  3.5 sec
ds_read2_b32                       aligned by  8:  3.4 sec
ds_read_u16 * 4                    aligned by  8:  5.3 sec
ds_read_u8 * 8                     aligned by  8:  8.5 sec
ds_read_b64                        aligned by  1: 13.1 sec
ds_read2_b32                       aligned by  1: 22.7 sec
ds_read_u16 * 4                    aligned by  1: 43.9 sec
ds_read_u8 * 8                     aligned by  1:  7.9 sec
ds_read_b64                        aligned by  2: 13.1 sec
ds_read2_b32                       aligned by  2: 22.7 sec
ds_read_u16 * 4                    aligned by  2:  5.6 sec
ds_read_u8 * 8                     aligned by  2:  7.9 sec
ds_read_b64                        aligned by  4: 13.1 sec
ds_read2_b32                       aligned by  4:  3.4 sec
ds_read_u16 * 4                    aligned by  4:  5.6 sec
ds_read_u8 * 8                     aligned by  4:  7.9 sec
```

GFX10 exposes a different pattern for sub-DWORD load/store performance
than GFX9. On GFX9 it is faster to issue a single unaligned load or
store than a fully split b8 access, where on GFX10 even a full split
is better. However, this is a theoretical only gain because splitting
an access to a sub-dword level will require more registers and packing/
unpacking logic, so ignoring this option it is better to use a single
64 bit instruction on a misaligned data with the exception of 4 byte
aligned data where ds_read2_b32/ds_write2_b32 is better.

Differential Revision: https://reviews.llvm.org/D123956
2022-04-21 09:37:16 -07:00
chenglin.bi b543d28df7 [InstCombine] Add one use limitation for (X * C2) << C1 --> X * (C2 << C1)
Follow up D123453, add one-use limitation for
(X * C2) << C1 --> X * (C2 << C1)
to make consistent with
lshr (mul nuw x, MulC), ShAmtC -> mul nuw x, (MulC >> ShAmtC)

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D124183
2022-04-22 00:32:36 +08:00
Jacob Lambert afcc6baac5 [clang][HIP] Updating driver to enable archive/bitcode to bitcode linking when targeting HIPAMD toolchain
Differential Revision: https://reviews.llvm.org/D124151
2022-04-21 09:24:33 -07:00
Sanjay Patel 8960ba7491 Revert "[InstCombine] C0 <<{nsw, nuw} (X - C1) --> (C0 >> C1) << X"
This reverts commit 5819f4a422.
This caused bots to fail with a crash/assert during the fold,
so some constraint was missed.
2022-04-21 12:15:27 -04:00
chenglin.bi e077e3a648 [InstCombine] add baseline test for (X * C2) << C1 --> X * (C2 << C1) without one use; NFC 2022-04-22 00:12:06 +08:00
Sam McCall af3fb07154 [Frontend] Simplify PrecompiledPreamble::PCHStorage. NFC
- Remove fiddly union, preambles are heavyweight
- Remove fiddly move constructors in TempPCHFile and PCHStorage, use unique_ptr
- Remove unneccesary accessors on PCHStorage
- Remove trivial InMemoryStorage
- Move implementation details into cpp file

This is a prefactoring, followup change will change the in-memory PCHStorage to
avoid extra string copies while creating it.

Differential Revision: https://reviews.llvm.org/D124177
2022-04-21 18:10:13 +02:00
Nico Weber 889847922d [lld/mac] Warn that writing zippered outputs isn't implemented
A "zippered" dylib contains several LC_BUILD_VERSION load commands, usually
one each for "normal" macOS and one for macCatalyst.

These are usually created by passing something like

   -shared -target arm64-apple-macos -darwin-target-variant arm64-apple-ios13.1-macabi

to clang, which turns it into

    -platform_version macos 12.0.0 12.3 -platform_version "mac catalyst" 14.0.0 15.4

for the linker.

ld64.lld can read these files fine, but it can't write them.  Before this
change, it would just silently use the last -platform_version flag and ignore
the rest.

This change adds a warning that writing zippered dylibs isn't implemented yet
instead.

Sadly, parts of ld64.lld's test suite relied on the previous
"silently use last flag" semantics for its test suite: `%lld` always expanded
to `ld64.lld -platform_version macos 10.15 11.0` and tests that wanted a
different value passed a 2nd `-platform_version` flag later on. But this now
produces a warning if the platform passed to `-platform_version` is not `macos`.

There weren't very many cases of this, so move these to use `%no-arg-lld` and
manually pass `-arch`.

Differential Revision: https://reviews.llvm.org/D124106
2022-04-21 12:05:56 -04:00
Adam Czachorowski ad46aaede6 [clangd] Add beforeExecute() callback to FeatureModules.
It runs immediatelly before FrontendAction::Execute() with a mutable
CompilerInstance, allowing FeatureModules to register callbacks, remap
files, etc.

Differential Revision: https://reviews.llvm.org/D124176
2022-04-21 18:03:39 +02:00
Simon Pilgrim f8a078f20c [X86] Add test case for Issue #54911 2022-04-21 17:02:15 +01:00
Fangrui Song ae46b3e01f Revert D121279 "[MLIR][GPU] Add canonicalizer for gpu.memcpy"
This reverts commit 12f55cac69.

Causes miscompile. Will follow up with a reproduce.
2022-04-21 08:55:13 -07:00
Simon Pilgrim 13d59a8ee4 [M68k] Regenerate cmp.ll tests
M68k is still experimental so wasn't updated in a recent DAG combine
2022-04-21 16:54:00 +01:00
Tyler Mandry d8c1d37ba3 [fuchsia] Don't include duplicate profiling symbols for Fuchsia
InstrProfilingPlatformLinux.c already provides these symbols. Linker order
saved us from noticing before.

Reviewed By: mcgrathr

Differential Revision: https://reviews.llvm.org/D124136
2022-04-21 15:44:37 +00:00
Byoungchan Lee 8a3afc6da5 [compiler-rt][Darwin] Add arm64 to simulator platforms
This patch is the reland of a8e5ce76b4,
which includes additional SDK version checks to ensure that
XCode's headers support arm64 builds.

Differential Revision: https://reviews.llvm.org/D119174
2022-04-21 17:42:31 +02:00
Sanjay Patel 5819f4a422 [InstCombine] C0 <<{nsw, nuw} (X - C1) --> (C0 >> C1) << X
This is similar to an existing pre-shift-of-constant fold:
8a9c70fc01
...but in this case, we need no-wrap on the shl and a negative
offset:
https://alive2.llvm.org/ce/z/_RVz99

Fixes #54890
2022-04-21 11:38:27 -04:00
Sanjay Patel 782d0105ba [InstCombine] add tests for C << (X - C1); NFC 2022-04-21 11:38:26 -04:00
Paul Robinson f80e369f61 [PS4] Driver: use correct --shared option 2022-04-21 08:19:42 -07:00
Kirill Bobyrev 9f05b111ee
[clangd] Include Cleaner: suppress unused warnings for IWYU pragma: export
Add limited support for "IWYU pragma: export" - for now it just supresses the
warning similar to "IWYU pragma: keep".

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D124170
2022-04-21 17:00:06 +02:00
Kirill Bobyrev e1c0d2fb82 [clangd] Correctly identify self-contained headers included rercursively
Right now when exiting the file Headers.cpp will identify the recursive
inclusion (with a new FileID) as non self-contained and will add it to the set
from which it will never be removed. As a result, we get incorrect results in
the IncludeStructure and Include Cleaner. This patch is a fix.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D124166
2022-04-21 16:54:59 +02:00
gbreynoo 1f71b5a386 [llvm-ar] Fix thin archive being wrongly converted to a full archive
When using the L option to quick append a full archive to a thin
archive, the thin archive was being wrongly converted to a full archive.
I've fixed the issue and added a check for it in
thin-to-full-archive.test and expanded some tests.

Differential Revision: https://reviews.llvm.org/D123142
2022-04-21 15:48:26 +01:00
Alex Zinenko 30f22429d3 [mlir] Connect Transform dialect to PDL
This introduces a pair of ops to the Transform dialect that connect it to PDL
patterns. Transform dialect relies on PDL for matching the Payload IR ops that
are about to be transformed. For this purpose, it provides a container op for
patterns, a "pdl_match" op and transform interface implementations that call
into the pattern matching infrastructure.

To enable the caching of compiled patterns, this also provides the extension
mechanism for TransformState. Extensions allow one to store additional
information in the TransformState and thus communicate it between different
Transform dialect operations when they are applied. They can be added and
removed when applying transform ops. An extension containing a symbol table in
which the pattern names are resolved and a pattern compilation cache is
introduced as the first client.

Depends On D123664

Reviewed By: Mogball

Differential Revision: https://reviews.llvm.org/D124007
2022-04-21 16:23:10 +02:00
Eric Astor 82ecf9a0b1 [LLVM-ML] Add standard LLVM debug flags
Adds support for -debug and -debug-only= flags.

Reviewed By: ayzhao

Differential Revision: https://reviews.llvm.org/D123545
2022-04-21 10:14:59 -04:00
Petar Avramovic e06290e53f AMDGPU/GlobalISel: Fix isVCC for uniform s1 with reg class on wave32
Fix isVCC for register that was assigned register class during
inst-selection. This happens when register has multiple uses.
For wave32, uniform i1 to vcc copy was selected like vcc to vcc
copy when uniform i1 had assigned register class.
Uniform i1 register with assigned register class will have s1 LLT,
be defined using G_TRUNC and class will be SReg_32RegClass.
Vcc i1 register with assigned register class will have s1 LLT,
class will be SReg_32RegClass for wave32 and SReg_64RegClass for
wave64 and register will not be defined by G_TRUNC.

Differential Revision: https://reviews.llvm.org/D124163
2022-04-21 16:12:04 +02:00
Petar Avramovic 4e0dacb2cf AMDGPU/GlobalISel: Precommit test for D124163 2022-04-21 16:12:03 +02:00
Nikita Popov ead231dec0 [InstCombine] Fix typo in test (NFC)
This is a copy paste mistake, this variant of the test was supposed
to use poison instead of undef.
2022-04-21 15:58:51 +02:00
Karl Meakin 81904454f7 [AArch64] Add `foldOverflowCheck` DAG combine
Differential Revision: https://reviews.llvm.org//D123779
2022-04-21 14:56:38 +01:00
Karl Meakin 13403a70e4 [AArch64] Add lowerings for {ADD,SUB}CARRY and S{ADD,SUB}O_CARRY
Differential Revision: https://reviews.llvm.org/D123322
2022-04-21 14:56:37 +01:00
Nikita Popov 46c2b41d02 [InstCombine] Remove dead code (NFC)
This was a leftover condition without code.
2022-04-21 15:53:53 +02:00
Jannik Silvanus 607f8ced39 [AMDGPU]: Fix failing assertion in SIMachineScheduler
This fixes the assertion failure "Loop in the Block Graph!".

SIMachineScheduler groups instructions into blocks (also referred to
as coloring or groups) and then performs a two-level scheduling:
inter-block scheduling, and intra-block scheduling.

This approach requires that the dependency graph on the blocks which
is obtained by contracting the blocks in the original dependency graph
is acyclic. In other words: Whenever A and B end up in the same block,
all vertices on a path from A to B must be in the same block.

When compiling an example consisting of an export followed by
a buffer store, we see a dependency between these two. This dependency
may be false, but that is a different issue.
This dependency was not correctly accounted for by SiMachineScheduler.

A new test case si-scheduler-exports.ll demonstrating this is
also added in this commit.

The problematic part of SiMachineScheduler was a post-optimization of
the block assignment that tried to group all export instructions into
a separate export block for better execution performance. This routine
correctly checked that any paths from exports to exports did not
contain any non-exports, but not vice-versa: In case of an export with
a non-export successor dependency, that single export was moved
to a separate block, which could then be both a successor and a
predecessor block of a non-export block.

As fix, we now skip export grouping if there are exports with direct
non-export successor dependencies. This fixes the issue at hand,
but is slightly pessimistic:
We *could* group all exports into a separate block that have neither
direct nor indirect export successor dependencies.
We will review the potential performance impact and potentially
revisit with a more sophisticated implementation.

Note that just grouping all exports without direct non-export successor
dependencies could still lead to illegal blocks, since non-export A
could depend on export B that depends on export C. In that case,
export C has no non-export successor, but still may not be grouped
into an export block.
2022-04-21 14:52:29 +01:00
Luo, Yuanke fa4347261e [X86] Add test case for SetCCMOVMSK combine.
Create 2 users for MOVMSK to test if compiler would perform the combine
"MOVMSK(CONCAT(X,Y)) == 0 ->  MOVMSK(OR(X,Y))".
2022-04-21 21:47:40 +08:00
Nikita Popov 662f57ee21 [InstCombine] Add tests for memset with undef/poison value (NFC) 2022-04-21 15:45:54 +02:00
Nikita Popov 9001edc535 [InstCombine] Split up test for store with undef (NFC) 2022-04-21 15:41:38 +02:00
Markus Böck 850b2c6b3c [mlir] Fix `Region`s `takeBody` method if the region is not empty
The current implementation of takeBody first clears the Region, before then taking ownership of the blocks of the other regions. The issue here however, is that when clearing the region, it does not take into account references of operations to each other. In particular, blocks are deleted from front to back, and operations within a block are very likely to be deleted despite still having uses, causing an assertion to trigger [0].

This patch fixes that issue by simply calling dropAllReferences()before clearing the blocks.

[0] 9a8bb4bc63/mlir/lib/IR/Operation.cpp (L154)

Differential Revision: https://reviews.llvm.org/D123913
2022-04-21 15:32:59 +02:00
Fabian Wolff 95d77383f2 [clang-tidy] Fix behavior of `modernize-use-using` with nested structs/unions
Fixes https://github.com/llvm/llvm-project/issues/50334.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D113804
2022-04-21 15:18:31 +02:00
Andrew Savonichev 96e7487013 [NVPTX] Fix LIT tests with default nameTableKind
Default nameTableKind results in the following DWARF section:

    .section .debug_pubnames
    {
      .b32 LpubNames_end0-LpubNames_start0    // Length of Public Names Info
      LpubNames_start0:
      [...]
      LpubNames_end0:
    }

Without -mattr=+ptx75 ptxas complains about labels and label
expressions:

error   : Feature 'labels1 - labels2 expression in .section' requires
PTX ISA .version 7.5 or later
error   : Feature 'Defining labels in .section' requires PTX ISA
.version 7.0 or later

The patch modifies dbg-value-const-byref.ll to let it run without PTX
7.5 (available from CUDA 11.0), and adds a new test just for this
case.

Differential revision: https://reviews.llvm.org/D124108
2022-04-21 16:05:25 +03:00
Simon Pilgrim ac213375d9 [InstCombine] Add nonpow2 (negative) test for D123374 2022-04-21 13:58:43 +01:00
Aaron Ballman 408226f20a Fix Sphinx build 2022-04-21 08:52:29 -04:00