Commit Graph

387768 Commits

Author SHA1 Message Date
Saleem Abdulrasool d319005a37 lit: revert 134b103fc0
Revert the 32-process cap on Windows.  When testing with Swift, we found
that there was a time reduction for testing with the higher load.  This
should hopefully not matter much in practice.  In the case that the
original problem with python remains with a high subprocess count, we
can easily revert this change.
2021-05-07 10:22:43 -07:00
Roman Lebedev b8701dc174
[X86] AMD Zen 3: mark XMM/YMM (but not MMX!) reg moves as eliminatible in RegisterFile 2021-05-07 20:11:21 +03:00
Roman Lebedev 5b1610a250
[X86] AMD Zen 3: MOVSX32rr32 is a zero-cycle move
It measures as such, and the reference docs agree.

I can't easily add a MCA test, because there's no mnemonic for it,
it can only be disassembled or created as a MCInst.
2021-05-07 20:11:20 +03:00
Fangrui Song 6a2850f3fc [AArch64][ELF] Prefer to lower MC_GlobalAddress operands to .Lfoo$local
Similar to X86 D73230 & 46788a21f9

With this change, we can set dso_local in clang's -fpic -fno-semantic-interposition mode,
for default visibility external linkage non-ifunc-non-COMDAT definitions.

For such dso_local definitions, variable access/taking the address of a
function/calling a function will go through a local alias to avoid GOT/PLT.

Note: the 'S' inline assembly constraint refers to an absolute symbolic address
or a label reference (D46745).

Differential Revision: https://reviews.llvm.org/D101872
2021-05-07 09:44:26 -07:00
Matt Morehouse f09414499c [libFuzzer] Fix stack-overflow-with-asan.test.
Fix function return type and remove check for SUMMARY, since it doesn't
seem to be output in Windows.
2021-05-07 09:18:21 -07:00
Whitney Tsang 1006ac3963 [LoopNest] Consider loop nest with inner loop guard using outer loop
induction variable to be perfect

This patch allow more conditional branches to be considered as loop
guard, and so more loop nests can be considered perfect.

Reviewed By: bmahjour, sidbav

Differential Revision: https://reviews.llvm.org/D94717
2021-05-07 16:04:18 +00:00
Simon Pilgrim f744723f75 [X86] combineXor - limit fold to non-opaque constants (PR50254)
Ensure we don't try to fold when one might be an opaque constant - the constant fold will fail and then the reverse fold will happen in DAGCombine.....
2021-05-07 16:39:24 +01:00
Roman Lebedev 2819009b5a
[X86] AMD Zen 3: _REV variants of zero-cycles moves are also zero-cycles (PR50261)
Sometimes disassembler picks _REV variants of instructions
over the plain ones, which in this case exposed an issue
that the _REV variants aren't being modelled as optimizable moves.
2021-05-07 18:27:40 +03:00
Roman Lebedev a8e30e63ac
[NFC][X86][MCA] AMD Zen3: add test for zero-cycle X87 move 2021-05-07 18:27:40 +03:00
Sebastian Poeplau 70cbc6dbef [libFuzzer] Fix stack overflow detection
Address sanitizer can detect stack exhaustion via its SEGV handler, which is
executed on a separate stack using the sigaltstack mechanism. When libFuzzer is
used with address sanitizer, it installs its own signal handlers which defer to
those put in place by the sanitizer before performing additional actions. In the
particular case of a stack overflow, the current setup fails because libFuzzer
doesn't preserve the flag for executing the signal handler on a separate stack:
when we run out of stack space, the operating system can't run the SEGV handler,
so address sanitizer never reports the issue. See the included test for an
example.

This commit fixes the issue by making libFuzzer preserve the SA_ONSTACK flag
when installing its signal handlers; the dedicated signal-handler stack set up
by the sanitizer runtime appears to be large enough to support the additional
frames from the fuzzer.

Reviewed By: morehouse

Differential Revision: https://reviews.llvm.org/D101824
2021-05-07 08:18:28 -07:00
thomasraoux a970e69d6b [mlir][vector] add pattern to cast away leading unit dim for elementwise op
Differential Revision: https://reviews.llvm.org/D102034
2021-05-07 07:54:09 -07:00
thomasraoux 565ee6afc7 [mlir][spirv] add support lowering of extract_slice to scalar type
Differential Revision: https://reviews.llvm.org/D102041
2021-05-07 07:52:02 -07:00
Joseph Tremoulet bc302bfbef BasicAA: Recognize inttoptr as isEscapeSource
Pointers escape when converted to integers, so a pointer produced by
converting an integer to a pointer must not be a local non-escaping
object.

Reviewed By: nikic, nlopes, aqjune

Differential Revision: https://reviews.llvm.org/D101541
2021-05-07 07:48:50 -07:00
Sanjay Patel 0a6f11aabd [AArch64] add test for missed vectorization; NFC
This is a reduction of the example in:
https://llvm.org/PR50256
2021-05-07 10:45:11 -04:00
Joseph Huber a15f8589f4 [libomptarget] Add support for target memory allocators to cuda RTL
Summary:
The allocator interface added in D97883 allows the RTL to allocate shared and
host-pinned memory from the cuda plugin. This patch adds support for these to
the runtime.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D102000
2021-05-07 10:27:02 -04:00
Tobias Gysi f31531a30b [mlir][linalg] Remove redundant indexOp builder.
Remove the builder signature taking a signed dimension identifier.

Reviewed By: ergawy

Differential Revision: https://reviews.llvm.org/D102055
2021-05-07 14:22:12 +00:00
Tres Popp faab8c140a [mlir] Rename BufferAliasAnalysis to BufferViewFlowAnalysis
This it to make more clear the difference between this and
an AliasAnalysis.

For example, given a sequence of subviews that create values
A -> B -> C -> d:
BufferViewFlowAnalysis::resolve(B) => {B, C, D}
AliasAnalysis::resolve(B) => {A, B, C, D}

Differential Revision: https://reviews.llvm.org/D100838
2021-05-07 16:12:54 +02:00
Ahsan Saghir 25bbff632d [PowerPC] Provide MMA builtins for compatibility
Vector pair intrinsics and builtins were renamed in
https://reviews.llvm.org/D91974 to replace the _mma_ prefix by _vsx_.
However, some projects used the _mma_ version, so this patch adds
these intrinsics to provide compatibility.

Fixes Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=50159

Reviewed By: nemanjai, amyk

Differential Revision: https://reviews.llvm.org/D100482
2021-05-07 09:10:16 -05:00
Roman Lebedev 34de155f7e
[NFC][X86][MCA] AMD Zen3 Decrease iteration count in reg-move-elimination tests
Drop it just enough so it still produces the right IPC.
2021-05-07 17:06:45 +03:00
Roman Lebedev 758c173309
[X86] AMD Zen 3: throughput for renameable XMM/YMM moves is 6
They are resolved at the register rename stage without
using any execution units.
2021-05-07 17:06:45 +03:00
Roman Lebedev 715c0d0bd4
[X86] AMD Zen 3: AVX YMM moves are zero-cycle
I've verified this with llvm-exegesis.
This is not limited to zero registers.
2021-05-07 17:06:45 +03:00
Roman Lebedev ee020b930d
[X86] AMD Zen 3: AVX XMM moves are zero-cycle
I've verified this with llvm-exegesis.
This is not limited to zero registers.
2021-05-07 17:06:44 +03:00
Roman Lebedev 9db4203883
[X86] AMD Zen 3: SSE XMM moves are zero-cycle
I've verified this with llvm-exegesis.
This is not limited to zero registers.

Refs:
AMD SOG 19h, 2.9.4 Zero Cycle Move
The processor is able to execute certain register to register
mov operations with zero cycle delay.

Agner,
22.13 Instructions with no latency
Register-to-register move instructions are resolved at
the register rename stage without using any execution units.
These instructions have zero latency. It is possible to do six such
register renamings per clock cycle, and it is even possible to
rename the same register multiple times in one clock cycle.
2021-05-07 17:06:44 +03:00
Roman Lebedev 0d961fbd52
[NFC][X86][MCA] AMD Zen 3: Add tests for renameable AVX YMM moves 2021-05-07 17:06:44 +03:00
Roman Lebedev bcbfc22ff9
[NFC][X86][MCA] AMD Zen 3: Add tests for renameable AVX XMM moves 2021-05-07 17:06:44 +03:00
Roman Lebedev cbabe4f4d6
[NFC][X86][MCA] AMD Zen 3: Add tests for renameable SSE XMM moves 2021-05-07 17:06:44 +03:00
Roman Lebedev d8c6202576
[X86] AMD Zen 3: throughput for renameable GPR moves is 6
They are resolved at the register rename stage without
using any execution units.
2021-05-07 17:06:43 +03:00
Roman Lebedev c3cd8ed009
[NFC][X86] AMD Zen 3: move sched classes for renameables moves togeter 2021-05-07 17:06:43 +03:00
Roman Lebedev e6d688ec96
[NFC][X86][MCA] Increase iteration count in reg move elimination tests
So the IPC actually stabilizes at 6.
2021-05-07 17:06:43 +03:00
Arthur O'Dwyer f42355e17c [libc++] [test] Test that unordered_*::swap/move/assign does not invalidate iterators.
And remove the dedicated debug-iterator tests; we want to test this in all modes.
We have a CI step for testing the whole test suite with `--debug_level=1` now.

Part of https://reviews.llvm.org/D102003
2021-05-07 10:04:26 -04:00
Arthur O'Dwyer a1f75bf091 [libc++] [test] Simplify arithmetic in list.special/swap.pass.cpp. NFCI.
Part of https://reviews.llvm.org/D102003
2021-05-07 10:03:52 -04:00
Arthur O'Dwyer 8935c8449b [libc++] [test] Test that list::swap/move/move-assign does not invalidate iterators.
And remove the dedicated debug-iterator test; we want to test this in all modes.
We have a CI step for testing the whole test suite with `--debug_level=1` now.

Part of https://reviews.llvm.org/D102003
2021-05-07 10:03:23 -04:00
Stephen Tozer 7bc1dd1191 Reapply "[DebugInfo] Drop DBG_VALUE_LISTs with an excessive number of debug operands"
Reapply b623df3c, which was reverted while reverting a different patch
with a breaking change. There are no underlying issues with this patch,
so no changes have been made to the original patch.

This reverts commit b11e4c9907.
2021-05-07 14:55:02 +01:00
Simon Pilgrim c9d4b4173b [CodeGen] Ensure UserValue::getDebugLoc() and UserLabel::getDebugLoc() consistently return a const reference NFCI.
Avoids a lot of unnecessary tracking increments/decrements of the underlying TrackingMDNodeRef.
2021-05-07 14:48:23 +01:00
Simon Pilgrim dd21c6b843 [DAG] Ensure all SD classes consistently return a const reference with getDebugLoc(). NFCI.
Avoids a lot of unnecessary tracking increments/decrements of the underlying TrackingMDNodeRef.
2021-05-07 14:48:23 +01:00
Benjamin Kramer 6248d11190 Retire TargetRegisterInfo::getSpillAlignment
getSpillAlign does the same thing.
2021-05-07 15:16:22 +02:00
Sebastian Neubauer 13c0316239 [AMDGPU] Restrict immediate scratch offsets
gfx9 does not work with negative offsets, gfx10 works only with
aligned negative offsets, but not with unaligned negative offsets.

This is slightly more conservative than needed, gfx9 does support
negative offsets when a VGPR address is used and gfx10 supports
negative, unaligned offsets when an SGPR address is used, but we
do not make use of that with this patch.

Differential Revision: https://reviews.llvm.org/D101292
2021-05-07 14:51:32 +02:00
David Stuttard 606d4e8061 AMDGPU: Correct const_index_stride for wave 32 for PAL ABI
Retrying after revert and fix (removed implicit def flag from operand). Now
passes with expensive_checks enabled.

Since there is a single scratch resource descriptor for all shaders, if there is
a wave32 and a wave64 shader (for instance for VsFs pairs)
then the const_index_stride will be incorrect for wave32 shaders.

Differential Revision: https://reviews.llvm.org/D101830

Change-Id: Ie3b8b2921237968caca91527dd0c97b1b0cc0360
2021-05-07 13:42:57 +01:00
Stephen Tozer 14818a86d0 Fix: [DebugInfo] Fix crash when emitting an invalidated SDDbgValue
This patch is a fix for revision ce0c1f3c, which caused test failures on
bots without x86 as a registered target. This patch moves the test added
in the prior patch to the x86 folder, so that it only runs on bots with
the correct target available.
2021-05-07 13:38:19 +01:00
Malhar Jajoo dfe3ffaa4a [ARM] Transforming memset to Tail predicated Loop
This patch converts llvm.memset intrinsic into Tail Predicated
Hardware loops for a target that supports the Arm M-profile
Vector Extension (MVE).

The llvm.memset is converted to a TP loop for both
constant and non-constant input sizes (of llvm.memset).

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100435
2021-05-07 13:35:53 +01:00
Anastasia Stulova 76f1de10f4 [OpenCL] Fix optional image types.
This change allows the use of identifiers for image types
from `cl_khr_gl_msaa_sharing` freely in the kernel code if
the extension is not supported since they are not in the
list of the reserved identifiers.

This change also removed the need for pragma for the types
in the extensions since the spec does not require the pragma
uses.

Differential Revision: https://reviews.llvm.org/D100983
2021-05-07 13:29:28 +01:00
Joachim Meyer d9f2960c93 [NFC] Correctly assert the indents for printEnumValHelpStr.
Only verify that there's no negative indent.
Noted by @chapuni in https://reviews.llvm.org/D93494.

Reviewed By: chapuni

Differential Revision: https://reviews.llvm.org/D102021
2021-05-07 14:30:43 +02:00
Stephen Tozer ce0c1f3ced [DebugInfo] Fix crash when emitting an invalidated SDDbgValue
This patch fixes a crash in the compiler that occurs when certain
invalidated SDDbgValues are emitted. The cause of this was that we would
attempt to check the liveness of the debug value's operands, which
triggers an assert if any of those operands are invalid. This patch
changes this check such that it only occurs if the SDDbgValue is valid;
if not, the check is irrelevant anyway, so can be safely ignored.

Differential Revision: https://reviews.llvm.org/D101540
2021-05-07 13:13:56 +01:00
Simon Pilgrim 280aa3415e [DAG] Add a generic expansion for SHIFT_PARTS opcodes using funnel shifts
Based off a discussion on D89281 - where the AARCH64 implementations were being replaced to use funnel shifts.

Any target that has efficient funnel shift lowering can handle the shift parts expansion using the same expansion, avoiding a lot of duplication.

I've generalized the X86 implementation and moved it to TargetLowering - so far I've found that AARCH64 and AMDGPU benefit, but many other targets (ARM, PowerPC + RISCV in particular) could easily use this with a few minor improvements to their funnel shift lowering (or the folding of their target ops that funnel shifts lower to).

NOTE: I'm trying to avoid adding full SHIFT_PARTS legalizer handling as I think it might actually be possible to remove these opcodes in the medium-term and use funnel shift / libcall expansion directly.

Differential Revision: https://reviews.llvm.org/D101987
2021-05-07 13:12:30 +01:00
David Stuttard 793b4b2603 Revert "AMDGPU: Correct const_index_stride for wave 32 for PAL ABI"
This reverts commit 442de0c1ad.
2021-05-07 12:49:17 +01:00
Simon Pilgrim 2a3f60b5f5 [SLP] Regenerate tests to reduce diff in D98714. NFCI. 2021-05-07 12:33:00 +01:00
Simon Pilgrim 8e42024f79 [X86] Ensure we pass DebugLoc by const reference where possible. NFCI.
Avoids a lot of unnecessary tracking increments/decrements of the underlying TrackingMDNodeRef
2021-05-07 12:32:59 +01:00
Ole Strohm f372ff17f7 [NFC] (test commit) Changed example invocation of C++ for OpenCL 2021-05-07 12:31:37 +01:00
David Stuttard 442de0c1ad AMDGPU: Correct const_index_stride for wave 32 for PAL ABI
Since there is a single scratch resource descriptor for all shaders, if there is
a wave32 and a wave64 shader (for instance for VsFs pairs)
then the const_index_stride will be incorrect for wave32 shaders.

Differential Revision: https://reviews.llvm.org/D101830

Change-Id: Id8de5566b0d1a07a814e2e7db016df9d20bf6d2c
2021-05-07 12:19:49 +01:00
Roman Lebedev bda9ca3e44
[NFC][X86][MCA] AMD Zen 3: add tests with non-eliminatible MMX moves
In Zen3, MMX moves are *not* eliminated,
i've verified this with llvm-exegesis.
2021-05-07 13:56:07 +03:00