llvm-project

Commit Graph

Author	SHA1	Message	Date
Saleem Abdulrasool	d319005a37	lit: revert `134b103fc0` Revert the 32-process cap on Windows. When testing with Swift, we found that there was a time reduction for testing with the higher load. This should hopefully not matter much in practice. In the case that the original problem with python remains with a high subprocess count, we can easily revert this change.	2021-05-07 10:22:43 -07:00
Roman Lebedev	b8701dc174	[X86] AMD Zen 3: mark XMM/YMM (but not MMX!) reg moves as eliminatible in RegisterFile	2021-05-07 20:11:21 +03:00
Roman Lebedev	5b1610a250	[X86] AMD Zen 3: MOVSX32rr32 is a zero-cycle move It measures as such, and the reference docs agree. I can't easily add a MCA test, because there's no mnemonic for it, it can only be disassembled or created as a MCInst.	2021-05-07 20:11:20 +03:00
Fangrui Song	6a2850f3fc	[AArch64][ELF] Prefer to lower MC_GlobalAddress operands to .Lfoo$local Similar to X86 D73230 & `46788a21f9` With this change, we can set dso_local in clang's -fpic -fno-semantic-interposition mode, for default visibility external linkage non-ifunc-non-COMDAT definitions. For such dso_local definitions, variable access/taking the address of a function/calling a function will go through a local alias to avoid GOT/PLT. Note: the 'S' inline assembly constraint refers to an absolute symbolic address or a label reference (D46745). Differential Revision: https://reviews.llvm.org/D101872	2021-05-07 09:44:26 -07:00
Matt Morehouse	f09414499c	[libFuzzer] Fix stack-overflow-with-asan.test. Fix function return type and remove check for SUMMARY, since it doesn't seem to be output in Windows.	2021-05-07 09:18:21 -07:00
Whitney Tsang	1006ac3963	[LoopNest] Consider loop nest with inner loop guard using outer loop induction variable to be perfect This patch allow more conditional branches to be considered as loop guard, and so more loop nests can be considered perfect. Reviewed By: bmahjour, sidbav Differential Revision: https://reviews.llvm.org/D94717	2021-05-07 16:04:18 +00:00
Simon Pilgrim	f744723f75	[X86] combineXor - limit fold to non-opaque constants (PR50254) Ensure we don't try to fold when one might be an opaque constant - the constant fold will fail and then the reverse fold will happen in DAGCombine.....	2021-05-07 16:39:24 +01:00
Roman Lebedev	2819009b5a	[X86] AMD Zen 3: _REV variants of zero-cycles moves are also zero-cycles (PR50261) Sometimes disassembler picks _REV variants of instructions over the plain ones, which in this case exposed an issue that the _REV variants aren't being modelled as optimizable moves.	2021-05-07 18:27:40 +03:00
Roman Lebedev	a8e30e63ac	[NFC][X86][MCA] AMD Zen3: add test for zero-cycle X87 move	2021-05-07 18:27:40 +03:00
Sebastian Poeplau	70cbc6dbef	[libFuzzer] Fix stack overflow detection Address sanitizer can detect stack exhaustion via its SEGV handler, which is executed on a separate stack using the sigaltstack mechanism. When libFuzzer is used with address sanitizer, it installs its own signal handlers which defer to those put in place by the sanitizer before performing additional actions. In the particular case of a stack overflow, the current setup fails because libFuzzer doesn't preserve the flag for executing the signal handler on a separate stack: when we run out of stack space, the operating system can't run the SEGV handler, so address sanitizer never reports the issue. See the included test for an example. This commit fixes the issue by making libFuzzer preserve the SA_ONSTACK flag when installing its signal handlers; the dedicated signal-handler stack set up by the sanitizer runtime appears to be large enough to support the additional frames from the fuzzer. Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D101824	2021-05-07 08:18:28 -07:00
thomasraoux	a970e69d6b	[mlir][vector] add pattern to cast away leading unit dim for elementwise op Differential Revision: https://reviews.llvm.org/D102034	2021-05-07 07:54:09 -07:00
thomasraoux	565ee6afc7	[mlir][spirv] add support lowering of extract_slice to scalar type Differential Revision: https://reviews.llvm.org/D102041	2021-05-07 07:52:02 -07:00
Joseph Tremoulet	bc302bfbef	BasicAA: Recognize inttoptr as isEscapeSource Pointers escape when converted to integers, so a pointer produced by converting an integer to a pointer must not be a local non-escaping object. Reviewed By: nikic, nlopes, aqjune Differential Revision: https://reviews.llvm.org/D101541	2021-05-07 07:48:50 -07:00
Sanjay Patel	0a6f11aabd	[AArch64] add test for missed vectorization; NFC This is a reduction of the example in: https://llvm.org/PR50256	2021-05-07 10:45:11 -04:00
Joseph Huber	a15f8589f4	[libomptarget] Add support for target memory allocators to cuda RTL Summary: The allocator interface added in D97883 allows the RTL to allocate shared and host-pinned memory from the cuda plugin. This patch adds support for these to the runtime. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D102000	2021-05-07 10:27:02 -04:00
Tobias Gysi	f31531a30b	[mlir][linalg] Remove redundant indexOp builder. Remove the builder signature taking a signed dimension identifier. Reviewed By: ergawy Differential Revision: https://reviews.llvm.org/D102055	2021-05-07 14:22:12 +00:00
Tres Popp	faab8c140a	[mlir] Rename BufferAliasAnalysis to BufferViewFlowAnalysis This it to make more clear the difference between this and an AliasAnalysis. For example, given a sequence of subviews that create values A -> B -> C -> d: BufferViewFlowAnalysis::resolve(B) => {B, C, D} AliasAnalysis::resolve(B) => {A, B, C, D} Differential Revision: https://reviews.llvm.org/D100838	2021-05-07 16:12:54 +02:00
Ahsan Saghir	25bbff632d	[PowerPC] Provide MMA builtins for compatibility Vector pair intrinsics and builtins were renamed in https://reviews.llvm.org/D91974 to replace the _mma_ prefix by _vsx_. However, some projects used the _mma_ version, so this patch adds these intrinsics to provide compatibility. Fixes Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=50159 Reviewed By: nemanjai, amyk Differential Revision: https://reviews.llvm.org/D100482	2021-05-07 09:10:16 -05:00
Roman Lebedev	34de155f7e	[NFC][X86][MCA] AMD Zen3 Decrease iteration count in reg-move-elimination tests Drop it just enough so it still produces the right IPC.	2021-05-07 17:06:45 +03:00
Roman Lebedev	758c173309	[X86] AMD Zen 3: throughput for renameable XMM/YMM moves is 6 They are resolved at the register rename stage without using any execution units.	2021-05-07 17:06:45 +03:00
Roman Lebedev	715c0d0bd4	[X86] AMD Zen 3: AVX YMM moves are zero-cycle I've verified this with llvm-exegesis. This is not limited to zero registers.	2021-05-07 17:06:45 +03:00
Roman Lebedev	ee020b930d	[X86] AMD Zen 3: AVX XMM moves are zero-cycle I've verified this with llvm-exegesis. This is not limited to zero registers.	2021-05-07 17:06:44 +03:00
Roman Lebedev	9db4203883	[X86] AMD Zen 3: SSE XMM moves are zero-cycle I've verified this with llvm-exegesis. This is not limited to zero registers. Refs: AMD SOG 19h, 2.9.4 Zero Cycle Move The processor is able to execute certain register to register mov operations with zero cycle delay. Agner, 22.13 Instructions with no latency Register-to-register move instructions are resolved at the register rename stage without using any execution units. These instructions have zero latency. It is possible to do six such register renamings per clock cycle, and it is even possible to rename the same register multiple times in one clock cycle.	2021-05-07 17:06:44 +03:00
Roman Lebedev	0d961fbd52	[NFC][X86][MCA] AMD Zen 3: Add tests for renameable AVX YMM moves	2021-05-07 17:06:44 +03:00
Roman Lebedev	bcbfc22ff9	[NFC][X86][MCA] AMD Zen 3: Add tests for renameable AVX XMM moves	2021-05-07 17:06:44 +03:00
Roman Lebedev	cbabe4f4d6	[NFC][X86][MCA] AMD Zen 3: Add tests for renameable SSE XMM moves	2021-05-07 17:06:44 +03:00
Roman Lebedev	d8c6202576	[X86] AMD Zen 3: throughput for renameable GPR moves is 6 They are resolved at the register rename stage without using any execution units.	2021-05-07 17:06:43 +03:00
Roman Lebedev	c3cd8ed009	[NFC][X86] AMD Zen 3: move sched classes for renameables moves togeter	2021-05-07 17:06:43 +03:00
Roman Lebedev	e6d688ec96	[NFC][X86][MCA] Increase iteration count in reg move elimination tests So the IPC actually stabilizes at 6.	2021-05-07 17:06:43 +03:00
Arthur O'Dwyer	f42355e17c	[libc++] [test] Test that unordered_*::swap/move/assign does not invalidate iterators. And remove the dedicated debug-iterator tests; we want to test this in all modes. We have a CI step for testing the whole test suite with `--debug_level=1` now. Part of https://reviews.llvm.org/D102003	2021-05-07 10:04:26 -04:00
Arthur O'Dwyer	a1f75bf091	[libc++] [test] Simplify arithmetic in list.special/swap.pass.cpp. NFCI. Part of https://reviews.llvm.org/D102003	2021-05-07 10:03:52 -04:00
Arthur O'Dwyer	8935c8449b	[libc++] [test] Test that list::swap/move/move-assign does not invalidate iterators. And remove the dedicated debug-iterator test; we want to test this in all modes. We have a CI step for testing the whole test suite with `--debug_level=1` now. Part of https://reviews.llvm.org/D102003	2021-05-07 10:03:23 -04:00
Stephen Tozer	7bc1dd1191	Reapply "[DebugInfo] Drop DBG_VALUE_LISTs with an excessive number of debug operands" Reapply b623df3c, which was reverted while reverting a different patch with a breaking change. There are no underlying issues with this patch, so no changes have been made to the original patch. This reverts commit `b11e4c9907`.	2021-05-07 14:55:02 +01:00
Simon Pilgrim	c9d4b4173b	[CodeGen] Ensure UserValue::getDebugLoc() and UserLabel::getDebugLoc() consistently return a const reference NFCI. Avoids a lot of unnecessary tracking increments/decrements of the underlying TrackingMDNodeRef.	2021-05-07 14:48:23 +01:00
Simon Pilgrim	dd21c6b843	[DAG] Ensure all SD classes consistently return a const reference with getDebugLoc(). NFCI. Avoids a lot of unnecessary tracking increments/decrements of the underlying TrackingMDNodeRef.	2021-05-07 14:48:23 +01:00
Benjamin Kramer	6248d11190	Retire TargetRegisterInfo::getSpillAlignment getSpillAlign does the same thing.	2021-05-07 15:16:22 +02:00
Sebastian Neubauer	13c0316239	[AMDGPU] Restrict immediate scratch offsets gfx9 does not work with negative offsets, gfx10 works only with aligned negative offsets, but not with unaligned negative offsets. This is slightly more conservative than needed, gfx9 does support negative offsets when a VGPR address is used and gfx10 supports negative, unaligned offsets when an SGPR address is used, but we do not make use of that with this patch. Differential Revision: https://reviews.llvm.org/D101292	2021-05-07 14:51:32 +02:00
David Stuttard	606d4e8061	AMDGPU: Correct const_index_stride for wave 32 for PAL ABI Retrying after revert and fix (removed implicit def flag from operand). Now passes with expensive_checks enabled. Since there is a single scratch resource descriptor for all shaders, if there is a wave32 and a wave64 shader (for instance for VsFs pairs) then the const_index_stride will be incorrect for wave32 shaders. Differential Revision: https://reviews.llvm.org/D101830 Change-Id: Ie3b8b2921237968caca91527dd0c97b1b0cc0360	2021-05-07 13:42:57 +01:00
Stephen Tozer	14818a86d0	Fix: [DebugInfo] Fix crash when emitting an invalidated SDDbgValue This patch is a fix for revision `ce0c1f3c`, which caused test failures on bots without x86 as a registered target. This patch moves the test added in the prior patch to the x86 folder, so that it only runs on bots with the correct target available.	2021-05-07 13:38:19 +01:00
Malhar Jajoo	dfe3ffaa4a	[ARM] Transforming memset to Tail predicated Loop This patch converts llvm.memset intrinsic into Tail Predicated Hardware loops for a target that supports the Arm M-profile Vector Extension (MVE). The llvm.memset is converted to a TP loop for both constant and non-constant input sizes (of llvm.memset). Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100435	2021-05-07 13:35:53 +01:00
Anastasia Stulova	76f1de10f4	[OpenCL] Fix optional image types. This change allows the use of identifiers for image types from `cl_khr_gl_msaa_sharing` freely in the kernel code if the extension is not supported since they are not in the list of the reserved identifiers. This change also removed the need for pragma for the types in the extensions since the spec does not require the pragma uses. Differential Revision: https://reviews.llvm.org/D100983	2021-05-07 13:29:28 +01:00
Joachim Meyer	d9f2960c93	[NFC] Correctly assert the indents for printEnumValHelpStr. Only verify that there's no negative indent. Noted by @chapuni in https://reviews.llvm.org/D93494. Reviewed By: chapuni Differential Revision: https://reviews.llvm.org/D102021	2021-05-07 14:30:43 +02:00
Stephen Tozer	ce0c1f3ced	[DebugInfo] Fix crash when emitting an invalidated SDDbgValue This patch fixes a crash in the compiler that occurs when certain invalidated SDDbgValues are emitted. The cause of this was that we would attempt to check the liveness of the debug value's operands, which triggers an assert if any of those operands are invalid. This patch changes this check such that it only occurs if the SDDbgValue is valid; if not, the check is irrelevant anyway, so can be safely ignored. Differential Revision: https://reviews.llvm.org/D101540	2021-05-07 13:13:56 +01:00
Simon Pilgrim	280aa3415e	[DAG] Add a generic expansion for SHIFT_PARTS opcodes using funnel shifts Based off a discussion on D89281 - where the AARCH64 implementations were being replaced to use funnel shifts. Any target that has efficient funnel shift lowering can handle the shift parts expansion using the same expansion, avoiding a lot of duplication. I've generalized the X86 implementation and moved it to TargetLowering - so far I've found that AARCH64 and AMDGPU benefit, but many other targets (ARM, PowerPC + RISCV in particular) could easily use this with a few minor improvements to their funnel shift lowering (or the folding of their target ops that funnel shifts lower to). NOTE: I'm trying to avoid adding full SHIFT_PARTS legalizer handling as I think it might actually be possible to remove these opcodes in the medium-term and use funnel shift / libcall expansion directly. Differential Revision: https://reviews.llvm.org/D101987	2021-05-07 13:12:30 +01:00
David Stuttard	793b4b2603	Revert "AMDGPU: Correct const_index_stride for wave 32 for PAL ABI" This reverts commit `442de0c1ad`.	2021-05-07 12:49:17 +01:00
Simon Pilgrim	2a3f60b5f5	[SLP] Regenerate tests to reduce diff in D98714. NFCI.	2021-05-07 12:33:00 +01:00
Simon Pilgrim	8e42024f79	[X86] Ensure we pass DebugLoc by const reference where possible. NFCI. Avoids a lot of unnecessary tracking increments/decrements of the underlying TrackingMDNodeRef	2021-05-07 12:32:59 +01:00
Ole Strohm	f372ff17f7	[NFC] (test commit) Changed example invocation of C++ for OpenCL	2021-05-07 12:31:37 +01:00
David Stuttard	442de0c1ad	AMDGPU: Correct const_index_stride for wave 32 for PAL ABI Since there is a single scratch resource descriptor for all shaders, if there is a wave32 and a wave64 shader (for instance for VsFs pairs) then the const_index_stride will be incorrect for wave32 shaders. Differential Revision: https://reviews.llvm.org/D101830 Change-Id: Id8de5566b0d1a07a814e2e7db016df9d20bf6d2c	2021-05-07 12:19:49 +01:00
Roman Lebedev	bda9ca3e44	[NFC][X86][MCA] AMD Zen 3: add tests with non-eliminatible MMX moves In Zen3, MMX moves are not eliminated, i've verified this with llvm-exegesis.	2021-05-07 13:56:07 +03:00

1 2 3 4 5 ...

387768 Commits All Branches Search

387768 Commits

All Branches