llvm-project

Commit Graph

Author	SHA1	Message	Date
Mircea Trofin	14a16fae43	[llvm][NFC] Rename CallAnalyzer::onCommonInstructionSimplification Summary: It is called when instructions aren't simplified, and the implementation is expected to account for a penalty. Renamed to onCommonInstructionMissedSimplification. Reviewers: davidxl, eraman Reviewed By: davidxl Subscribers: hiraditya, baloghadamsoftware, haicheng, a.sidorin, Szelethus, donat.nagy, dkrupp, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73662	2020-01-29 21:07:36 -08:00
Johannes Doerfert	89c2e733e8	[Attributor] Pointer privatization attribute (argument promotion) A pointer is privatizeable if it can be replaced by a new, private one. Privatizing pointer reduces the use count, interaction between unrelated code parts. This is a first step towards replacing argument promotion. While we can already handle recursion (unlike argument promotion!) we are restricted to stack allocations for now because we do not analyze the uses in the callee. Reviewed By: uenoku Differential Revision: https://reviews.llvm.org/D68852	2020-01-29 21:31:04 -06:00
Johannes Doerfert	791c9f1145	[Attributor] Fix TODO to avoid recomputation of results The helpers AAReturnedFromReturnedValues and AACallSiteReturnedFromReturned are useful not only to avoid code duplication but also to avoid recomputation of results. If we have N call sites we should not recompute the function return information N times but once. These are mostly straightforward usages with some minor improvements on the helpers and addition of a new one (IRPosition::getAssociatedType) that knows about function return types.	2020-01-29 19:24:34 -06:00
Gabor Horvath	31ae0165c3	[LTO] Add optimization remarks for removed functions This only works with regular LTO for now. Differential Revision: https://reviews.llvm.org/D73597	2020-01-29 15:53:51 -08:00
Craig Topper	35625464c6	[X86] Fix the cost model for v16i16->v16i32 zero_extend/sign_extend with AVX2 We seem to be inheriting the cost from sse4.1. But if we have 256-bit registers we should be able to do this with just one extract to split the 16i16 and two v8i16->v8i32 operations so our cost should be 3 not 4. Differential Revision: https://reviews.llvm.org/D73646	2020-01-29 15:52:10 -08:00
Matt Arsenault	c5fffa4da3	GlobalISel: Add observer argument to legalizeIntrinsic This is passed to legalizeCustom, but not intrinsic. Also remove the MRI argument, since you can get that from the MachineIRBuilder. I'm not sure why MachineIRBuilder has a private observer member, and this is passed separately.	2020-01-29 18:33:45 -05:00
Matt Arsenault	7f3280ecdd	AMDGPU/GlobalISel: Select permlane16/permlanex16	2020-01-29 17:55:31 -05:00
Cameron McInally	4f2e2acc4b	[NFC][AArch64][SVE] Rename Destructive enumerator from DestructiveInstType Rename Destructive enumerator in preparation for a larger set of patches to support prefixing destructive oeprations with MOVPRFX. Differential Revision: https://reviews.llvm.org/D73212	2020-01-29 15:42:26 -06:00
Amara Emerson	c12f046eb9	[GlobalISel] Add new combine to convert scalar G_MUL to G_SHL. For pow2 constants we should use G_SHL for pattern matching (and perf) purposes later. Vector support not yet implemented. Differential Revision: https://reviews.llvm.org/D73659	2020-01-29 13:39:00 -08:00
Jessica Paquette	050cd443ca	[AArch64][GlobalISel] Fix TBNZ/TBZ opcode selection When the bit is <= 32, we have to use the W register variant for TB(N)Z. This is because of the way the instruction is encoded. Differential Revision: https://reviews.llvm.org/D73660	2020-01-29 13:11:18 -08:00
Hiroshi Yamauchi	24962ced81	[Loads] Handle simple cases with same base pointer with constant offsets in FindAvailableLoadedValue when AA is null. Summary: This will help with devirtualization (store forwarding with vtable pointers in the presence of other stores into members in the constructor.) During inlining, we don't have AA. Reviewers: davidxl Subscribers: mgorny, Prazek, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71307	2020-01-29 13:05:46 -08:00
Cameron McInally	00c2249910	[NFCI][AArch64][SVE] Set default DestructiveInstType in AArch64Inst class Some housekeeping for the DestructiveInstType enum before a larger set of patches to support prefixing destructive oeprations with MOVPRFX. Differential Revision: https://reviews.llvm.org/D73141	2020-01-29 15:00:19 -06:00
Victor Huang	1492b70a03	[PowerPC][Future] Add prefixed loads and stores for future CPU A previous patch should have added pld and pstd and any support code in the backend that is required for prefixed load and store type operations. This patch adds a number of additional prefixed load and store type instructions for the future CPU. Differential Revision: https://reviews.llvm.org/D72577	2020-01-29 14:45:56 -06:00
Sterling Augustine	c64b56617d	Print discriminators when printing .debug_line in GNU style. Summary: gnu addr2line prints DWARF line table discriminators like so: <file>:<line> (discriminator <Number>) This matches that behavior. Document how and when --output-style=GNU prints discriminators Add test for new GNU-style discriminator printing. Reviewers: rupprecht, labath, jhenderson Subscribers: aprantl, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73318	2020-01-29 12:22:12 -08:00
Nikita Popov	e086e23024	[InstCombine] Support non-splat vectors in icmp eq + add/sub fold For the icmp eq (add X, C1), C2 => icmp eq X, C2-C1 icmp eq (sub C1, X), C2 => icmp eq X, C1-C2 folds, this allows C1 to be non-splat and contain undefs. C2 is still splat, due to the structure of the code. This is to address the remaining part of the regression in D73411, where demanded element analysis replaces some elements with undef. Differential Revision: https://reviews.llvm.org/D73647	2020-01-29 20:56:58 +01:00
Amara Emerson	0da937bb5c	[GlobalISel][IRTranslator] Follow convention and put constant offset of getelementptr arithmetic on RHS. We were needlessly putting known constant values on the LHS of a G_MUL, which is suboptimal. Differential Revision: https://reviews.llvm.org/D73650	2020-01-29 11:37:19 -08:00
Huihui Zhang	8f6761aa41	Revert "[AArch64] Fix data race on RegisterBank initialization." Buildbot failure, revert first while looking at the issue. This reverts commit `a5a4a47d69`.	2020-01-29 11:17:19 -08:00
Huihui Zhang	af620fc36a	Revert "[AMDGPU] Fix data race on RegisterBank initialization." There looks to be buildbot failure related. This reverts commit `8bb6c8a22a`.	2020-01-29 11:16:27 -08:00
Huihui Zhang	2ec954579a	Revert "[ARM] Fix data race on RegisterBank initialization." There looks to be buildbot failure related. This reverts commit `91618d940e`.	2020-01-29 11:15:27 -08:00
Fangrui Song	8903e61b66	[AsmPrinter][ELF] Define local aliases (.Lfoo$local) for GlobalObjects For `MC_GlobalAddress` operands referencing certain GlobalObjects, we can lower them to STB_LOCAL aliases to avoid costs brought by assembler/linker's conservative decisions about symbol interposition: * An assembler conservatively assumes a global default visibility symbol interposable (ELF semantics). So relocations in object files are needed even if the code generator assumed the definition exact and non-interposable. * The relocations can cause the creation of PLT entries on some targets for -shared links. A linker conservatively assumes a global default visibility symbol interposable (if not otherwise constrained by -Bsymbolic/--dynamic-list/VER_NDX_LOCAL/etc). "certain" refers to GlobalObjects in the intersection of `hasExactDefinition() and !isInterposable()`: `external`, `appending`, `internal`, `private`. Local linkages (`internal` and `private`) cannot be interposed. `appending` is for very few objects LLVM interpret specially. So the set just includes `external`. This patch emits STB_LOCAL aliases (.Lfoo$local) for such GlobalObjects, so that targets can lower MC_GlobalAddress operands to STB_LOCAL aliases if applicable. We may extend the scope and include GlobalAlias in the future. LLVM's existing -fno-semantic-interposition behaviors give us license to do such optimizations: * Various optimizations (ipconstprop, inliner, sccp, sroa, etc) treat normal ExternalLinkage GlobalObjects as non-interposable. * Before D72197, MC resolved a PC-relative VK_None fixup to a non-local symbol at assembly time (no outstanding relocation), if the target is defined in the same section. Put it simply, even if IR optimizations failed to optimize and allowed interposition for the function call in `void foo() {} void bar() { foo(); }`, the assembler would disallow it. This patch sets up AsmPrinter infrastructure to make -fno-semantic-interposition more so. With and without the patch, the object file output should be identical: `.Lfoo$local` does not take a symbol table entry. Reviewed By: sfertile Differential Revision: https://reviews.llvm.org/D73228	2020-01-29 10:58:43 -08:00
Sterling Augustine	0758ac4e0c	Handle non-absolute include dirs properly for both dwarf4 and dwarf5. Summary: Add test case for the same. This test case will also serve as a starting point for later symbolizer tests. Reviewers: dblaikie, jdoerfert Subscribers: hiraditya, llvm-commits, jhenderson Tags: #llvm Differential Revision: https://reviews.llvm.org/D73583	2020-01-29 10:51:51 -08:00
Simon Pilgrim	f7245ef897	[DAGCombiner] ISD::SHL/SRA/SRL - use general SelectionDAG::FoldConstantArithmetic This handles all the constant splat / opaque testing for us.	2020-01-29 18:49:42 +00:00
Huihui Zhang	d2e2fc450e	[ConstantFold][SVE] Fix constant folding for scalable vector binary operations. Summary: Scalable vector should not be evaluated element by element. Add support to handle scalable vector UndefValue. Reviewers: sdesmalen, huntergr, spatel, lebedev.ri, apazos, efriedma, willlovett Reviewed By: efriedma Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71445	2020-01-29 10:49:08 -08:00
Austin Kerbow	2605adb69c	[AMDGPU][GlobalISel] Select 8-byte LDS Ops with 4-byte alignment Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73585	2020-01-29 10:42:12 -08:00
Adrian Prantl	18dbe1b279	Run clang-format on DwarfExpression (NFC)	2020-01-29 10:23:12 -08:00
Adrian Prantl	816ee8a423	DwarfExpression: Factor out getOrCreateBaseType() (NFC)	2020-01-29 10:23:12 -08:00
Huihui Zhang	91618d940e	[ARM] Fix data race on RegisterBank initialization. Summary: The initialization of RegisterBank needs to be done only once. The logic of AlreadyInit has data race, use llvm::call_once instead. This is continuing work of D73587. Reviewers: arsenm, rovka, dsanders, t.p.northover, efriedma, apazos Reviewed By: arsenm Subscribers: wdng, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73605	2020-01-29 10:15:37 -08:00
Huihui Zhang	8bb6c8a22a	[AMDGPU] Fix data race on RegisterBank initialization. Summary: The initialization of RegisterBank needs to be done only once. The logic of AlreadyInit has data race, use llvm::call_once instead. This is continuing work of D73587. Reviewers: arsenm, tstellar, ronlieb, efriedma, apazos, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73604	2020-01-29 10:14:40 -08:00
Huihui Zhang	a5a4a47d69	[AArch64] Fix data race on RegisterBank initialization. Summary: The initialization of RegisterBank needs to be done only once. The logic of AlreadyInit has a data race, use llvm::call_once instead. This issue was identified through thread sanitizer. Reviewers: efriedma, apazos, qcolombet, dsanders Reviewed By: efriedma Subscribers: arsenm, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73587	2020-01-29 10:12:52 -08:00
Adrian Prantl	aa6ec19c5f	Add dwarfdump support for DW_OP_regval_type. Differential Revision: https://reviews.llvm.org/D73598	2020-01-29 10:02:23 -08:00
Simon Pilgrim	25b8e96388	[DAGCombiner] ISD::MUL - use general SelectionDAG::FoldConstantArithmetic This handles all the constant splat / opaque testing for us.	2020-01-29 17:26:22 +00:00
Craig Topper	90c31b0f42	[X86] Custom lower ISD::FROUND with SSE4.1 to avoid a libcall. ISD::FROUND is defined to round to nearest with ties rounding away from 0. This mode isn't supported in hardware on X86. But as long as we aren't compiling with trapping math, we can emulate this with floor(X + copysign(nextafter(0.5, 0.0), X)). We have to use nextafter to avoid some corner cases that adding 0.5 would have. For example, if X is nextafter(0.5, 0.0) it should round to 0.0, but adding 0.5 would need one extra bit of mantissa than can be stored so it rounds to 1.0. Adding nextafter(0.5, 0.0) instead will just increase the exponent by 1 and leave the mantissa as all 1s. This would be nextafter(1.0, 0.0) which will floor to 0.0. Techically this requires -fno-trapping-math which isn't our default. But if we care about exceptions we should be using constrained intrinsics. Constrained intrinsics would use STRICT_FROUND which won't go through this code. Fixes PR42195. Differential Revision: https://reviews.llvm.org/D73607	2020-01-29 09:10:02 -08:00
Jay Foad	d07a789579	[AMDGPU] Cluster FLAT instructions with both vaddr and saddr Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73634	2020-01-29 17:01:35 +00:00
Simon Pilgrim	4b04e11735	[DAGCombiner] Sub/SUBSAT - use general SelectionDAG::FoldConstantArithmetic This handles all the constant splat / opaque testing for us.	2020-01-29 16:57:13 +00:00
Simon Pilgrim	48bd6a0986	[DAGCombiner] visitIMINMAX - use general SelectionDAG::FoldConstantArithmetic This handles all the constant splat / opaque testing for us instead of the ConstantSDNode variant where we have to do it ourselves.	2020-01-29 16:57:13 +00:00
Craig Topper	e5edd641fd	[X86] Use a shorter sequence to implement FLT_ROUNDS This code needs to map from the FPCW 2-bit encoding for rounding mode to the 2-bit encoding defined for FLT_ROUNDS. The previous implementation did some clever swapping of bits and adding 1 modulo 4 to do the mapping. This patch instead uses an 8-bit immediate as a lookup table of four 2-bit values. Then we use the 2-bit FPCW encoding to index the lookup table by using a right shift and an AND. This requires extracting the 2-bit value from FPCW and multipying it by 2 to make it usable as a shift amount. But still results in less code. Differential Revision: https://reviews.llvm.org/D73599	2020-01-29 08:56:33 -08:00
Matt Arsenault	62129878a6	AMDGPU/GlobalISel: Fix tablegen selection for scalar bin ops Fixes selection for scalar G_SMULH/G_UMULH. Also switches to using tablegen selected add/sub, which switch to the signed version of the opcode. This matches the current DAG behavior. We can't drop the manual selection for add/sub yet, because it's still both for VALU add/sub and for G_PTR_ADD.	2020-01-29 08:55:54 -08:00
Kazushi (Jam) Marukawa	fef80a2946	[VE] (conditional) branch modification & isel patterns Summary: InstInfo for branch modification, (conditional) branch isel patterns and tests. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D73632	2020-01-29 17:40:57 +01:00
Matt Arsenault	b63629a58d	GlobalISel: Fix mask computation in lowerInsert This is supposed to be the high bit index, not the width. Use the wrapping form of getBitsSet and avoid the bitflip.	2020-01-29 08:25:36 -08:00
Matt Arsenault	68b102b97a	AMDGPU: Directly select 16-bank LDS case of llvm.amdgcn.interp.p1.f16 Manually select this is as a tablegen workraound. Both SelectionDAG and GlobalISel end up misplacing the copy to m0 when both instructions in the output need it. Neither considers that both output instructions depend on m0. I don't know of any other pattern we need to handle this case, so it's less effort to just workaround this for now.	2020-01-29 08:24:31 -08:00
Jay Foad	0d7bd34312	[MachineScheduler] Ignore artificial edges when forming store chains Summary: BaseMemOpClusterMutation::apply forms store chains by looking for control (i.e. non-data) dependencies from one mem op to another. In the test case, clusterNeighboringMemOps successfully clusters the loads, and then adds artificial edges to the loads' successors as described in the comment: // Copy successor edges from SUa to SUb. Interleaving computation // dependent on SUa can prevent load combining due to register reuse. The effect of this is that data dependencies from one load to a store are copied as artificial dependencies from a different load to the same store. Then when BaseMemOpClusterMutation::apply looks at the stores, it finds that some of them have a control dependency on a previous load, which breaks the chains and means that the stores are not all considered part of the same chain and won't all be clustered. The fix is to only consider non-artificial control dependencies when forming chains. Subscribers: MatzeB, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71717	2020-01-29 16:23:01 +00:00
Matt Arsenault	96352e0a1b	AMDGPU/GlobalISel: Handle LDS with relocations case	2020-01-29 08:18:55 -08:00
Elia Geretto	ab2300bc15	[PassManagerBuilder] Remove global extension when a plugin is unloaded This commit fixes PR39321. GlobalExtensions is not guaranteed to be destroyed when optimizer plugins are unloaded. If it is indeed destroyed after a plugin is dlclose-d, the destructor of the corresponding ExtensionFn is not mapped anymore, causing a call to unmapped memory during destruction. This commit guarantees that extensions coming from external plugins are removed from GlobalExtensions when the plugin is unloaded if GlobalExtensions has not been destroyed yet. Differential Revision: https://reviews.llvm.org/D71959	2020-01-29 16:15:45 +00:00
Connor Abbott	87d98c1495	AMDGPU: Fix handling of infinite loops in fragment shaders Summary: Due to the fact that kill is just a normal intrinsic, even though it's supposed to terminate the thread, we can end up with provably infinite loops that are actually supposed to end successfully. The AMDGPUUnifyDivergentExitNodes pass breaks up these loops, but because there's no obvious place to make the loop branch to, it just makes it return immediately, which skips the exports that are supposed to happen at the end and hangs the GPU if all the threads end up being killed. While it would be nice if the fact that kill terminates the thread were modeled in the IR, I think that the structurizer as-is would make a mess if we did that when the kill is inside control flow. For now, we just add a null export at the end to make sure that it always exports something, which fixes the immediate problem without penalizing the more common case. This means that we sometimes do two "done" exports when only some of the threads enter the discard loop, but from tests the hardware seems ok with that. This fixes dEQP-VK.graphicsfuzz.while-inside-switch with radv. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70781	2020-01-29 17:13:25 +01:00
Matt Arsenault	94e8ef4d4c	AMDGPU/GlobalISel: Look through copies for source modifiers When all VOP instructions are legalized to VGPRs, any SGPR source modifiers will have a copy in the way.	2020-01-29 08:08:13 -08:00
Stanislav Mekhanoshin	c2ad7ee1a9	[AMDGPU] override isHighLatencyDef SIMachineScheduler uses isHighLatencyInstruction with the same sematincs, but TargetInstrInfo has virtual isHighLatencyDef method, so override it instead. Added FLAT to the list of high latency opcodes and a check for mayLoad since stores are not technically high latency in terms of data dependency. This change did not produce any visible impact on our tests. Differential Revision: https://reviews.llvm.org/D73582	2020-01-29 08:01:29 -08:00
Matt Arsenault	f717483acd	GlobalISel: Assert on invalid bitcast in MIRBuilder The other casts validate, so this should too.	2020-01-29 07:49:39 -08:00
Simon Pilgrim	79748add70	Fix MSVC lamdba default capture mode warning. NFCI.	2020-01-29 15:47:04 +00:00
Hans Wennborg	31e07692d7	Work around PR44697 in CrashRecoveryContext	2020-01-29 16:35:07 +01:00
Connor Abbott	08b205bb48	Revert "AMDGPU: Fix handling of infinite loops in fragment shaders" This reverts commit `0994c485e6`.	2020-01-29 16:14:52 +01:00

1 2 3 4 5 ...

130588 Commits