llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjay Patel	661cc71a1c	[PassManager][PhaseOrdering] lower expects before running simplifyCFG Retry of `330619a3a6` that includes a clang test update. Original commit message: If we run passes before lowering llvm.expect intrinsics to metadata, then those passes have no way to act on the hints provided by llvm.expect. SimplifyCFG is the known offender, and we made it smarter about profile metadata in D98898 <https://reviews.llvm.org/D98898>. In the motivating example from https://llvm.org/PR49336 , this means we were ignoring the recommended method for a programmer to tell the compiler that a compare+branch is expensive. This change appears to solve that case - the metadata survives to the backend, the compare order is as expected in IR, and the backend does not do anything to reverse it. We make the same change to the old pass manager to keep things synchronized. Differential Revision: https://reviews.llvm.org/D100213	2021-04-12 15:07:53 -04:00
Sanjay Patel	23ac9d1e6e	Revert "[PassManager][PhaseOrdering] lower expects before running simplifyCFG" This reverts commit `330619a3a6`. There are clang tests that also need to be updated.	2021-04-12 13:58:54 -04:00
Sanjay Patel	330619a3a6	[PassManager][PhaseOrdering] lower expects before running simplifyCFG If we run passes before lowering llvm.expect intrinsics to metadata, then those passes have no way to act on the hints provided by llvm.expect. SimplifyCFG is the known offender, and we made it smarter about profile metadata in D98898. In the motivating example from https://llvm.org/PR49336 , this means we were ignoring the recommended method for a programmer to tell the compiler that a compare+branch is expensive. This change appears to solve that case - the metadata survives to the backend, the compare order is as expected in IR, and the backend does not do anything to reverse it. We make the same change to the old pass manager to keep things synchronized. Differential Revision: https://reviews.llvm.org/D100213	2021-04-12 12:23:31 -04:00
Sebastian Neubauer	6cc91adf1e	[AMDGPU] Kill temporary register after restoring Not a correctness issue, but the temporary register is not used afterwards and should be dead. Differential Revision: https://reviews.llvm.org/D100295	2021-04-12 14:20:03 +02:00
Sebastian Neubauer	b76c2a6c2b	[AMDGPU] Fix saving fp and bp Spilling the fp or bp to scratch could overwrite VGPRs of inactive lanes. Fix that by using only the active lanes of the scavenged VGPR. This builds on the assumptions that 1. a function is never called with exec=0 2. lanes do not die in a function, i.e. exec!=0 in the function epilog 3. no new lanes are active when exiting the function, i.e. exec in the epilog is a subset of exec in the prolog. Differential Revision: https://reviews.llvm.org/D96869	2021-04-12 11:52:55 +02:00
Sebastian Neubauer	ca3bae94c4	[AMDGPU] Autogenerate test. NFC	2021-04-12 11:51:28 +02:00
Sebastian Neubauer	32bc9a9bc3	[AMDGPU] Unify spill code Instead of reimplementing spilling in prolog and epilog, reuse buildSpillLoadStore. Reviewed By: scott.linder Differential Revision: https://reviews.llvm.org/D99269	2021-04-12 11:19:08 +02:00
Sebastian Neubauer	f9a8c6a0e5	[AMDGPU] Save VGPR of whole wave when spilling Spilling SGPRs to scratch uses a temporary VGPR. LLVM currently cannot determine if a VGPR is used in other lanes or not, so we need to save all lanes of the VGPR. We even need to save the VGPR if it is marked as dead. The generated code depends on two things: - Can we scavenge an SGPR to save EXEC? - And can we scavenge a VGPR? If we can scavenge an SGPR, we - save EXEC into the SGPR - set the needed lane mask - save the temporary VGPR - write the spilled SGPR into VGPR lanes - save the VGPR again to the target stack slot - restore the VGPR - restore EXEC If we were not able to scavenge an SGPR, we do the same operations, but everytime the temporary VGPR is written to memory, we - write VGPR to memory - flip exec (s_not exec, exec) - write VGPR again (previously inactive lanes) Surprisingly often, we are able to scavenge an SGPR, even though we are at the brink of running out of SGPRs. Scavenging a VGPR does not have a great effect (saves three instructions if no SGPR was scavenged), but we need to know if the VGPR we use is live before or not, otherwise the machine verifier complains. Differential Revision: https://reviews.llvm.org/D96336	2021-04-12 11:01:38 +02:00
dfukalov	8f4b7e94a2	[AMDGPU][CostModel] Refine cost model for control-flow instructions. Added cost estimation for switch instruction, updated costs of branches, fixed phi cost. Had to increase `-amdgpu-unroll-threshold-if` default value since conditional branch cost (size) was corrected to higher value. Test renamed to "control-flow.ll". Removed redundant code in `X86TTIImpl::getCFInstrCost()` and `PPCTTIImpl::getCFInstrCost()`. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D96805	2021-04-10 09:20:24 +03:00
Mitch Phillips	092f288d36	Revert "[AMDGPU] Remove MachineDCE after SIFoldOperands" This reverts commit `5a0117b2d0`. Reason: Dependent change `d19a42eba9` broke the ASan buildbots.	2021-04-09 15:47:44 -07:00
Jay Foad	5a0117b2d0	[AMDGPU] Remove MachineDCE after SIFoldOperands Remove the MachineDCE pass after the first SIFoldOperands pass now that SIFoldOperands deletes its own dead instructions. Differential Revision: https://reviews.llvm.org/D100189	2021-04-09 20:41:09 +01:00
Stanislav Mekhanoshin	034fe0e03d	[AMDGPU] Added udot2 op_sel test. NFC.	2021-04-09 12:19:42 -07:00
Jay Foad	a4ced03d34	[AMDGPU] SIFoldOperands: eagerly delete dead copies This is cheap to implement, means less work for future passes like MachineDCE, and slightly improves the folding in some cases. Differential Revision: https://reviews.llvm.org/D100117	2021-04-09 13:52:54 +01:00
Philip Reames	35393c865c	[funcattrs] Infer nosync from instruction walk Pretty straightforward use of existing infrastructure and port of the attributor inference rules for nosync. A couple points of interest: * I deliberately switched from "monotonic or better" to "unordered or better". This is simply me being conservative and is better in line with the rest of the optimizer. We treat monotonic conservatively pretty much everywhere. * The operand bundle test change is suspicious. It looks like we might have missed something here, but if so, it's an issue with the existing nofree inference as well. I'm going to take a closer look at that separately. * I needed to keep the previous inference from readnone. This surprised me, but made sense once I realized readonly inference goes to lengths to reason about local vs non-local memory and that writes to local memory are okay. This is fine for the purpose of nosync, but would e.g. prevent us from inferring nofree from readnone - which is slightly surprising. Differential Revision: https://reviews.llvm.org/D99769	2021-04-08 14:05:00 -07:00
Konstantin Zhuravlyov	4fae63c612	AMDGPU: Add gfx90c support to code object v2 for backwards compatibility Differential Revision: https://reviews.llvm.org/D100126	2021-04-08 16:42:43 -04:00
Stanislav Mekhanoshin	627dab3dbf	[AMDGPU] Check for all meta instrs in GCNRegBankReassign It used to work correctly even with a KILL, but there is no reason to consider meta instructions since they do not create real HW uses. Differential Revision: https://reviews.llvm.org/D100135	2021-04-08 13:41:10 -07:00
Nikita Popov	59a2f67011	[LoopRotate] Don't split loop pass manager After D99249 we use three different loop pass managers for LICM, LoopRotate and LICM+LoopUnswitch. This happens because LazyBFI and LazyBPI are not preserved by LoopRotate (note that D74640 is no longer needed). Avoid this by marking them as preserved. My understanding of D86156 is that it is okay to simply preserve them (which LoopUnswitch already does for the same reason) and rely on callbacks to deal with deleted blocks. Differential Revision: https://reviews.llvm.org/D99843	2021-04-08 22:05:18 +02:00
Stanislav Mekhanoshin	189310a140	[AMDGPU] Allow -amdgpu-unsafe-fp-atomics to ignore denorm mode Fixes: SWDEV-274276 Differential Revision: https://reviews.llvm.org/D100072	2021-04-08 12:46:36 -07:00
Jay Foad	e184eeaa3b	[AMDGPU] Add some implicit uses to tests. NFC. This is just to stop a future patch from optimizing away the things that we actually want to check for.	2021-04-08 16:37:48 +01:00
Jay Foad	c28f79a0e3	[AMDGPU] SIFoldOperands: try harder to fold cndmask instructions Look through copies to find more cases where the two values being selected are identical. The motivation for this is just to be able to remove the weird special case where tryFoldCndMask was called from foldInstOperand, part way through folding a move-immediate into its users, without regressing any lit tests.	2021-04-08 14:26:12 +01:00
Sebastian Neubauer	c10cc4ea27	[AMDGPU] Fix computing live registers in prolog ScratchExecCopy needs to be marked as live, we cannot use that register while EXEC is stored in there. Marking SGPRForFPSaveRestoreCopy and SGPRForBPSaveRestoreCopy as available is unnecessary, they should not be live at that point anway. Differential Revision: https://reviews.llvm.org/D100098	2021-04-08 14:52:50 +02:00
Thomas Preud'homme	04419628e0	[AMDGPU, test] Fix use of undef FileCheck var Test CodeGen/AMDGPU/amdgpu.private-memory.ll and CodeGen/AMDGPU/private-memory-r600.ll have a block of CHECK directives whose prefix is inconsistent: R600-CHECK Vs R600. This leads to a R600-NOT directive using an undefined CHAN variable due to R600-CHECK directives never being considered by FileCheck. Fixing the prefix leads to the testcase failing. As per https://reviews.llvm.org/D99865#2675235 this commit removes the directives instead since it is not possible to write a reliable check. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D99865	2021-04-08 09:42:59 +01:00
hsmahesha	ac64995ceb	[AMDGPU] Only use ds_read/write_b128 for alignment >= 16 PS: Submitting on behalf of Jay. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D100008	2021-04-08 08:12:05 +05:30
hsmahesha	d5fee599c5	[AMDGPU] Add some exhaustive ds read/write alignment tests PS: Submitting on behalf of Jay. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D100007	2021-04-08 08:08:49 +05:30
Tony Tye	4658cd4c18	[AMDGPU] Update gfx90a memory model support Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D100070	2021-04-07 22:17:58 +00:00
Jay Foad	e9608a84d8	[AMDGPU][SDag] Add IMG init also for image_gather4 instructions This fixes an oversight in D99747 which moved the IMG init code from SIAddIMGInit to AdjustInstrPostInstrSelection, but did not set the hasPostISelHook flag on gather4 instructions. Differential Revision: https://reviews.llvm.org/D99953	2021-04-06 14:47:20 +01:00
Jay Foad	0bf4836dc4	[AMDGPU] Fix dubious regexes with unescaped brackets. NFC.	2021-04-06 13:17:41 +01:00
Jay Foad	6fec0a34ce	[AMDGPU] Fix typo in regular expression checks. NFC.	2021-04-06 12:29:48 +01:00
Jay Foad	6eb5b06ecf	[AMDGPU] Regenerate checks to fix prefixes broken in D96340. NFC.	2021-04-06 11:43:53 +01:00
Stanislav Mekhanoshin	30b3aab329	Copy syncscope when expanding atomicrmw into cmpxchg loop Fixes: SWDEV-280070 Differential Revision: https://reviews.llvm.org/D99902	2021-04-05 17:29:38 -07:00
Roman Lebedev	a26f1bf67e	[PassManager] Run additional LICM before LoopRotate Loop rotation often has to perform code duplication from header into preheader, which introduces PHI nodes. >>! In D99204, @thopre wrote: > > With loop peeling, it is important that unnecessary PHIs be avoided or > it will leads to spurious peeling. One source of such PHIs is loop > rotation which creates PHIs for invariant loads. Those PHIs are > particularly problematic since loop peeling is now run as part of simple > loop unrolling before GVN is run, and are thus a source of spurious > peeling. > > Note that while some of the load can be hoisted and eventually > eliminated by instruction combine, this is not always possible due to > alignment issue. In particular, the motivating example [1] was a load > inside a class instance which cannot be hoisted because the `this' > pointer has an alignment of 1. > > [1] http://lists.llvm.org/pipermail/llvm-dev/attachments/20210312/4ce73c47/attachment.cpp Now, we could enhance LoopRotate to avoid duplicating code when not needed, but instead hoist loop-invariant code, but isn't that a code duplication? (sic) We have LICM, and in fact we already run it right after LoopRotation. We could try to move it to before LoopRotation, that is basically free from compile-time perspective: https://llvm-compile-time-tracker.com/compare.php?from=6c93eb4477d88af046b915bc955c03693b2cbb58&to=a4bee6d07732b1184c436da489040b912f0dc271&stat=instructions But, looking at stats, i think it isn't great that we would no longer do LICM after LoopRotation, in particular: \| statistic name \| LoopRotate-LICM \| LICM-LoopRotate \| Δ \| % \| abs(%) \| \| asm-printer.EmittedInsts \| 9015930 \| 9015799 \| -131 \| 0.00% \| 0.00% \| \| indvars.NumElimCmp \| 3536 \| 3544 \| 8 \| 0.23% \| 0.23% \| \| indvars.NumElimExt \| 36725 \| 36580 \| -145 \| -0.39% \| 0.39% \| \| indvars.NumElimIV \| 1197 \| 1187 \| -10 \| -0.84% \| 0.84% \| \| indvars.NumElimIdentity \| 143 \| 136 \| -7 \| -4.90% \| 4.90% \| \| indvars.NumElimRem \| 4 \| 5 \| 1 \| 25.00% \| 25.00% \| \| indvars.NumLFTR \| 29842 \| 29890 \| 48 \| 0.16% \| 0.16% \| \| indvars.NumReplaced \| 2293 \| 2227 \| -66 \| -2.88% \| 2.88% \| \| indvars.NumSimplifiedSDiv \| 6 \| 8 \| 2 \| 33.33% \| 33.33% \| \| indvars.NumWidened \| 26438 \| 26329 \| -109 \| -0.41% \| 0.41% \| \| instcount.TotalBlocks \| 1178338 \| 1173840 \| -4498 \| -0.38% \| 0.38% \| \| instcount.TotalFuncs \| 111825 \| 111829 \| 4 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 9905442 \| 9896139 \| -9303 \| -0.09% \| 0.09% \| \| lcssa.NumLCSSA \| 425871 \| 423961 \| -1910 \| -0.45% \| 0.45% \| \| licm.NumHoisted \| 378357 \| 378753 \| 396 \| 0.10% \| 0.10% \| \| licm.NumMovedCalls \| 2193 \| 2208 \| 15 \| 0.68% \| 0.68% \| \| licm.NumMovedLoads \| 35899 \| 31821 \| -4078 \| -11.36% \| 11.36% \| \| licm.NumPromoted \| 11178 \| 11154 \| -24 \| -0.21% \| 0.21% \| \| licm.NumSunk \| 13359 \| 13587 \| 228 \| 1.71% \| 1.71% \| \| loop-delete.NumDeleted \| 8547 \| 8402 \| -145 \| -1.70% \| 1.70% \| \| loop-instsimplify.NumSimplified \| 12876 \| 11890 \| -986 \| -7.66% \| 7.66% \| \| loop-peel.NumPeeled \| 1008 \| 925 \| -83 \| -8.23% \| 8.23% \| \| loop-rotate.NumNotRotatedDueToHeaderSize \| 368 \| 365 \| -3 \| -0.82% \| 0.82% \| \| loop-rotate.NumRotated \| 42015 \| 42003 \| -12 \| -0.03% \| 0.03% \| \| loop-simplifycfg.NumLoopBlocksDeleted \| 240 \| 242 \| 2 \| 0.83% \| 0.83% \| \| loop-simplifycfg.NumLoopExitsDeleted \| 497 \| 20 \| -477 \| -95.98% \| 95.98% \| \| loop-simplifycfg.NumTerminatorsFolded \| 618 \| 336 \| -282 \| -45.63% \| 45.63% \| \| loop-unroll.NumCompletelyUnrolled \| 11028 \| 11032 \| 4 \| 0.04% \| 0.04% \| \| loop-unroll.NumUnrolled \| 12608 \| 12529 \| -79 \| -0.63% \| 0.63% \| \| mem2reg.NumDeadAlloca \| 10222 \| 10221 \| -1 \| -0.01% \| 0.01% \| \| mem2reg.NumPHIInsert \| 192110 \| 192106 \| -4 \| 0.00% \| 0.00% \| \| mem2reg.NumSingleStore \| 637650 \| 637643 \| -7 \| 0.00% \| 0.00% \| \| scalar-evolution.NumBruteForceTripCountsComputed \| 814 \| 812 \| -2 \| -0.25% \| 0.25% \| \| scalar-evolution.NumTripCountsComputed \| 283108 \| 282934 \| -174 \| -0.06% \| 0.06% \| \| scalar-evolution.NumTripCountsNotComputed \| 106712 \| 106718 \| 6 \| 0.01% \| 0.01% \| \| simple-loop-unswitch.NumBranches \| 5178 \| 4752 \| -426 \| -8.23% \| 8.23% \| \| simple-loop-unswitch.NumCostMultiplierSkipped \| 914 \| 503 \| -411 \| -44.97% \| 44.97% \| \| simple-loop-unswitch.NumSwitches \| 20 \| 18 \| -2 \| -10.00% \| 10.00% \| \| simple-loop-unswitch.NumTrivial \| 183 \| 95 \| -88 \| -48.09% \| 48.09% \| ... but that actually regresses LICM (-12% `licm.NumMovedLoads`), loop-simplifycfg (`NumLoopExitsDeleted`, `NumTerminatorsFolded`), simple-loop-unswitch (`NumTrivial`). What if we instead have LICM both before and after LoopRotate? \| statistic name \| LoopRotate-LICM \| LICM-LoopRotate-LICM \| Δ \| % \| abs(%) \| \| asm-printer.EmittedInsts \| 9015930 \| 9014474 \| -1456 \| -0.02% \| 0.02% \| \| indvars.NumElimCmp \| 3536 \| 3546 \| 10 \| 0.28% \| 0.28% \| \| indvars.NumElimExt \| 36725 \| 36681 \| -44 \| -0.12% \| 0.12% \| \| indvars.NumElimIV \| 1197 \| 1185 \| -12 \| -1.00% \| 1.00% \| \| indvars.NumElimIdentity \| 143 \| 146 \| 3 \| 2.10% \| 2.10% \| \| indvars.NumElimRem \| 4 \| 5 \| 1 \| 25.00% \| 25.00% \| \| indvars.NumLFTR \| 29842 \| 29899 \| 57 \| 0.19% \| 0.19% \| \| indvars.NumReplaced \| 2293 \| 2299 \| 6 \| 0.26% \| 0.26% \| \| indvars.NumSimplifiedSDiv \| 6 \| 8 \| 2 \| 33.33% \| 33.33% \| \| indvars.NumWidened \| 26438 \| 26404 \| -34 \| -0.13% \| 0.13% \| \| instcount.TotalBlocks \| 1178338 \| 1173652 \| -4686 \| -0.40% \| 0.40% \| \| instcount.TotalFuncs \| 111825 \| 111829 \| 4 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 9905442 \| 9895452 \| -9990 \| -0.10% \| 0.10% \| \| lcssa.NumLCSSA \| 425871 \| 425373 \| -498 \| -0.12% \| 0.12% \| \| licm.NumHoisted \| 378357 \| 383352 \| 4995 \| 1.32% \| 1.32% \| \| licm.NumMovedCalls \| 2193 \| 2204 \| 11 \| 0.50% \| 0.50% \| \| licm.NumMovedLoads \| 35899 \| 35755 \| -144 \| -0.40% \| 0.40% \| \| licm.NumPromoted \| 11178 \| 11163 \| -15 \| -0.13% \| 0.13% \| \| licm.NumSunk \| 13359 \| 14321 \| 962 \| 7.20% \| 7.20% \| \| loop-delete.NumDeleted \| 8547 \| 8538 \| -9 \| -0.11% \| 0.11% \| \| loop-instsimplify.NumSimplified \| 12876 \| 12041 \| -835 \| -6.48% \| 6.48% \| \| loop-peel.NumPeeled \| 1008 \| 924 \| -84 \| -8.33% \| 8.33% \| \| loop-rotate.NumNotRotatedDueToHeaderSize \| 368 \| 365 \| -3 \| -0.82% \| 0.82% \| \| loop-rotate.NumRotated \| 42015 \| 42005 \| -10 \| -0.02% \| 0.02% \| \| loop-simplifycfg.NumLoopBlocksDeleted \| 240 \| 241 \| 1 \| 0.42% \| 0.42% \| \| loop-simplifycfg.NumTerminatorsFolded \| 618 \| 619 \| 1 \| 0.16% \| 0.16% \| \| loop-unroll.NumCompletelyUnrolled \| 11028 \| 11029 \| 1 \| 0.01% \| 0.01% \| \| loop-unroll.NumUnrolled \| 12608 \| 12525 \| -83 \| -0.66% \| 0.66% \| \| mem2reg.NumPHIInsert \| 192110 \| 192073 \| -37 \| -0.02% \| 0.02% \| \| mem2reg.NumSingleStore \| 637650 \| 637652 \| 2 \| 0.00% \| 0.00% \| \| scalar-evolution.NumTripCountsComputed \| 283108 \| 282998 \| -110 \| -0.04% \| 0.04% \| \| scalar-evolution.NumTripCountsNotComputed \| 106712 \| 106691 \| -21 \| -0.02% \| 0.02% \| \| simple-loop-unswitch.NumBranches \| 5178 \| 5185 \| 7 \| 0.14% \| 0.14% \| \| simple-loop-unswitch.NumCostMultiplierSkipped \| 914 \| 925 \| 11 \| 1.20% \| 1.20% \| \| simple-loop-unswitch.NumTrivial \| 183 \| 179 \| -4 \| -2.19% \| 2.19% \| \| simple-loop-unswitch.NumBranches \| 5178 \| 4752 \| -426 \| -8.23% \| 8.23% \| \| simple-loop-unswitch.NumCostMultiplierSkipped \| 914 \| 503 \| -411 \| -44.97% \| 44.97% \| \| simple-loop-unswitch.NumSwitches \| 20 \| 18 \| -2 \| -10.00% \| 10.00% \| \| simple-loop-unswitch.NumTrivial \| 183 \| 95 \| -88 \| -48.09% \| 48.09% \| I.e. we end up with less instructions, less peeling, more LICM activity, also note how none of those 4 regressions are here. Namely: \| statistic name \| LICM-LoopRotate \| LICM-LoopRotate-LICM \| Δ \| % \| abs(%) \| \| asm-printer.EmittedInsts \| 9015799 \| 9014474 \| -1325 \| -0.01% \| 0.01% \| \| indvars.NumElimCmp \| 3544 \| 3546 \| 2 \| 0.06% \| 0.06% \| \| indvars.NumElimExt \| 36580 \| 36681 \| 101 \| 0.28% \| 0.28% \| \| indvars.NumElimIV \| 1187 \| 1185 \| -2 \| -0.17% \| 0.17% \| \| indvars.NumElimIdentity \| 136 \| 146 \| 10 \| 7.35% \| 7.35% \| \| indvars.NumLFTR \| 29890 \| 29899 \| 9 \| 0.03% \| 0.03% \| \| indvars.NumReplaced \| 2227 \| 2299 \| 72 \| 3.23% \| 3.23% \| \| indvars.NumWidened \| 26329 \| 26404 \| 75 \| 0.28% \| 0.28% \| \| instcount.TotalBlocks \| 1173840 \| 1173652 \| -188 \| -0.02% \| 0.02% \| \| instcount.TotalInsts \| 9896139 \| 9895452 \| -687 \| -0.01% \| 0.01% \| \| lcssa.NumLCSSA \| 423961 \| 425373 \| 1412 \| 0.33% \| 0.33% \| \| licm.NumHoisted \| 378753 \| 383352 \| 4599 \| 1.21% \| 1.21% \| \| licm.NumMovedCalls \| 2208 \| 2204 \| -4 \| -0.18% \| 0.18% \| \| licm.NumMovedLoads \| 31821 \| 35755 \| 3934 \| 12.36% \| 12.36% \| \| licm.NumPromoted \| 11154 \| 11163 \| 9 \| 0.08% \| 0.08% \| \| licm.NumSunk \| 13587 \| 14321 \| 734 \| 5.40% \| 5.40% \| \| loop-delete.NumDeleted \| 8402 \| 8538 \| 136 \| 1.62% \| 1.62% \| \| loop-instsimplify.NumSimplified \| 11890 \| 12041 \| 151 \| 1.27% \| 1.27% \| \| loop-peel.NumPeeled \| 925 \| 924 \| -1 \| -0.11% \| 0.11% \| \| loop-rotate.NumRotated \| 42003 \| 42005 \| 2 \| 0.00% \| 0.00% \| \| loop-simplifycfg.NumLoopBlocksDeleted \| 242 \| 241 \| -1 \| -0.41% \| 0.41% \| \| loop-simplifycfg.NumLoopExitsDeleted \| 20 \| 497 \| 477 \| 2385.00% \| 2385.00% \| \| loop-simplifycfg.NumTerminatorsFolded \| 336 \| 619 \| 283 \| 84.23% \| 84.23% \| \| loop-unroll.NumCompletelyUnrolled \| 11032 \| 11029 \| -3 \| -0.03% \| 0.03% \| \| loop-unroll.NumUnrolled \| 12529 \| 12525 \| -4 \| -0.03% \| 0.03% \| \| mem2reg.NumDeadAlloca \| 10221 \| 10222 \| 1 \| 0.01% \| 0.01% \| \| mem2reg.NumPHIInsert \| 192106 \| 192073 \| -33 \| -0.02% \| 0.02% \| \| mem2reg.NumSingleStore \| 637643 \| 637652 \| 9 \| 0.00% \| 0.00% \| \| scalar-evolution.NumBruteForceTripCountsComputed \| 812 \| 814 \| 2 \| 0.25% \| 0.25% \| \| scalar-evolution.NumTripCountsComputed \| 282934 \| 282998 \| 64 \| 0.02% \| 0.02% \| \| scalar-evolution.NumTripCountsNotComputed \| 106718 \| 106691 \| -27 \| -0.03% \| 0.03% \| \| simple-loop-unswitch.NumBranches \| 4752 \| 5185 \| 433 \| 9.11% \| 9.11% \| \| simple-loop-unswitch.NumCostMultiplierSkipped \| 503 \| 925 \| 422 \| 83.90% \| 83.90% \| \| simple-loop-unswitch.NumSwitches \| 18 \| 20 \| 2 \| 11.11% \| 11.11% \| \| simple-loop-unswitch.NumTrivial \| 95 \| 179 \| 84 \| 88.42% \| 88.42% \| {F15983613} {F15983615} {F15983616} (this is vanilla llvm testsuite + rawspeed + darktable) As an example of the code where early LICM only is bad, see: https://godbolt.org/z/GzEbacs4K This does have an observable compile-time regression of +~0.5% geomean https://llvm-compile-time-tracker.com/compare.php?from=7c5222e4d1a3a14f029e5f614c9aefd0fa505f1e&to=5d81826c3411982ca26e46b9d0aff34c80577664&stat=instructions but i think that's basically nothing, and there's potential that it might be avoidable in the future by fixing clang to produce alignment information on function arguments, thus making the second run unneeded. Differential Revision: https://reviews.llvm.org/D99249	2021-04-02 11:11:42 +03:00
Philip Reames	a8ac8816c9	Update a test missed in `6ef4505`	2021-04-01 12:17:01 -07:00
Brendon Cahoon	65c8bfb509	[AMDGPU] Enable output modifiers for double precision instructions Update SIFoldOperands pass to recognize v_add_f64 and v_mul_f64 instructions for folding output modifiers. Differential Revision: https://reviews.llvm.org/D99505	2021-04-01 10:08:17 -04:00
Dmitry Preobrazhensky	cd953434f2	[AMDGPU][MC][GFX10][GFX90A] Corrected _e32/_e64 suffices Fixed bugs https://bugs.llvm.org//show_bug.cgi?id=49643, https://bugs.llvm.org//show_bug.cgi?id=49644, https://bugs.llvm.org//show_bug.cgi?id=49645. Differential Revision: https://reviews.llvm.org/D99413	2021-04-01 14:21:00 +03:00
Simonas Kazlauskas	777a58e05b	Support {S,U}REMEqFold before legalization This allows these optimisations to apply to e.g. `urem i16` directly before `urem` is promoted to i32 on architectures where i16 operations are not intrinsically legal (such as on Aarch64). The legalization then later can happen more directly and generated code gets a chance to avoid wasting time on computing results in types wider than necessary, in the end. Seems like mostly an improvement in terms of results at least as far as x86_64 and aarch64 are concerned, with a few regressions here and there. It also helps in preventing regressions in changes like {D87976}. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D88785	2021-04-01 01:35:41 +03:00
Jay Foad	b138cf115e	[AMDGPU] Add some image tests with enable-prt-strict-null disabled. NFC.	2021-03-31 17:27:20 +01:00
Jay Foad	a991ee330b	[AMDGPU] Use a common check prefix for some image tests. NFC.	2021-03-31 17:27:20 +01:00
Jay Foad	5d0e9ddfa5	[AMDGPU][GlobalISel] Add support for global atomicrmw fadd This includes gfx908 which only has a no-return version of the global_atomic_add_f32 instruction, using the same hack that was previously implemented for selecting from the llvm.amdgcn.global.atomic.fadd intrinsic. Differential Revision: https://reviews.llvm.org/D97767	2021-03-31 11:13:00 +01:00
Krasimir Georgiev	c51e91e046	Revert "[Passes] Add relative lookup table converter pass" This reverts commit `5178ffc7cf`. Compiling `llvm-profdata` with a compiler build from this produces a crashing binary.	2021-03-30 14:13:37 +02:00
Gulfem Savrun Yeniceri	5178ffc7cf	[Passes] Add relative lookup table converter pass Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in: https://bugs.llvm.org/show_bug.cgi?id=45244 This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly. Differential Revision: https://reviews.llvm.org/D94355	2021-03-29 21:53:32 +00:00
Joe Nash	45fd7c02af	Revert "[AMDGPU] Mark additional VOP3 as commutable" This reverts commit `d35d8da7d6`.	2021-03-29 14:48:11 -04:00
Joe Nash	d35d8da7d6	[AMDGPU] Mark additional VOP3 as commutable Note, only src0 and src1 will be commuted if the isCommutable flag is set. This patch does not change that, it just makes it possible to commute src0 and src1 of more instructions. Reviewed By: foad, rampitec Differential Revision: https://reviews.llvm.org/D99376 Change-Id: I61e20490962d95ea429beb355c55f55c024dafdc	2021-03-29 14:22:20 -04:00
Roger Ferrer Ibanez	489ca73ac4	[PrologEpilogInserter][AMDGPU] Only adjust offset for emergency spill slots if the stack grows down D89239 adjusts the stack offset of emergency spill slots for overaligned stacks. However the adjustment is not valid for targets whose stack grows up (such as AMDGPU). This change makes the adjustment conditional only to those targets whose stack grows down. Fixes https://bugs.llvm.org/show_bug.cgi?id=49686 Differential Revision: https://reviews.llvm.org/D99504	2021-03-29 17:26:58 +00:00
Petar Avramovic	b082e6f88a	[AMDGPU] Extend gfx10 test coverage. NFC. Differential Revision: https://reviews.llvm.org/D99267	2021-03-29 11:13:55 +02:00
Jay Foad	9d08f276d7	[AMDGPU] Use reductions instead of scans in the atomic optimizer If the result of an atomic operation is not used then it can be more efficient to build a reduction across all lanes instead of a scan. Do this for GFX10, where the permlanex16 instruction makes it viable. For wave64 this saves a couple of dpp operations. For wave32 it saves one readlane (which are generally bad for performance) and one dpp operation. Differential Revision: https://reviews.llvm.org/D98953	2021-03-26 15:38:14 +00:00
Gulfem Savrun Yeniceri	5fbe1fdf17	Revert "[Passes] Add relative lookup table converter pass" This reverts commit `5fd001a5ff` because it broke clang-with-thin-lto-ubuntu bot.	2021-03-24 18:59:33 +00:00
Gulfem Savrun Yeniceri	5fd001a5ff	[Passes] Add relative lookup table converter pass Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in: https://bugs.llvm.org/show_bug.cgi?id=45244 This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly. Differential Revision: https://reviews.llvm.org/D94355	2021-03-24 17:31:18 +00:00
Konstantin Zhuravlyov	f4ace63737	AMDGPU: Add target id and code object v4 support - Add target id support (https://clang.llvm.org/docs/ClangOffloadBundler.html#target-id) - Add code object v4 support (https://llvm.org/docs/AMDGPUUsage.html#elf-code-object) - Add kernarg_size to kernel descriptor - Change trap handler ABI to no longer move queue pointer into s[0:1] - Cleanup ELF definitions - Add V2, V3, V4 suffixes to make a clear distinction for code object version - Consolidate note names Differential Revision: https://reviews.llvm.org/D95638	2021-03-24 11:54:05 -04:00
alex-t	dccf83acf9	[AMDGPU] SIOptimizeExecMaskingPreRA should check constant bus constraint when folds EXEC copy Folding EXEC copy into it's single use may lead to constant bus constraint violation as it adds one more SGPR operand. This change makes it validate the user instruction with the new SGPR operand and only fold it if it is legal. Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D98888	2021-03-24 14:14:13 +03:00
Matt Arsenault	b24436ac96	GlobalISel: Lower funnel shifts	2021-03-23 09:11:17 -04:00

1 2 3 4 5 ...

4448 Commits