llvm-project

Commit Graph

Author	SHA1	Message	Date
Hiroshi Yamauchi	ed50e6060b	[PGO][PGSO] Enable size optimizations in code gen / target passes for cold code. Summary: Split off of D67120. Reviewers: davidxl Subscribers: hiraditya, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71288	2019-12-13 11:01:19 -08:00
Joerg Sonnenberger	9681ea9560	Reapply r374743 with a fix for the ocaml binding Add a pass to lower is.constant and objectsize intrinsics This pass lowers is.constant and objectsize intrinsics not simplified by earlier constant folding, i.e. if the object given is not constant or if not using the optimized pass chain. The result is recursively simplified and constant conditionals are pruned, so that dead blocks are removed even for -O0. This allows inline asm blocks with operand constraints to work all the time. The new pass replaces the existing lowering in the codegen-prepare pass and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert on the intrinsics. Differential Revision: https://reviews.llvm.org/D65280 llvm-svn: 374784	2019-10-14 16:15:14 +00:00
Dmitri Gribenko	1a21f98ac3	Revert "Add a pass to lower is.constant and objectsize intrinsics" This reverts commit r374743. It broke the build with Ocaml enabled: http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/19218 llvm-svn: 374768	2019-10-14 12:22:48 +00:00
Joerg Sonnenberger	e4300c392d	Add a pass to lower is.constant and objectsize intrinsics This pass lowers is.constant and objectsize intrinsics not simplified by earlier constant folding, i.e. if the object given is not constant or if not using the optimized pass chain. The result is recursively simplified and constant conditionals are pruned, so that dead blocks are removed even for -O0. This allows inline asm blocks with operand constraints to work all the time. The new pass replaces the existing lowering in the codegen-prepare pass and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert on the intrinsics. Differential Revision: https://reviews.llvm.org/D65280 llvm-svn: 374743	2019-10-13 23:00:15 +00:00
David Bolvansky	3d33e97be6	[NFC} Updated test llvm-svn: 372093	2019-09-17 09:45:52 +00:00
Fangrui Song	ac14f7b10c	[lit] Delete empty lines at the end of lit.local.cfg NFC llvm-svn: 363538	2019-06-17 09:51:07 +00:00
Sanjay Patel	c8d88ad1a9	[CodeGenPrepare][x86] shift both sides of a vector select when profitable This is based on the example/discussion in PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 Proper vector shift instructions don't appear until AVX2, so we may generate several extra instructions within a loop trying to compensate for that. It's difficult to recover from that shift expansion later than this, so use the existing TLI hook and splat analysis to enable better codegen. This extends CGP functionality introduced with: rL201655 Differential Revision: https://reviews.llvm.org/D63233 llvm-svn: 363511	2019-06-16 15:29:03 +00:00
Sanjay Patel	7ea378b940	[CodeGenPrepare] propagate debuginfo when copying a shuffle llvm-svn: 363409	2019-06-14 15:05:35 +00:00
Sanjay Patel	a1421e8347	[x86] add tests for vector shifts; NFC llvm-svn: 363203	2019-06-12 21:30:06 +00:00
Sanjay Patel	5ab41a7a05	[CodeGenPrepare] limit overflow intrinsic matching to a single basic block (2nd try) This is a subset of the original commit from rL359879 which was reverted because it could crash when using the 'RemovedInstructions' structure that enables delayed deletion of dead instructions. The motivating compile-time win does not require that change though. We should get most of that win from this change alone. Using/updating a dominator tree to match math overflow patterns may be very expensive in compile-time (because of the way CGP uses a DT), so just handle the single-block case. See post-commit thread for rL354298 for more details: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190422/646276.html Differential Revision: https://reviews.llvm.org/D61075 llvm-svn: 359969	2019-05-04 12:46:32 +00:00
Evgeniy Stepanov	46ec57e576	Revert "[CodeGenPrepare] limit overflow intrinsic matching to a single basic block" This reverts commit r359879, which introduced a compiler crash. llvm-svn: 359908	2019-05-03 17:31:49 +00:00
Sanjay Patel	8ff072e48e	[CodeGenPrepare] limit overflow intrinsic matching to a single basic block Using/updating a dominator tree to match math overflow patterns may be very expensive in compile-time (because of the way CGP uses a DT), so just handle the single-block case. Also, we were restarting the iterator loops when doing the overflow intrinsic transforms by marking the dominator tree for update. That was done to prevent iterating over a removed instruction. But we can postpone the deletion using the existing "RemovedInsts" structure, and that means we don't need to update the DT. See post-commit thread for rL354298 for more details: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190422/646276.html Differential Revision: https://reviews.llvm.org/D61075 llvm-svn: 359879	2019-05-03 13:09:18 +00:00
Eric Christopher	cee313d288	Revert "Temporarily Revert "Add basic loop fusion pass."" The reversion apparently deleted the test/Transforms directory. Will be re-reverting again. llvm-svn: 358552	2019-04-17 04:52:47 +00:00
Eric Christopher	a863435128	Temporarily Revert "Add basic loop fusion pass." As it's causing some bot failures (and per request from kbarton). This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda. llvm-svn: 358546	2019-04-17 02:12:23 +00:00
Sanjay Patel	d47eac59ef	[CodeGenPrepare] limit formation of overflow intrinsics (PR41129) This is probably a bigger limitation than necessary, but since we don't have any evidence yet that this transform led to real-world perf improvements rather than regressions, I'm making a quick, blunt fix. In the motivating x86 example from: https://bugs.llvm.org/show_bug.cgi?id=41129 ...and shown in the regression test, we want to avoid an extra instruction in the dominating block because that could be costly. The x86 LSR test diff is reversing the changes from D57789. There's no evidence that 1 version is any better than the other yet. Differential Revision: https://reviews.llvm.org/D59602 llvm-svn: 356665	2019-03-21 13:57:07 +00:00
Sanjay Patel	fb44f99b73	[CGP][x86] add tests for usubo regression (PR41129); NFC llvm-svn: 356559	2019-03-20 15:02:35 +00:00
Sanjay Patel	2c9275a790	[CGP] add another bailout for degenerate code (PR41064) This is almost the same as: rL355345 ...and should prevent any potential crashing from examples like: https://bugs.llvm.org/show_bug.cgi?id=41064 ...although the bug was masked by: rL355823 ...and I'm not sure how to repro the problem after that change. llvm-svn: 356218	2019-03-14 23:14:31 +00:00
Tim Northover	8935aca9c7	CodeGenPrep: preserve inbounds attribute when sinking GEPs. Targets can potentially emit more efficient code if they know address computations never overflow. For example ILP32 code on AArch64 (which only has 64-bit address computation) can ignore the possibility of overflow with this extra information. llvm-svn: 355926	2019-03-12 15:22:23 +00:00
Rong Xu	ce3be45cac	[CodeGenPrepare] Fix ModifiedDT flag in optimizeSelectInst r44412 fixed a huge compile time regression but it needed ModifiedDT flag to be maintained correctly in optimizations in optimizeBlock() and optimizeInst(). Function optimizeSelectInst() does not update the flag. This patch propagates the flag in optimizeSelectInst() back to optimizeBlock(). This patch also removes ModifiedDT in CodeGenPrepare class (which is not used). The property of ModifiedDT is now recorded in a ref parameter. Differential Revision: https://reviews.llvm.org/D59139 llvm-svn: 355751	2019-03-08 22:46:18 +00:00
Sanjay Patel	3b2d0bc7c2	[CodeGenPrepare] avoid crashing on non-canonical/degenerate code The test is reduced from an example in the post-commit thread for: rL354746 http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190304/632396.html While we must avoid dying here, the real question should be: Why is non-canonical and/or degenerate code making it to CGP when using the new pass manager? llvm-svn: 355345	2019-03-04 22:47:13 +00:00
Sanjay Patel	cb04ba032f	[CGP] add special-cases to form unsigned add with overflow (PR40486) There's likely a missed IR canonicalization for at least 1 of these patterns. Otherwise, we wouldn't have needed the pattern-matching enhancement in D57516. Note that -- unlike usubo added with D57789 -- the TLI hook for this transform defaults to 'on'. So if there's any perf fallout from this, targets should look at how they're lowering the uaddo node in SDAG and/or override that hook. The x86 diffs suggest that there's some missing pattern-matching for forming inc/dec. This should fix the remaining known problems in: https://bugs.llvm.org/show_bug.cgi?id=40486 https://bugs.llvm.org/show_bug.cgi?id=31754 llvm-svn: 354746	2019-02-24 15:31:27 +00:00
Sanjay Patel	973143ab79	[CGP] add tests for uaddo increment/decrement; NFC llvm-svn: 354699	2019-02-22 23:19:34 +00:00
Sanjay Patel	198cc305e9	[CGP] match a special-case of unsigned subtract overflow This is the 'sub0' (negate) pattern from PR31754: https://bugs.llvm.org/show_bug.cgi?id=31754 llvm-svn: 354519	2019-02-20 21:23:04 +00:00
Sanjay Patel	8d91faa481	[CGP][x86] add tests for usubo special-case; NFC This is another example from PR31754: https://bugs.llvm.org/show_bug.cgi?id=31754 llvm-svn: 354475	2019-02-20 15:40:58 +00:00
Sanjay Patel	d8b4efcb6b	[CGP] form usub with overflow from sub+icmp The motivating x86 cases for forming the intrinsic are shown in PR31754 and PR40487: https://bugs.llvm.org/show_bug.cgi?id=31754 https://bugs.llvm.org/show_bug.cgi?id=40487 ..and those are shown in the IR test file and x86 codegen file. Matching the usubo pattern is harder than uaddo because we have 2 independent values rather than a def-use. This adds a TLI hook that should preserve the existing behavior for uaddo formation, but disables usubo formation by default. Only x86 overrides that setting for now although other targets will likely benefit by forming usbuo too. Differential Revision: https://reviews.llvm.org/D57789 llvm-svn: 354298	2019-02-18 23:33:05 +00:00
Sanjay Patel	b696b771bf	[CGP] add test for unsigned subtract of 1 with overflow; NFC llvm-svn: 353179	2019-02-05 15:27:40 +00:00
Sanjay Patel	738e17b5df	[CGP] fix bogus test names/comments; NFC Inverted operand 0 and operand 1. llvm-svn: 353106	2019-02-04 22:37:05 +00:00
Sanjay Patel	1f9e23e3cc	[CGP] add tests for usubo; NFC llvm-svn: 353103	2019-02-04 22:27:08 +00:00
Sanjay Patel	c00bdab4c8	[CGP] use IRBuilder to simplify code This is no-functional-change-intended although there could be intermediate variations caused by a difference in the debug info produced by setting that from the builder's insertion point. I'm updating the IR test file associated with this code just to show that the naming differences from using the builder are visible. The motivation for adding a helper function is that we are likely to extend this code to deal with other overflow ops. llvm-svn: 353056	2019-02-04 16:30:46 +00:00
Sanjay Patel	84ceae6048	[CGP] adjust target constraints for forming uaddo There are 2 changes visible here: 1. There's no reason to limit this transform based on number of condition registers. That diff allows PPC to produce slightly better (dot-instructions should be generally good) code. Note: someone that cares about PPC codegen might want to look closer at that output because it seems like we could still improve this. 2. We (probably?) should not bother trying to form uaddo (or other overflow ops) when there's no target support for such an op. This goes beyond checking whether the op is expanded because both PPC and AArch64 show better codegen for standard types regardless of whether the op is legal/custom. llvm-svn: 353001	2019-02-03 17:53:09 +00:00
Sanjay Patel	837552fe9f	[PatternMatch] add special-case uaddo matching for increment-by-one (2nd try) This is the most important uaddo problem mentioned in PR31754: https://bugs.llvm.org/show_bug.cgi?id=31754 ...but that was overcome in x86 codegen with D57637. That patch also corrects the inc vs. add regressions seen with the previous attempt at this. Still, we want to make this matcher complete, so we can potentially canonicalize the pattern even if it's an 'add 1' operation. Pattern matching, however, shouldn't assume that we have canonicalized IR, so we match 4 commuted variants of uaddo. There's also a test with a crazy type to show that the existing CGP transform based on this matcher is not limited by target legality checks. I'm not sure if the Hexagon diff means the test is no longer testing what it intended to test, but that should be solvable in a follow-up. Differential Revision: https://reviews.llvm.org/D57516 llvm-svn: 352998	2019-02-03 16:16:48 +00:00
Sanjay Patel	b961bd68f0	[CGP] move test file to prevent bot failures The test specifiies the triple, so it needs to be in the x86 directory in case a bot has been configured without the x86 target. llvm-svn: 352992	2019-02-03 14:19:45 +00:00
David L. Jones	d81f23071c	Revert "Reapply "[CGP] Check for existing inttotpr before creating new one"" This change reverts r351626. The changes in r351626 cause quadratic work in several cases. (See r351626 thread on llvm-commits for details.) llvm-svn: 352722	2019-01-31 03:28:46 +00:00
Roman Tereshin	a0383d6c1f	Reapply "[CGP] Check for existing inttotpr before creating new one" Original commit: r351582 llvm-svn: 351626	2019-01-19 03:37:25 +00:00
Roman Tereshin	022bf3e8e7	Revert "Reapply "[CGP] Check for existing inttotpr before creating new one"" This reverts commit r351618. Compiler RT + ASAN tests are failing for PowerPC. Not sure how would I reproduce these on macOS, so reverting (again) until I do. llvm-svn: 351619	2019-01-19 01:53:26 +00:00
Roman Tereshin	dd6f9f68bb	Reapply "[CGP] Check for existing inttotpr before creating new one" Original commit: r351582 llvm-svn: 351618	2019-01-19 01:41:03 +00:00
Roman Tereshin	86ac532687	Revert "[CGP] Check for existing inttotpr before creating new one" This reverts commit r351582. Bots are failing. Reverting this to fix and re-commit later. llvm-svn: 351598	2019-01-18 21:38:44 +00:00
Roman Tereshin	85a0467a11	[CGP] Check for existing inttotpr before creating new one Make sure CodeGenPrepare doesn't emit multiple inttoptr instructions of the same integer value while sinking address computations, but rather CSEs them on the fly: excessive inttoptr's confuse SCEV into thinking that related pointers have nothing to do with each other. This problem blocks LoadStoreVectorizer from vectorizing some of the loads / stores in a downstream target. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D56838 llvm-svn: 351582	2019-01-18 20:13:42 +00:00
Vedant Kumar	4760686823	[CodeGenPrepare] Set debug loc when widening a switch condition Set a debug location on the cast instruction used to widen a switch condition. llvm-svn: 340379	2018-08-22 01:23:31 +00:00
Vedant Kumar	1e8a2c963c	[CodeGenPrepare] Set debug locations when splitting selects When splitting a select into a diamond, set debug locations on newly-created branch instructions and phi nodes. llvm-svn: 340371	2018-08-22 00:10:37 +00:00
Vedant Kumar	30406fd789	[CodeGenPrepare] Clean up dbg.value use-before-def as late as possible CodeGenPrepare has a strategy for moving dbg.values so that a value's definition always dominates its debug users. This cleanup was happening too early (before certain CGP transforms were run), resulting in some dbg.value use-before-def errors. Perform this cleanup as late as possible to avoid use-before-def. llvm-svn: 340370	2018-08-21 23:43:08 +00:00
Vedant Kumar	8d652b756e	[CodeGenPrepare] Pre-commit debug info test for optimizeSelectInst This test shows that optimizeSelectInst splits a select and sinks a `fdiv` operation to one side of the diamond. However, the dbg.value for the operation isn't moved. llvm-svn: 340369	2018-08-21 23:42:53 +00:00
Guozhi Wei	8c17f9a77d	[CodeGenPrepare] Add BothExtension type to PromotedInsts This patch fixes PR38125. Instruction extension types are recorded in PromotedInsts, it can be used later in function canGetThrough. If an instruction has two users with different extension types, it will be inserted into PromotedInsts two times in function promoteOperandForOther. The second one overwrites the first one, and the final extension type is wrong, later causes problem in canGetThrough. This patch changes the simple bool extension type to 2-bit enum type, add a BothExtension type in addition to zero/sign extension. When an user sees BothExtension for an instruction, it actually knows nothing about how that instruction is extended. Differential Revision: https://reviews.llvm.org/D49512 llvm-svn: 339822	2018-08-15 22:08:26 +00:00
Alina Sbirlea	dfd14adeb0	Generalize MergeBlockIntoPredecessor. Replace uses of MergeBasicBlockIntoOnlyPred. Summary: Two utils methods have essentially the same functionality. This is an attempt to merge them into one. 1. lib/Transforms/Utils/Local.cpp : MergeBasicBlockIntoOnlyPred 2. lib/Transforms/Utils/BasicBlockUtils.cpp : MergeBlockIntoPredecessor Prior to the patch: 1. MergeBasicBlockIntoOnlyPred Updates either DomTree or DeferredDominance Moves all instructions from Pred to BB, deletes Pred Asserts BB has single predecessor If address was taken, replace the block address with constant 1 (?) 2. MergeBlockIntoPredecessor Updates DomTree, LoopInfo and MemoryDependenceResults Moves all instruction from BB to Pred, deletes BB Returns if doesn't have a single predecessor Returns if BB's address was taken After the patch: Method 2. MergeBlockIntoPredecessor is attempting to become the new default: Updates DomTree or DeferredDominance, and LoopInfo and MemoryDependenceResults Moves all instruction from BB to Pred, deletes BB Returns if doesn't have a single predecessor Returns if BB's address was taken Uses of MergeBasicBlockIntoOnlyPred that need to be replaced: 1. lib/Transforms/Scalar/LoopSimplifyCFG.cpp Updated in this patch. No challenges. 2. lib/CodeGen/CodeGenPrepare.cpp Updated in this patch. i. eliminateFallThrough is straightforward, but I added using a temporary array to avoid the iterator invalidation. ii. eliminateMostlyEmptyBlock(s) methods also now use a temporary array for blocks Some interesting aspects: - Since Pred is not deleted (BB is), the entry block does not need updating. - The entry block was being updated with the deleted block in eliminateMostlyEmptyBlock. Added assert to make obvious that BB=SinglePred. - isMergingEmptyBlockProfitable assumes BB is the one to be deleted. - eliminateMostlyEmptyBlock(BB) does not delete BB on one path, it deletes its unique predecessor instead. - adding some test owner as subscribers for the interesting tests modified: test/CodeGen/X86/avx-cmp.ll test/CodeGen/AMDGPU/nested-loop-conditions.ll test/CodeGen/AMDGPU/si-annotate-cf.ll test/CodeGen/X86/hoist-spill.ll test/CodeGen/X86/2006-11-17-IllegalMove.ll 3. lib/Transforms/Scalar/JumpThreading.cpp Not covered in this patch. It is the only use case using the DeferredDominance. I would defer to Brian Rzycki to make this replacement. Reviewers: chandlerc, spatel, davide, brzycki, bkramer, javed.absar Subscribers: qcolombet, sanjoy, nemanjai, nhaehnle, jlebar, tpr, kbarton, RKSimon, wmi, arsenm, llvm-commits Differential Revision: https://reviews.llvm.org/D48202 llvm-svn: 335183	2018-06-20 22:01:04 +00:00
Guozhi Wei	c4c6b548c5	[CodeGenPrepare] Move Extension Instructions Through Logical And Shift Instructions CodeGenPrepare pass move extension instructions close to load instructions in different BB, so they can be combined later. But the extension instructions can't move through logical and shift instructions in current implementation. This patch enables this enhancement, so we can eliminate more extension instructions. Differential Revision: https://reviews.llvm.org/D45537 This is re-commit of r331783, which was reverted by r333305. The performance regression was caused by some unlucky alignment, not a code generation problem. llvm-svn: 334049	2018-06-05 21:03:52 +00:00
Guozhi Wei	03005d3f03	[CodeGenPrepare] Revert r331783 The patch r331783 caused regression in one of our internal application. So revert it now, will investigate it further. llvm-svn: 333305	2018-05-25 20:30:26 +00:00
Shiva Chen	2c864551df	[DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label. In order to set breakpoints on labels and list source code around labels, we need collect debug information for labels, i.e., label name, the function label belong, line number in the file, and the address label located. In order to keep these information in LLVM IR and to allow backend to generate debug information correctly. We create a new kind of metadata for labels, DILabel. The format of DILabel is !DILabel(scope: !1, name: "foo", file: !2, line: 3) We hope to keep debug information as much as possible even the code is optimized. So, we create a new kind of intrinsic for label metadata to avoid the metadata is eliminated with basic block. The intrinsic will keep existing if we keep it from optimized out. The format of the intrinsic is llvm.dbg.label(metadata !1) It has only one argument, that is the DILabel metadata. The intrinsic will follow the label immediately. Backend could get the label metadata through the intrinsic's parameter. We also create DIBuilder API for labels to be used by Frontend. Frontend could use createLabel() to allocate DILabel objects, and use insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR. Differential Revision: https://reviews.llvm.org/D45024 Patch by Hsiangkai Wang. llvm-svn: 331841	2018-05-09 02:40:45 +00:00
Guozhi Wei	1aea95a9ea	[CodeGenPrepare] Move Extension Instructions Through Logical And Shift Instructions CodeGenPrepare pass move extension instructions close to load instructions in different BB, so they can be combined later. But the extension instructions can't move through logical and shift instructions in current implementation. This patch enables this enhancement, so we can eliminate more extension instructions. Differential Revision: https://reviews.llvm.org/D45537 llvm-svn: 331783	2018-05-08 17:58:32 +00:00
Serguei Katkov	a20e05bb94	[CGP] Fix the remove of matched phis in complex addressing mode When we replace the Phi we created with matched ones it is possible that there are two identical phi nodes in IR. And matcher is smart enough to find that new created phi matches both of them. So we try to replace our phi node with matched ones twice and what is bad we delete our phi node twice causing a crash. As soon as we found that we have two identical Phi nodes it makes sense to do a clean-up and replace one phi node by other one. The patch implements it. Reviewers: john.brawn, reames Reviewed By: john.brawn Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43758 llvm-svn: 327250	2018-03-12 03:50:07 +00:00
Simon Pilgrim	073f089c6e	[X86][XOP] Update isVectorShiftByScalarCheap with cases covered by XOP Similar to D42437, XOP supports variable shift for v16i8/v8i16/v4i32/v2i64 types. Differential Revision: https://reviews.llvm.org/D42526 llvm-svn: 323797	2018-01-30 18:10:21 +00:00

1 2 3

124 Commits