llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjay Patel	962a8431ea	[InstSimplify] allow folds for bool vector div/rem llvm-svn: 297411	2017-03-09 21:56:03 +00:00
Sanjay Patel	7e56366204	[ConstantFold] vector div/rem with any zero element in divisor is undef Follow-up for: https://reviews.llvm.org/D30665 https://reviews.llvm.org/rL297390 llvm-svn: 297409	2017-03-09 20:42:30 +00:00
Matt Arsenault	efe949cc67	AMDGPU: Support for SimplifyDemandedVectorElts for load intrinsics llvm-svn: 297408	2017-03-09 20:34:27 +00:00
Sanjay Patel	bb47616aef	[InstSimplify] add tests for vector constant folding div/rem-by-0; NFC llvm-svn: 297407	2017-03-09 20:31:20 +00:00
Sanjay Patel	2b1f6f4b92	[InstSimplify] vector div/rem with any zero element in divisor is undef This was suggested as a DAG simplification in the review for rL297026 : http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170306/435253.html ...but let's start with IR since we have actual docs for IR (LangRef). Differential Revision: https://reviews.llvm.org/D30665 llvm-svn: 297390	2017-03-09 16:20:52 +00:00
Chandler Carruth	20e588e1af	[PM/Inliner] Make the new PM's inliner process call edges across an entire SCC before iterating on newly-introduced call edges resulting from any inlined function bodies. This more closely matches the behavior of the old PM's inliner. While it wasn't really clear to me initially, this behavior is actually essential to the inliner behaving reasonably in its current design. Because the inliner is fundamentally a bottom-up inliner and all of its cost modeling is designed around that it often runs into trouble within an SCC where we don't have any meaningful bottom-up ordering to use. In addition to potentially cyclic, infinite inlining that we block with the inline history mechanism, it can also take seemingly simple call graph patterns within an SCC and turn them into insanely large functions by accidentally working top-down across the SCC without any of the threshold limitations that traditional top-down inliners use. Consider this diabolical monster.cpp file that Richard Smith came up with to help demonstrate this issue: ``` template <int N> extern const char str; void g(const char ); template <bool K, int N> void f(bool B, bool E) { if (K) g(str<N>); if (B == E) return; if (B) f<true, N + 1>(B + 1, E); else f<false, N + 1>(B + 1, E); } template <> void f<false, MAX>(bool B, bool E) { return f<false, 0>(B, E); } template <> void f<true, MAX>(bool B, bool E) { return f<true, 0>(B, E); } extern bool arr, end; void test() { f<false, 0>(arr, end); } ``` When compiled with '-DMAX=N' for various values of N, this will create an SCC with a reasonably large number of functions. Previously, the inliner would try to exhaust the inlining candidates in a single function before moving on. This, unfortunately, turns it into a top-down inliner within the SCC. Because our thresholds were never built for that, we will incrementally decide that it is always worth inlining and proceed to flatten the entire SCC into that one function. What's worse, we'll then proceed to the next function, and do the exact same thing except we'll skip the first function, and so on. And at each step, we'll also make some of the constant factors larger, which is awesome. The fix in this patch is the obvious one which makes the new PM's inliner use the same technique used by the old PM: consider all the call edges across the entire SCC before beginning to process call edges introduced by inlining. The result of this is essentially to distribute the inlining across the SCC so that every function incrementally grows toward the inline thresholds rather than allowing the inliner to grow one of the functions vastly beyond the threshold. The code for this is a bit awkward, but it works out OK. We could consider in the future doing something more powerful here such as prioritized order (via lowest cost and/or profile info) and/or a code-growth budget per SCC. However, both of those would require really substantial work both to design the system in a way that wouldn't break really useful abstraction decomposition properties of the current inliner and to be tuned across a reasonably diverse set of code and workloads. It also seems really risky in many ways. I have only found a single real-world file that triggers the bad behavior here and it is generated code that has a pretty pathological pattern. I'm not worried about the inliner not doing an awesome* job here as long as it does ok. On the other hand, the cases that will be tricky to get right in a prioritized scheme with a budget will be more common and idiomatic for at least some frontends (C++ and Rust at least). So while these approaches are still really interesting, I'm not in a huge rush to go after them. Staying even closer to the existing PM's behavior, especially when this easy to do, seems like the right short to medium term approach. I don't really have a test case that makes sense yet... I'll try to find a variant of the IR produced by the monster template metaprogram that is both small enough to be sane and large enough to clearly show when we get this wrong in the future. But I'm not confident this exists. And the behavior change here should be unobservable without snooping on debug logging. So there isn't really much to test. The test case updates come from two incidental changes: 1) We now visit functions in an SCC in the opposite order. I don't think there really is a "right" order here, so I just update the test cases. 2) We no longer compute some analyses when an SCC has no call instructions that we consider for inlining. llvm-svn: 297374	2017-03-09 11:35:40 +00:00
Peter Collingbourne	0152c8156b	WholeProgramDevirt: Implement importing for uniform ret val opt. Differential Revision: https://reviews.llvm.org/D29854 llvm-svn: 297350	2017-03-09 01:11:15 +00:00
Peter Collingbourne	6d284fab20	WholeProgramDevirt: Implement importing for single-impl devirtualization. Differential Revision: https://reviews.llvm.org/D29844 llvm-svn: 297333	2017-03-09 00:21:25 +00:00
Changpeng Fang	1be9b9f816	AMDGPU/SI: Disable unrolling in the loop vectorizer if the loop is not vectorized. Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D30719 llvm-svn: 297328	2017-03-09 00:07:00 +00:00
Evgeniy Stepanov	8537d9994d	Don't merge global constants with non-dbg metadata. !type metadata can not be dropped. An alternative to this is adding !type metadata from the replaced globals to the replacement, but that may weaken type tests and make them slower at the same time. The merged global gets !dbg metadata from replaced globals, and can end up with multiple debug locations. llvm-svn: 297327	2017-03-09 00:03:37 +00:00
Matthew Simpson	3388de1349	[LV] Select legal insert point when fixing first-order recurrences Because IRBuilder performs constant-folding, it's not guaranteed that an instruction in the original loop map to an instruction in the vector loop. It could map to a constant vector instead. The handling of first-order recurrences was incorrectly making this assumption when setting the IRBuilder's insert point. llvm-svn: 297302	2017-03-08 18:18:20 +00:00
Matthew Simpson	8966848d17	[LV] Make the test case for PR30183 less fragile This patch also renames the PR number the test points to. The previous reference was PR29559, but that bug was somehow deleted and recreated under PR30183. llvm-svn: 297295	2017-03-08 17:03:38 +00:00
Matthew Simpson	903dd5aa9b	[LV] Add missing check labels to tests and reformat llvm-svn: 297294	2017-03-08 16:55:34 +00:00
Jun Bum Lim	ac170872b2	[JumpThread] Use AA in SimplifyPartiallyRedundantLoad() Summary: Use AA when scanning to find an available load value. Reviewers: rengolin, mcrosier, hfinkel, trentxintong, dberlin Reviewed By: rengolin, dberlin Subscribers: aemerson, dberlin, llvm-commits Differential Revision: https://reviews.llvm.org/D30352 llvm-svn: 297284	2017-03-08 15:22:30 +00:00
Sanjay Patel	62906af379	[InstCombine] avoid crashing on shuffle shrinkage when input type is not same as result type llvm-svn: 297280	2017-03-08 15:02:23 +00:00
Sam Parker	0f4db38c20	[LoopRotate] Propagate dbg.value intrinsics Recommitting patch which was previously reverted in r297159. These changes should address the casting issues. The original patch enables dbg.value intrinsics to be attached to newly inserted PHI nodes. Differential Review: https://reviews.llvm.org/D30701 llvm-svn: 297269	2017-03-08 09:56:22 +00:00
Sebastian Pop	4a4d245b19	Handle UnreachableInst in isGuaranteedToTransferExecutionToSuccessor A block with an UnreachableInst does not transfer execution to a successor. The problem was exposed by GVN-hoist. This patch fixes bug 32153. Patch by Aditya Kumar. Differential Revision: https://reviews.llvm.org/D30667 llvm-svn: 297254	2017-03-08 01:54:50 +00:00
Sanjay Patel	fe9705149b	[InstCombine] shrink truncated insertelement into undef vector This is the 2nd part of solving: http://lists.llvm.org/pipermail/llvm-dev/2017-February/110293.html D30123 moves the trunc ahead of the shuffle, and this moves the trunc ahead of the insertelement. We're limiting this transform to undef rather than any constant to avoid backend problems. Differential Revision: https://reviews.llvm.org/D30137 llvm-svn: 297242	2017-03-07 23:27:14 +00:00
Evgeniy Stepanov	7a5cfa9a11	Fix one-after-the-end type metadata handling in globalsplit. Itanium ABI may have an address point one byte after the end of a vtable. When such vtable global is split, the !type metadata needs to follow the right vtable. Differential Revision: https://reviews.llvm.org/D30716 llvm-svn: 297236	2017-03-07 22:18:48 +00:00
Sanjay Patel	53fa17a014	[InstCombine] shrink truncated splat shuffle (2nd try) This was committed at r297155 and reverted at r297166 because of an over-reaching clang test. That should be fixed with r297189. This is one part of solving a recent bug report: http://lists.llvm.org/pipermail/llvm-dev/2017-February/110293.html This keeps with our general approach: changing arbitrary shuffles is off-limts, but changing splat is ok. The transform is very similar to the existing shrinkBitwiseLogic() canonicalization. Differential Revision: https://reviews.llvm.org/D30123 llvm-svn: 297232	2017-03-07 21:45:16 +00:00
Gor Nishanov	c52006ab09	[coroutines] Add handling for unwind coro.ends Summary: The purpose of coro.end intrinsic is to allow frontends to mark the cleanup and other code that is only relevant during the initial invocation of the coroutine and should not be present in resume and destroy parts. In landing pads coro.end is replaced with an appropriate instruction to unwind to caller. The handling of coro.end differs depending on whether the target is using landingpad or WinEH exception model. For landingpad based exception model, it is expected that frontend uses the `coro.end`_ intrinsic as follows: ``` ehcleanup: %InResumePart = call i1 @llvm.coro.end(i8* null, i1 true) br i1 %InResumePart, label %eh.resume, label %cleanup.cont cleanup.cont: ; rest of the cleanup eh.resume: %exn = load i8, i8* %exn.slot, align 8 %sel = load i32, i32* %ehselector.slot, align 4 %lpad.val = insertvalue { i8, i32 } undef, i8 %exn, 0 %lpad.val29 = insertvalue { i8, i32 } %lpad.val, i32 %sel, 1 resume { i8, i32 } %lpad.val29 ``` The `CoroSpit` pass replaces `coro.end` with ``True`` in the resume functions, thus leading to immediate unwind to the caller, whereas in start function it is replaced with ``False``, thus allowing to proceed to the rest of the cleanup code that is only needed during initial invocation of the coroutine. For Windows Exception handling model, a frontend should attach a funclet bundle referring to an enclosing cleanuppad as follows: ``` ehcleanup: %tok = cleanuppad within none [] %unused = call i1 @llvm.coro.end(i8* null, i1 true) [ "funclet"(token %tok) ] cleanupret from %tok unwind label %RestOfTheCleanup ``` The `CoroSplit` pass, if the funclet bundle is present, will insert ``cleanupret from %tok unwind to caller`` before the `coro.end`_ intrinsic and will remove the rest of the block. Reviewers: majnemer Reviewed By: majnemer Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D25543 llvm-svn: 297223	2017-03-07 21:00:54 +00:00
Matthew Simpson	c86b2134c7	[LV] Consider users that are memory accesses in uniforms expansion step When expanding the set of uniform instructions beyond the seed instructions (e.g., consecutive pointers), we mark a new instruction uniform if all its loop-varying users are uniform. We should also allow users that are consecutive or interleaved memory accesses. This fixes cases where we have an instruction that is used as the pointer operand of a consecutive access but also used by a non-memory instruction that later becomes uniform as part of the expansion. llvm-svn: 297179	2017-03-07 18:47:30 +00:00
Sanjay Patel	6d30606168	revert r297155 because there's a clang test that depends on InstCombine: tools/clang/test/CodeGen/zvector.c llvm-svn: 297166	2017-03-07 17:41:45 +00:00
Adrian Prantl	d4056501fb	Revert "Strip debug info when inlining into a nodebug function." This reverts commit r296488. As noted by David Blaikie on llvm-commits, I overlooked the case of a debug function being inlined into a nodebug function being inlined into a debug function. llvm-svn: 297163	2017-03-07 17:28:57 +00:00
Nico Weber	3b2f0094d7	Revert r297132, it caused PR32171 llvm-svn: 297159	2017-03-07 17:23:52 +00:00
Sanjay Patel	defdb7bed5	[InstCombine] shrink truncated splat shuffle This is one part of solving a recent bug report: http://lists.llvm.org/pipermail/llvm-dev/2017-February/110293.html This keeps with our general approach: changing arbitrary shuffles is off-limts, but changing splat is ok. The transform is very similar to the existing shrinkBitwiseLogic() canonicalization. Differential Revision: https://reviews.llvm.org/D30123 llvm-svn: 297155	2017-03-07 16:10:36 +00:00
Sam Parker	6ec5fdbc94	[LoopRotate] Update dbg.value intrinsics Propagate debug info through the newly inserted PHI nodes. Differential Revision: https://reviews.llvm.org/D30190 llvm-svn: 297132	2017-03-07 09:34:25 +00:00
Sanjoy Das	30c3538e2e	[LoopUnrolling] Fix loop size check for peeling Summary: We should check if loop size allows us to peel at least one iteration before we do so. Patch by Max Kazantsev! Reviewers: sanjoy, mkuper, efriedma Reviewed By: mkuper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30632 llvm-svn: 297122	2017-03-07 06:03:15 +00:00
Michael Kuperstein	768d013a03	[SLP] Revert r296863 due to miscompiles. Details and reproducer are on the email thread for r296863. llvm-svn: 297103	2017-03-06 23:54:51 +00:00
Daniel Berlin	961b002714	NewGVN: We were not really failing this testcase, because the instructions it was looking for are unused. GVN value numbers unused instructions, NewGVN does not. Fix the instructions to be used, so we eliminate the redundancies it's checking for, and un-XFAIL it llvm-svn: 297058	2017-03-06 20:01:31 +00:00
Sanjay Patel	3bbee79d9e	[InstSimplify] add tests for vector div/rem with UB potential; NFC llvm-svn: 297048	2017-03-06 18:45:39 +00:00
Sanjay Patel	c494239bd8	[InstSimplify] regenerate checks; NFC llvm-svn: 297040	2017-03-06 18:13:01 +00:00
Dehao Chen	c632a393b7	Remove the sample pgo annotation heuristic that uses call count to annotate basic block count. Summary: We do not need that special handling because the debug info is more accurate now. Performance testing shows no regression on google internal benchmarks. Reviewers: davidxl, aprantl Reviewed By: aprantl Subscribers: llvm-commits, aprantl Differential Revision: https://reviews.llvm.org/D30658 llvm-svn: 297038	2017-03-06 17:49:59 +00:00
Alexey Bataev	d7db344c17	[SLP] A test for vectorization of users of extractelement instructions, NFC. llvm-svn: 297024	2017-03-06 16:26:00 +00:00
Evgeny Stupachenko	e4b0813d62	Add test missed in r296770. Differential Revision: http://reviews.llvm.org/D27004 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 296962	2017-03-04 05:20:02 +00:00
Evgeny Stupachenko	d6aa0d02c2	Set option enabling LSR alternative way to resolve complex solution to false. Differential Revision: http://reviews.llvm.org/D29862 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 296959	2017-03-04 03:14:05 +00:00
Peter Collingbourne	77a8d563a3	WholeProgramDevirt: Implement exporting for uniform ret val opt. Differential Revision: https://reviews.llvm.org/D29846 llvm-svn: 296948	2017-03-04 01:34:53 +00:00
Peter Collingbourne	2325bb34c1	WholeProgramDevirt: Implement exporting for single-impl devirtualization. Differential Revision: https://reviews.llvm.org/D29811 llvm-svn: 296945	2017-03-04 01:31:01 +00:00
Peter Collingbourne	b406baaeef	WholeProgramDevirt: Add any unsuccessful llvm.type.checked.load devirtualizations to the list of llvm.type.test users. Any unsuccessful llvm.type.checked.load devirtualizations will be translated into uses of llvm.type.test, so we need to add the resulting llvm.type.test intrinsics to the function summaries so that the LowerTypeTests pass will export them. Differential Revision: https://reviews.llvm.org/D29808 llvm-svn: 296939	2017-03-04 01:23:30 +00:00
Sanjoy Das	664c925a57	[LoopUnrolling] Peel loops with invariant backedge Phi input Summary: If a loop contains a Phi node which has an invariant input from back edge, it is profitable to peel such loops (rather than unroll them) to use the advantage that this Phi is always invariant starting from 2nd iteration. After the 1st iteration is peeled, other optimizations can potentially simplify calculations with this invariant. Patch by Max Kazantsev! Reviewers: sanjoy, apilipenko, igor-laevsky, anna, mkuper, reames Reviewed By: mkuper Subscribers: mkuper, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D30161 llvm-svn: 296898	2017-03-03 18:19:15 +00:00
Benjamin Kramer	9528f8c2fb	Revert "Re-apply "[GVNHoist] Move GVNHoist to function simplification part of pipeline."" This reverts commit r296759. Miscompiles bash. llvm-svn: 296872	2017-03-03 14:27:53 +00:00
Mohammad Shahid	bdac9f30c0	[SLP] Fixes the bug due to absence of in order uses of scalars which needs to be available for VectorizeTree() API.This API uses it for proper mask computation to be used in shufflevector IR. The fix is to compute the mask for out of order memory accesses while building the vectorizable tree instead of actual vectorization of vectorizable tree.It also needs to recompute the proper Lane for external use of vectorizable scalars based on shuffle mask. Reviewers: mkuper Differential Revision: https://reviews.llvm.org/D30159 Change-Id: Ide8773ce0ad3562f3cf4d1a0ad0f487e2f60ce5d llvm-svn: 296863	2017-03-03 10:02:47 +00:00
Peter Collingbourne	3baa72af7d	ThinLTOBitcodeWriter: Do not follow operand edges of type GlobalValue when looking for virtual functions. Such edges may otherwise result in infinite recursion if a pointer to a vtable is reachable from the vtable itself. This can happen in practice if a TU defines the ABI types used to implement RTTI, and is itself compiled with RTTI. Fixes PR32121. llvm-svn: 296839	2017-03-02 23:10:17 +00:00
Nikolai Bozhenov	4a04fb9e90	[BypassSlowDivision] Use ValueTracking to simplify run-time checks ValueTracking is used for more thorough analysis of operands. Based on the analysis, either run-time checks can be simplified (e.g. check only one operand instead of two) or the transformation can be avoided. For example, it is quite often the case that a divisor is promoted from a shorter type and run-time checks for it are redundant. With additional compile-time analysis of values, two special cases naturally arise and are addressed by the patch: 1) Both operands are known to be short enough. Then, the long division can be simply replaced with a short one without CFG modification. 2) If a division is unsigned and the dividend is known to be short then the long division is not needed at all. Because if the divisor is too big for short division then the quotient is obviously zero (and the remainder is equal to the dividend). Actually, the division is not needed when (divisor > dividend). Differential Revision: https://reviews.llvm.org/D29897 llvm-svn: 296832	2017-03-02 22:12:15 +00:00
Tobias Grosser	f818c3300b	Revert "Fix PR 24415 (at least), by making our post-dominator tree behavior sane." and also "clang-format GenericDomTreeConstruction.h, since the current formatting makes it look like their is a bug in the loop indentation, and there is not" This reverts commit r296535. There are still some open design questions which I would like to discuss. I revert this for Daniel (who gave the OK), as he is on vacation. llvm-svn: 296812	2017-03-02 21:08:37 +00:00
Evgeny Stupachenko	d655ec56c3	The patch fixes r296770 Summary: Extend -unroll-partial-threshold to 200 for runtime-loop3.ll test as epilogue unroll initially add 1 more IV to the loop. From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 296803	2017-03-02 19:41:38 +00:00
Evgeny Stupachenko	21bef2cb3c	The patch turns on epilogue unroll for loops with constant recurency start. Summary: Set unroll remainder to epilog if a loop contains a phi with constant parameter: loop: pn = phi [Const, PreHeader], [pn.next, Latch] ... Reviewer: hfinkel Differential Revision: http://reviews.llvm.org/D27004 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 296770	2017-03-02 17:38:46 +00:00
Geoff Berry	484d756583	Re-apply "[GVNHoist] Move GVNHoist to function simplification part of pipeline." This re-applies r289696, which caused TSan perf regression, which has since been addressed in separate changes (see PR for details). See PR31382. llvm-svn: 296759	2017-03-02 16:16:47 +00:00
Bjorn Pettersson	e5027cfbcc	[InstCombine] Avoid faulty combines of select-cmp-br Summary: When InstCombine is optimizing certain select-cmp-br patterns it replaces the result of the select in uses outside of the basic block containing the select. This is only legal if the path from the select to the outside use is disjoint from all other paths out from the originating basic block. The problem found was that InstCombiner::replacedSelectWithOperand did not consider the case when both edges out from the br pointed to the same label. In that case the paths aren't disjoint and the transformation is illegal. This patch avoids the faulty rewrites by verifying that there is a single flow to the successor where we want to replace uses. Reviewers: llvm-commits, spatel, majnemer Differential Revision: https://reviews.llvm.org/D30455 llvm-svn: 296752	2017-03-02 15:18:58 +00:00
Matthew Simpson	aee9771ae2	[ARM/AArch64] Update costs for interleaved accesses with wide types After r296750, we're able to match interleaved accesses having types wider than 128 bits. This patch updates the associated TTI costs. Differential Revision: https://reviews.llvm.org/D29675 llvm-svn: 296751	2017-03-02 15:15:35 +00:00

1 2 3 4 5 ...

8494 Commits