llvm-project

Commit Graph

Author	SHA1	Message	Date
David L. Jones	5ff7b8a04a	Revert r347934 "[SCEV] Guard movement of insertion point for loop-invariants" This change caused SEGVs in instcombine. (The r347934 change seems to me to be a precipitating cause, not a root cause. Details are on the llvm-commits thread for r347934.) llvm-svn: 348426	2018-12-05 23:13:50 +00:00
Sanjay Patel	998ececef0	[InstCombine] remove dead code from visitExtractElement Extracting from a splat constant is always handled by InstSimplify. Move the test for this from InstCombine to InstSimplify to make sure that stays true. llvm-svn: 348423	2018-12-05 23:09:33 +00:00
Sanjay Patel	de3db684b7	[InstCombine] add/move tests for extractelement; NFC llvm-svn: 348417	2018-12-05 21:56:13 +00:00
Vedant Kumar	09415a850e	[CodeExtractor] Do not marked outlined calls which may resume EH as noreturn Treat terminators which resume exception propagation as returning instructions (at least, for the purposes of marking outlined functions `noreturn`). This is to avoid inserting traps after calls to outlined functions which unwind. rdar://46129950 llvm-svn: 348404	2018-12-05 19:35:37 +00:00
Christian Bruel	4ead99b3ac	Allow norecurse attribute on functions that have debug infos. Summary: debug intrinsics might be marked norecurse to enable the caller function to be norecurse and optimized if needed. This avoids code gen optimisation differences when -g is used, as in globalOpt.cpp:processInternalGlobal checks. Reviewers: chandlerc, jmolloy, aprantl Reviewed By: aprantl Subscribers: aprantl, llvm-commits Differential Revision: https://reviews.llvm.org/D55187 llvm-svn: 348381	2018-12-05 16:48:00 +00:00
Sanjay Patel	baffae91b2	[InstCombine] simplify icmps with same operands based on dominating cmp The tests here are based on the motivating cases from D54827. More background: 1. We don't get these cases in general with SimplifyCFG because the root of the pattern match is an icmp, not a branch. I'm not sure how often we encounter this pattern vs. the seemingly more likely case with branches, but I don't see evidence to leave the minimal pattern unoptimized. 2. This has a chance of increasing compile-time because we're using a ValueTracking call to handle the match. The motivating cases could be handled with a simpler pair of calls to isImpliedTrueByMatchingCmp/ isImpliedFalseByMatchingCmp, but I saw that we have a more comprehensive wrapper around those, so we might as well use it here unless there's evidence that it's significantly slower. 3. Ideally, we'd handle the fold to constants in InstSimplify, but as with the existing code here, we could extend this to handle cases where the result is not a constant, but a new combined predicate. That would mean splitting the logic across the 2 passes and possibly duplicating the pattern-matching cost. 4. As mentioned in D54827, this seems like the kind of thing that should be handled in Correlated Value Propagation, but that pass is currently limited to dealing with instructions with constant operands, so extending this bit of InstCombine is the smallest/easiest way to get these patterns optimized. llvm-svn: 348367	2018-12-05 15:04:00 +00:00
Max Kazantsev	594cb55686	[NFC] Verify memoryssa in test for PR39783 llvm-svn: 348333	2018-12-05 05:20:08 +00:00
Sanjay Patel	1df1facae2	[InstCombine] add tests for implied simplifications; NFC Ideally, we would fold all of these in InstSimplify in a similar way to rL347896, but this is a bit awkward when we're trying to simplify a compare directly because the ValueTracking API expects the compare as an input, but in InstSimplify, we just have the operands of the compare. Given that we can do transforms besides just simplifications, we might as well just extend the code in InstCombine (which already does simplifications with constant operands). llvm-svn: 348312	2018-12-04 22:25:33 +00:00
Sanjay Patel	882555628b	[InstCombine] auto-generate full checks for icmp overflow tests; NFC llvm-svn: 348274	2018-12-04 15:41:34 +00:00
Sanjay Patel	320cf5dde5	[InstCombine] auto-generate full checks for icmp dominator tests; NFC llvm-svn: 348270	2018-12-04 15:00:35 +00:00
Simon Pilgrim	924f98e579	Add common check prefix. NFCI. llvm-svn: 348265	2018-12-04 14:32:42 +00:00
Alina Sbirlea	a2eebb828e	Update MemorySSA in SimpleLoopUnswitch. Summary: Teach SimpleLoopUnswitch to preserve MemorySSA. Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits Differential Revision: https://reviews.llvm.org/D47022 llvm-svn: 348263	2018-12-04 14:23:37 +00:00
Vedant Kumar	d129569e34	[CodeExtractor] Split PHI nodes with incoming values from outlined region (PR39433) If a PHI node out of extracted region has multiple incoming values from it, split this PHI on two parts. First PHI has incomings only from region and extracts with it (they are placed to the separate basic block that added to the list of outlined), and incoming values in original PHI are replaced by first PHI. Similar solution is already used in CodeExtractor for PHIs in entry block (severSplitPHINodes method). It covers PR39433 bug. Patch by Sergei Kachkov! Differential Revision: https://reviews.llvm.org/D55018 llvm-svn: 348205	2018-12-03 22:40:21 +00:00
Sanjay Patel	8c65515082	[InstCombine] fix undef propagation bug with shuffle+binop When we have a shuffle that extends a source vector with undefs and then do some binop on that, we must make sure that the extra elements remain undef with that binop if we reverse the order of the binop and shuffle. 'or' is probably the easiest example to show the bug because 'or C, undef --> -1' (not undef). But there are other opcode/constant combinations where this is true as shown by the 'shl' test. llvm-svn: 348191	2018-12-03 21:15:17 +00:00
Roman Lebedev	7bf2fed167	[InstCombine] foldICmpWithLowBitMaskedVal(): disable 2 faulty folds. These two folds are invalid for this non-constant pattern when the mask ends up being all-ones: https://rise4fun.com/Alive/9au https://rise4fun.com/Alive/UcQM Fixes https://bugs.llvm.org/show_bug.cgi?id=39861 llvm-svn: 348181	2018-12-03 20:07:58 +00:00
Sanjay Patel	3e66d81ec6	[InstCombine] add tests for shuffle+binop fold; NFC llvm-svn: 348173	2018-12-03 19:41:21 +00:00
Sanjay Patel	8918a511a1	[SimplifyCFG] add tests for cross block compare folding; NFC These are the baseline tests for D54827. Patch based on code originally written by: @yinyuefengyi (luo xionghu) Differential Revision: https://reviews.llvm.org/D54994 llvm-svn: 348151	2018-12-03 16:55:29 +00:00
Michal Gorny	ff13c24cfe	[test] Fix use of 'sort -b' in SimpleLoopUnswitch on NetBSD Add '-k 1' to 'sort -b' calls in SimpleLoopUnswitch tests, as required for sort implementation on NetBSD. The '-b' modifier is ineffective if specified without any key. Per the manpage: Note that the -b option has no effect unless key fields are specified. Differential Revision: https://reviews.llvm.org/D55168 llvm-svn: 348097	2018-12-02 16:49:33 +00:00
Simon Pilgrim	102854f4d4	[TTI] Reduction costs only need to include a single extract element cost (REAPPLIED) We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type. For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed. Fixes PR37731 Rebased and reapplied after being reverted in rL347541 due to PR39774 - which was fixed by D54955/rL347759 and D55017/rL347997 Differential Revision: https://reviews.llvm.org/D54585 llvm-svn: 348076	2018-12-01 14:18:31 +00:00
Nikita Popov	0c5d6ccbfc	[InstCombine] Support ssub.sat canonicalization for non-splats Extend ssub.sat(X, C) -> sadd.sat(X, -C) canonicalization to also support non-splat vector constants. This is done by generalizing the implementation of the isNotMinSignedValue() helper to return true for constants that are non-splat, but don't contain any signed min elements. Differential Revision: https://reviews.llvm.org/D55011 llvm-svn: 348072	2018-12-01 10:58:34 +00:00
Teresa Johnson	5b8ff375c8	[ThinLTO] Allow importing of functions with var args Summary: Follow up to D54270, which allowed importing of var args functions unless they called va_start. As pointed out in the post-commit comments on that patch, the inliner can handle functions that call va_start in certain situations as well. Go ahead and enable importing of all var args functions. Measurements on a large binary show that this increases imports and binary size by an insignificant amount. Reviewers: davidxl Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D54607 llvm-svn: 348068	2018-12-01 05:11:46 +00:00
Craig Topper	88270231f8	[X86][LoopVectorize] Replace -mcpu=skylake-avx512 with -mattr=avx512f in some tests that failed when experimenting with defaulting to -mprefer-vector-width=256 for skylake-avx512. llvm-svn: 348063	2018-12-01 01:38:44 +00:00
Craig Topper	502fc1bdd5	[X86] Split skylake-avx512 run lines in SLP vectorizer tests to cover -mprefer=vector-width=256 and -mprefer-vector-width=512. This will make these tests immune if we ever change the default behavior of -march=skylake-avx512 to prefer 256 bit vectors. llvm-svn: 348046	2018-11-30 22:53:21 +00:00
Sanjay Patel	398728732e	[InstSimplify] add tests for undef + partial undef constant folding; NFC These tests should probably go under a separate test file because they should fold with just -constprop, but they're similar to the scalar tests already in here. llvm-svn: 348045	2018-11-30 22:51:34 +00:00
Joseph Tremoulet	27b1e3bd4f	[Mem2Reg] Fix nondeterministic corner case Summary: When mem2reg inserts phi nodes in blocks with unreachable predecessors, it adds undef operands for those incoming edges. When there are multiple such predecessors, the order is currently based on the address of the BasicBlocks. This change fixes that by using the BBNumbers in the sort/search predicates, as is done elsewhere in mem2reg to ensure determinism. Also adds a testcase with a bunch of unreachable preds, which (nodeterministically) fails without the fix. Reviewers: majnemer Reviewed By: majnemer Subscribers: mgrang, llvm-commits Differential Revision: https://reviews.llvm.org/D55077 llvm-svn: 348024	2018-11-30 19:20:02 +00:00
Alexey Bataev	3689747619	[SLP]PR39774: Update references of the replaced external instructions. Summary: An additional fix for PR39774. Need to update the references for the RedcutionRoot instruction when it is replaced during the vectorization phase to avoid compiler crash on reduction vectorization. Reviewers: RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55017 llvm-svn: 347997	2018-11-30 15:14:20 +00:00
Renato Golin	135e72e1b9	Add a new reduction pattern match Adding a new reduction pattern match for vectorizing code similar to TSVC s3111: for (int i = 0; i < N; i++) if (a[i] > b) sum += a[i]; This patch adds support for fadd, fsub and fmull, as well as multiple branches and different (but compatible) instructions (ex. add+sub) in different branches. The difference from the previous patch(https://reviews.llvm.org/D49168) is as follows: - Added check of fast-math property of fp-instruction to the previous patch - Fix/add some pattern for if-reduction.ll Differential Revision: https://reviews.llvm.org/D54464 Patch by Takahiro Miyoshi <takahiro.miyoshi@linaro.org> and Masakazu Ueno <masakazu.ueno@linaro.org> llvm-svn: 347989	2018-11-30 13:40:10 +00:00
Max Kazantsev	9cf417db78	[LoopSimplifyCFG] Update MemorySSA in terminator folding. PR39783 Terminator folding transform lacks MemorySSA update for memory Phis, while they exist within MemorySSA analysis. They need exactly the same type of updates as regular Phis. Failing to update them properly ends up with inconsistent MemorySSA and manifests in various assertion failures. This patch adds Memory Phi updates to this transform. Thanks to @jonpa for finding this! Differential Revision: https://reviews.llvm.org/D55050 Reviewed By: asbirlea llvm-svn: 347979	2018-11-30 10:06:23 +00:00
Max Kazantsev	deaa3e2068	[NFC] Simplify and reduce tests for PR39783 llvm-svn: 347976	2018-11-30 09:51:25 +00:00
Warren Ristow	72d1f3a285	[SCEV] Guard movement of insertion point for loop-invariants r320789 suppressed moving the insertion point of SCEV expressions with dev/rem operations to the loop header in non-loop-invariant situations. This, and similar, hoisting is also unsafe in the loop-invariant case, since there may be a guard against a zero denominator. This is an adjustment to the fix of r320789 to suppress the movement even in the loop-invariant case. This fixes PR30806. Differential Revision: https://reviews.llvm.org/D54713 llvm-svn: 347934	2018-11-30 00:02:54 +00:00
David Stuttard	c6603861d8	Revert r347871 "Fix: Add support for TFE/LWE in image intrinsic" Also revert fix r347876 One of the buildbots was reporting a failure in some relevant tests that I can't repro or explain at present, so reverting until I can isolate. llvm-svn: 347911	2018-11-29 20:14:17 +00:00
Sanjay Patel	d802270808	[InstSimplify] fold select with implied condition This is an almost direct move of the functionality from InstCombine to InstSimplify. There's no reason not to do this in InstSimplify because we never create a new value with this transform. (There's a question of whether any dominance-based transform belongs in either of these passes, but that's a separate issue.) I've changed 1 of the conditions for the fold (1 of the blocks for the branch must be the block we started with) into an assert because I'm not sure how that could ever be false. We need 1 extra check to make sure that the instruction itself is in a basic block because passes other than InstCombine may be using InstSimplify as an analysis on values that are not wired up yet. The 3-way compare changes show that InstCombine has some kind of phase-ordering hole. Otherwise, we would have already gotten the intended final result that we now show here. llvm-svn: 347896	2018-11-29 18:44:39 +00:00
John Brawn	a7eb2c863f	[LICM] Reapply r347776 "Make LICM able to hoist phis" with fix This commit caused a large compile-time slowdown in some cases when NDEBUG is off due to the dominator tree verification it added. Fix this by only doing dominator tree and loop info verification when something has been hoisted. Differential Revision: https://reviews.llvm.org/D52827 llvm-svn: 347889	2018-11-29 17:10:00 +00:00
Sanjay Patel	515b91cdef	[SimplifyCFG] auto-generate complete checks; NFC llvm-svn: 347882	2018-11-29 16:28:37 +00:00
Sanjay Patel	81449c6b0e	[InstCombine] auto-generate complete checks; NFC llvm-svn: 347881	2018-11-29 16:26:03 +00:00
Joseph Tremoulet	926ee459c4	[CallSiteSplitting] Report edge deletion to DomTreeUpdater Summary: When splitting musttail calls, the split blocks' original terminators get removed; inform the DTU when this happens. Also add a testcase that fails an assertion in the DTU without this fix. Reviewers: fhahn, junbuml Reviewed By: fhahn Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55027 llvm-svn: 347872	2018-11-29 15:27:04 +00:00
David Stuttard	de02e4b1cc	Add support for TFE/LWE in image intrinsics TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda llvm-svn: 347871	2018-11-29 15:21:13 +00:00
Martin Storsjo	bfd1d27585	Revert "[LICM] Enable control flow hoisting by default" and "[LICM] Reapply r347190 "Make LICM able to hoist phis" with fix" This reverts commits r347776 and r347778. The first one, r347776, caused significant compile time regressions for certain input files, see PR39836 for details. llvm-svn: 347867	2018-11-29 14:39:39 +00:00
Sanjay Patel	83d1d3f167	[CVP] auto-generate complete test checks; NFC llvm-svn: 347866	2018-11-29 14:28:47 +00:00
Max Kazantsev	a63b275285	[NFC] Add two XFAIL tests from PR39783 llvm-svn: 347845	2018-11-29 09:38:22 +00:00
Sam Parker	d6ebf0108e	[LoopStrengthReduce] ComplexityLimit as an option Convert ComplexityLimit into a command line value. Differential Revision: https://reviews.llvm.org/D54899 llvm-svn: 347843	2018-11-29 08:34:22 +00:00
Craig Topper	961b956eb4	[Inliner] Modify the merging of min-legal-vector-width attribute to better handle when the caller or callee don't have the attribute. Lack of an attribute means that the function hasn't been checked for what vector width it requires. So if the caller or the callee doesn't have the attribute we should make sure the combined function after inlining does not have the attribute. If the caller already doesn't have the attribute we can just avoid adding it. Otherwise if the callee doesn't have the attribute just remove the caller's attribute. llvm-svn: 347841	2018-11-29 07:27:38 +00:00
Craig Topper	645cc6e331	[Inliner] Add test for merging of min-legal-vector-width function attribute. This should have been added in r337844, but apparently was I failed to 'git add' the file. llvm-svn: 347840	2018-11-29 07:02:47 +00:00
Paul Robinson	adcdc1bd0a	[DebugInfo] IR/Bitcode changes for DISubprogram flags. Packing the flags into one bitcode word will save effort in adding new flags in the future. Differential Revision: https://reviews.llvm.org/D54755 llvm-svn: 347806	2018-11-28 21:14:32 +00:00
Jeremy Morse	9b4cfa55b1	[DebugInfo] Give inlinable calls DILocs (PR39807) In PR39807 we incorrectly handle circumstances where calls are common'd from conditional blocks into the parent BB. Calls that can be inlined must always have DebugLocs, however we strip them during commoning, which the IR verifier asserts on. Fix this by using applyMergedLocation: it will perform the same DebugLoc stripping of conditional Locs, but will also generate an unknown location DebugLoc that satisfies the requirement for inlinable calls to always have locations. Some of the prior logic for selecting a DebugLoc is now likely redundant; I'll generate a follow-up to remove it (involves editing more regression tests). Differential Revision: https://reviews.llvm.org/D54997 llvm-svn: 347782	2018-11-28 17:58:45 +00:00
John Brawn	4557ffeb63	[LICM] Enable control flow hoisting by default Differential Revision: https://reviews.llvm.org/D54949 llvm-svn: 347778	2018-11-28 17:23:03 +00:00
John Brawn	31c9769580	[LICM] Reapply r347190 "Make LICM able to hoist phis" with fix This commit caused failures because it failed to correctly handle cases where we hoist a phi, then hoist a use of that phi, then have to rehoist that use. We need to make sure that we rehoist the use to _after_ the hoisted phi, which we do by always rehoisting to the immediate dominator instead of just rehoisting everything to the original preheader. An option is also added to control whether control flow is hoisted, which is off in this commit but will be turned on in a subsequent commit. Differential Revision: https://reviews.llvm.org/D52827 llvm-svn: 347776	2018-11-28 17:21:49 +00:00
Nikita Popov	8d63aed459	[InstCombine] Combine saturating add/sub with constant operands Combine sat(sat(X + C1) + C2) -> sat(X + (C1+C2)) and sat(sat(X - C1) - C2) -> sat(X - (C1+C2)) if the sign of C1 and C2 matches. In the unsigned case we can compute C1+C2 with saturating arithmetic, and InstSimplify will reduce this just to the saturation value. For the signed case, we cannot perform the simplification if the result of the addition overflows. This change is part of https://reviews.llvm.org/D54534. llvm-svn: 347773	2018-11-28 16:37:15 +00:00
Nikita Popov	42f89989a1	[InstCombine] Canonicalize ssub.sat to sadd.sat Canonicalize ssub.sat(X, C) to ssub.sat(X, -C) if C is constant and not signed minimum. This will help further optimizations to apply. This change is part of https://reviews.llvm.org/D54534. llvm-svn: 347772	2018-11-28 16:37:09 +00:00
Nikita Popov	cf596a8c26	[ValueTracking] Determine always-overflow condition for unsigned sub Always-overflow was already determined for unsigned addition, but not subtraction. This patch establishes parity. This allows us to perform some additional simplifications for signed saturating subtractions. This change is part of https://reviews.llvm.org/D54534. llvm-svn: 347771	2018-11-28 16:37:04 +00:00
Nikita Popov	78a9295e15	[InstCombine] Use known overflow information for saturating add/sub If ValueTracking can determine that the add/sub can newer overflow, replace it with the corresponding nuw/nsw add/sub. Additionally, for the unsigned case, if ValueTracking determines that the add/sub always overflows, replace the result with the saturation value. This change is part of https://reviews.llvm.org/D54534. llvm-svn: 347770	2018-11-28 16:36:59 +00:00
Nikita Popov	085d24a8b3	[InstCombine] Canonicalize const arg for saturating adds If a saturating add intrinsic has one constant argument, make sure it is on the RHS. This will simplify further transformations. This change is part of https://reviews.llvm.org/D54534. llvm-svn: 347769	2018-11-28 16:36:52 +00:00
Alexey Bataev	579c2d9d64	[SLP]Fix PR39774: Set ReductionRoot if the original instruction is vectorized. Summary: If the original reduction root instruction was vectorized, it might be removed from the tree. It means that the insertion point may become invalidated and the whole vectorization of the reduction leads to the incorrect output result. The ReductionRoot instruction must be marked as externally used so it could not be removed. Otherwise it might cause inconsistency with the cost model and we may end up with too optimistic optimization. Reviewers: RKSimon, spatel, hfinkel, mkuper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54955 llvm-svn: 347759	2018-11-28 14:34:11 +00:00
Nikita Popov	e20e6b4a53	[InstCombine] Add tests for saturating add/sub; NFC These are baseline tests for D54534. llvm-svn: 347700	2018-11-27 19:52:56 +00:00
Florian Hahn	fd6ea134f4	[PartialInliner] Make PHIs free in cost computation. InlineCost also treats them as free and the current implementation can cause assertion failures if PHI nodes are moved outside the region from entry BBs to the region. It also updates the code to use the instructionsWithoutDebug iterator. Reviewers: davidxl, davide, vsk, graham-yiu-huawei Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D54748 llvm-svn: 347683	2018-11-27 18:17:27 +00:00
Max Kazantsev	b0a9b75e2a	Add missing REQUIRES: asserts llvm-svn: 347644	2018-11-27 07:51:18 +00:00
Max Kazantsev	c4e4d6449a	[LoopSimplifyCFG] Fix corner case with duplicating successors It fixes a bug that doesn't update Phi inputs of the only live successor that is in the list of block's successors more than once. Thanks @uabelho for finding this. Differential Revision: https://reviews.llvm.org/D54849 Reviewed By: anna llvm-svn: 347640	2018-11-27 06:17:21 +00:00
Sanjay Patel	703299e5e9	[InstCombine] add tests for rotate/bswap equality; NFC llvm-svn: 347618	2018-11-27 00:08:21 +00:00
Xin Tong	04d49779a1	[ICP] Remove incompatible attributes at indirect-call promoted callsites. Summary: Removing ncompatible attributes at indirect-call promoted callsites, not removing it results in at least a IR verification error. Reviewers: davidxl, xur, mssimpso Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54913 llvm-svn: 347605	2018-11-26 22:03:52 +00:00
Fedor Sergeev	8cd9d1b5ce	Revert "[TTI] Reduction costs only need to include a single extract element cost" This reverts commit r346970. It was causing PR39774, a crash in slp-vectorizer on a rather simple loop with just a bunch of 'and's in the body. llvm-svn: 347541	2018-11-26 10:17:27 +00:00
Florian Hahn	6615a7132a	[IPSCCP] Use input operand instead of OriginalOp for ssa_copy. OriginalOp of a Predicate refers to the original IR value, before renaming. While solving in IPSCCP, we have to use the operand of the ssa_copy instead, to avoid missing updates for nested conditions on the same IR value. Fixes PR39772. llvm-svn: 347524	2018-11-25 16:32:02 +00:00
Nikita Popov	2c779c0e34	[InstCombine] Determine demanded and known bits for funnel shifts Support funnel shifts in InstCombine demanded bits simplification. If the shift amount is constant, we can determine both the demanded bits of the operands, as well as the known bits of the result. If one of the operands has no demanded bits, it will be replaced by undef and the funnel shift will be simplified into a simple shift due to the simplifications added in D54778. Differential Revision: https://reviews.llvm.org/D54869 llvm-svn: 347515	2018-11-24 19:00:45 +00:00
Joel Jones	7459398a43	Revert unapproved commit llvm-svn: 347511	2018-11-24 07:26:55 +00:00
Joel Jones	5f533c5fe1	[AArch64] Enable libm vectorized functions via SLEEF This changeset is modeled after Intel's submission for SVML. It enables trigonometry functions vectorization via SLEEF: http://sleef.org/. * A new vectorization library enum is added to TargetLibraryInfo.h: SLEEF. * A new option is added to TargetLibraryInfoImpl - ClVectorLibrary: SLEEF. * A comprehensive test case is included in this changeset. * In a separate changeset (for clang), a new vectorization library argument is added to -fveclib: -fveclib=SLEEF. Trigonometry functions that are vectorized by sleef: acos asin atan atanh cos cosh exp exp2 exp10 lgamma log10 log2 log sin sinh sqrt tan tanh tgamma Patch by Stefan Teleman Differential Revision: https://reviews.llvm.org/D53927 llvm-svn: 347510	2018-11-24 06:41:39 +00:00
Nikita Popov	6e81d421e1	[InstCombine] Simplify funnel shift with zero/undef operand to shift The following simplifications are implemented: * `fshl(X, 0, C) -> shl X, C%BW` * `fshl(X, undef, C) -> shl X, C%BW` (assuming undef = 0) * `fshl(0, X, C) -> lshr X, BW-C%BW` * `fshl(undef, X, C) -> lshr X, BW-C%BW` (assuming undef = 0) * `fshr(X, 0, C) -> shl X, (BW-C%BW)` * `fshr(X, undef, C) -> shl X, BW-C%BW` (assuming undef = 0) * `fshr(0, X, C) -> lshr X, C%BW` * `fshr(undef, X, C) -> lshr, X, C%BW` (assuming undef = 0) The simplification is only performed if the shift amount C is constant, because we can explicitly compute C%BW and BW-C%BW in this case. Differential Revision: https://reviews.llvm.org/D54778 llvm-svn: 347505	2018-11-23 22:45:08 +00:00
Max Kazantsev	7231009b78	[NFC] Add test that demonstrates buggy behavior on term folding of LoopSimplifyCFG llvm-svn: 347488	2018-11-23 10:34:22 +00:00
Max Kazantsev	e1c2dc27d3	Disable LoopSimplifyCFG terminator folding by default llvm-svn: 347486	2018-11-23 09:14:53 +00:00
Max Kazantsev	cb8e240334	[LoopSimplifyCFG] Don't delete LCSSA Phis When removing edges, we also update Phi inputs and may end up removing a Phi if it has only one input. We should not do it for edges that leave the current loop because these Phis are LCSSA Phis and need to be preserved. Thanks @dmgreen for finding this! Differential Revision: https://reviews.llvm.org/D54841 llvm-svn: 347484	2018-11-23 07:56:47 +00:00
Max Kazantsev	a10c1c7412	[NFC] Add verification flags to tests llvm-svn: 347483	2018-11-23 05:21:53 +00:00
Nikita Popov	a70fdf8635	[InstCombine] Add tests for funnel shift with zero operand; NFC These are additional baseline tests for D54778. llvm-svn: 347414	2018-11-21 20:34:11 +00:00
Nikita Popov	6f54fb0052	[MergeFuncs] Generate alias instead of thunk if possible The MergeFunctions pass was originally intended to emit aliases instead of thunks where possible (unnamed_addr). However, for a long time this functionality was behind a flag hardcoded to false, bitrotted and was eventually removed in r309313. Originally the functionality was first disabled in r108417 due to lack of support for aliases in Mach-O. I believe that this is no longer the case nowadays, but not really familiar with this area. In the interest of being conservative, this patch reintroduces the aliasing functionality behind a default disabled -mergefunc-use-aliases flag. Differential Revision: https://reviews.llvm.org/D53285 llvm-svn: 347407	2018-11-21 19:37:19 +00:00
Mikael Holmen	b6f76002d9	[PM] Port Scalarizer to the new pass manager. Patch by: markus (Markus Lavin) Reviewers: chandlerc, fedor.sergeev Reviewed By: fedor.sergeev Subscribers: llvm-commits, Ka-Ka, bjope Differential Revision: https://reviews.llvm.org/D54695 llvm-svn: 347392	2018-11-21 14:00:17 +00:00
Max Kazantsev	bcd3f55827	[NFC] More complex tests for LoopSimplifyCFG llvm-svn: 347384	2018-11-21 09:55:09 +00:00
Max Kazantsev	6d9e7918ec	[NFC] Add some sophisticated tests on LoopSimplifyCFG llvm-svn: 347381	2018-11-21 07:22:06 +00:00
John Regehr	3a1c9d55cc	[LVI] run transfer function for binary operator even when the RHS isn't a constant LVI was symbolically executing binary operators only when the RHS was constant, missing the case where we have a ConstantRange for the RHS, but not an actual constant. Tested using check-all and by bootstrapping. Compile time is not impacted measurably. Differential Revision: https://reviews.llvm.org/D19859 llvm-svn: 347379	2018-11-21 05:24:12 +00:00
Sanjay Patel	96152dcb1c	[InstCombine] add tests for funnel shifts; NFC These are included in D54666, so adding them first with baseline results. Patch by: @nikic (Nikita Popov) llvm-svn: 347333	2018-11-20 17:51:49 +00:00
Sanjay Patel	14ab9170b8	[InstSimplify] fold funnel shifts with undef operands Splitting these off from the D54666. Patch by: nikic (Nikita Popov) llvm-svn: 347332	2018-11-20 17:34:59 +00:00
Sanjay Patel	2778f56a40	[InstSimplify] add tests for funnel shift with undef operands; NFC These are part of D54666, so adding them here before the patch to show the baseline (currently unoptimized) results. Patch by: @nikic (Nikita Popov) llvm-svn: 347331	2018-11-20 17:30:09 +00:00
Sanjay Patel	eea21da12a	[InstructionSimplify] Add support for saturating add/sub Add support for saturating add/sub in InstructionSimplify. In particular, the following simplifications are supported: sat(X + 0) -> X sat(X + undef) -> -1 sat(X uadd MAX) -> MAX (and commutative variants) sat(X - 0) -> X sat(X - X) -> 0 sat(X - undef) -> 0 sat(undef - X) -> 0 sat(0 usub X) -> 0 sat(X usub MAX) -> 0 Patch by: @nikic (Nikita Popov) Differential Revision: https://reviews.llvm.org/D54532 llvm-svn: 347330	2018-11-20 17:20:26 +00:00
Guozhi Wei	c21fba1bab	[LoopSink] Add preheader to alias set This patch fixes PR39695. The original LoopSink only considers memory alias in loop body. But PR39695 shows that instructions following sink candidate in preheader should also be checked. This is a conservative patch, it simply adds whole preheader block to alias set. It may lose some optimization opportunity, but I think that is very rare because: 1 in the most common case st/ld to the same address, the load should already be optimized away. 2 usually preheader is not very large. Differential Revision: https://reviews.llvm.org/D54659 llvm-svn: 347325	2018-11-20 16:49:07 +00:00
Sanjay Patel	f5ead29b78	[PatternMatch] Handle undef vectors consistently This patch fixes the issue noticed in D54532. The problem is that cst_pred_ty-based matchers like m_Zero() currently do not match scalar undefs (as expected), but do match vector undefs. This may lead to optimization inconsistencies in rare cases. There is only one existing test for which output changes, reverting the change from D53205. The reason here is that vector fsub undef, %x is no longer matched as an m_FNeg(). While I think that the new output is technically worse than the previous one, it is consistent with scalar, and I don't think it's really important either way (generally that undef should have been folded away prior to reassociation.) I've also added another test case for this issue based on InstructionSimplify. It took some effort to find that one, as in most cases undef folds are either checked first -- and in the cases where they aren't it usually happens to not make a difference in the end. This is the only case I was able to come up with. Prior to this patch the test case simplified to undef in the scalar case, but zeroinitializer in the vector case. Patch by: @nikic (Nikita Popov) Differential Revision: https://reviews.llvm.org/D54631 llvm-svn: 347318	2018-11-20 16:08:19 +00:00
Max Kazantsev	c04b5307d1	Recommit "[LoopSimplifyCFG] Teach LoopSimplifyCFG to constant-fold branches and switches" The initial version of patch lacked Phi nodes updates in destinations of removed edges. This version contains this update and tests on this situation. Differential Revision: https://reviews.llvm.org/D54021 llvm-svn: 347289	2018-11-20 05:43:32 +00:00
Benjamin Kramer	fdd9b4fc8f	Revert "[LoopSimplifyCFG] Teach LoopSimplifyCFG to constant-fold branches and switches" This reverts commits r347183 & r347184. Crashes while building libxml. llvm-svn: 347260	2018-11-19 20:01:20 +00:00
Vedant Kumar	238533ec2e	[InstCombine] Set debug loc on `mergeStoreIntoSuccessor` phi Assigning a merged debug location to the `mergeStoreIntoSuccessor` phi improves backtrace quality. Fixes llvm.org/PR38083. llvm-svn: 347257	2018-11-19 19:55:02 +00:00
Benjamin Kramer	2cad359c91	Revert "[LICM] Make LICM able to hoist phis" This reverts commit r347190. llvm-svn: 347225	2018-11-19 16:51:57 +00:00
Anna Thomas	5e9215f02b	[LV] Avoid vectorizing unsafe dependencies in uniform address Summary: Currently, when vectorizing stores to uniform addresses, the only instance we prevent vectorization is if there are multiple stores to the same uniform address causing an unsafe dependency. This patch teaches LAA to avoid vectorizing loops that have an unsafe cross-iteration dependency between a load and a store to the same uniform address. Fixes PR39653. Reviewers: Ayal, efriedma Subscribers: rkruppe, llvm-commits Differential Revision: https://reviews.llvm.org/D54538 llvm-svn: 347220	2018-11-19 15:39:59 +00:00
John Brawn	12c046fba0	[LICM] Make LICM able to hoist phis The general approach taken is to make note of loop invariant branches, then when we see something conditional on that branch, such as a phi, we create a copy of the branch and (empty versions of) its successors and hoist using that. This has no impact by itself that I've been able to see, as LICM typically doesn't see such phis as they will have been converted into selects by the time LICM is run, but once we start doing phi-to-select conversion later it will be important. Differential Revision: https://reviews.llvm.org/D52827 llvm-svn: 347190	2018-11-19 11:31:24 +00:00
Fangrui Song	209cfbe60e	[LoopSimplifyCFG] Add requires: asserts after rL347183 llvm-svn: 347184	2018-11-19 06:28:15 +00:00
Max Kazantsev	8e3e33d138	[LoopSimplifyCFG] Teach LoopSimplifyCFG to constant-fold branches and switches This patch introduces infrastructure and the simplest case for constant-folding of branch and switch instructions within loop into unconditional branches. It is useful as a cleanup for such passes as loop unswitching that sometimes produce such branches. Only the simplest case supported in this patch: after the folding, no block should become dead or stop being part of the loop. Support for more sophisticated cases will go separately in follow-up patches. Differential Revision: https://reviews.llvm.org/D54021 Reviewed By: anna llvm-svn: 347183	2018-11-19 05:54:38 +00:00
Vedant Kumar	35f504c113	[CorrelatedValuePropagation] Preserve debug locations (PR38178) Fix all of the missing debug location errors in CVP found by debugify. This includes the missing-location-after-udiv-truncation case described in llvm.org/PR38178. llvm-svn: 347147	2018-11-18 00:29:58 +00:00
Fedor Sergeev	2e3e224e71	[SimpleLoopUnswitch] adding cost multiplier to cap exponential unswitch with We need to control exponential behavior of loop-unswitch so we do not get run-away compilation. Suggested solution is to introduce a multiplier for an unswitch cost that makes cost prohibitive as soon as there are too many candidates and too many sibling loops (meaning we have already started duplicating loops by unswitching). It does solve the currently known problem with compile-time degradation (PR 39544). Tests are built on top of a recently implemented CHECK-COUNT-<num> FileCheck directives. Reviewed By: chandlerc, mkazantsev Differential Revision: https://reviews.llvm.org/D54223 llvm-svn: 347097	2018-11-16 21:16:43 +00:00
Adrian Prantl	83d87520ed	GlobalDCE: Teach isEmptyFunction() to ignore debug intrinsics. This fixes PR39669. https://bugs.llvm.org/show_bug.cgi?id=39669 llvm-svn: 347065	2018-11-16 17:47:21 +00:00
Sanjay Patel	f967328e24	[InstSimplify] add tests for saturating add/sub; NFC These are baseline tests for D54532. Patch based on the original tests by: @nikic (Nikita Popov) llvm-svn: 347060	2018-11-16 16:32:34 +00:00
Sanjay Patel	5ebd2a785e	[InstSimplify] add test to demonstrate undef matching differences; NFC This is a baseline test for D54631. Patch by: @nikic (Nikita Popov) llvm-svn: 347055	2018-11-16 15:35:58 +00:00
Sanjay Patel	c92aa7618f	[InstCombine] adjust rotate direction in tests; NFC Copy/paste errors - all of the changed tests rotated left before. llvm-svn: 346982	2018-11-15 19:15:41 +00:00
Sanjay Patel	6cda87463f	[InstCombine] add tests for funnel shift (rotate) canonicalization; NFC llvm-svn: 346975	2018-11-15 18:19:56 +00:00
Simon Pilgrim	924f193419	[TTI] Reduction costs only need to include a single extract element cost We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type. For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed. Fixes PR37731 Differential Revision: https://reviews.llvm.org/D54585 llvm-svn: 346970	2018-11-15 17:42:53 +00:00
Sanjay Patel	bc56b2432d	[InstCombine] fix rotate narrowing bug for non-pow-2 types llvm-svn: 346968	2018-11-15 17:19:14 +00:00
Sanjay Patel	712bdb275c	[InstCombine] add rotate narrowing tests with odd types; NFC There's a potential miscompile here. It's unlikely in the real world because this transform is guarded with shouldChangeType(), but this test file doesn't include a standard data-layout for some reason (despite including a custom 1), so we can see the bug. llvm-svn: 346966	2018-11-15 16:34:26 +00:00
Simon Pilgrim	5a1b7cea91	[SLPVectorizer][X86] Regenerate reduction minmax tests and cleanup check prefixes llvm-svn: 346965	2018-11-15 16:34:15 +00:00
Simon Pilgrim	4dd692ec2a	[SLPVectorizer][X86] Regenerate reduction tests and add PR37731 test Cleanup check prefixes llvm-svn: 346964	2018-11-15 16:08:25 +00:00
Sanjay Patel	e98ec77a95	[InstSimplify] delete shift-of-zero guard ops around funnel shifts This is a problem seen in common rotate idioms as noted in: https://bugs.llvm.org/show_bug.cgi?id=34924 Note that we are not canonicalizing standard IR (shifts and logic) to the intrinsics yet. (Although I've written this before...) I think this is the last step before we enable that transform. Ie, we could regress code by doing that transform without this simplification in place. In PR34924, I questioned whether this is a valid transform for target-independent IR, but I convinced myself this is ok. If we're speculating a funnel shift by turning cmp+br into select, then SimplifyCFG has already determined that the transform is justified. It's possible that SimplifyCFG is not taking into account profile or other metadata, but if that's true, then it's a bug independent of funnel shifts. Also, we do have CGP code to restore a guard like this around an intrinsic if it can't be lowered cheaply. But that isn't necessary for funnel shift because the default expansion in SelectionDAGBuilder includes this same cmp+select. Differential Revision: https://reviews.llvm.org/D54552 llvm-svn: 346960	2018-11-15 14:53:37 +00:00
Sanjay Patel	4832ffee39	[InstSimplify] add more tests for funnel shift with select; NFC The cases are just different enough that we should have complete tests to avoid bugs from typos in the code. llvm-svn: 346902	2018-11-14 22:34:25 +00:00
Vedant Kumar	808e157356	Mark @llvm.trap cold A call to @llvm.trap can be expected to be cold (i.e. unlikely to be reached in a normal program execution). Outlining paths which unconditionally trap is an important memory saving. As the hot/cold splitting pass (imho) should not treat all noreturn calls as cold, explicitly mark @llvm.trap cold so that it can be outlined. Split out of https://reviews.llvm.org/D54244. Differential Revision: https://reviews.llvm.org/D54329 llvm-svn: 346885	2018-11-14 19:53:41 +00:00
Teresa Johnson	32dc5b9bf1	[ThinLTO] Update handling of vararg functions to match inliner Summary: Previously we marked all vararg functions as non-inlinable in the function summary, which prevented their importing. However, the corresponding inliner restriction was loosened in r321940/r342675 to only apply to functions calling va_start. Adjust the summary flag computation to match. Reviewers: davidxl Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D54270 llvm-svn: 346883	2018-11-14 19:30:13 +00:00
Sanjay Patel	7d028670f6	[InstSimplify] add tests for funnel shift with select; NFC llvm-svn: 346881	2018-11-14 19:12:54 +00:00
Mandeep Singh Grang	0905fc77c1	[InstCombine] Remove a couple of asserts based on incorrect assumptions Summary: These asserts are based on the assumption that the order of true/false operands in a select and those in the compare would always be the same. This fixes PR39595. Reviewers: craig.topper, spatel, dmgreen Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54359 llvm-svn: 346874	2018-11-14 17:55:07 +00:00
John Brawn	9fd8c20c4f	[SimplifyCFG] Regenerate preserve-branchweights.ll test. NFC Regenerate this test using update_test_checks.py in preparation for an upcomming commit, to make it not depend on the names of instructions. llvm-svn: 346869	2018-11-14 15:27:07 +00:00
Florian Hahn	505091a8f2	Recommit r346483: [CallSiteSplitting] Only record conditions up to the IDom(call site). The underlying problem causing the expensive-check failure was fixed in rL346769. llvm-svn: 346843	2018-11-14 10:04:30 +00:00
Reid Kleckner	41390b47de	Revert r346810 "Preserve loop metadata when splitting exit blocks" It broke the Windows self-host: http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/1457 llvm-svn: 346823	2018-11-14 01:47:32 +00:00
Sanjay Patel	a139564896	[InstCombine] fold funnel shift amount based on demanded bits The shift amount of a funnel shift is modulo the scalar bitwidth: http://llvm.org/docs/LangRef.html#llvm-fshl-intrinsic ...so we can use demanded bits analysis on that operand to simplify it when we have a power-of-2 bitwidth. This is another step towards canonicalizing {shift/shift/or} to the intrinsics in IR. Differential Revision: https://reviews.llvm.org/D54478 llvm-svn: 346814	2018-11-13 23:27:23 +00:00
Craig Topper	3c87c2a3c5	Preserve loop metadata when splitting exit blocks LoopUtils.cpp contains a utility that splits an loop exit block, so that the new block contains only edges coming from the loop. In the case of nested loops, the exit path for the inner loop might also be the back-edge of the outer loop. The new block which is inserted on this path, is now a latch for the outer loop, and it needs to hold the loop metadata for the outer loop. (The test case gives a more concrete view of the situation.) Patch by Chang Lin (clin1) Differential Revision: https://reviews.llvm.org/D53876 llvm-svn: 346810	2018-11-13 23:06:49 +00:00
Sanjay Patel	f8f12272e8	[InstCombine] canonicalize rotate patterns with cmp/select The cmp+branch variant of this pattern is shown in: https://bugs.llvm.org/show_bug.cgi?id=34924 ...and as discussed there, we probably can't transform that without a rotate intrinsic. We do have that now via funnel shift, but we're not quite ready to canonicalize IR to that form yet. The case with 'select' should already be transformed though, so that's this patch. The sequence with negation followed by masking is what we use in the backend and partly in clang (though that part should be updated). https://rise4fun.com/Alive/TplC %cmp = icmp eq i32 %shamt, 0 %sub = sub i32 32, %shamt %shr = lshr i32 %x, %shamt %shl = shl i32 %x, %sub %or = or i32 %shr, %shl %r = select i1 %cmp, i32 %x, i32 %or => %neg = sub i32 0, %shamt %masked = and i32 %shamt, 31 %maskedneg = and i32 %neg, 31 %shl2 = lshr i32 %x, %masked %shr2 = shl i32 %x, %maskedneg %r = or i32 %shl2, %shr2 llvm-svn: 346807	2018-11-13 22:47:24 +00:00
Cameron McInally	cbde0d9c7b	[IR] Add a dedicated FNeg IR Instruction The IEEE-754 Standard makes it clear that fneg(x) and fsub(-0.0, x) are two different operations. The former is a bitwise operation, while the latter is an arithmetic operation. This patch creates a dedicated FNeg IR Instruction to model that behavior. Differential Revision: https://reviews.llvm.org/D53877 llvm-svn: 346774	2018-11-13 18:15:47 +00:00
Florian Hahn	107d0a8756	[CSP, Cloning] Update DuplicateInstructionsInSplitBetween to use DomTreeUpdater. This patch updates DuplicateInstructionsInSplitBetween to update a DTU instead of applying updates to the DT directly. Given that there only are 2 users, also updated them in this patch to avoid churn. I slightly moved the code in CallSiteSplitting around to reduce the places where we have to pass in DTU. If necessary, I could split those changes in a separate patch. This fixes missing DT updates when dealing with musttail calls in CallSiteSplitting, by using DTU->deleteBB. Reviewers: junbuml, kuhar, NutshellySima, indutny, brzycki Reviewed By: NutshellySima llvm-svn: 346769	2018-11-13 17:54:43 +00:00
Sanjay Patel	bcc5a74261	[InstCombine] add tests for funnel shift demanded bits; NFC llvm-svn: 346762	2018-11-13 16:47:16 +00:00
Sanjay Patel	02f289e587	[InstCombine] add rotate variants that include select; NFC llvm-svn: 346719	2018-11-12 23:58:59 +00:00
Sanjay Patel	35b1c2d19d	[InstCombine] narrow width of rotate patterns, part 3 This is a longer variant for the pattern handled in rL346713 This one includes zexts. Eventually, we should canonicalize all rotate patterns to the funnel shift intrinsics, but we need a bit more infrastructure to make sure the vectorizers handle those intrinsics as well as the shift+logic ops. https://rise4fun.com/Alive/FMn Name: narrow rotateright %neg = sub i8 0, %shamt %rshamt = and i8 %shamt, 7 %rshamtconv = zext i8 %rshamt to i32 %lshamt = and i8 %neg, 7 %lshamtconv = zext i8 %lshamt to i32 %conv = zext i8 %x to i32 %shr = lshr i32 %conv, %rshamtconv %shl = shl i32 %conv, %lshamtconv %or = or i32 %shl, %shr %r = trunc i32 %or to i8 => %maskedShAmt2 = and i8 %shamt, 7 %negShAmt2 = sub i8 0, %shamt %maskedNegShAmt2 = and i8 %negShAmt2, 7 %shl2 = lshr i8 %x, %maskedShAmt2 %shr2 = shl i8 %x, %maskedNegShAmt2 %r = or i8 %shl2, %shr2 llvm-svn: 346716	2018-11-12 22:52:25 +00:00
Sanjay Patel	98e427ccf2	[InstCombine] narrow width of rotate patterns, part 2 (PR39624) The sub-pattern for the shift amount in a rotate can take on several different forms, and there's apparently no way to canonicalize those without seeing the entire rotate sequence. This is the form noted in: https://bugs.llvm.org/show_bug.cgi?id=39624 https://rise4fun.com/Alive/qnT %zx = zext i8 %x to i32 %maskedShAmt = and i32 %shAmt, 7 %shl = shl i32 %zx, %maskedShAmt %negShAmt = sub i32 0, %shAmt %maskedNegShAmt = and i32 %negShAmt, 7 %shr = lshr i32 %zx, %maskedNegShAmt %rot = or i32 %shl, %shr %r = trunc i32 %rot to i8 => %truncShAmt = trunc i32 %shAmt to i8 %maskedShAmt2 = and i8 %truncShAmt, 7 %shl2 = shl i8 %x, %maskedShAmt2 %negShAmt2 = sub i8 0, %truncShAmt %maskedNegShAmt2 = and i8 %negShAmt2, 7 %shr2 = lshr i8 %x, %maskedNegShAmt2 %r = or i8 %shl2, %shr2 llvm-svn: 346713	2018-11-12 22:11:09 +00:00
Sanjay Patel	b32d03dfed	[InstCombine] add more tests for rotate narrowing; NFC llvm-svn: 346703	2018-11-12 20:32:59 +00:00
Sanjay Patel	8512e5909e	[InstCombine] regenerate checks; NFC llvm-svn: 346689	2018-11-12 18:41:08 +00:00
Simon Pilgrim	47d38198eb	[CostModel] Add more realistic SK_InsertSubvector generic costs. Instead of defaulting to a cost = 1, expand to element extract/insert like we do for other shuffles. llvm-svn: 346662	2018-11-12 15:20:24 +00:00
Sanjay Patel	1456fd7614	[VectorUtils] add funnel-shifts to the list of vectorizable intrinsics This just identifies the intrinsics as candidates for vectorization. It does not mean we will attempt to vectorize under normal conditions (the test file is forcing vectorization). The cost model must be fixed to show that the transform is profitable in general. Allowing vectorization with these intrinsics is required to avoid potential regressions from canonicalizing to the intrinsics from generic IR: https://bugs.llvm.org/show_bug.cgi?id=37417 llvm-svn: 346661	2018-11-12 15:20:14 +00:00
Sanjay Patel	75120dcb06	[LoopVectorize] add tests for funnel shifts; NFC llvm-svn: 346658	2018-11-12 14:52:01 +00:00
Max Kazantsev	7d49a3a816	[LICM] Hoist guards from non-header blocks This patch relaxes overconservative checks on whether or not we could write memory before we execute an instruction. This allows us to hoist guards out of loops even if they are not in the header block. Differential Revision: https://reviews.llvm.org/D50891 Reviewed By: fedor.sergeev llvm-svn: 346643	2018-11-12 09:29:58 +00:00
Florian Hahn	9026d4ee9b	[IPSCCP,PM] Preserve PDT in the new pass manager. Reviewers: kuhar, chandlerc, NutshellySima, brzycki Reviewed By: NutshellySima, brzycki Differential Revision: https://reviews.llvm.org/D54317 llvm-svn: 346618	2018-11-11 20:22:45 +00:00
Sanjay Patel	3482801dea	[InstCombine] auto-generate full checks; NFC llvm-svn: 346594	2018-11-10 18:51:10 +00:00
Eli Friedman	15930bf352	[JumpThreading] Fix exponential time algorithm computing known values. ComputeValueKnownInPredecessors has a "visited" set to prevent infinite loops, since a value can be visited more than once. However, the implementation didn't prevent the algorithm from taking exponential time. Instead of removing elements from the RecursionSet one at a time, we should keep around the whole set until ComputeValueKnownInPredecessors finishes, then discard it. The testcase is synthetic because I was having trouble effectively reducing the original. But it's basically the same idea. Instead of failing, we could theoretically cache the result instead. But I don't think it would help substantially in practice. Differential Revision: https://reviews.llvm.org/D54239 llvm-svn: 346562	2018-11-09 22:35:26 +00:00
Simon Pilgrim	fc8f1d7da7	[CostModel][X86] SK_ExtractSubvector is free if the subvector is at the start of the source vector llvm-svn: 346538	2018-11-09 19:04:27 +00:00
Florian Hahn	9f878e9bae	Revert r346483: [CallSiteSplitting] Only record conditions up to the IDom(call site). This cause a failure with EXPENSIVE_CHECKS llvm-svn: 346492	2018-11-09 13:28:58 +00:00
Florian Hahn	a1062f4b68	[IPSCCP,PM] Preserve DT in the new pass manager. After D45330, Dominators are required for IPSCCP and can be preserved. This patch preserves DominatorTreeAnalysis in the new pass manager. AFAIK the legacy pass manager cannot preserve function analysis required by a module analysis. Reviewers: davide, dberlin, chandlerc, efriedma, kuhar, NutshellySima Reviewed By: chandlerc, kuhar, NutshellySima Differential Revision: https://reviews.llvm.org/D47259 llvm-svn: 346486	2018-11-09 11:52:27 +00:00
Florian Hahn	52578f95c9	[CallSiteSplitting] Only record conditions up to the IDom(call site). We can stop recording conditions once we reached the immediate dominator for the block containing the call site. Conditions in predecessors of the that node will be the same for all paths to the call site and splitting is not beneficial. This patch makes CallSiteSplitting dependent on the DT anlysis. because the immediate dominators seem to be the easiest way of finding the node to stop at. I had to update some exiting tests, because they were checking for conditions that were true/false on all paths to the call site. Those should now be handled by instcombine/ipsccp. Reviewers: davide, junbuml Reviewed By: junbuml Differential Revision: https://reviews.llvm.org/D44627 llvm-svn: 346483	2018-11-09 10:23:46 +00:00
Florian Hahn	a684a99441	[LoopInterchange] Support reductions across inner and outer loop. This patch adds logic to detect reductions across the inner and outer loop by following the incoming values of PHI nodes in the outer loop. If the incoming values take part in a reduction in the inner loop or come from outside the outer loop, we found a reduction spanning across inner and outer loop. With this change, ~10% more loops are interchanged in the LLVM test-suite + SPEC2006. Fixes https://bugs.llvm.org/show_bug.cgi?id=30472 Reviewers: mcrosier, efriedma, karthikthecool, davide, hfinkel, dmgreen Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D43245 llvm-svn: 346438	2018-11-08 20:44:19 +00:00
Pirama Arumuga Nainar	e61652a384	[LTO] Drop non-prevailing definitions only if linkage is not local or appending Summary: This fixes PR 37422 In ELF, non-weak symbols can also be non-prevailing. In this particular PR, the __llvm_profile_* symbols are non-prevailing but weren't getting dropped - causing multiply-defined errors with lld. Also add a test, strong_non_prevailing.ll, to ensure that multiple copies of a strong symbol are dropped. To fix the test regressions exposed by this fix, - do not mark prevailing copies for symbols with 'appending' linkage. There's no one prevailing copy for such symbols. - fix the prevailing version in dead-strip-fulllto.ll - explicitly pass exported symbols to llvm-lto in fumcimport.ll and funcimport_var.ll Reviewers: tejohnson, pcc Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, dang, srhines, llvm-commits Differential Revision: https://reviews.llvm.org/D54125 llvm-svn: 346436	2018-11-08 20:10:07 +00:00
Tom Stellard	28d662164d	InstCombine: Avoid introducing poison values when lowering llvm.amdgcn.[us]bfe Summary: When the 3rd argument to these intrinsics is zero, lowering them to shift instructions produces poison values, since we end up with shift amounts equal to the number of bits in the shifted value. This means we can only lower these intrinsics if we can prove that the 3rd argument is not zero. Reviewers: arsenm Reviewed By: arsenm Subscribers: bnieuwenhuizen, jvesely, wdng, nhaehnle, llvm-commits Differential Revision: https://reviews.llvm.org/D53739 llvm-svn: 346422	2018-11-08 17:57:57 +00:00
Vedant Kumar	d6699423f1	[CodeExtractor] Mark functions noreturn when applicable This eliminates the outlining penalty for llvm.trap/unreachable, because callers no longer have to emit cleanup/ret instructions after calling an outlined `noreturn` function. rdar://45523626 llvm-svn: 346421	2018-11-08 17:57:09 +00:00
Max Kazantsev	266c087b9d	Return "[IndVars] Smart hard uses detection" The patch has been reverted because it ended up prohibiting propagation of a constant to exit value. For such values, we should skip all checks related to hard uses because propagating a constant is always profitable. Differential Revision: https://reviews.llvm.org/D53691 llvm-svn: 346397	2018-11-08 11:54:35 +00:00
Gil Rapaport	7b88bab386	[LSR] Combine unfolded offset into invariant register LSR reassociates constants as unfolded offsets when the constants fit as immediate add operands, which currently prevents such constants from being combined later with loop invariant registers. This patch modifies GenerateCombinations() to generate a second formula which includes the unfolded offset in the combined loop-invariant register. This commit fixes a bug in the original patch (committed at r345114, reverted at r345123). Differential Revision: https://reviews.llvm.org/D51861 llvm-svn: 346390	2018-11-08 09:01:19 +00:00
whitequark	73cb978495	[MergeFuncs] Improve ordering of equal functions Summary: MergeFunctions currently tries to process strong functions before weak functions, because weak functions can simply call strong functions, while a strong/weak function cannot call a weak function (a backing strong function is needed). This patch additionally tries to process external functions before local functions, because we definitely have to keep the external function, but may be able to drop the local one (and definitely can if it is also unnamed_addr). Unfortunately, this exposes an existing bug in the implementation: The FnTree and FNodesInTree structures can currently go out of sync in the case where two weak functions are merged, because the function in FnTree/FNodesInTree is RAUWed. This leaves it behind in FnTree (this is intended, as it is the strong backing function which should be used for further merges), while it is replaced in FNodesInTree (this is not intended). This is fixed by switching FNodesInTree from using a ValueMap to using a DenseMap of AssertingVH. This exposes another minor issue: Currently FNodesInTree is not cleared after MergeFunctions finishes running. Currently, this is potentially dangerous (e.g. if something else wants to RAUW a function with a non-function), but at the very least it is unnecessary/inefficient. After the change to use AssertingVH it becomes more problematic, because there are certainly passes that remove functions. This issue is fixed by clearing FNodesInTree at the end of the pass. Reviewers: jfb, whitequark Reviewed By: whitequark Subscribers: rkruppe, llvm-commits Differential Revision: https://reviews.llvm.org/D53271 llvm-svn: 346386	2018-11-08 03:58:01 +00:00
whitequark	3580ac6125	[MergeFuncs] Call removeUsers() prior to unnamed_addr RAUW Summary: For unnamed_addr functions we RAUW instead of only replacing direct callers. However, functions in which replacements were performed currently are not added back to the worklist, resulting in missed merging opportunities. Fix this by calling removeUsers() prior to RAUW. Reviewers: jfb, whitequark Reviewed By: whitequark Subscribers: rkruppe, llvm-commits Differential Revision: https://reviews.llvm.org/D53262 llvm-svn: 346385	2018-11-08 03:57:55 +00:00
Rong Xu	fb4bcc452c	[PGO] Exit early if all count values are zero If all the edge counts for a function are zero, skip count population and annotation, as nothing will happen. This can save some compile time. Differential Revision: https://reviews.llvm.org/D54212 llvm-svn: 346370	2018-11-07 23:51:20 +00:00
Fedor Sergeev	f9a02a7006	[SimpleLoopUnswitch] partial unswitch needs to be careful when replacing invariants with constants When partial unswitch operates on multiple conditions at once, .e.g: if (Cond1 \|\| Cond2 \|\| NonInv) ... it should infer (and replace) values for individual conditions only on one side of unswitch and not another. More precisely only these derivations hold true: (Cond1 \|\| Cond2) == false => Cond1 == Cond2 == false (Cond1 && Cond2) == true => Cond1 == Cond2 == true By the way we organize unswitching it means only replacing on "continue" blocks and never on "unswitched" ones. Since trivial unswitch does not have "unswitched" blocks it does not have this problem. Fixes PR 39568. Reviewers: chandlerc, asbirlea Differential Revision: https://reviews.llvm.org/D54211 llvm-svn: 346350	2018-11-07 20:05:11 +00:00
Mandeep Singh Grang	d47d188b6f	[LoopSink] Do not sink instructions into non-cold blocks Summary: This fixes PR39570. Reviewers: danielcdh, rnk, bkramer Reviewed By: rnk Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54181 llvm-svn: 346337	2018-11-07 18:26:24 +00:00
Florian Hahn	ac86038b40	[NewGVN] Make sure we do not add a user to itself. If we simplify an instruction to itself, we do not need to add a user to itself. For congruence classes with a defining expression, we already use a similar logic. Fixes PR38259. Reviewers: davide, efriedma, mcrosier Reviewed By: davide Differential Revision: https://reviews.llvm.org/D51168 llvm-svn: 346335	2018-11-07 17:20:07 +00:00
Sanjay Patel	57a08b3343	[InstCombine] propagate FMF for fcmp+fabs folds By morphing the instruction rather than deleting and creating a new one, we retain fast-math-flags and potentially other metadata (profile info?). llvm-svn: 346331	2018-11-07 16:15:01 +00:00
Sanjay Patel	bb521e63af	[InstCombine] peek through fabs() when checking isnan() That should be the end of the missing cases for this fold. See earlier patches in this series: rL346321 rL346324 llvm-svn: 346327	2018-11-07 15:44:26 +00:00
Sanjay Patel	d80ec9e11a	[InstCombine] add tests for isnan(fabs(X)); NFC llvm-svn: 346325	2018-11-07 15:36:23 +00:00
Sanjay Patel	fa5f146872	[InstCombine] add folds for fcmp Pred fabs(X), 0.0 Similar to rL346321, we had folds for the ordered versions of these compares already, so add the unordered siblings for completeness. llvm-svn: 346324	2018-11-07 15:33:03 +00:00
Sanjay Patel	16a527e7de	[InstCombine] add tests for more fcmp+fabs preds; NFC llvm-svn: 346323	2018-11-07 15:27:02 +00:00
James Y Knight	72f76bf230	Add support for llvm.is.constant intrinsic (PR4898) This adds the llvm-side support for post-inlining evaluation of the __builtin_constant_p GCC intrinsic. Also fixed SCCPSolver::visitCallSite to not blow up when seeing a call to a function where canConstantFoldTo returns true, and one of the arguments is a struct. Updated from patch initially by Janusz Sobczak. Differential Revision: https://reviews.llvm.org/D4276 llvm-svn: 346322	2018-11-07 15:24:12 +00:00
Sanjay Patel	76faf5145d	[InstCombine] add fold for fabs(X) u< 0.0 The sibling fold for 'oge' --> 'ord' was already here, but this half was missing. The result of fabs() must be positive or nan, so asking if the result is negative or nan is the same as asking if the result is nan. This is another step towards fixing: https://bugs.llvm.org/show_bug.cgi?id=39475 llvm-svn: 346321	2018-11-07 15:11:32 +00:00
Sanjay Patel	c006a0ad4b	[InstCombine] add test for fcmp+fabs; NFC llvm-svn: 346320	2018-11-07 15:01:09 +00:00
Sanjay Patel	46a2510d01	[InstCombine] add FMF to fcmp to show failure to propagate; NFC llvm-svn: 346317	2018-11-07 14:44:09 +00:00
Sanjay Patel	7552d0d2e6	[InstCombine] do not shrink switch conditions to illegal types (PR29009) This patch makes shrinking switch conditions less aggressive which was introduced by: rL274233 Note that we have 2 new bugs to track potential follow-ups that might have solved PR29009 in different ways: https://bugs.llvm.org/show_bug.cgi?id=39569 https://bugs.llvm.org/show_bug.cgi?id=39578 Patch by: @dendibakh (Denis Bakhvalov) Differential Revision: https://reviews.llvm.org/D54115 llvm-svn: 346315	2018-11-07 14:12:41 +00:00
Max Kazantsev	68b2ad7e63	[NFC] Add missing test case, some test renaming llvm-svn: 346295	2018-11-07 05:58:10 +00:00
Vedant Kumar	1e209e284f	[CodeExtractor] Do not extract calls to eh_typeid_for (PR39545) The lowering for a call to eh_typeid_for changes when it's moved from one function to another. There are several proposals for fixing this issue in llvm.org/PR39545. Until some solution is in place, do not allow CodeExtractor to extract calls to eh_typeid_for, as that results in serious miscompilations. llvm-svn: 346256	2018-11-06 19:06:08 +00:00
Vedant Kumar	09b7aa443d	[CodeExtractor] Erase use-without-def debug intrinsics in parent func When CodeExtractor moves instructions to a new function, debug intrinsics referring to those instructions within the parent function become invalid. This results in the same verifier failure which motivated r344545, about function-local metadata being used in the wrong function. llvm-svn: 346255	2018-11-06 19:05:53 +00:00
Eli Friedman	e3a5fc6d80	Disable calls to _finite and other glibc-only functions on Musl. Non-GNU environments don't have __finite_, so treat them as unavailable. Differential Revision: https://reviews.llvm.org/D51282 llvm-svn: 346250	2018-11-06 18:23:32 +00:00
Sanjay Patel	724014adde	[InstCombine] allow vector types for fcmp+fpext fold llvm-svn: 346245	2018-11-06 17:20:20 +00:00
Sanjay Patel	db272b3720	[InstCombine] add vector test for fcmp+fpext; NFC llvm-svn: 346243	2018-11-06 17:06:58 +00:00
Sanjay Patel	46bf3922c1	[InstCombine] propagate fast-math-flags when folding fcmp+fpext, part 2 llvm-svn: 346242	2018-11-06 16:45:27 +00:00
Sanjay Patel	1b85f00201	[InstCombine] propagate fast-math-flags when folding fcmp+fpext llvm-svn: 346240	2018-11-06 16:23:03 +00:00
Sanjay Patel	6aea3071e8	[InstCombine] adjust tests to show dropping FMF; NFC llvm-svn: 346239	2018-11-06 16:07:39 +00:00
Sanjay Patel	2fd5b0ebfb	[InstCombine] propagate fast-math-flags when folding fcmp+fneg, part 2 llvm-svn: 346238	2018-11-06 15:58:57 +00:00
Sanjay Patel	a166d19d93	[InstCombine] adjust tests to show dropping FMF; NFC Also, remove some stale FIXME comments ( rL346234 ). llvm-svn: 346236	2018-11-06 15:57:52 +00:00
Sanjay Patel	70282a0501	[InstCombine] propagate fast-math-flags when folding fcmp+fneg This is another part of solving PR39475: https://bugs.llvm.org/show_bug.cgi?id=39475 This might be enough to fix that particular issue, but as noted with the FIXME, we're still dropping FMF on other folds around here. llvm-svn: 346234	2018-11-06 15:49:45 +00:00
Sanjay Patel	be985e33f0	[InstCombine] add tests for FMF propagation failure; NFC llvm-svn: 346232	2018-11-06 15:21:44 +00:00
Simon Pilgrim	c1da5f757e	[InstCombine] Ensure nested shifts are in range (OSS-Fuzz #9880 ) llvm-svn: 346225	2018-11-06 11:28:22 +00:00
Max Kazantsev	69f6dfa0f8	[LICM] Use ICFLoopSafetyInfo in LICM This patch makes LICM use `ICFLoopSafetyInfo` that is a smarter version of LoopSafetyInfo that leverages power of Implicit Control Flow Tracking to keep track of throwing instructions and give less pessimistic answers to queries related to throws. The ICFLoopSafetyInfo itself has been introduced in rL344601. This patch enables it in LICM only. Differential Revision: https://reviews.llvm.org/D50377 Reviewed By: apilipenko llvm-svn: 346201	2018-11-06 02:44:49 +00:00
Max Kazantsev	c210c65e77	[NFC] Add motivating test case for revert in rL346198 llvm-svn: 346199	2018-11-06 02:12:44 +00:00
Max Kazantsev	e059f4452b	Revert "[IndVars] Smart hard uses detection" This reverts commit 2f425e9c7946b9d74e64ebbfa33c1caa36914402. It seems that the check that we still should do the transform if we know the result is constant is missing in this code. So the logic that has been deleted by this change is still sometimes accidentally useful. I revert the change to see what can be done about it. The motivating case is the following: @Y = global [400 x i16] zeroinitializer, align 1 define i16 @foo() { entry: br label %for.body for.body: ; preds = %entry, %for.body %i = phi i16 [ 0, %entry ], [ %inc, %for.body ] %arrayidx = getelementptr inbounds [400 x i16], [400 x i16]* @Y, i16 0, i16 %i store i16 0, i16* %arrayidx, align 1 %inc = add nuw nsw i16 %i, 1 %cmp = icmp ult i16 %inc, 400 br i1 %cmp, label %for.body, label %for.end for.end: ; preds = %for.body %inc.lcssa = phi i16 [ %inc, %for.body ] ret i16 %inc.lcssa } We should be able to figure out that the result is constant, but the patch breaks it. Differential Revision: https://reviews.llvm.org/D51584 llvm-svn: 346198	2018-11-06 02:02:05 +00:00
Sanjay Patel	1440107821	[InstSimplify] fold select (fcmp X, Y), X, Y This is NFCI for InstCombine because it calls InstSimplify, so I left the tests for this transform there. As noted in the code comment, we can allow this fold more often by using FMF and/or value tracking. llvm-svn: 346169	2018-11-05 21:51:39 +00:00
Sanjay Patel	72c2d355b7	[InstSimplify] add tests for select+fcmp; NFC These are translated from InstCombine's test file with the same name. We should move the transform from InstCombine to InstSimplify. llvm-svn: 346168	2018-11-05 21:42:01 +00:00
Taewook Oh	2b7ae47ccb	[MergeICmps] Do not perform the transformation if GEP is used outside of block Summary: This patch prevents MergeICmps to performn the transformation if the address operand GEP of the load instruction has a use outside of the load's parent block. Without this patch, compiler crashes with the given test case because the use of `%first.i` is still around when the basic block is erased from https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/Scalar/MergeICmps.cpp#L620. I think checking `isUsedOutsideOfBlock` with `GEP` is the original intention of the code, as the checking for `LoadI` is already performed in the same function. This patch is incomplete though, as this makes the pass overly conservative and fails the test `tuple-four-int8.ll`. I believe what needs to be done is checking if GEP has a use outside of block that is not the part of "Comparisons" chain. Submit the patch as of now to prevent compiler crash. Reviewers: courbet, trentxintong Reviewed By: courbet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54089 llvm-svn: 346151	2018-11-05 18:16:32 +00:00
Sanjay Patel	1cfba9b5ed	[InstCombine] add/adjust tests for fcmp+select substitution; NFC There was no coverage for at least 2 out of the 4 patterns because of fcmp canonicalization. The tests and code should be moved to InstSimplify in a follow-up because this doesn't create any new values. llvm-svn: 346150	2018-11-05 18:09:10 +00:00
Sanjay Patel	c26fd1e772	[InstCombine] canonicalize -0.0 to +0.0 in fcmp As stated in IEEE-754 and discussed in: https://bugs.llvm.org/show_bug.cgi?id=38086 ...the sign of zero does not affect any FP compare predicate. Known regressions were fixed with: rL346097 (D54001) rL346143 The transform will help reduce pattern-matching complexity to solve: https://bugs.llvm.org/show_bug.cgi?id=39475 ...as well as improve CSE and codegen (a zero constant is almost always easier to produce than 0x80..00). llvm-svn: 346147	2018-11-05 17:26:42 +00:00
Sanjay Patel	87aa10062c	[InstCombine] loosen FP 0.0 constraint for fcmp+select substitution It looks like we correctly removed edge cases with 0.0 from D50714, but we were a bit conservative because getBinOpIdentity() doesn't distinguish between +0.0 and -0.0 and 'nsz' is effectively always true for fcmp (see discussion in: https://bugs.llvm.org/show_bug.cgi?id=38086 Without this change, we would get regressions by canonicalizing to +0.0 in all fcmp, and that's a step towards solving: https://bugs.llvm.org/show_bug.cgi?id=39475 llvm-svn: 346143	2018-11-05 16:50:44 +00:00
Sanjay Patel	8b2a1f7fd9	[InstCombine] adjust tests for select with FP identity op; NFC These are mislabeled as negative tests. llvm-svn: 346142	2018-11-05 16:27:03 +00:00
Sanjay Patel	92a53eabc6	[InstCombine] add/adjust tests for select with fsub identity op; NFC llvm-svn: 346138	2018-11-05 15:45:01 +00:00
Sanjay Patel	278db2fba1	[InstCombine] add tests for select with FP identity op; NFC llvm-svn: 346136	2018-11-05 15:08:36 +00:00
David Green	ba9f245b0d	[Inliner] Penalise inlining of calls with loops at Oz We currently seem to underestimate the size of functions with loops in them, both in terms of absolute code size and in the difficulties of dealing with such code. (Calls, for example, can be tail merged to further reduce codesize). At -Oz, we can then increase code size by inlining small loops multiple times. This attempts to penalise functions with loops at -Oz by adding a CallPenalty for each top level loop in the function. It uses LI (and hence DT) to calculate the number of loops. As we are dealing with minsize, the inline threshold is small and functions at this point should be relatively small, making the construction of these cheap. Differential Revision: https://reviews.llvm.org/D52716 llvm-svn: 346134	2018-11-05 14:54:34 +00:00
Vedant Kumar	d2a895a972	[HotColdSplitting] Use TTI to inform outlining threshold Using TargetTransformInfo allows the splitting pass to factor in the code size cost of instructions as it decides whether or not outlining is profitable. This did not regress the overall amount of outlining seen on the handful of internal frameworks I tested. Thanks to Jun Bum Lim for suggesting this! Differential Revision: https://reviews.llvm.org/D53835 llvm-svn: 346108	2018-11-04 23:11:57 +00:00
Sanjay Patel	e7c94ef1de	[ValueTracking] determine sign of 0.0 from select when matching min/max FP In PR39475: https://bugs.llvm.org/show_bug.cgi?id=39475 ..we may fail to recognize/simplify fabs() in some cases because we do not canonicalize fcmp with a -0.0 operand. Adding that canonicalization can cause regressions on min/max FP tests, so that's this patch: for the purpose of determining whether something is min/max, let the value returned by the select determine how we treat a 0.0 operand in the fcmp. This patch doesn't actually change the -0.0 to +0.0. It just changes the analysis, so we don't fail to recognize equivalent min/max patterns that only differ in the signbit of 0.0. Differential Revision: https://reviews.llvm.org/D54001 llvm-svn: 346097	2018-11-04 14:28:48 +00:00
Sanjay Patel	cac28b452e	[ValueTracking] peek through 2-input shuffles in ComputeNumSignBits This patch gives the IR ComputeNumSignBits the same functionality as the DAG version (the code is derived from the existing code). This an extension of the single input shuffle analysis added with D53659. Differential Revision: https://reviews.llvm.org/D53987 llvm-svn: 346071	2018-11-03 13:18:55 +00:00
Jordan Rupprecht	80e7e86c29	[DebugInfo][InstMerge] Fix -debugify for phi node created by -mldst-motion Summary: -mldst-motion creates a new phi node without any debug info. Use the merged debug location from the incoming stores to fix this. Fixes PR38177. The test case here is (somewhat) simplified from: ``` struct S { int foo; void fn(int bar); }; void S::fn(int bar) { if (bar) foo = 1; else foo = 0; } ``` Reviewers: dblaikie, gbedwell, aprantl, vsk Reviewed By: vsk Subscribers: vsk, JDevlieghere, llvm-commits Tags: #debug-info Differential Revision: https://reviews.llvm.org/D54019 llvm-svn: 346027	2018-11-02 18:25:41 +00:00
Jonas Paulsson	79f2441eee	[SystemZ] Rework getInterleavedMemoryOpCost() Model this function more closely after the BasicTTIImpl version, with separate handling of loads and stores. For loads, the set of actually loaded vectors is checked. This makes it more readable and just slightly more accurate generally. Review: Ulrich Weigand https://reviews.llvm.org/D53071 llvm-svn: 345998	2018-11-02 17:15:36 +00:00
Ayal Zaks	45a3ca7be7	[LV] Avoid vectorizing loops under opt for size that involve SCEV checks Fix PR39417, PR39497 The loop vectorizer may generate runtime SCEV checks for overflow and stride==1 cases, leading to execution of original scalar loop. The latter is forbidden when optimizing for size. An assert introduced in r344743 triggered the above PR's showing it does happen. This patch fixes this behavior by preventing vectorization in such cases. Differential Revision: https://reviews.llvm.org/D53612 llvm-svn: 345959	2018-11-02 09:16:12 +00:00
Florian Hahn	c8bd6ea35e	[LoopInterchange] Remove support for inner-only reductions. Inner-loop only reductions require additional checks to make sure they form a load-phi-store cycle across inner and outer loop. Otherwise the reduction value is not properly preserved. This patch disables interchanging such loops for now, as it causes miscompiles in some cases and it seems to apply only for a tiny amount of loops. Across the test-suite, SPEC2000 and SPEC2006, 61 instead of 62 loops are interchange with inner loop reduction support disabled. With -loop-interchange-threshold=-1000, 3256 instead of 3267. See the discussion and history of D53027 for an outline of how such legality checks could look like. Reviewers: efriedma, mcrosier, davide Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D53027 llvm-svn: 345877	2018-11-01 19:25:00 +00:00
Sanjay Patel	73bb119940	[InstCombine] add test for ComputeNumSignBits on 2-input shuffle; NFC llvm-svn: 345852	2018-11-01 16:57:54 +00:00
Sanjay Patel	746ebb4ee8	[InstSimplify] fold icmp based on range of abs/nabs (2nd try) This is retrying the fold from rL345717 (reverted at rL347780) ...with a fix for the miscompile demonstrated by PR39510: https://bugs.llvm.org/show_bug.cgi?id=39510 Original commit message: This is a fix for PR39475: https://bugs.llvm.org/show_bug.cgi?id=39475 We managed to get some of these patterns using computeKnownBits in https://reviews.llvm.org/D47041, but that can't be used for nabs(). Instead, put in some range-based logic, so we can fold both abs/nabs with icmp with a constant value. Alive proofs: https://rise4fun.com/Alive/21r Name: abs_nsw_is_positive %cmp = icmp slt i32 %x, 0 %negx = sub nsw i32 0, %x %abs = select i1 %cmp, i32 %negx, i32 %x %r = icmp sgt i32 %abs, -1 => %r = i1 true Name: abs_nsw_is_not_negative %cmp = icmp slt i32 %x, 0 %negx = sub nsw i32 0, %x %abs = select i1 %cmp, i32 %negx, i32 %x %r = icmp slt i32 %abs, 0 => %r = i1 false Name: nabs_is_negative_or_0 %cmp = icmp slt i32 %x, 0 %negx = sub i32 0, %x %nabs = select i1 %cmp, i32 %x, i32 %negx %r = icmp slt i32 %nabs, 1 => %r = i1 true Name: nabs_is_not_over_0 %cmp = icmp slt i32 %x, 0 %negx = sub i32 0, %x %nabs = select i1 %cmp, i32 %x, i32 %negx %r = icmp sgt i32 %nabs, 0 => %r = i1 false Differential Revision: https://reviews.llvm.org/D53844 llvm-svn: 345832	2018-11-01 14:07:39 +00:00
Sanjay Patel	056807b01e	[InstSimplify] add tests for icmp fold bug (PR39510); NFC Verify that set intersection/subset are not confused. llvm-svn: 345831	2018-11-01 14:03:22 +00:00
Max Kazantsev	3d347bf545	[IndVars] Smart hard uses detection When rewriting loop exit values, IndVars considers this transform not profitable if the loop instruction has a loop user which it believes cannot be optimized away. In current implementation only calls that immediately use the instruction are considered as such. This patch extends the definition of "hard" users to any side-effecting instructions (which usually cannot be optimized away from the loop) and also allows handling of not just immediate users, but use chains. Differentlai Revision: https://reviews.llvm.org/D51584 Reviewed By: etherzhhb llvm-svn: 345814	2018-11-01 06:47:01 +00:00
Sanjay Patel	72fe03f93b	revert rL345717 : [InstSimplify] fold icmp based on range of abs/nabs This can miscompile as shown in PR39510: https://bugs.llvm.org/show_bug.cgi?id=39510 llvm-svn: 345780	2018-10-31 21:37:40 +00:00
Sanjay Patel	b041831a1a	[InstCombine] add tests for fmin/fmax pattern matching failure; NFC llvm-svn: 345771	2018-10-31 20:03:27 +00:00
Sanjay Patel	886893883a	[InstCombine] regenerate test checks; NFC llvm-svn: 345757	2018-10-31 18:17:51 +00:00
Sanjay Patel	5bcec66c55	[InstCombine] add tests for fcmp with -0.0; NFC From IEEE754: "Comparisons shall ignore the sign of zero (so +0 = −0)." llvm-svn: 345752	2018-10-31 17:55:40 +00:00
Volkan Keles	3ca146d083	[InstCombine] Combine nested min/max intrinsics with constants Reviewers: arsenm, spatel Reviewed By: spatel Subscribers: lebedev.ri, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D53774 llvm-svn: 345751	2018-10-31 17:50:52 +00:00
Sanjay Patel	1c254c6716	[InstCombine] refactor fabs+fcmp fold; NFC Also, remove/replace/minimize/enhance the tests for this fold. The code drops FMF, so it needs more tests and at least 1 fix. llvm-svn: 345734	2018-10-31 16:34:43 +00:00
Sanjay Patel	d4dc30c20d	[InstSimplify] fold 'fcmp nnan ult X, 0.0' when X is not negative This is the inverted case for the transform added with D53874 / rL345725. llvm-svn: 345728	2018-10-31 15:35:46 +00:00
Sanjay Patel	85cba3b6fb	[InstSimplify] fold 'fcmp nnan oge X, 0.0' when X is not negative This re-raises some of the open questions about how to apply and use fast-math-flags in IR from PR38086: https://bugs.llvm.org/show_bug.cgi?id=38086 ...but given the current implementation (no FMF on casts), this is likely the only way to predicate the transform. This is part of solving PR39475: https://bugs.llvm.org/show_bug.cgi?id=39475 Differential Revision: https://reviews.llvm.org/D53874 llvm-svn: 345725	2018-10-31 14:57:23 +00:00

... 2 3 4 5 6 ...

11991 Commits