llvm-project

Commit Graph

Author	SHA1	Message	Date
Orlando Cazalet-Hyams	78a6062c24	[DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion Summary: Bug: https://bugs.llvm.org/show_bug.cgi?id=39024 The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here: A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins. B) Instructions in the middle block have different line numbers which give the impression of another iteration. In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks. Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel Reviewed By: hfinkel Subscribers: bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits Tags: #llvm, #debug-info Differential Revision: https://reviews.llvm.org/D60831 llvm-svn: 360162	2019-05-07 15:37:38 +00:00
Keno Fischer	a1a4adf4b9	[SCEV] Add explicit representations of umin/smin Summary: Currently we express umin as `~umax(~x, ~y)`. However, this becomes a problem for operands in non-integral pointer spaces, because `~x` is not something we can compute for `x` non-integral. However, since comparisons are generally still allowed, we are actually able to express `umin(x, y)` directly as long as we don't try to express is as a umax. Support this by adding an explicit umin/smin representation to SCEV. We do this by factoring the existing getUMax/getSMax functions into a new function that does all four. The previous two functions were largely identical. Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D50167 llvm-svn: 360159	2019-05-07 15:28:47 +00:00
Robert Lougher	07298c9b1e	Precommit tests for or/add transform. NFC. llvm-svn: 360149	2019-05-07 14:14:29 +00:00
Jordan Rupprecht	8f14e7cacf	Revert "Re-commit r357452: SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259)" This reverts r357452 (git commit `21eb771dcb`). This was causing strange optimization-related test failures on an internal test. Will followup with more details offline. llvm-svn: 360086	2019-05-06 21:55:05 +00:00
Sanjay Patel	a6019d5164	[InstCombine] sink FP negation of operands through select We don't always get this: Cond ? -X : -Y --> -(Cond ? X : Y) ...even with the legacy IR form of fneg in the case with extra uses, and we miss matching with the newer 'fneg' instruction because we are expecting binops through the rest of the path. Differential Revision: https://reviews.llvm.org/D61604 llvm-svn: 360075	2019-05-06 20:34:05 +00:00
Sanjay Patel	473dbf0301	[InstCombine] add tests for fneg+sel; NFC llvm-svn: 360058	2019-05-06 17:29:22 +00:00
Cameron McInally	c3167696bc	Add FNeg support to InstructionSimplify Differential Revision: https://reviews.llvm.org/D61573 llvm-svn: 360053	2019-05-06 16:05:10 +00:00
Sanjay Patel	3379fb599d	[InstCombine] regenerate test checks; NFC llvm-svn: 360052	2019-05-06 16:03:53 +00:00
Clement Courbet	9e1f2a7fe7	[SimplifyLibCalls] Simplify bcmp too. Summary: Fixes PR40699. Reviewers: gchatelet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61585 llvm-svn: 360021	2019-05-06 09:15:22 +00:00
Markus Lavin	a778074165	[DebugInfo] GlobalOpt DW_OP_deref_size instead of DW_OP_deref. Optimization pass lib/Transforms/IPO/GlobalOpt.cpp needs to insert DW_OP_deref_size instead of DW_OP_deref to be compatible with big-endian targets for same reasons as in D59687. Differential Revision: https://reviews.llvm.org/D60611 llvm-svn: 360013	2019-05-06 07:20:56 +00:00
Cameron McInally	1d0c845d9d	Add FNeg IR constant folding support llvm-svn: 359982	2019-05-05 16:07:09 +00:00
Cameron McInally	fd254e429e	Add InstCombine tests for FNeg instruction. llvm-svn: 359970	2019-05-04 14:56:08 +00:00
Sanjay Patel	5ab41a7a05	[CodeGenPrepare] limit overflow intrinsic matching to a single basic block (2nd try) This is a subset of the original commit from rL359879 which was reverted because it could crash when using the 'RemovedInstructions' structure that enables delayed deletion of dead instructions. The motivating compile-time win does not require that change though. We should get most of that win from this change alone. Using/updating a dominator tree to match math overflow patterns may be very expensive in compile-time (because of the way CGP uses a DT), so just handle the single-block case. See post-commit thread for rL354298 for more details: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190422/646276.html Differential Revision: https://reviews.llvm.org/D61075 llvm-svn: 359969	2019-05-04 12:46:32 +00:00
Evgeniy Stepanov	46ec57e576	Revert "[CodeGenPrepare] limit overflow intrinsic matching to a single basic block" This reverts commit r359879, which introduced a compiler crash. llvm-svn: 359908	2019-05-03 17:31:49 +00:00
Robert Lougher	e28ab93546	Revert r359549 - incorrect update of test checks. NFC llvm-svn: 359897	2019-05-03 15:14:19 +00:00
Sanjay Patel	d3cfaae243	[LICM] auto-generate complete test checks; NFC llvm-svn: 359881	2019-05-03 13:25:06 +00:00
Sanjay Patel	8ff072e48e	[CodeGenPrepare] limit overflow intrinsic matching to a single basic block Using/updating a dominator tree to match math overflow patterns may be very expensive in compile-time (because of the way CGP uses a DT), so just handle the single-block case. Also, we were restarting the iterator loops when doing the overflow intrinsic transforms by marking the dominator tree for update. That was done to prevent iterating over a removed instruction. But we can postpone the deletion using the existing "RemovedInsts" structure, and that means we don't need to update the DT. See post-commit thread for rL354298 for more details: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190422/646276.html Differential Revision: https://reviews.llvm.org/D61075 llvm-svn: 359879	2019-05-03 13:09:18 +00:00
Bob Haarman	a78ab77b6b	remove inalloca parameters in globalopt and simplify argpromotion Summary: Inalloca parameters require special handling in some optimizations. This change causes globalopt to strip the inalloca attribute from function parameters when it is safe to do so, removes the special handling for inallocas from argpromotion, and replaces it with a simple check that causes argpromotion to skip functions that receive inallocas (for when the pass is invoked on code that didn't run through globalopt first). This also avoids a case where argpromotion would incorrectly try to pass an inalloca in a register. Fixes PR41658. Reviewers: rnk, efriedma Reviewed By: rnk Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61286 llvm-svn: 359743	2019-05-02 00:37:36 +00:00
Hiroshi Yamauchi	1620104034	[PGO][CHR] A bug fix. Summary: Fix a transformation bug where two scopes share a common instrution to hoist. Reviewers: davidxl Reviewed By: davidxl Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61405 llvm-svn: 359736	2019-05-01 22:49:52 +00:00
Hubert Tong	02d055a269	[tests] Add host-byteorder-*-endian; update XFAILs of big-endian triples Summary: Triple components in `XFAIL` lines are tested against the target triple. Various tests that are expected to fail on big-endian hosts are marked as being `XFAIL` for big-endian targets. This patch corrects these tests by having them test against a new `host-byteorder-big-endian` feature. Reviewers: xingxue, sfertile, jasonliu Reviewed By: xingxue Subscribers: jvesely, nhaehnle, fedor.sergeev, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60551 llvm-svn: 359689	2019-05-01 15:36:18 +00:00
Philip Reames	84e54eb471	[InstCombine] Limit a vector demanded elts rule which was producing invalid IR. The demanded elts rules introduced for GEPs in https://reviews.llvm.org/rL356293 replaced vector constants with undefs (by design). It turns out that the LangRef disallows such cases when indexing structs. The right fix is probably to relax the langref requirement, and update other passes to expect the result, but for the moment, limit the transform to avoid compiler crashes. This should fix https://bugs.llvm.org/show_bug.cgi?id=41624. llvm-svn: 359633	2019-04-30 23:09:26 +00:00
Alina Sbirlea	4e1ac95cf5	[PassManagerBuilder] Add option for interleaved loops, for loop vectorize. Summary: Match NewPassManager behavior: add option for interleaved loops in the old pass manager, and use that instead of the flag used to disable loop unroll. No changes in the defaults. Reviewers: chandlerc Subscribers: mehdi_amini, jlebar, dmgreen, hsaito, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61030 llvm-svn: 359615	2019-04-30 21:29:20 +00:00
Simon Pilgrim	83098d28a1	[SLP] Lit test that cannot get vectorized due to lack of look-ahead operand reordering heuristic. The code in this test is not vectorized by SLP because its operand reordering cannot look beyond the immediate predecessors. This will get fixed in a follow-up patch that introduces the look-ahead operand reordering heuristic. Committed on behalf of @vporpo (Vasileios Porpodas) Differential Revision: https://reviews.llvm.org/D61283 llvm-svn: 359553	2019-04-30 11:03:09 +00:00
Jeremy Morse	562f5f04f5	Update checks in an instcombine test, NFC This reduces the delta in some incoming work that changes this test. llvm-svn: 359549	2019-04-30 10:56:33 +00:00
Quentin Colombet	ae2cbb3400	[BlockExtractor] Change the basic block separator from ',' to ';' This change aims at making the file format be compatible with the way LLVM handles command line options. Differential Revision: https://reviews.llvm.org/D60970 llvm-svn: 359462	2019-04-29 16:14:00 +00:00
Simon Pilgrim	46128cdf08	[InstCombine][X86] Add PACKSS tests for truncation of sign-extended comparisons llvm-svn: 359435	2019-04-29 10:36:20 +00:00
Dan Robertson	9e441aee50	[NFC] Add baseline tests for int isKnownNonZero Add baseline tests for improvements of isKnownNonZero for integer types. Differential Revision: https://reviews.llvm.org/D60932 llvm-svn: 359267	2019-04-26 02:55:54 +00:00
Akira Hatanaka	8edf8f317b	[ObjC][ARC] Let ARC optimizer bail out if the number of pointer states it keeps track of becomes too large ARC optimizer does a top-down and a bottom-up traversal of the whole function to pair up retain and release instructions and remove them. This can be expensive if the number of instructions in the function and pointer states it tracks are large since it has to look at each pointer state and determine whether the instruction being visited can potentially use the pointer. This patch adds a command line option that sets a limit to the number of pointers it tracks. rdar://problem/49477063 Differential Revision: https://reviews.llvm.org/D61100 llvm-svn: 359226	2019-04-25 19:42:55 +00:00
Robert Lougher	d469133f95	[Evaluator] Walk initial elements when handling load through bitcast When evaluating a store through a bitcast, the evaluator tries to move the bitcast from the pointer onto the stored value. If the cast is invalid, it tries to "introspect" the type to get a valid cast by obtaining a pointer to the initial element (if the type is nested, this may require walking several initial elements). In some situations it is possible to get a bitcast on a load (e.g. with unions, where the bitcast may not be the same type as the store). However, equivalent logic to the store to introspect the type is missing. This patch add this logic. Note, when developing the patch I was unhappy with adding similar logic directly to the load case as it could get out of step. Instead, I have abstracted the "introspection" into a helper function, with the specifics being handled by a passed-in lambda function. Differential Revision: https://reviews.llvm.org/D60793 llvm-svn: 359205	2019-04-25 17:00:01 +00:00
Simon Pilgrim	86ff9d313a	[InstCombine][X86] Add PACKSS/PACKUS tests for truncation where saturation won't occur llvm-svn: 359185	2019-04-25 12:45:11 +00:00
Roman Lebedev	445c22b7eb	[NFC][LoopIdiomRecognize] Some basic baseline tests for bcmp loop idiom Doubt this is the final test coverage, but this appears to have good coverage already, so i figure i might as well precommit it. llvm-svn: 359173	2019-04-25 08:33:47 +00:00
Alina Sbirlea	733c8c40c8	Enable LoopVectorization by default. Summary: When refactoring vectorization flags, vectorization was disabled by default in the new pass manager. This patch re-enables is for both managers, and changes the assumptions opt makes, based on the new defaults. Comments in opt.cpp should clarify the intended use of all flags to enable/disable vectorization. Reviewers: chandlerc, jgorbe Subscribers: jlebar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61091 llvm-svn: 359167	2019-04-25 04:49:48 +00:00
Alexey Bataev	ef3c1884ec	[SLP] Fix crash after r358519, by V. Porpodas. Summary: The code did not check if operand was undef before casting it to Instruction. Reviewers: RKSimon, ABataev, dtemirbulatov Reviewed By: ABataev Subscribers: uabelho Tags: #llvm Differential Revision: https://reviews.llvm.org/D61024 llvm-svn: 359136	2019-04-24 20:21:32 +00:00
Dmitry Mikulin	312b5f86b7	The error message for mismatched value sites is very cryptic. Make it more readable for an average user. Differential Revision: https://reviews.llvm.org/D60896 llvm-svn: 359043	2019-04-23 22:26:55 +00:00
Akira Hatanaka	5c3117b0a9	[ObjC][ARC] Check the basic block size before calling DominatorTree::dominate. ARC contract pass has an optimization that replaces the uses of the argument of an ObjC runtime function call with the call result. For example: ; Before optimization %1 = tail call i8* @foo1() %2 = tail call i8* @llvm.objc.retainAutoreleasedReturnValue(i8* %1) store i8* %1, i8** @g0, align 8 ; After optimization %1 = tail call i8* @foo1() %2 = tail call i8* @llvm.objc.retainAutoreleasedReturnValue(i8* %1) store i8* %2, i8** @g0, align 8 // %1 is replaced with %2 Before replacing the argument use, DominatorTree::dominate is called to determine whether the user instruction is dominated by the ObjC runtime function call instruction. The call to DominatorTree::dominate can be expensive if the two instructions belong to the same basic block and the size of the basic block is large. This patch checks the basic block size and just bails out if the size exceeds the limit set by command line option "arc-contract-max-bb-size". rdar://problem/49477063 Differential Revision: https://reviews.llvm.org/D60900 llvm-svn: 359027	2019-04-23 19:49:03 +00:00
Philip Reames	2ce017026a	[InstCombine] Convert a masked.load of a dereferenceable address to an unconditional load If we have a masked.load from a location we know to be dereferenceable, we can simply issue a speculative unconditional load against that address. The key advantage is that it produces IR which is well understood by the optimizer. The select (cnd, load, passthrough) form produced should be pattern matchable back to hardware predication if profitable. Differential Revision: https://reviews.llvm.org/D59703 llvm-svn: 359000	2019-04-23 15:25:14 +00:00
David Green	63a2aa715a	[LSR] Limit the recursion for setup cost In some circumstances we can end up with setup costs that are very complex to compute, even though the scevs are not very complex to create. This can also lead to setupcosts that are calculated to be exactly -1, which LSR treats as an invalid cost. This patch puts a limit on the recursion depth for setup cost to prevent them taking too long. Thanks to @reames for the report and test case. Differential Revision: https://reviews.llvm.org/D60944 llvm-svn: 358958	2019-04-23 08:52:21 +00:00
Philip Reames	d748689c7f	[InstCombine] Eliminate stores to constant memory If we have a store to a piece of memory which is known constant, then we know the store must be storing back the same value. As a result, the store (or memset, or memmove) must either be down a dead path, or a noop. In either case, it is valid to simply remove the store. The motivating case for this involves a memmove to a buffer which is constant down a path which is dynamically dead. Note that I'm choosing to implement the less aggressive of two possible semantics here. We could simply say that the store is undefined, and prune the path. Consensus in the review was that the more aggressive form might be a good follow on change at a later date. Differential Revision: https://reviews.llvm.org/D60659 llvm-svn: 358919	2019-04-22 20:28:19 +00:00
Philip Reames	f01583d097	[Tests] Revise a test as requested by reviewer in D59703 llvm-svn: 358907	2019-04-22 18:51:58 +00:00
Philip Reames	8f47089034	[Tests] Add a negative test for masked.gather part of D59703 llvm-svn: 358906	2019-04-22 18:28:44 +00:00
Serguei Katkov	40a3b96196	[NewPM] Add Option handling for SimpleLoopUnswitch This patch enables passing options to SimpleLoopUnswitch via the passes pipeline. Reviewers: chandlerc, fedor.sergeev, leonardchan, philip.pfaffe Reviewed By: fedor.sergeev Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D60676 llvm-svn: 358880	2019-04-22 10:35:07 +00:00
Serguei Katkov	5614f4a3a5	[NewPM] Add dummy Test for LoopVectorize option parsing. llvm-svn: 358878	2019-04-22 09:53:26 +00:00
Luqman Aden	2993661cc0	[CorrelatedValuePropagation] Mark subs that we know not to wrap with nuw/nsw. Summary: Teach CorrelatedValuePropagation to also handle sub instructions in addition to add. Relatively simple since makeGuaranteedNoWrapRegion already understood sub instructions. Only subtle change is which range is passed as "Other" to that function, since sub isn't commutative. Note that CorrelatedValuePropagation::processAddSub is still hidden behind a default-off flag as IndVarSimplify hasn't yet been fixed to strip the added nsw/nuw flags and causes a miscompile. (PR31181) Reviewers: sanjoy, apilipenko, nikic Reviewed By: nikic Subscribers: hiraditya, jfb, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60036 llvm-svn: 358816	2019-04-20 13:14:18 +00:00
Nikita Popov	d89de3f7f4	[IndVarSimplify] Generate full checks for some LFTR tests; NFC llvm-svn: 358813	2019-04-20 12:05:53 +00:00
Nikita Popov	aa0c5a022f	[IndVarSimplify] Add tests for PR31181; NFC llvm-svn: 358812	2019-04-20 12:05:43 +00:00
Nikita Popov	2e33f8de57	[CVP] Add tests for sub nowrap inference; NFC These are baseline tests for D60036. Patch by Luqman Aden. llvm-svn: 358808	2019-04-20 07:43:15 +00:00
Vedant Kumar	282b26ec4d	[GVN+LICM] Use line 0 locations for better crash attribution This is a follow-up to r291037+r291258, which used null debug locations to prevent jumpy line tables. Using line 0 locations achieves the same effect, but works better for crash attribution because it preserves the right inline scope. Differential Revision: https://reviews.llvm.org/D60913 llvm-svn: 358791	2019-04-19 22:36:40 +00:00
Fangrui Song	884f557bb2	[MergeFunc] removeUsers: call remove() only on direct users removeUsers uses a work list to collect indirect users and call remove() on those functions. However it has a bug (`if (!Visited.insert(UU).second)`). Actually, we don't have to collect indirect users. After the merge of F and G, G's callers will be considered (added to Deferred). If G's callers can be merged, G's callers' callers will be considered. Update the test unnamed-addr-reprocessing.ll to make it clear we can still merge indirect callers. llvm-svn: 358741	2019-04-19 07:57:51 +00:00
Saleem Abdulrasool	b96d9b3419	MergeFunc: preserve COMDAT information when creating a thunk We would previously drop the COMDAT on the thunk we generated when replacing a function body with the forwarding thunk. This would result in a function that may have been multiply emitted and multiply merged to be emitted with the same name without the COMDAT. This is a hard error with PE/COFF where the COMDAT is used for the deduplication of Value Witness functions for Swift. llvm-svn: 358728	2019-04-19 01:48:36 +00:00
Philip Reames	137995d8da	[GuardWidening] Wire up a NPM version of the LoopGuardWidening pass llvm-svn: 358704	2019-04-18 19:17:14 +00:00
Quentin Colombet	ea3364bf85	[BlockExtractor] Extend the file format to support the grouping of basic blocks Prior to this patch, each basic block listed in the extrack-blocks-file would be extracted to a different function. This patch adds the support for comma separated list of basic blocks to form group. When the region formed by a group is not extractable, e.g., not single entry, all the blocks of that group are left untouched. Let us see this new format in action (comments are not part of the file format): ;; funcName bbName[,bbName...] foo bb1 ;; Extract bb1 in its own function foo bb2,bb3 ;; Extract bb2,bb3 in their own function bar bb1,bb4 ;; Extract bb1,bb4 in their own function bar bb2 ;; Extract bb2 in its own function Assuming all regions are extractable, this will create one function and thus one call per region. Differential Revision: https://reviews.llvm.org/D60746 llvm-svn: 358701	2019-04-18 18:28:30 +00:00
Philip Reames	adf288c5d9	[LoopPred] Fix a blatantly obvious bug in r358684 The bug is that I didn't check whether the operand of the invariant_loads were themselves invariant. I don't know how this got missed in the patch and review. I even had an unreduced test case locally, and I remember handling this case, but I must have lost it in one of the rebases. Oops. llvm-svn: 358688	2019-04-18 17:01:19 +00:00
Philip Reames	92a7177e6b	[LoopPredication] Allow predication of loop invariant computations (within the loop) The purpose of this patch is to eliminate a pass ordering dependence between LoopPredication and LICM. To understand the purpose, consider the following snippet of code inside some loop 'L' with IV 'i' A = _a.length; guard (i < A) a = _a[i] B = _b.length; guard (i < B); b = _b[i]; ... Z = _z.length; guard (i < Z) z = _z[i] accum += a + b + ... + z; Today, we need LICM to hoist the length loads, LoopPredication to make the guards loop invariant, and TrivialUnswitch to eliminate the loop invariant guard to establish must execute for the next length load. Today, if we can't prove speculation safety, we'd have to iterate these three passes 26 times to reduce this example down to the minimal form. Using the fact that the array lengths are known to be invariant, we can short circuit this iteration. By forming the loop invariant form of all the guards at once, we remove the need for LoopPredication from the iterative cycle. At the moment, we'd still have to iterate LICM and TrivialUnswitch; we'll leave that part for later. As a secondary benefit, this allows LoopPred to expose peeling oppurtunities in a much more obvious manner. See the udiv test changes as an example. If the udiv was not hoistable (i.e. we couldn't prove speculation safety) this would be an example where peeling becomes obviously profitable whereas it wasn't before. A couple of subtleties in the implementation: - SCEV's isSafeToExpand guarantees speculation safety (i.e. let's us expand at a new point). It is not a precondition for expansion if we know the SCEV corresponds to a Value which dominates the requested expansion point. - SCEV's isLoopInvariant returns true for expressions which compute the same value across all iterations executed, regardless of where the original Value is located. (i.e. it can be in the loop) This implies we have a speculation burden to prove before expanding them outside loops. - invariant_loads and AA->pointsToConstantMemory are two cases that SCEV currently does not handle, but meets the SCEV definition of invariance. I plan to sink this part into SCEV once this has baked for a bit. Differential Revision: https://reviews.llvm.org/D60093 llvm-svn: 358684	2019-04-18 16:33:17 +00:00
Kit Barton	3cdf87940f	Add basic loop fusion pass. This patch adds a basic loop fusion pass. It will fuse loops that conform to the following 4 conditions: 1. Adjacent (no code between them) 2. Control flow equivalent (if one loop executes, the other loop executes) 3. Identical bounds (both loops iterate the same number of iterations) 4. No negative distance dependencies between the loop bodies. The pass does not make any changes to the IR to create opportunities for fusion. Instead, it checks if the necessary conditions are met and if so it fuses two loops together. The pass has not been added to the pass pipeline yet, and thus is not enabled by default. It can be run stand alone using the -loop-fusion option. Differential Revision: https://reviews.llvm.org/D55851 llvm-svn: 358607	2019-04-17 18:53:27 +00:00
Steven Wu	05a358cdcd	[ThinLTO] Fix ThinLTOCodegenerator to export llvm.used symbols Summary: Reapply r357931 with fixes to ThinLTO testcases and llvm-lto tool. ThinLTOCodeGenerator currently does not preserve llvm.used symbols and it can internalize them. In order to pass the necessary information to the legacy ThinLTOCodeGenerator, the input to the code generator is rewritten to be based on lto::InputFile. Now ThinLTO using the legacy LTO API will requires data layout in Module. "internalize" thinlto action in llvm-lto is updated to run both "promote" and "internalize" with the same configuration as ThinLTOCodeGenerator. The old "promote" + "internalize" option does not produce the same output as ThinLTOCodeGenerator. This fixes: PR41236 rdar://problem/49293439 Reviewers: tejohnson, pcc, kromanova, dexonsmith Reviewed By: tejohnson Subscribers: ormris, bd1976llvm, mehdi_amini, inglorion, eraman, hiraditya, jkorous, dexonsmith, arphaman, dang, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60421 llvm-svn: 358601	2019-04-17 17:38:09 +00:00
Nikita Popov	2039581002	[LVI][CVP] Constrain values in with.overflow branches If a branch is conditional on extractvalue(op.with.overflow(%x, C), 1) then we can constrain the value of %x inside the branch based on makeGuaranteedNoWrapRegion(). We do this by extending the edge-value handling in LVI. This allows CVP to then fold comparisons against %x, as illustrated in the tests. Differential Revision: https://reviews.llvm.org/D60650 llvm-svn: 358597	2019-04-17 16:57:42 +00:00
Florian Hahn	893aea58ea	[LoopUnroll] Allow unrolling if the unrolled size does not exceed loop size. Summary: In the following cases, unrolling can be beneficial, even when optimizing for code size: 1) very low trip counts 2) potential to constant fold most instructions after fully unrolling. We can unroll in those cases, by setting the unrolling threshold to the loop size. This might highlight some cost modeling issues and fixing them will have a positive impact in general. Reviewers: vsk, efriedma, dmgreen, paquette Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D60265 llvm-svn: 358586	2019-04-17 15:57:43 +00:00
Roman Lebedev	0080645846	[CVP] processOverflowIntrinsic(): don't crash if constant-holding happened As reported by Mikael Holmén in post-commit review in https://reviews.llvm.org/D60791#1469765 llvm-svn: 358559	2019-04-17 06:35:07 +00:00
Eric Christopher	e29874eaa0	Revert "Add basic loop fusion pass." Per request. This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda. llvm-svn: 358553	2019-04-17 04:55:24 +00:00
Eric Christopher	cee313d288	Revert "Temporarily Revert "Add basic loop fusion pass."" The reversion apparently deleted the test/Transforms directory. Will be re-reverting again. llvm-svn: 358552	2019-04-17 04:52:47 +00:00
Eric Christopher	a863435128	Temporarily Revert "Add basic loop fusion pass." As it's causing some bot failures (and per request from kbarton). This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda. llvm-svn: 358546	2019-04-17 02:12:23 +00:00
Kit Barton	ab70da0728	Add basic loop fusion pass. This patch adds a basic loop fusion pass. It will fuse loops that conform to the following 4 conditions: 1. Adjacent (no code between them) 2. Control flow equivalent (if one loop executes, the other loop executes) 3. Identical bounds (both loops iterate the same number of iterations) 4. No negative distance dependencies between the loop bodies. The pass does not make any changes to the IR to create opportunities for fusion. Instead, it checks if the necessary conditions are met and if so it fuses two loops together. The pass has not been added to the pass pipeline yet, and thus is not enabled by default. It can be run stand alone using the -loop-fusion option. Phabricator: https://reviews.llvm.org/D55851 llvm-svn: 358543	2019-04-17 01:37:00 +00:00
Sanjay Patel	e08783e2f5	[EarlyCSE] detect equivalence of selects with inverse conditions and commuted operands (PR41101) This is 1 of the problems discussed in the post-commit thread for: rL355741 / http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190311/635516.html and filed as: https://bugs.llvm.org/show_bug.cgi?id=41101 Instcombine tries to canonicalize some of these cases (and there's room for improvement there independently of this patch), but it can't always do that because of extra uses. So we need to recognize these commuted operand patterns here in EarlyCSE. This is similar to how we detect commuted compares and commuted min/max/abs. Differential Revision: https://reviews.llvm.org/D60723 llvm-svn: 358523	2019-04-16 20:41:20 +00:00
Nikita Popov	52b24ee932	[CVP] Simplify umulo and smulo that cannot overflow If a umul.with.overflow or smul.with.overflow operation cannot overflow, simplify it to a simple mul nuw / mul nsw. After the refactoring in D60668 this is just a matter of removing an explicit check against multiplications. Differential Revision: https://reviews.llvm.org/D60791 llvm-svn: 358521	2019-04-16 20:31:41 +00:00
Simon Pilgrim	82ffa88a04	[SLP] Refactoring of the operand reordering code. This is a refactoring patch which should have all the functionality of the current code. Its goal is twofold: i. Cleanup and simplify the reordering code, and ii. Generalize reordering so that it will work for an arbitrary number of operands, not just 2. This is the second patch in a series of patches that will enable operand reordering across chains of operations. An example of this was presented in EuroLLVM'18 https://www.youtube.com/watch?v=gIEn34LvyNo . Committed on behalf of @vporpo (Vasileios Porpodas) Differential Revision: https://reviews.llvm.org/D59973 llvm-svn: 358519	2019-04-16 19:27:00 +00:00
Nikita Popov	5a30177906	[CVP] Add tests for non-overflowing mulo; NFC Should be simplified to simple mul. llvm-svn: 358517	2019-04-16 19:25:35 +00:00
Nikita Popov	5ecd6a48b9	[InstCombine] Prune fshl/fshr with masked operands If a constant shift amount is used, then only some of the LHS/RHS operand bits are demanded and we may be able to simplify based on that. InstCombineSimplifyDemanded already had the necessary support for that, we just weren't calling it with fshl/fshr as root. In particular, this allows us to relax some masked funnel shifts into simple shifts, as shown in the tests. Patch by Shawn Landden. Differential Revision: https://reviews.llvm.org/D60660 llvm-svn: 358515	2019-04-16 19:05:49 +00:00
Nikita Popov	f700081a7d	[InstCombine] Add tests for fshl/fshr with masked operands; NFC Baseline tests for D60660. Patch by Shawn Landden. Differential Revision: https://reviews.llvm.org/D60688 llvm-svn: 358514	2019-04-16 19:05:40 +00:00
Philip Reames	c44b68e2b7	[Tests] Add branch_weights to latches so that test is not effected by future profitability patch to LoopPredication llvm-svn: 358506	2019-04-16 16:32:59 +00:00
Hans Wennborg	21eb771dcb	Re-commit r357452: SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259) The original commit caused false positives from AddressSanitizer's use-after-scope checks, which have now been fixed in r358478. > The code was previously checking that candidates for sinking had exactly > one use or were a store instruction (which can't have uses). This meant > we could sink call instructions only if they had a use. > > That limitation seemed a bit arbitrary, so this patch changes it to > "instruction has zero or one use" which seems more natural and removes > the need to special-case stores. > > Differential revision: https://reviews.llvm.org/D59936 llvm-svn: 358483	2019-04-16 12:13:25 +00:00
Quentin Colombet	fda0426888	[LSR] Rewrite misses some fixup locations if it splits critical edge If LSR split critical edge during rewriting phi operands and phi node has other pending fixup operands, we need to update those pending fixups. Otherwise formulae will not be implemented completely and some instructions will not be eliminated. llvm.org/PR41445 Differential Revision: https://reviews.llvm.org/D60645 Patch by: Denis Bakhvalov <denis.bakhvalov@intel.com> llvm-svn: 358457	2019-04-15 22:23:46 +00:00
Sanjay Patel	800a0c3e4b	[EarlyCSE] add more tests for double-negated select condition; NFC llvm-svn: 358454	2019-04-15 21:51:51 +00:00
Sanjay Patel	5ae05d810c	[EarlyCSE] add test for select condition double-negation; NFC llvm-svn: 358444	2019-04-15 20:25:31 +00:00
Philip Reames	af808ee2ee	[Tests] Add a few more tests for LoopPredication w/invariant loads Making sure to cover an important legality cornercase. llvm-svn: 358439	2019-04-15 19:45:27 +00:00
Wolfgang Pieb	4fe42214e2	[DEBUGINFO] Prevent Instcombine from dropping debuginfo when removing zexts Zexts can be treated like no-op casts when it comes to assessing whether their removal affects debug info. Reviewer: aprantl Differential Revision: https://reviews.llvm.org/D60641 llvm-svn: 358431	2019-04-15 17:36:29 +00:00
Hiroshi Yamauchi	09e539fcae	[PGO] Profile guided code size optimization. Summary: Enable some of the existing size optimizations for cold code under PGO. A ~5% code size saving in big internal app under PGO. The way it gets BFI/PSI is discussed in the RFC thread http://lists.llvm.org/pipermail/llvm-dev/2019-March/130894.html Note it doesn't currently touch loop passes. Reviewers: davidxl, eraman Reviewed By: eraman Subscribers: mgorny, javed.absar, smeenai, mehdi_amini, eraman, zzheng, steven_wu, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59514 llvm-svn: 358422	2019-04-15 16:49:00 +00:00
Sanjay Patel	0e0bb0e24a	[EarlyCSE] add tests for selects with commuted operands (PR41101); NFC llvm-svn: 358420	2019-04-15 16:01:05 +00:00
Philip Reames	fbe64a2cfb	[LoopPred] Hoist and of predicated checks where legal If we have multiple range checks which can be predicated, hoist the and of the results outside the loop. This minorly cleans up the resulting IR, but the main motivation is as a building block for D60093. llvm-svn: 358419	2019-04-15 15:53:25 +00:00
Sanjay Patel	c71433335a	[EarlyCSE] regenerate test checks; NFC llvm-svn: 358407	2019-04-15 14:02:37 +00:00
Sanjay Patel	5e13cd2e61	[InstCombine] canonicalize fdiv after fmul if reassociation is allowed (X / Y) * Z --> (X * Z) / Y This can allow other optimizations/reassociations as shown in the test diffs. llvm-svn: 358404	2019-04-15 13:23:38 +00:00
Serguei Katkov	f54328372b	[NewPM] Add Option handling for SimplifyCFG This patch enables passing options to SimplifyCFGPass via the passes pipeline. Reviewers: chandlerc, fedor.sergeev, leonardchan, philip.pfaffe Reviewed By: fedor.sergeev Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D60675 llvm-svn: 358379	2019-04-15 08:57:53 +00:00
Philip Reames	0eeb2cd491	[Tests] Add tests for D60659, and make adjustments to others to make diff clear Three related changes: 1) auto-gen several test files 2) Add the new tests at the bottom of said files 3) Adjust a couple of other test files not to use stores to constants when trying to test constexpr address handling llvm-svn: 358344	2019-04-13 22:12:56 +00:00
Nikita Popov	040871db48	[CVP] Add tests for range of with.overflow result; NFC Test range of with.overflow result in the no-overflow branch. llvm-svn: 358341	2019-04-13 19:43:51 +00:00
Nikita Popov	41e284b9c3	[CVP] Fix inverted predicates in test; NFC Checked the wrong direction in the umul tests... fix predicated to line up with the test name. llvm-svn: 358331	2019-04-13 11:47:36 +00:00
Nikita Popov	25c1aa15a7	[CVP] Add tests for with.overflow used as condition; NFC llvm-svn: 358330	2019-04-13 11:40:16 +00:00
Chen Zheng	87dd0e06dc	[InstCombine] Canonicalize (-X srem Y) to -(X srem Y). Differential Revision: https://reviews.llvm.org/D60647 llvm-svn: 358328	2019-04-13 09:21:22 +00:00
Chen Zheng	fc59a0326b	[InstCombine] [NFC] add testcases for canonicalizing (-X srem Y) to -(X srem Y). llvm-svn: 358327	2019-04-13 07:34:55 +00:00
Philip Reames	b091cc081d	[InstCombine] Fix a nasty miscompile introduced w/masked.gather demanded elts This fixes a miscompile which was introduced in r356510 (https://reviews.llvm.org/D57372). The problem is that the original patch removed pointer operands where the load results we're demanded, but without considering the legality of the load itself. If the masked.gather had active, but undemanded, lanes, then we could end up creating a load which loaded from an undef address. The result could be a segfault, or, in theory, an arbitrary read from a random memory location into an used register. llvm-svn: 358299	2019-04-12 18:26:56 +00:00
Nikita Popov	00a0d5d1de	[CVP] Set NSW/NUW flags when simplifying with.overflow When CVP determines that a with.overflow intrinsic cannot overflow, it currently inserts a simple add/sub. As we already determined that there can be no overflow, we should add the appropriate NUW/NSW flag. Differential Revision: https://reviews.llvm.org/D60585 llvm-svn: 358298	2019-04-12 18:18:17 +00:00
Philip Reames	7a60cd38af	[Tests] Checkin a test demonstrating a miscompile so that patch which fixes it shows a clear diff llvm-svn: 358296	2019-04-12 18:11:58 +00:00
Jeremy Morse	32afe6a1f8	[DebugInfo] Fix pr41175 Dead Store Elimination missing debug loc Bug: https://bugs.llvm.org/show_bug.cgi?id=41175 In the bug test case the DSE pass is shortening the range of memory that a memset is working on. A getelementptr is generated so that the new starting address can be passed to memset. This instruction was not given a DebugLoc. To fix the bug, copy the DebugLoc from the memset instruction. Patch by Orlando Cazalet-Hyams! Differential Revision: https://reviews.llvm.org/D60556 llvm-svn: 358270	2019-04-12 09:47:35 +00:00
Fangrui Song	d5c404246f	[ConstantFold] Don't evaluate FP or FP vector casts or truncations when simplifying icmp Fix PR41476 llvm-svn: 358262	2019-04-12 07:34:30 +00:00
Nikita Popov	6ffa1511ea	[CVP] Generate full test checks for overflows.ll; NFC llvm-svn: 358229	2019-04-11 21:10:39 +00:00
Rong Xu	959ef16859	[PGO] Better handling of profile hash mismatch We currently assume profile hash conflicts will be caught by an upfront check and we assert for the cases that escape the check. The assumption is not always true as there are chances of conflict. This patch prints a warning and skips annotating the function for the escaped cases,. Differential Revision: https://reviews.llvm.org/D60154 llvm-svn: 358225	2019-04-11 20:54:17 +00:00
Simon Pilgrim	8d083c5e0b	[ConstantFold] ExtractConstantBytes - handle shifts on large integer types Use APInt instead of getZExtValue from the ConstantInt until we can confirm that the shift amount is in range. Reduced from OSS-Fuzz #14169 - https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=14169 llvm-svn: 358192	2019-04-11 16:39:31 +00:00
Erik Pilkington	cb5c7bd9eb	Fix a hang when lowering __builtin_dynamic_object_size If the ObjectSizeOffsetEvaluator fails to fold the object size call, then it may litter some unused instructions in the function. When done repeatably in InstCombine, this results in an infinite loop. Fix this by tracking the set of instructions that were inserted, then removing them on failure. rdar://49172227 Differential revision: https://reviews.llvm.org/D60298 llvm-svn: 358146	2019-04-10 23:42:11 +00:00
Nikita Popov	0a8228fd28	[InstCombine] Handle ssubo always overflow Following D60483 and D60497, this adds support for AlwaysOverflows handling for ssubo. This is the last case we can handle right now. Differential Revision: https://reviews.llvm.org/D60518 llvm-svn: 358100	2019-04-10 16:32:15 +00:00
Nikita Popov	7a543c3758	[InstCombine] ssubo X, C -> saddo X, -C ssubo X, C is equivalent to saddo X, -C. Make the transformation in InstCombine and allow the logic implemented for saddo to fold prior usages of add nsw or sub nsw with constants. Patch by Dan Robertson. Differential Revision: https://reviews.llvm.org/D60061 llvm-svn: 358099	2019-04-10 16:27:36 +00:00
Nikita Popov	ef23e88480	[InstCombine] Handle saddo always overflow Followup to D60483: Handle AlwaysOverflow conditions for saddo as well. Differential Revision: https://reviews.llvm.org/D60497 llvm-svn: 358095	2019-04-10 16:18:01 +00:00
David Stenberg	fab4bdf4b9	Add REQUIRES: asserts to test using -debug-only llvm-svn: 358057	2019-04-10 08:44:57 +00:00
Florian Hahn	db1a69c250	[VPLAN] Minor improvement to testing and debug messages. 1. Use computed VF for stress testing. 2. If the computed VF does not produce vector code (VF smaller than 2), force VF to be 4. 3. Test vectorization of i64 data on AArch64 to make sure we generate VF != 4 (on X86 that was already tested on AVX). Patch by Francesco Petrogalli <francesco.petrogalli@arm.com> Differential Revision: https://reviews.llvm.org/D59952 llvm-svn: 358056	2019-04-10 08:17:28 +00:00
Nikita Popov	09020ec2a7	[InstCombine] Handle usubo always overflow Check AlwaysOverflow condition for usubo. The implementation is the same as the existing handling for uaddo and umulo. Handling for saddo and ssubo will follow (smulo doesn't have the necessary ValueTracking support). Differential Revision: https://reviews.llvm.org/D60483 llvm-svn: 358052	2019-04-10 07:10:53 +00:00
Chen Zheng	5e13ff1da2	[InstCombine] Canonicalize (-X s/ Y) to -(X s/ Y). Differential Revision: https://reviews.llvm.org/D60395 llvm-svn: 358050	2019-04-10 06:52:09 +00:00
Akira Hatanaka	9ca9d32b6b	[ObjC][ARC] Convert the retainRV marker that is passed as a named metadata into a module flag in the auto-upgrader and make the ARC contract pass read the marker as a module flag. This is needed to fix a bug where ARC contract wasn't inserting the retainRV marker when LTO was enabled, which caused objects returned from a function to be auto-released. rdar://problem/49464214 Differential Revision: https://reviews.llvm.org/D60303 llvm-svn: 358047	2019-04-10 06:20:20 +00:00
Nikita Popov	c176b708e4	[InstCombine] Add with.overflow always overflow tests; NFC The uadd and umul cases are currently handled, the usub, sadd, ssub and smul cases are not. usub, sadd and ssub already have the necessary ValueTracking support, smul doesn't. llvm-svn: 358031	2019-04-09 20:02:23 +00:00
Nikita Popov	2f5e9de8d1	Revert "[InstCombine] [InstCombine] Canonicalize (-X s/ Y) to -(X s/ Y)." This reverts commit `1383a91689`. sdiv-canonicalize.ll fails after this revision. The fold needs to be moved outside the branch handling constant operands. However when this is done there are further test changes, so I'm reverting this in the meantime. llvm-svn: 358026	2019-04-09 18:32:38 +00:00
Nikita Popov	4b2323d1a3	[ValueTracking] Use computeConstantRange() for signed sub overflow determination This is the same change as D60420 but for signed sub rather than signed add: Range information is intersected into the known bits result, allows to detect more no/always overflow conditions. Differential Revision: https://reviews.llvm.org/D60469 llvm-svn: 358020	2019-04-09 17:01:49 +00:00
Chen Zheng	1383a91689	[InstCombine] [InstCombine] Canonicalize (-X s/ Y) to -(X s/ Y). Differential Revision: https://reviews.llvm.org/D60395 llvm-svn: 358017	2019-04-09 16:34:31 +00:00
Nikita Popov	10edd2b79d	[ValueTracking] Use computeConstantRange() in signed add overflow determination This is D59386 for the signed add case. The computeConstantRange() result is now intersected into the existing known bits information, allowing to detect additional no-overflow/always-overflow conditions (though the latter isn't used yet). This (finally...) covers the motivating case from D59071. Differential Revision: https://reviews.llvm.org/D60420 llvm-svn: 358014	2019-04-09 16:12:59 +00:00
Sanjay Patel	49d9d17a77	[InstCombine] prevent possible miscompile with sdiv+negate of vector op Similar to: rL358005 Forego folding arbitrary vector constants to fix a possible miscompile bug. We can enhance the transform if we do want to handle the more complicated vector case. llvm-svn: 358013	2019-04-09 15:13:03 +00:00
Sanjay Patel	d5173f5acf	[InstCombine] add tests for sdiv with negated dividend and constant divisor; NFC llvm-svn: 358010	2019-04-09 14:48:44 +00:00
Sanjay Patel	7563b65ad4	[InstCombine] add tests for sdiv-by-int-min; NFC llvm-svn: 358008	2019-04-09 14:27:07 +00:00
Sanjay Patel	d469954d61	[InstCombine] auto-generate complete test checks; NFC llvm-svn: 358007	2019-04-09 14:27:03 +00:00
Sanjay Patel	f62dcea7ed	[InstCombine] prevent possible miscompile with negate+sdiv of vector op // 0 - (X sdiv C) -> (X sdiv -C) provided the negation doesn't overflow. This fold has been around for many years and nobody noticed the potential vector miscompile from overflow until recently... So it seems unlikely that there's much demand for a vector sdiv optimization on arbitrary vector constants, so just limit the matching to splat constants to avoid the possible bug. Differential Revision: https://reviews.llvm.org/D60426 llvm-svn: 358005	2019-04-09 14:09:06 +00:00
Sanjay Patel	a230bb5fc0	[InstCombine] add tests/comments for negate+sdiv; NFC llvm-svn: 358003	2019-04-09 13:41:29 +00:00
Chen Zheng	11cf397292	[InstCombine] add more testcases for canonicalize (-X s/ Y) to -(X s/ Y). llvm-svn: 358000	2019-04-09 12:47:29 +00:00
Sanjay Patel	74ccef1f4f	[InstCombine] add tests for negate+sdiv; NFC PR41425: https://bugs.llvm.org/show_bug.cgi?id=41425 llvm-svn: 357953	2019-04-08 22:55:10 +00:00
Sanjay Patel	773e04c883	[InstCombine] peek through fdiv to find a squared sqrt A more general canonicalization between fdiv and fmul would not handle this case because that would have to be limited by uses to prevent 2 values from becoming 3 values: (x/y) * (x/y) --> (xx) / (yy) (But we probably should still have that limited -- but more general -- canonicalization independently of this change.) llvm-svn: 357943	2019-04-08 21:23:50 +00:00
Sanjay Patel	bf1417d7e4	[InstCombine] add extra-use tests for fmul+sqrt; NFC llvm-svn: 357939	2019-04-08 20:37:34 +00:00
Nikita Popov	15abd74de7	[InstCombine] Add more tests for signed saturing math overflow; NFC Overflow conditions for sadd.sat and ssub.sat which can be determined based on constant ranges, but not necessarily known bits. llvm-svn: 357938	2019-04-08 20:02:47 +00:00
Brian M. Rzycki	887865c1ad	[JumpThreading] Fix incorrect fold conditional after indirectbr/callbr Fixes bug 40992: https://bugs.llvm.org/show_bug.cgi?id=40992 There is potential for miscompiled code emitted from JumpThreading when analyzing a block with one or more indirectbr or callbr predecessors. The ProcessThreadableEdges() function incorrectly folds conditional branches into an unconditional branch. This patch prevents incorrect branch folding without fully pessimizing other potential threading opportunities through the same basic block. This IR shape was manually fed in via opt and is unclear if clang and the full pass pipeline will ever emit similar code shapes. Thanks to Matthias Liedtke for the bug report and simplified IR example. Differential Revision: https://reviews.llvm.org/D60284 llvm-svn: 357930	2019-04-08 18:20:35 +00:00
Sanjay Patel	b33938df7a	[InstCombine] remove overzealous assert for shuffles (PR41419) As the TODO indicates, instsimplify could be improved. Should fix: https://bugs.llvm.org/show_bug.cgi?id=41419 llvm-svn: 357910	2019-04-08 13:28:29 +00:00
Simon Pilgrim	b4f1bfa659	[InstCombine][X86] Expand MOVMSK to generic IR (PR39927) First step towards removing the MOVMSK intrinsics completely - this patch expands MOVMSK to the pattern: e.g. PMOVMSKB(v16i8 x): %cmp = icmp slt <16 x i8> %x, zeroinitializer %int = bitcast <16 x i8> %cmp to i16 %res = zext i16 %int to i32 Which is correctly handled by ISel and FastIsel (give or take an annoying movzx move....): https://godbolt.org/z/rkrSFW Differential Revision: https://reviews.llvm.org/D60256 llvm-svn: 357909	2019-04-08 13:17:51 +00:00
Chen Zheng	923c7c9daa	[InstCombine] sdiv exact flag fixup. Differential Revision: https://reviews.llvm.org/D60396 llvm-svn: 357904	2019-04-08 12:08:03 +00:00
Chen Zheng	edf91ed855	[InstCombine] add more testcases for sdiv exact flag fixup. llvm-svn: 357894	2019-04-08 09:19:42 +00:00
Chen Zheng	d3b1d74624	[InstCombine] add testcases for sdiv exact flag fixing - NFC. llvm-svn: 357884	2019-04-08 05:49:15 +00:00
Chen Zheng	c84107612a	[InstCombine]add testcase for sdiv canonicalizetion - NFC llvm-svn: 357883	2019-04-08 03:07:32 +00:00
Nikita Popov	3db93ac5d6	Reapply [ValueTracking] Support min/max selects in computeConstantRange() Add support for min/max flavor selects in computeConstantRange(), which allows us to fold comparisons of a min/max against a constant in InstSimplify. This fixes an infinite InstCombine loop, with the test case taken from D59378. Relative to the previous iteration, this contains some adjustments for AMDGPU med3 tests: The AMDGPU target runs InstSimplify prior to codegen, which ends up constant folding some existing med3 tests after this change. To preserve these tests a hidden -amdgpu-scalar-ir-passes option is added, which allows disabling scalar IR passes (that use InstSimplify) for testing purposes. Differential Revision: https://reviews.llvm.org/D59506 llvm-svn: 357870	2019-04-07 17:22:16 +00:00
Sanjay Patel	c538c50113	[InstCombine] add more tests for fmul+fdiv+sqrt; NFC llvm-svn: 357816	2019-04-05 20:54:35 +00:00
Sanjay Patel	79df4454e1	[InstCombine] add tests for fdiv+fmul; NFC llvm-svn: 357782	2019-04-05 17:00:57 +00:00
Sanjay Patel	7e3e7f8040	[InstCombine] add tests for sqrt+fdiv+fmul; NFC Examples based on recent llvm-dev thread. These are specific patterns of more general enhancements that would solve these. llvm-svn: 357780	2019-04-05 16:52:57 +00:00
Sanjay Patel	9965f5aa70	[InstCombine] add test to show reassociation that creates a denormal constant; NFC llvm-svn: 357776	2019-04-05 16:42:21 +00:00
Simon Pilgrim	5ad10f4df9	[SLP][X86] Regenerate operandorder tests with arguments on same line. NFCI. Stops update_test_checks.py from splitting the later arguments after the CHECKs. llvm-svn: 357679	2019-04-04 09:31:12 +00:00
Luqman Aden	8911c5be46	[InstCombine] Combine no-wrap sub and icmp w/ constant. Teach InstCombine the transformation `(icmp P (sub nuw\|nsw C2, Y), C) -> (icmp swap(P) Y, C2-C)` Reviewers: majnemer, apilipenko, sanjoy, spatel, lebedev.ri Reviewed By: lebedev.ri Subscribers: dmgreen, lebedev.ri, nikic, hiraditya, JDevlieghere, jfb, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59916 llvm-svn: 357674	2019-04-04 07:08:30 +00:00
David L. Jones	8b8a02175a	Revert r357452 - 'SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259)' This revision causes tests to fail under ASAN. Since the cause of the failures is not clear (could be ASAN, could be a Clang bug, could be a bug in this revision), the safest course of action seems to be to revert while investigating. llvm-svn: 357667	2019-04-04 02:27:57 +00:00
Taewook Oh	a960f89962	[ProfileSummary] Count callsite samples when computing total samples. Summary: Currently ProfileSummaryBuilder doesn't count into callsite samples when computing total samples. Considering that ProfileSummaryInfo is used to checked the hotness of not only body samples but also callsite samples (from SampleProfileLoader), I think the callsite sample counts should be considered when computing total samples. Reviewers: eraman, danielcdh, wmi Subscribers: hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59835 llvm-svn: 357627	2019-04-03 19:54:43 +00:00
David Bolvansky	937720e75b	[InstCombine] Simplify ctpop with bitreverse/bswap Summary: Fixes PR41337 Reviewers: spatel Reviewed By: spatel Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60148 llvm-svn: 357564	2019-04-03 08:08:44 +00:00
Matt Arsenault	f426ddbfc7	AMDGPU: Assume ECC is enabled by default if supported The test should really be checking for the property directly in the code object headers, but there are problems with this. I don't see this directly represented in the text form, and for the binary emission this is depending on a function level subtarget feature to emit a global flag. llvm-svn: 357558	2019-04-03 01:58:57 +00:00
Matt Arsenault	03e7492876	InstSimplify: Fold round intrinsics from sitofp/uitofp https://godbolt.org/z/gEMRZb llvm-svn: 357549	2019-04-03 00:25:06 +00:00
David Bolvansky	9f179b2c65	[InstCombine] Added tests for PR41337 llvm-svn: 357522	2019-04-02 20:21:26 +00:00
David Bolvansky	5ba60b22a4	[InstCombine] Simplify ctlz/cttz with bitreverse Summary: Fixes PR41273 Reviewers: spatel Reviewed By: spatel Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60096 llvm-svn: 357521	2019-04-02 20:13:28 +00:00
David Bolvansky	9bba938de4	[InstCombine] Added tests for PR41273 llvm-svn: 357508	2019-04-02 18:33:54 +00:00
Vedant Kumar	9da8a68d6b	[ArgPromotion] Set debug location at updated callsites Set the correct debug location on instructions which load arguments in preparation for a call to an arg-promoted function. This prevents location cascade from misattributing the line/scope of one of these loads to the location of the instruction preceding the call. Differential Revision: https://reviews.llvm.org/D60113 llvm-svn: 357500	2019-04-02 17:42:17 +00:00
Vedant Kumar	c6bceec01a	[DebugInfo] Fix pr41180 : Loop Vectorization Debugify Failure Bug: https://bugs.llvm.org/show_bug.cgi?id=41180 In the bug test case the debug location was missing for the cmp instruction in the "middle block" BB. This patch fixes the bug by copying the debug location from the cmp of the scalar loop's terminator branch, if it exists. The patch also fixes the debug location on the subsequent branch instruction. It was previously using the location of the of the original loop's pre-header block terminator. Both of these instructions will now map to the source line of the conditional branch in the original loop. A regression test has been added that covers these issues. Patch by Orlando Cazalet-Hyams! Differential Revision: https://reviews.llvm.org/D59944 llvm-svn: 357499	2019-04-02 17:28:34 +00:00
Philip Reames	d3d5d76a7b	[WideableCond] Fix a nasty bug in detection of "explicit guards" The code was failing to actually check for the presence of the call to widenable_condition. The whole point of specifying the widenable_condition intrinsic was allowing widening transforms. A normal branch is not widenable. A normal branch leading to a deopt is not widenable (in general). I added a test case via LoopPredication, but GuardWidening has an analogous bug. Those are the only two passes actually using this utility just yet. Noticed while working on LoopPredication for non-widenable branches; POC in D60111. llvm-svn: 357493	2019-04-02 16:51:43 +00:00
Joseph Tremoulet	fb4d9f7287	[SimplifyCFG] Don't split musttail call from ret Summary: When inserting an `unreachable` after a noreturn call, we must ensure that it's not a musttail call to avoid breaking the IR invariants for musttail calls. Reviewers: fedor.sergeev, majnemer Reviewed By: majnemer Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60080 llvm-svn: 357485	2019-04-02 15:48:58 +00:00
Taewook Oh	6a27c48be2	[SampleProfile] Repeat indirect call promotion only when the target is actually hot. Summary: It is possible that multiple indirect call targets have been promoted for a single callsite from the profiled binary. Current implementation repeats promotion for all these targets as far as the callsite itself is hot (the callsite is assumed to be hot if any one of these targets was "hot" during the profiling). However, even when one of the ICPed target is hot other targets may not, and we should not repeat promotion for "cold" targets. Reviewers: danielcdh, wmi Subscribers: hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59940 llvm-svn: 357484	2019-04-02 15:48:21 +00:00
Joseph Tremoulet	b69afa8e9b	[PruneEH] Don't split musttail call from ret Summary: When inserting an `unreachable` after a noreturn call, we must ensure that it's not a musttail call to avoid breaking the IR invariants for musttail calls. Reviewers: fedor.sergeev, majnemer Reviewed By: majnemer Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60079 llvm-svn: 357483	2019-04-02 15:47:11 +00:00
Hans Wennborg	b669fea42f	SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259) The code was previously checking that candidates for sinking had exactly one use or were a store instruction (which can't have uses). This meant we could sink call instructions only if they had a use. That limitation seemed a bit arbitrary, so this patch changes it to "instruction has zero or one use" which seems more natural and removes the need to special-case stores. Differential revision: https://reviews.llvm.org/D59936 llvm-svn: 357452	2019-04-02 08:01:38 +00:00
Matt Arsenault	fa0a2c529b	InstSimplify: Add missing case from r357386 llvm-svn: 357443	2019-04-02 00:46:19 +00:00
Matt Arsenault	294e07cf03	AMDGPU: Fix test filename llvm-svn: 357441	2019-04-02 00:36:04 +00:00
Philip Reames	05e3e554b4	[LoopPred] Be uniform about proving generated conditions We'd been optimizing the case where the predicate was obviously true, do the same for the false case. Mostly just for completeness sake, but also may improve compile time in loops which will exit through the guard. Such loops are presumed rare in fastpath code, but may be present down untaken paths, so optimizing for them is still useful. llvm-svn: 357408	2019-04-01 16:26:08 +00:00
Philip Reames	d109e2a7c3	[LoopPred] Delete the old condition expressions if unused LoopPredication was replacing the original condition, but leaving the instructions to compute the old conditions around. This would get cleaned up by other passes of course, but we might as well do it eagerly. That also makes the test output less confusing. llvm-svn: 357406	2019-04-01 16:05:15 +00:00
Philip Reames	7eee62b5d4	[Tests] Autogen all the LoopPredication tests I'm about to make some changes to the pass which cause widespread - but uninteresting - test diffs. Prepare the tests for easy updating. llvm-svn: 357404	2019-04-01 15:35:30 +00:00
Philip Reames	9ef7708bbb	[Tests] Add tests for a possible loop predication transform variant As highlighted by tests, if one of the operands is loop variant, but guaranteed to have the same value on all iterations, we have a missed oppurtunity. llvm-svn: 357403	2019-04-01 15:32:07 +00:00
Mikael Holmen	150a7ec2dc	[InstCombine] Handle vector gep with scalar argument in evaluateInDifferentElementOrder Summary: This fixes PR41270. The recursive function evaluateInDifferentElementOrder expects to be called on a vector Value, so when we call it on a vector GEP's arguments, we must first check that the argument is indeed a vector. Reviewers: reames, spatel Reviewed By: spatel Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60058 llvm-svn: 357389	2019-04-01 14:10:10 +00:00
Mikael Holmen	3e527cd823	Revert "[InstCombine] Handle vector gep with scalar argument in evaluateInDifferentElementOrder" This reverts commit 75216a6dbcfe5fb55039ef06a07e419fa875f4a5. I'll recommit with a better commit message with reference to the phabricator review. llvm-svn: 357387	2019-04-01 14:06:45 +00:00
Matt Arsenault	0276b94356	InstSimplify: Add baseline test for upcoming change llvm-svn: 357386	2019-04-01 14:03:44 +00:00
Mikael Holmen	d66a47f90a	[InstCombine] Handle vector gep with scalar argument in evaluateInDifferentElementOrder This fixes PR41270. The recursive function evaluateInDifferentElementOrder expects to be called on a vector Value, so when we call it on a vector GEP's arguments, we must first check that the argument is indeed a vector. llvm-svn: 357385	2019-04-01 13:48:56 +00:00
Sanjay Patel	97d1bc4454	[InstCombine] eliminate commuted select-shuffles + binop (PR41304) If we have a commutable vector binop with inverted select-shuffles, we don't care about the order of the operands in each vector lane: LHS = shuffle V1, V2, <0, 5, 6, 3> RHS = shuffle V2, V1, <0, 5, 6, 3> LHS + RHS --> <V1[0]+V2[0], V2[1]+V1[1], V2[2]+V1[2], V1[3]+V2[3]> --> V1 + V2 PR41304: https://bugs.llvm.org/show_bug.cgi?id=41304 ...is currently titled as an SLP enhancement, but at least for the given example, we can reduce that in instcombine because we are just eliminating shuffles. As noted in the TODO, this could be generalized, but I haven't thought through those patterns completely, so this is limited to what appears to be always safe. Differential Revision: https://reviews.llvm.org/D60048 llvm-svn: 357382	2019-04-01 13:36:40 +00:00
Sanjay Patel	7ac1186b58	[InstCombine] add tests for inverted select-shuffles + binop (PR41304); NFC llvm-svn: 357368	2019-03-31 15:45:47 +00:00
Sanjay Patel	b276dd195a	[InstCombine] canonicalize select shuffles by commuting In PR41304: https://bugs.llvm.org/show_bug.cgi?id=41304 ...we have a case where we want to fold a binop of select-shuffle (blended) values. Rather than try to match commuted variants of the pattern, we can canonicalize the shuffles and check for mask equality with commuted operands. We don't produce arbitrary shuffle masks in instcombine, but select-shuffles are a special case that the backend is required to handle because we already canonicalize vector select to this shuffle form. So there should be no codegen difference from this change. It's possible that this improves CSE in IR though. Differential Revision: https://reviews.llvm.org/D60016 llvm-svn: 357366	2019-03-31 15:01:30 +00:00
Luqman Aden	7c67dbdc65	[NFC][InstCombine] Add tests for combining icmp of no-wrap sub w/ constant. llvm-svn: 357360	2019-03-31 08:58:50 +00:00
Matt Arsenault	055e4dce45	AMDGPU: Remove dx10-clamp from subtarget features Since this can be set with s_setreg*, it should not be a subtarget property. Set a default based on the calling convention, and Introduce a new amdgpu-dx10-clamp attribute to override this if desired. Also introduce a new amdgpu-ieee attribute to match. The values need to match to allow inlining. I think it is OK for the caller's dx10-clamp attribute to override the callee, but there doesn't appear to be the infrastructure to do this currently without definining the attribute in the generic Attributes.td. Eventually the calling convention lowering will need to insert a mode switch somewhere for these. llvm-svn: 357302	2019-03-29 19:14:54 +00:00
Sanjay Patel	01c07b1a45	[InstCombine] autogenerate complete checks; NFC llvm-svn: 357291	2019-03-29 17:51:39 +00:00
Sanjay Patel	2bff8b4272	[InstCombine] regenerate test checks; NFC llvm-svn: 357288	2019-03-29 17:47:51 +00:00
Simon Pilgrim	6a75c36ea9	[SLP] Add support for commutative icmp/fcmp predicates For the cases where the icmp/fcmp predicate is commutative, use reorderInputsAccordingToOpcode to collect and commute the operands. This requires a helper to recognise commutativity in both general Instruction and CmpInstr types - the CmpInst::isCommutative doesn't overload the Instruction::isCommutative method for reasons I'm not clear on (maybe because its based on predicate not opcode?!?). Differential Revision: https://reviews.llvm.org/D59992 llvm-svn: 357266	2019-03-29 15:28:25 +00:00
Simon Pilgrim	62f0d1650a	[SLP] Add support for swapping icmp/fcmp predicates to permit vectorization We should be able to match elements with the swapped predicate as well - as long as we commute the source operands. Differential Revision: https://reviews.llvm.org/D59956 llvm-svn: 357243	2019-03-29 10:41:00 +00:00
Florian Hahn	45682fd633	[LSR] Fix signed overflow in GenerateCrossUseConstantOffsets. For the attached test case, unchecked addition of immediate starts and ends overflows, as they can be arbitrary i64 constants. Proof: https://rise4fun.com/Alive/Plqc Reviewers: qcolombet, gilr, efriedma Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D59218 llvm-svn: 357217	2019-03-28 22:17:29 +00:00
Eli Friedman	96f295e23b	[InterleavedAccessPass] Don't increase the number of bytes loaded. Even if the interleaving transform would otherwise be legal, we shouldn't introduce an interleaved load that is wider than the original load: it might have undefined behavior. It might be possible to perform some sort of mask-narrowing transform in some cases (using a narrower interleaved load, then extending the results using shufflevectors). But I haven't tried to implement that, at least for now. Fixes https://bugs.llvm.org/show_bug.cgi?id=41245 . Differential Revision: https://reviews.llvm.org/D59954 llvm-svn: 357212	2019-03-28 20:44:50 +00:00
Simon Pilgrim	ceb3de5d25	[SLP][X86] Add tests showing failure to commute icmp/fcmp by swapping predicate By swapping icmp/fcmp predicates we can commute their operands to improve vectorization llvm-svn: 357204	2019-03-28 19:13:38 +00:00
Simon Pilgrim	66b5e322fc	[SLP][X86] Add tests showing failure to commute icmp/fcmp operands Some predicates are fully commutative - we should be able to easily commute their operands to improve vectorization llvm-svn: 357202	2019-03-28 19:03:53 +00:00
Clement Courbet	699dc025a6	[X86MacroFusion] Handle branch fusion (AMD CPUs). Summary: This adds a BranchFusion feature to replace the usage of the MacroFusion for AMD CPUs. See D59688 for context. Reviewers: andreadb, lebedev.ri Subscribers: hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59872 llvm-svn: 357171	2019-03-28 14:12:46 +00:00
Florian Hahn	e21ed594d8	[VPlan] Determine Vector Width programmatically. With this change, the VPlan native path is triggered with the directive: #pragma clang loop vectorize(enable) There is no need to specify the vectorize_width(N) clause. Patch by Francesco Petrogalli <francesco.petrogalli@arm.com> Differential Revision: https://reviews.llvm.org/D57598 llvm-svn: 357156	2019-03-28 10:37:12 +00:00
Chandler Carruth	923ff550b9	[NewPM] Fix a nasty bug with analysis invalidation in the new PM. The issue here is that we actually allow CGSCC passes to mutate IR (and therefore invalidate analyses) outside of the current SCC. At a minimum, we need to support mutating parent and ancestor SCCs to support the ArgumentPromotion pass which rewrites all calls to a function. However, the analysis invalidation infrastructure is heavily based around not needing to invalidate the same IR-unit at multiple levels. With Loop passes for example, they don't invalidate other Loops. So we need to customize how we handle CGSCC invalidation. Doing this without gratuitously re-running analyses is even harder. I've avoided most of these by using an out-of-band preserved set to accumulate the cross-SCC invalidation, but it still isn't perfect in the case of re-visiting the same SCC repeatedly but it coming off the worklist. Unclear how important this use case really is, but I wanted to call it out. Another wrinkle is that in order for this to successfully propagate to function analyses, we have to make sure we have a proxy from the SCC to the Function level. That requires pre-creating the necessary proxy. The motivating test case now works cleanly and is added for ArgumentPromotion. Thanks for the review from Philip and Wei! Differential Revision: https://reviews.llvm.org/D59869 llvm-svn: 357137	2019-03-28 00:51:36 +00:00
Nikita Popov	7462303e06	[InstCombine] Use uadd.sat and usub.sat for canonicalization Start using the uadd.sat and usub.sat intrinsics for the existing canonicalizations. These intrinsics should optimize better than expanded IR, have better handling in the X86 backend and should be no worse than expanded IR in other backends, as far as we know. rL357012 already introduced use of uadd.sat for the add+umin pattern. Differential Revision: https://reviews.llvm.org/D58872 llvm-svn: 357103	2019-03-27 17:56:15 +00:00
Clement Courbet	f8666b0649	[X86MacroFusion][NFC] Add a bulldozer test. llvm-svn: 357099	2019-03-27 17:44:16 +00:00
Nikita Popov	7f15dd097e	[InstCombine] Add tests for ssubo X, C -> saddo X, -C; NFC Add baseline tests for canonicalization of ssubo X, C -> saddo X, -C. Patch by Dan Robertson. Differential Revision: https://reviews.llvm.org/D59653 llvm-svn: 357013	2019-03-26 18:05:43 +00:00
Sanjay Patel	81e8d76f5b	[InstCombine] form uaddsat from add+umin (PR14613) This is the last step towards solving the examples shown in: https://bugs.llvm.org/show_bug.cgi?id=14613 With this change, x86 should end up with psubus instructions when those are available. All known codegen issues with expanding the saturating intrinsics were resolved with: D59006 / rL356855 We also have some early evidence in D58872 that using the intrinsics will lead to better perf. If some target regresses from this, custom lowering of the intrinsics (as in the above for x86) may be needed. llvm-svn: 357012	2019-03-26 17:50:08 +00:00
Sanjay Patel	0dd67ed462	[InstCombine] add tests for uaddsat using min; NFC llvm-svn: 357005	2019-03-26 16:19:13 +00:00
Sanjay Patel	418ee7b7bb	[InstCombine] update tests to use FileCheck; NFC llvm-svn: 357004	2019-03-26 15:58:33 +00:00
Simon Pilgrim	6f96795b88	[SLPVectorizer] Merge reorderAltShuffleOperands into reorderInputsAccordingToOpcode As discussed on D59738, this generalizes reorderInputsAccordingToOpcode to handle multiple + non-commutative instructions so we can get rid of reorderAltShuffleOperands and make use of the extra canonicalizations that reorderInputsAccordingToOpcode brings. Differential Revision: https://reviews.llvm.org/D59784 llvm-svn: 356939	2019-03-25 20:05:27 +00:00
Simon Pilgrim	77749567a1	[SLPVectorizer] Update file missed in rL356913 Differential Revision: https://reviews.llvm.org/D59738 llvm-svn: 356915	2019-03-25 16:14:21 +00:00
Simon Pilgrim	ff3abef395	[SLPVectorizer] reorderInputsAccordingToOpcode - remove non-Instruction canonicalization Remove attempts to commute non-Instructions to the LHS - the codegen changes appear to rely on chance more than anything else and also have a tendency to fight existing instcombine canonicalization which moves constants to the RHS of commutable binary ops. This is prep work towards: (a) reusing reorderInputsAccordingToOpcode for alt-shuffles and removing the similar reorderAltShuffleOperands (b) improving reordering to optimized cases with commutable and non-commutable instructions to still find splat/consecutive ops. Differential Revision: https://reviews.llvm.org/D59738 llvm-svn: 356913	2019-03-25 15:53:55 +00:00
Simon Pilgrim	9eb0de8573	[X86][SLP] Show example of failure to uniformly commute splats for 'alt' shuffles. If either the main/alt opcodes isn't commutable we may end up with the splats not correctly commuted to the same side. llvm-svn: 356837	2019-03-23 16:14:04 +00:00
Daniel Sanders	ef8761fd3b	Fix non-determinism in Reassociate caused by address coincidences Summary: Between building the pair map and querying it there are a few places that erase and create Values. It's rare but the address of these newly created Values is occasionally the same as a just-erased Value that we already have in the pair map. These coincidences should be accounted for to avoid non-determinism. Thanks to Roman Tereshin for the test case. Reviewers: rtereshin, bogner Reviewed By: rtereshin Subscribers: mgrang, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59401 llvm-svn: 356803	2019-03-22 20:16:35 +00:00
Sanjay Patel	a0aaa11afc	[SLP] fix variables names in test; NFC 'tmpXXX' conflicts with the auto-generated script regex names. That could cause mask a bug or fail if the output changes. llvm-svn: 356790	2019-03-22 18:33:11 +00:00
James Y Knight	c0e6b8ac3a	IR: Support parsing numeric block ids, and emit them in textual output. Just as as llvm IR supports explicitly specifying numeric value ids for instructions, and emits them by default in textual output, now do the same for blocks. This is a slightly incompatible change in the textual IR format. Previously, llvm would parse numeric labels as string names. E.g. define void @f() { br label %"55" 55: ret void } defined a label named "55", even without needing to be quoted, while the reference required quoting. Now, if you intend a block label which looks like a value number to be a name, you must quote it in the definition too (e.g. `"55":`). Previously, llvm would print nameless blocks only as a comment, and would omit it if there was no predecessor. This could cause confusion for readers of the IR, just as unnamed instructions did prior to the addition of "%5 = " syntax, back in 2008 (PR2480). Now, it will always print a label for an unnamed block, with the exception of the entry block. (IMO it may be better to print it for the entry-block as well. However, that requires updating many more tests.) Thus, the following is supported, and is the canonical printing: define i32 @f(i32, i32) { %3 = add i32 %0, %1 br label %4 4: ret i32 %3 } New test cases covering this behavior are added, and other tests updated as required. Differential Revision: https://reviews.llvm.org/D58548 llvm-svn: 356789	2019-03-22 18:27:13 +00:00
Philip Reames	d627048c07	[Tests] Add masked.gather tests for non-constant masks + speculation possibilities llvm-svn: 356782	2019-03-22 16:39:04 +00:00
Bixia Zheng	bdf0230cff	[ConstantFolding] Fix GetConstantFoldFPValue to avoid cast overflow. Summary: In C++, the behavior of casting a double value that is beyond the range of a single precision floating-point to a float value is undefined. This change replaces such a cast with APFloat::convert to convert the value, which is consistent with how we convert a double value to a half value. Reviewers: sanjoy Subscribers: lebedev.ri, sanjoy, jlebar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59500 llvm-svn: 356781	2019-03-22 16:37:37 +00:00
Philip Reames	f032e85d64	[tests] Add a generic masked.gather test to show sometimes we can't transform llvm-svn: 356779	2019-03-22 16:30:56 +00:00
Philip Reames	e234fd6118	[tests] Add tests for converting masked.load to load speculatively llvm-svn: 356778	2019-03-22 16:26:57 +00:00
Philip Reames	4a518c7055	[Tests] Use valid alignment in masked.gather tests llvm-svn: 356775	2019-03-22 16:20:24 +00:00
Tim Renouf	94c163c34e	InstCombineSimplifyDemanded: Allow v3 results for AMDGCN buffer and image intrinsics This helps to avoid the situation where RA spots that only 3 of the v4f32 result of a load are used, and immediately reallocates the 4th register for something else, requiring a stall waiting for the load. Differential Revision: https://reviews.llvm.org/D58906 Change-Id: I947661edfd5715f62361a02b100f14aeeada29aa llvm-svn: 356768	2019-03-22 15:53:50 +00:00
Dinar Temirbulatov	f95351b918	[SLPVectorizer] Add test related to SLP Throttling support, NFCI. llvm-svn: 356754	2019-03-22 14:50:53 +00:00
Nikita Popov	b86576a5b9	[InstSimplify] Add tests for signed icmp of and/or; NFC Even if a signed predicate is used, the ranges computed for and/or are unsigned, resulting in missed simplifications. llvm-svn: 356720	2019-03-21 21:13:08 +00:00
Akira Hatanaka	b576c77a9e	Don't add a tail keyword to calls to ObjC runtime functions if the calls are annotated with notail. r356705 annotated calls to objc_retainAutoreleasedReturnValue with notail on x86-64. This commit teaches ARC optimizer to check the notail marker on the call before turning it into a tail call. rdar://problem/38675807 llvm-svn: 356707	2019-03-21 20:16:09 +00:00
Craig Topper	16dc165046	[InstCombine] Don't transform ((C1 OP zext(X)) & C2) -> zext((C1 OP X) & C2) if either zext or OP has another use. If they have other users we'll just end up increasing the instruction count. We might be able to weaken this to only one of them having a single use if we can prove that the and will be removed. Fixes PR41164. Differential Revision: https://reviews.llvm.org/D59630 llvm-svn: 356690	2019-03-21 17:50:49 +00:00
Craig Topper	9f0b17a248	[ScalarizeMaskedMemIntrin] Add support for scalarizing expandload and compressstore intrinsics. This adds support for scalarizing these intrinsics as well the X86TargetTransformInfo support to avoid scalarizing them in the cases X86 can handle. I've omitted handling special cases for constant masks for this first pass. Though CodeGenPrepare can constant fold the branch conditions and remove some of the control flow anyway. Fixes PR40994 and is covers most of PR3666. Might want to implement constant masks to close that. Differential Revision: https://reviews.llvm.org/D59180 llvm-svn: 356687	2019-03-21 17:38:52 +00:00
Nikita Popov	3af5b28f47	[ValueTracking] Use ConstantRange based overflow check for signed sub This is D59450, but for signed sub. This case is not NFC, because the overflow logic in ConstantRange is more powerful than the existing check. This resolves the TODO in the function. I've added two tests to show that this indeed catches more cases than the previous logic, but the main correctness test coverage here is in the existing ConstantRange unit tests. Differential Revision: https://reviews.llvm.org/D59617 llvm-svn: 356685	2019-03-21 17:23:51 +00:00
Sanjay Patel	d47eac59ef	[CodeGenPrepare] limit formation of overflow intrinsics (PR41129) This is probably a bigger limitation than necessary, but since we don't have any evidence yet that this transform led to real-world perf improvements rather than regressions, I'm making a quick, blunt fix. In the motivating x86 example from: https://bugs.llvm.org/show_bug.cgi?id=41129 ...and shown in the regression test, we want to avoid an extra instruction in the dominating block because that could be costly. The x86 LSR test diff is reversing the changes from D57789. There's no evidence that 1 version is any better than the other yet. Differential Revision: https://reviews.llvm.org/D59602 llvm-svn: 356665	2019-03-21 13:57:07 +00:00
Craig Topper	72d888ba9f	[InstCombine] Add test case for PR41164. NFC llvm-svn: 356645	2019-03-21 05:33:10 +00:00
Nikita Popov	03dbfc2eef	[InstCombine] Add additional sub nsw inference tests; NFC nsw can be determined based on known bits here, but currently isn't. llvm-svn: 356620	2019-03-20 21:42:17 +00:00
Philip Reames	e4588bbf80	Simplify operands of masked stores and scatters based on demanded elements If we know we're not storing a lane, we don't need to compute the lane. This could be improved by using the undef element result to further prune the mask, but I want to separate that into its own change since it's relatively likely to expose other problems. Differential Revision: https://reviews.llvm.org/D57247 llvm-svn: 356590	2019-03-20 18:44:58 +00:00
Alina Sbirlea	5baa72ea74	[LICM & MemorySSA] Don't sink/hoist stores in the presence of ordered loads. Summary: Before this patch, if any Use existed in the loop, with a defining access in the loop, we conservatively decide to not move the store. What this approach was missing, is that ordered loads are not Uses, they're Defs in MemorySSA. So, even when the clobbering walker does not find that volatile load to interfere, we still cannot hoist a store past a volatile load. Resolves PR41140. Reviewers: george.burgess.iv Subscribers: sanjoy, jlebar, Prazek, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59564 llvm-svn: 356588	2019-03-20 18:33:37 +00:00
Nikita Popov	00b5ecab5d	[ValueTracking] Compute range for abs without nsw This is a small followup to D59511. The code that was moved into computeConstantRange() there is a bit overly conversative: If the abs is not nsw, it does not compute any range. However, abs without nsw still has a well-defined contiguous unsigned range from 0 to SIGNED_MIN. This is a lot less useful than the usual 0 to SIGNED_MAX range, but if we're already here we might as well specify it... Differential Revision: https://reviews.llvm.org/D59563 llvm-svn: 356586	2019-03-20 18:16:02 +00:00
Nikita Popov	37cf25c3c6	[InstCombine] Fold add nuw + uadd.with.overflow Fold add nuw and uadd.with.overflow with constants if the addition does not overflow. Part of https://bugs.llvm.org/show_bug.cgi?id=38146. Patch by Dan Robertson. Differential Revision: https://reviews.llvm.org/D59471 llvm-svn: 356584	2019-03-20 18:00:27 +00:00
Sanjay Patel	fb44f99b73	[CGP][x86] add tests for usubo regression (PR41129); NFC llvm-svn: 356559	2019-03-20 15:02:35 +00:00
Nikita Popov	2dd1566e8b	[InstSimplify] Add additional cmp of abs without nsw tests; NFC llvm-svn: 356520	2019-03-19 21:12:21 +00:00
Philip Reames	70537abe52	Demanded elements support for masked.load and masked.gather Teach instcombine to propagate demanded elements through a masked load or masked gather instruction. This is in the broader context of improving vector pointer instcombine under https://reviews.llvm.org/D57140. Differential Revision: https://reviews.llvm.org/D57372 llvm-svn: 356510	2019-03-19 20:10:00 +00:00
Nikita Popov	208381953b	[ValueTracking] Use computeConstantRange() for unsigned add/sub overflow Improve computeOverflowForUnsignedAdd/Sub in ValueTracking by intersecting the computeConstantRange() result into the ConstantRange created from computeKnownBits(). This allows us to detect some additional never/always overflows conditions that can't be determined from known bits. This revision also adds basic handling for constants to computeConstantRange(). Non-splat vectors will be handled in a followup. The signed case will also be handled in a followup, as it needs some more groundwork. Differential Revision: https://reviews.llvm.org/D59386 llvm-svn: 356489	2019-03-19 17:53:56 +00:00
Sanjay Patel	5b820323ca	[InstCombine] fold logic-of-nan-fcmps (PR41069) Combine 2 fcmps that are checking for nan-ness: and (fcmp ord X, 0), (and (fcmp ord Y, 0), Z) --> and (fcmp ord X, Y), Z or (fcmp uno X, 0), (or (fcmp uno Y, 0), Z) --> or (fcmp uno X, Y), Z This is an exact match for a minimal reassociation pattern. If we want to handle this more generally that should go in the reassociate pass and allow removing this code. This should fix: https://bugs.llvm.org/show_bug.cgi?id=41069 llvm-svn: 356471	2019-03-19 16:39:17 +00:00
Teresa Johnson	bda581b831	[InstCombine] Add missing test for icmp transformation (NFC) This was split out of D59378. There was no testing for the EQ case in foldICmpWithDominatingICmp, add one here. llvm-svn: 356463	2019-03-19 15:43:56 +00:00
Simon Pilgrim	8ee477a2ab	[InstSimplify] SimplifyICmpInst - icmp eq/ne %X, undef -> undef As discussed on PR41125 and D59363, we have a mismatch between icmp eq/ne cases with an undef operand: When the other operand is constant we fold to undef (handled in ConstantFoldCompareInstruction) When the other operand is non-constant we fold to a bool constant based on isTrueWhenEqual (handled in SimplifyICmpInst). Neither is really wrong, but this patch changes the logic in SimplifyICmpInst to consistently fold to undef. The NewGVN test change is annoying (as with most heavily reduced tests) but AFAICT I have kept the purpose of the test based on rL291968. Differential Revision: https://reviews.llvm.org/D59541 llvm-svn: 356456	2019-03-19 14:08:23 +00:00
Sanjay Patel	423b958306	[InstCombine] add FMF to tests for extra coverage; NFC ninf is probably the only relevant possible flag here (nnan allows simplification and nsz never makes a difference). llvm-svn: 356453	2019-03-19 13:39:29 +00:00
Markus Lavin	b86ce219f4	[DebugInfo] Introduce DW_OP_LLVM_convert Introduce a DW_OP_LLVM_convert Dwarf expression pseudo op that allows for a convenient way to perform type conversions on the Dwarf expression stack. As an additional bonus it paves the way for using other Dwarf v5 ops that need to reference a base_type. The new DW_OP_LLVM_convert is used from lib/Transforms/Utils/Local.cpp to perform sext/zext on debug values but mainly the patch is about preparing terrain for adding other Dwarf v5 ops that need to reference a base_type. For Dwarf v5 the op maps to DW_OP_convert and for earlier versions a complex shift & mask pattern is generated to emulate sext/zext. This is a recommit of r356442 with trivial fixes for the failing tests. Differential Revision: https://reviews.llvm.org/D56587 llvm-svn: 356451	2019-03-19 13:16:28 +00:00
Simon Pilgrim	9497b2b2f7	[InstCombine] Regenerate + add icmp with undef tests Better test coverage for PR41125 and D59363 llvm-svn: 356448	2019-03-19 11:44:22 +00:00
Markus Lavin	ad78768d59	Revert "[DebugInfo] Introduce DW_OP_LLVM_convert" This reverts commit 1cf4b593a7ebd666fc6775f3bd38196e8e65fafe. Build bots found failing tests not detected locally. Failing Tests (3): LLVM :: DebugInfo/Generic/convert-debugloc.ll LLVM :: DebugInfo/Generic/convert-inlined.ll LLVM :: DebugInfo/Generic/convert-linked.ll llvm-svn: 356444	2019-03-19 09:17:28 +00:00
Markus Lavin	cd8a940b37	[DebugInfo] Introduce DW_OP_LLVM_convert Introduce a DW_OP_LLVM_convert Dwarf expression pseudo op that allows for a convenient way to perform type conversions on the Dwarf expression stack. As an additional bonus it paves the way for using other Dwarf v5 ops that need to reference a base_type. The new DW_OP_LLVM_convert is used from lib/Transforms/Utils/Local.cpp to perform sext/zext on debug values but mainly the patch is about preparing terrain for adding other Dwarf v5 ops that need to reference a base_type. For Dwarf v5 the op maps to DW_OP_convert and for earlier versions a complex shift & mask pattern is generated to emulate sext/zext. Differential Revision: https://reviews.llvm.org/D56587 llvm-svn: 356442	2019-03-19 08:48:19 +00:00
Nikita Popov	3e9770d2dc	Revert "[ValueTracking][InstSimplify] Support min/max selects in computeConstantRange()" This reverts commit `106f0cdefb`. This change impacts the AMDGPU smed3.ll and umed3.ll codegen tests. llvm-svn: 356424	2019-03-18 22:26:27 +00:00
Nikita Popov	106f0cdefb	[ValueTracking][InstSimplify] Support min/max selects in computeConstantRange() Add support for min/max flavor selects in computeConstantRange(), which allows us to fold comparisons of a min/max against a constant in InstSimplify. This was suggested by spatel as an alternative approach to D59378. I've also added the infinite looping test from that revision here. Differential Revision: https://reviews.llvm.org/D59506 llvm-svn: 356415	2019-03-18 21:35:19 +00:00
Nikita Popov	930341ba30	[InstCombine] Add tests for add nuw + uaddo; NFC Baseline tests for D59471 (InstCombine of `add nuw` and `uaddo` with constants). Patch by Dan Robertson. Differential Revision: https://reviews.llvm.org/D59472 llvm-svn: 356414	2019-03-18 21:35:09 +00:00
Nikita Popov	05baa9ee1a	[InstSimplify] Add additional icmp of min/max tests; NFC These are baseline tests for D59506. llvm-svn: 356408	2019-03-18 21:19:56 +00:00
Nikita Popov	c1d4fc8a62	[InstCombine] Improve with.overflow intrinsic tests; NFC - Do not use unnamed values in saddo tests - Add tests for canonicalization of a constant arg0 Patch by Dan Robertson. Differential Revision: https://reviews.llvm.org/D59476 llvm-svn: 356403	2019-03-18 20:08:35 +00:00
Warren Ristow	ad7d0ded2e	[SCEV] Guard movement of insertion point for loop-invariants This reinstates r347934, along with a tweak to address a problem with PHI node ordering that that commit created (or exposed). (That commit was reverted at r348426, due to the PHI node issue.) Original commit message: r320789 suppressed moving the insertion point of SCEV expressions with dev/rem operations to the loop header in non-loop-invariant situations. This, and similar, hoisting is also unsafe in the loop-invariant case, since there may be a guard against a zero denominator. This is an adjustment to the fix of r320789 to suppress the movement even in the loop-invariant case. This fixes PR30806. Differential Revision: https://reviews.llvm.org/D57428 llvm-svn: 356392	2019-03-18 18:52:35 +00:00
Sanjay Patel	08b5e68ef6	[InstCombine] add/adjust test for NaN checks; NFC llvm-svn: 356383	2019-03-18 17:37:05 +00:00
Sanjay Patel	6063393536	[InstCombine] allow general vector constants for funnel shift to shift transforms Follow-up to: rL356338 rL356369 We can calculate an arbitrary vector constant minus the bitwidth, so there's no need to limit this transform to scalars and splats. llvm-svn: 356372	2019-03-18 14:27:51 +00:00
Sanjay Patel	84de8a30a0	[InstCombine] extend rotate-left-by-constant canonicalization to funnel shift Follow-up to: rL356338 Rotates are a special case of funnel shift where the 2 input operands are the same value, but that does not need to be a restriction for the canonicalization when the shift amount is a constant. llvm-svn: 356369	2019-03-18 14:10:11 +00:00
Sanjay Patel	d7f1539322	[InstCombine] add funnel shift tests with arbitrary constants; NFC llvm-svn: 356367	2019-03-18 13:35:51 +00:00
Matt Arsenault	4873056ced	Remove immarg from llvm.expect The LangRef claimed this was required to be a constant, but this appears to be wrong. Fixes bug 41079. llvm-svn: 356353	2019-03-17 23:16:18 +00:00
Sanjay Patel	b3bcd95771	[InstCombine] canonicalize rotate right by constant to rotate left This was noted as a backend problem: https://bugs.llvm.org/show_bug.cgi?id=41057 ...and subsequently fixed for x86: rL356121 But we should canonicalize these in IR for the benefit of all targets and improve IR analysis such as CSE. llvm-svn: 356338	2019-03-17 19:08:00 +00:00
Sanjay Patel	a3a2f9424e	[InstCombine] add tests for rotate by constant using funnel intrinsics; NFC llvm-svn: 356337	2019-03-17 18:50:39 +00:00
Philip Reames	68a2e4d48b	[SimplifyDemandedVec] Strengthen handling all undef lanes (particularly GEPs) A change of two parts: 1) A generic enhancement for all callers of SDVE to exploit the fact that if all lanes are undef, the result is undef. 2) A GEP specific piece to strengthen/fix the vector index undef element handling, and call into the generic infrastructure when visiting the GEP. The result is that we replace a vector gep with at least one undef in each lane with a undef. We can also do the same for vector intrinsics. Once the masked.load patch (D57372) has landed, I'll update to include call tests as well. Differential Revision: https://reviews.llvm.org/D57468 llvm-svn: 356293	2019-03-15 19:54:06 +00:00
Sanjay Patel	052d1b7b66	[InstCombine] add tests for logic of NaN fcmps; NFC llvm-svn: 356287	2019-03-15 18:14:25 +00:00
Philip Reames	6867b1f7de	[tests] Add a test for constexpr mask as requested in D57372 llvm-svn: 356285	2019-03-15 18:06:32 +00:00
Sanjay Patel	a70c9d49af	[InstCombine] add tests for masked store/scatter; NFC Baseline tests for D57247 llvm-svn: 356283	2019-03-15 18:00:28 +00:00
Florian Hahn	728293ac87	[LSR] Update test from rL356256 after rebase. llvm-svn: 356257	2019-03-15 12:37:50 +00:00
Florian Hahn	d9e88f7b7f	[LSR] Check for signed overflow in NarrowSearchSpaceByDetectingSupersets. We are adding a sign extended IR value to an int64_t, which can cause signed overflows, as in the attached test case, where we have a formula with BaseOffset = -1 and a constant with numeric_limits<int64_t>::min(). If the addition would overflow, skip the simplification for this formula. Note that the target triple is required to trigger the failure. Reviewers: qcolombet, gilr, kparzysz, efriedma Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D59211 llvm-svn: 356256	2019-03-15 12:17:36 +00:00
Matt Arsenault	1d83670dbd	AMDGPU: Remove intrinsic operand assert Before r355981, this was under LLVM_DEBUG. I don't think the assert is quite right, but this really should be a verifier check. Instcombine should not be asserting on this sort of thing. llvm-svn: 356219	2019-03-14 23:45:09 +00:00
Sanjay Patel	2c9275a790	[CGP] add another bailout for degenerate code (PR41064) This is almost the same as: rL355345 ...and should prevent any potential crashing from examples like: https://bugs.llvm.org/show_bug.cgi?id=41064 ...although the bug was masked by: rL355823 ...and I'm not sure how to repro the problem after that change. llvm-svn: 356218	2019-03-14 23:14:31 +00:00
Paul Robinson	96c1f2cd6c	Tighten up tests that use -debugify as a shortcut. NFC These now verify that a given instruction has a specific source location, rather than any old location. We want to make sure we propagate the correct locations from one instruction to another. llvm-svn: 356217	2019-03-14 23:09:17 +00:00
Nikita Popov	48eb21ee5f	[InstCombine] Add tests for range-based saturing math overflow; NFC Tests for cases where overflow can be determined, but not based on known bits. llvm-svn: 356203	2019-03-14 21:06:46 +00:00
Sanjay Patel	38f07b1966	[InstCombine] remove duplicate tests These got accidentally doubled with rL356191. llvm-svn: 356195	2019-03-14 19:41:21 +00:00
Sanjay Patel	de1d5d3675	[InstCombine] canonicalize funnel shift constant shift amount to be modulo bitwidth The shift argument is defined to be modulo the bitwidth, so if that argument is a constant, we can always reduce the constant to its minimal form to allow better CSE and other follow-on transforms. We need to be careful to ignore constant expressions here, or we will likely infinite loop. I'm adding a general vector constant query for that case. Differential Revision: https://reviews.llvm.org/D59374 llvm-svn: 356192	2019-03-14 19:22:08 +00:00
Sanjay Patel	6e86216531	[InstCombine] add tests for funnel shift constant shift amount mod bitwidth; NFC llvm-svn: 356191	2019-03-14 19:22:00 +00:00
Sanjay Patel	43570a0a62	[InstCombine] add tests for funnel shift constant shift amount mod bitwidth; NFC llvm-svn: 356175	2019-03-14 17:39:40 +00:00
Sam Parker	3b2ba20afd	[ARM] Run ARMParallelDSP in the IRPasses phase Run EarlyCSE before ParallelDSP and do this in the backend IR opt phase. Differential Revision: https://reviews.llvm.org/D59257 llvm-svn: 356130	2019-03-14 10:57:40 +00:00
Matt Arsenault	0253620f89	Verifier: Make sure masked load/store alignment is a power of 2 The same should also be done for scatter/gather, but the verifier doesn't check those at all now. llvm-svn: 356094	2019-03-13 19:46:34 +00:00
Craig Topper	9bae5ba076	[X86] Add ImmArg markings to intrinsics. Remove test cases that checked for not crashing when immediate operands were passed not an immediate. These are now considered ill-formed in IR. This was done by manually scanning the intrinsic file for llvm_i32_ty and llvm_i8_ty which are the predominant types we use for immediates. Most of them are on vector intrinsics. I might have missed some other intrinsics. Differential Revision: https://reviews.llvm.org/D58302 llvm-svn: 355993	2019-03-12 23:48:07 +00:00
Matt Arsenault	caf1316f71	IR: Add immarg attribute This indicates an intrinsic parameter is required to be a constant, and should not be replaced with a non-constant value. Add the attribute to all AMDGPU and generic intrinsics that comments indicate it should apply to. I scanned other target intrinsics, but I don't see any obvious comments indicating which arguments are intended to be only immediates. This breaks one questionable testcase for the autoupgrade. I'm unclear on whether the autoupgrade is supposed to really handle declarations which were never valid. The verifier fails because the attributes now refer to a parameter past the end of the argument list. llvm-svn: 355981	2019-03-12 21:02:54 +00:00
Philip Reames	9b6b4fac83	[SROA] Fix a crash when trying to convert a memset to an non-integral pointer type The included test case currently crashes on tip of tree. Rather than adding a bailout, I chose to restructure the code so that the existing helper function could be used. Given that, the majority of the diff is NFC-ish, but the key difference is that canConvertValue returns false when only one side is a non-integral pointer. Thanks to Cherry Zhang for the test case. Differential Revision: https://reviews.llvm.org/D59000 llvm-svn: 355962	2019-03-12 20:15:05 +00:00
Tim Northover	8935aca9c7	CodeGenPrep: preserve inbounds attribute when sinking GEPs. Targets can potentially emit more efficient code if they know address computations never overflow. For example ILP32 code on AArch64 (which only has 64-bit address computation) can ignore the possibility of overflow with this extra information. llvm-svn: 355926	2019-03-12 15:22:23 +00:00
Fangrui Song	f260967055	[SimplifyLibCalls] Fix comments about fputs, memchr, and s[n]printf. NFC llvm-svn: 355905	2019-03-12 10:31:52 +00:00
Sanjoy Das	3f5ce18658	Reland "Relax constraints for reduction vectorization" Change from original commit: move test (that uses an X86 triple) into the X86 subdirectory. Original description: Gating vectorizing reductions on all fastmath flags seems unnecessary; `reassoc` should be sufficient. Reviewers: tvvikram, mkuper, kristof.beyls, sdesmalen, Ayal Reviewed By: sdesmalen Subscribers: dcaballe, huntergr, jmolloy, mcrosier, jlebar, bixia, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57728 llvm-svn: 355889	2019-03-12 01:31:44 +00:00
Sanjoy Das	2136a5bc49	Revert "Relax constraints for reduction vectorization" This reverts commit r355868. Breaks hexagon. llvm-svn: 355873	2019-03-11 22:37:31 +00:00
Sanjoy Das	93f8cc186a	Relax constraints for reduction vectorization Summary: Gating vectorizing reductions on all fastmath flags seems unnecessary; `reassoc` should be sufficient. Reviewers: tvvikram, mkuper, kristof.beyls, sdesmalen, Ayal Reviewed By: sdesmalen Subscribers: dcaballe, huntergr, jmolloy, mcrosier, jlebar, bixia, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57728 llvm-svn: 355868	2019-03-11 21:36:41 +00:00
Brian Gesiak	d7b68132d8	[coroutines][PR40979] Ignore unreachable uses across suspend points Summary: Depends on https://reviews.llvm.org/D59069. https://bugs.llvm.org/show_bug.cgi?id=40979 describes a bug in which the -coro-split pass would assert that a use was across a suspend point from a definition. Normally this would mean that a value would "spill" across a suspend point and thus need to be stored in the coroutine frame. However, in this case the use was unreachable, and so it would not be necessary to store the definition on the frame. To prevent the assert, simply remove unreachable basic blocks from a coroutine function before computing spills. This avoids the assert reported in PR40979. Reviewers: GorNishanov, tks2103 Reviewed By: GorNishanov Subscribers: EricWF, jdoerfert, llvm-commits, lewissbaker Tags: #llvm Differential Revision: https://reviews.llvm.org/D59068 llvm-svn: 355852	2019-03-11 18:31:28 +00:00
Jeremy Morse	90ede5f4bf	[SimplifyCFG] Retain debug info when threading jumps with critical edges Fixes bug 38023: https://bugs.llvm.org/show_bug.cgi?id=38023 The SimplifyCFG pass will perform jump threading in some cases where doing so is trivial and would simplify the CFG. When folding a series of blocks with redundant conditional branches into an unconditional "critical edge" block, it does not keep the debug location associated with the previous conditional branch. This patch fixes the bug described by copying the debug info from the old conditional branch to the new unconditional branch instruction, and adds a regression test for the SimplifyCFG pass that covers this case. Patch by Stephen Tozer! Differential Revision: https://reviews.llvm.org/D59206 llvm-svn: 355833	2019-03-11 16:23:59 +00:00
Sam Parker	52760bf435	[CGP] Limit distance between overflow math and cmp Inserting an overflowing arithmetic intrinsic can increase register pressure by producing two values at a point where only one is needed, while the second use maybe several blocks away. This increase in pressure is likely to be more detrimental on performance than rematerialising one of the original instructions. So, check that the arithmetic and compare instructions are no further apart than their immediate successor/predecessor. Differential Revision: https://reviews.llvm.org/D59024 llvm-svn: 355823	2019-03-11 13:19:46 +00:00
Jeremy Morse	b60aea4131	[JumpThreading] Retain debug info when replacing branch instructions Fixes bug 37966: https://bugs.llvm.org/show_bug.cgi?id=37966 The Jump Threading pass will replace certain conditional branch instructions with unconditional branches when it can prove that only one branch can occur. Prior to this patch, it would not carry the debug info from the old instruction to the new one. This patch fixes the bug described by copying the debug info from the conditional branch instruction to the new unconditional branch instruction, and adds a regression test for the Jump Threading pass that covers this case. Patch by Stephen Tozer! Differential Revision: https://reviews.llvm.org/D58963 llvm-svn: 355822	2019-03-11 11:48:57 +00:00
Craig Topper	69f8c1653d	[ScalarizeMaskedMemIntrin] Use IRBuilder functions that take uint32_t/uint64_t for getelementptr, extractelement, and insertelement. This saves needing to call getInt32 ourselves. Making the code a little shorter. The test changes are because insert/extract use getInt64 internally. Shouldn't be a functional issue. This cleanup because I plan to write similar code for expandload/compressstore. llvm-svn: 355767	2019-03-09 02:08:41 +00:00
Rong Xu	ce3be45cac	[CodeGenPrepare] Fix ModifiedDT flag in optimizeSelectInst r44412 fixed a huge compile time regression but it needed ModifiedDT flag to be maintained correctly in optimizations in optimizeBlock() and optimizeInst(). Function optimizeSelectInst() does not update the flag. This patch propagates the flag in optimizeSelectInst() back to optimizeBlock(). This patch also removes ModifiedDT in CodeGenPrepare class (which is not used). The property of ModifiedDT is now recorded in a ref parameter. Differential Revision: https://reviews.llvm.org/D59139 llvm-svn: 355751	2019-03-08 22:46:18 +00:00
Clement Courbet	8e16d73346	[SelectionDAG] Allow the user to specify a memeq function. Summary: Right now, when we encounter a string equality check, e.g. `if (memcmp(a, b, s) == 0)`, we try to expand to a comparison if `s` is a small compile-time constant, and fall back on calling `memcmp()` else. This is sub-optimal because memcmp has to compute much more than equality. This patch replaces `memcmp(a, b, s) == 0` by `bcmp(a, b, s) == 0` on platforms that support `bcmp`. `bcmp` can be made much more efficient than `memcmp` because equality compare is trivially parallel while lexicographic ordering has a chain dependency. Subscribers: fedor.sergeev, jyknight, ckennelly, gchatelet, llvm-commits Differential Revision: https://reviews.llvm.org/D56593 llvm-svn: 355672	2019-03-08 09:07:45 +00:00
Florian Hahn	6ca0985aa5	[InterleavedAccessAnalysis] Fix integer overflow in insertMember. Without checking for integer overflow, invalid members can be added e.g. if the calculated key overflows, becomes positive and the largest key. This fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=7560 https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13128 https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13229 Reviewers: Ayal, anna, hsaito, efriedma Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D55538 llvm-svn: 355613	2019-03-07 17:50:16 +00:00
David Green	ffc922ec35	[LSR] Attempt to increase the accuracy of LSR's setup cost In some loops, we end up generating loop induction variables that look like: {(-1 * (zext i16 (%i0 * %i1) to i32))<nsw>,+,1} As opposed to the simpler: {(zext i16 (%i0 * %i1) to i32),+,-1} i.e we count up from -limit to 0, not the simpler counting down from limit to 0. This is because the scores, as LSR calculates them, are the same and the second is filtered in place of the first. We end up with a redundant SUB from 0 in the code. This patch tries to make the calculation of the setup cost a little more thoroughly, recursing into the scev members to better approximate the setup required. The cost function for comparing LSR costs is: return std::tie(C1.NumRegs, C1.AddRecCost, C1.NumIVMuls, C1.NumBaseAdds, C1.ScaleCost, C1.ImmCost, C1.SetupCost) < std::tie(C2.NumRegs, C2.AddRecCost, C2.NumIVMuls, C2.NumBaseAdds, C2.ScaleCost, C2.ImmCost, C2.SetupCost); So this will only alter results if none of the other variables turn out to be different. Differential Revision: https://reviews.llvm.org/D58770 llvm-svn: 355597	2019-03-07 13:44:40 +00:00
Nick Desaulniers	212c8ac23f	[LoopRotate] fix crash encountered with callbr Summary: While implementing inlining support for callbr (https://bugs.llvm.org/show_bug.cgi?id=40722), I hit a crash in Loop Rotation when trying to build the entire x86 Linux kernel (drivers/char/random.c). This is a small fix up to r353563. Test case is drivers/char/random.c (with callbr's inlined), then ran through creduce, then `opt -opt-bisect-limit=<limit>`, then bugpoint. Thanks to Craig Topper for immediately spotting the fix, and teaching me how to fish. Reviewers: craig.topper, jyknight Reviewed By: craig.topper Subscribers: hiraditya, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D58929 llvm-svn: 355564	2019-03-06 23:04:40 +00:00
Rong Xu	3ee1524afc	[PGO] Fix hexagon buildbot errors in r355541 Add "REQUIRES: x86-registered-target" to thinlto test cases. llvm-svn: 355556	2019-03-06 22:16:47 +00:00
Rong Xu	05c0afe842	[PGO] Context sensitive PGO (part 4) Part 4 of CSPGO changes: (1) add support in cmake for cspgo build. (2) fix an issue in big endian. (3) test cases. Differential Revision: https://reviews.llvm.org/D54175 llvm-svn: 355541	2019-03-06 19:31:37 +00:00
Philip Reames	9549f7560f	[AtomicExpand] Allow libcall expansion for non-zero address spaces (try 2) Restore a reverted commit, with the silly mistake fixed. Sorry for the previous breakage. Be consistent about how we treat atomics in non-zero address spaces. If we get to the backend, we tend to lower them as if in address space 0. Do the same if we need to insert a libcall instead. Differential Revision: https://reviews.llvm.org/D58760 llvm-svn: 355540	2019-03-06 19:27:13 +00:00
Nikita Popov	884feb1b69	[InstCombine] Fold add nsw + sadd.with.overflow Fold `add nsw` and `sadd.with.overflow` with constants if the addition does not overflow. Part of https://bugs.llvm.org/show_bug.cgi?id=38146. Patch by Dan Robertson. Differential Revision: https://reviews.llvm.org/D58881 llvm-svn: 355530	2019-03-06 18:30:00 +00:00
Mitch Phillips	f0c21e2ff5	Revert "[AtomicExpand] Allow libcall expansion for non-zero address spaces" for buildbot failures. llvm-svn: 355461	2019-03-06 00:25:40 +00:00
Florian Hahn	13bbcb3264	[ARM] Sink zext/sext operands for add and sub to enable vsubl generation. This uses the infrastructure added in rL353152 to sink zext and sexts to sub/add users, to enable vsubl/vaddl generation when NEON is available. See https://bugs.llvm.org/show_bug.cgi?id=40025. Reviewers: SjoerdMeijer, t.p.northover, samparker, efriedma Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D58063 llvm-svn: 355460	2019-03-06 00:10:03 +00:00
Philip Reames	1e4c5d3611	[AtomicExpand] Allow libcall expansion for non-zero address spaces Be consistent about how we treat atomics in non-zero address spaces. If we get to the backend, we tend to lower them as if in address space 0. Do the same if we need to insert a libcall instead. Differential Revision: https://reviews.llvm.org/D58760 llvm-svn: 355453	2019-03-05 23:00:14 +00:00
Florian Hahn	add2d2e304	[SLP] Fix invalid triple in X86 tests x86-64 is an invalid architecture in triples. Changing it to the correct triple (x86_64) changes some tests, because SLP is not deemed profitable any more. Reviewers: ABataev, RKSimon, spatel Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D58931 llvm-svn: 355420	2019-03-05 17:56:35 +00:00
David Green	4511f3fa86	[SCEV] Ensure that isHighCostExpansion takes into account what is being divided A SCEV is not low-cost just because you can divide it by a power of 2. We need to also check what we are dividing to make sure it too is not a high-code expansion. This helps to not expand the exit value of certain loops, helping not to bloat the code. The change in no-iv-rewrite.ll is reverting back to what it was testing before rL194116, and looks a lot like the other tests in replace-loop-exit-folds.ll. Differential Revision: https://reviews.llvm.org/D58435 llvm-svn: 355393	2019-03-05 12:12:18 +00:00
David Green	3bcb0aa7f9	[SCEV] Add some extra tests for IndVarSimplifys loop exit values. NFC. Add some tests for various loops of the form: while(S >= 32) { S -= 32; something(); }; return S; llvm-svn: 355389	2019-03-05 11:18:55 +00:00
Florian Hahn	fd2d89f98b	Fix invalid target triples in tests. (NFC) llvm-svn: 355349	2019-03-04 23:37:41 +00:00
Sanjay Patel	3b2d0bc7c2	[CodeGenPrepare] avoid crashing on non-canonical/degenerate code The test is reduced from an example in the post-commit thread for: rL354746 http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190304/632396.html While we must avoid dying here, the real question should be: Why is non-canonical and/or degenerate code making it to CGP when using the new pass manager? llvm-svn: 355345	2019-03-04 22:47:13 +00:00
Sanjay Patel	6e32b46b1d	[ConstantHoisting] avoid hang/crash from unreachable blocks (PR40930) I'm not too familiar with this pass, so there might be a better solution, but this appears to fix the degenerate: PR40930 PR40931 PR40932 PR40934 ...without affecting any real-world code. As we've seen in several other passes, when we have unreachable blocks, they can contain semi-bogus IR and/or cause unexpected conditions. We would not typically expect these patterns to make it this far, but we have to guard against them anyway. llvm-svn: 355337	2019-03-04 20:57:14 +00:00
Nikita Popov	8670faf939	[InstCombine] Add tests for add nsw + sadd.with.overflow; NFC Baseline tests for D58881, which fixes part of PR38146. Patch by Dan Robertson. llvm-svn: 355328	2019-03-04 19:35:46 +00:00
Davide Italiano	672bec223d	[InstCombine] Mark debug values as unavailable after DCE. Fixes PR40838. llvm-svn: 355301	2019-03-04 04:38:58 +00:00
Sanjay Patel	e076491759	[InstCombine] remove stale FIXME comment from test; NFC llvm-svn: 355293	2019-03-03 19:08:54 +00:00
Sanjay Patel	2a70703770	[ValueTracking] do not try to peek through bitcasts in computeKnownBitsFromAssume() There are no tests for this case, and I'm not sure how it could ever work, so I'm just removing this option from the matcher. This should fix PR40940: https://bugs.llvm.org/show_bug.cgi?id=40940 llvm-svn: 355292	2019-03-03 18:59:33 +00:00
Amaury Sechet	d341a94261	Add extra ops in add to sub transform test in order to enforce proper operand ordering. NFC llvm-svn: 355291	2019-03-03 15:11:13 +00:00
Amaury Sechet	315d0bbb9c	Add test case for add to sub transformation. NFC llvm-svn: 355277	2019-03-02 20:12:25 +00:00
Sanjay Patel	1f65903dc1	[InstCombine] move add after smin/smax Follow-up to rL355221. This isn't specifically called for within PR14613, but we'll get there eventually if it's not already requested in some other bug report. https://rise4fun.com/Alive/5b0 Name: smax Pre: WillNotOverflowSignedSub(C1,C0) %a = add nsw i8 %x, C0 %cond = icmp sgt i8 %a, C1 %r = select i1 %cond, i8 %a, i8 C1 => %c2 = icmp sgt i8 %x, C1-C0 %u2 = select i1 %c2, i8 %x, i8 C1-C0 %r = add nsw i8 %u2, C0 Name: smin Pre: WillNotOverflowSignedSub(C1,C0) %a = add nsw i32 %x, C0 %cond = icmp slt i32 %a, C1 %r = select i1 %cond, i32 %a, i32 C1 => %c2 = icmp slt i32 %x, C1-C0 %u2 = select i1 %c2, i32 %x, i32 C1-C0 %r = add nsw i32 %u2, C0 llvm-svn: 355272	2019-03-02 16:45:10 +00:00
Sanjay Patel	42ad8685c6	[InstCombine] add tests for add+smin/smax; NFC llvm-svn: 355271	2019-03-02 16:45:05 +00:00
Xing GUO	23b1dfe675	[Transforms] fix typo in test case. NFC. llvm-svn: 355265	2019-03-02 08:32:32 +00:00
Florian Hahn	98f11a7d75	[SCEV] Handle case where MaxBECount is less precise than ExactBECount for OR. In some cases, MaxBECount can be less precise than ExactBECount for AND and OR (the AND case was PR26207). In the OR test case, both ExactBECounts are undef, but MaxBECount are different, so we hit the assertion below. This patch uses the same solution the AND case already uses. Assertion failed: ((isa<SCEVCouldNotCompute>(ExactNotTaken) \|\| !isa<SCEVCouldNotCompute>(MaxNotTaken)) && "Exact is not allowed to be less precise than Max"), function ExitLimit This patch also consolidates test cases for both AND and OR in a single test case. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13245 Reviewers: sanjoy, efriedma, mkazantsev Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D58853 llvm-svn: 355259	2019-03-02 02:31:44 +00:00
Craig Topper	35f55d72f6	[X86] Remove IntrArgMemOnly from target specific gather/scatter intrinsics IntrArgMemOnly implies that only memory pointed to by pointer typed arguments will be accessed. But these intrinsics allow you to pass null to the pointer argument and put the full address into the index argument. Other passes won't be able to understand this. A colleague found that ISPC was creating gathers like this and then dead store elimination removed some stores because it didn't understand what the gather was doing since the pointer argument was null. Differential Revision: https://reviews.llvm.org/D58805 llvm-svn: 355228	2019-03-01 21:02:40 +00:00
Craig Topper	e6bfb0919c	[X86] Add test case for D58805. NFC This demonstrates dead store elimination removing a store that may alias a gather that uses null as its base. llvm-svn: 355227	2019-03-01 21:02:34 +00:00
Philip Reames	cf0a978e1f	[InstCombine] Extend saturating idempotent atomicrmw transform to FP I'm assuming that the nan propogation logic for InstructonSimplify's handling of fadd and fsub is correct, and applying the same to atomicrmw. Differential Revision: https://reviews.llvm.org/D58836 llvm-svn: 355222	2019-03-01 19:50:36 +00:00
Sanjay Patel	6e1e7e1c3e	[InstCombine] move add after umin/umax In the motivating cases from PR14613: https://bugs.llvm.org/show_bug.cgi?id=14613 ...moving the add enables us to narrow the min/max which eliminates zext/trunc which enables signficantly better vectorization. But that bug is still not completely fixed. https://rise4fun.com/Alive/5KQ Name: umax Pre: C1 u>= C0 %a = add nuw i8 %x, C0 %cond = icmp ugt i8 %a, C1 %r = select i1 %cond, i8 %a, i8 C1 => %c2 = icmp ugt i8 %x, C1-C0 %u2 = select i1 %c2, i8 %x, i8 C1-C0 %r = add nuw i8 %u2, C0 Name: umin Pre: C1 u>= C0 %a = add nuw i32 %x, C0 %cond = icmp ult i32 %a, C1 %r = select i1 %cond, i32 %a, i32 C1 => %c2 = icmp ult i32 %x, C1-C0 %u2 = select i1 %c2, i32 %x, i32 C1-C0 %r = add nuw i32 %u2, C0 llvm-svn: 355221	2019-03-01 19:42:40 +00:00
Sanjay Patel	20292a0526	[InstCombine] add tests for umin/umax narrowing (PR14613); NFC llvm-svn: 355220	2019-03-01 19:42:34 +00:00
Philip Reames	2226e9a745	[LICM] Infer proper alignment from loads during scalar promotion This patch fixes an issue where we would compute an unnecessarily small alignment during scalar promotion when no store is not to be guaranteed to execute, but we've proven load speculation safety. Since speculating a load requires proving the existing alignment is valid at the new location (see Loads.cpp), we can use the alignment fact from the load. For non-atomics, this is a performance problem. For atomics, this is a correctness issue, though an incredibly rare one to see in practice. For atomics, we might not be able to lower an improperly aligned load or store (i.e. i32 align 1). If such an instruction makes it all the way to codegen, we may fail to codegen the operation, or we may simply generate a slow call to a library function. The part that makes this super hard to see in practice is that the memory location actually is well aligned, and instcombine knows that. So, to see a failure, you have to have a) hit the bug in LICM, b) somehow hit a depth limit in InstCombine/ValueTracking to avoid fixing the alignment, and c) then have generated an instruction which fails codegen rather than simply emitting a slow libcall. All around, pretty hard to hit. Differential Revision: https://reviews.llvm.org/D58809 llvm-svn: 355217	2019-03-01 18:45:05 +00:00
Philip Reames	1648f95eb1	[Tests] More missing atomicrmw combines llvm-svn: 355215	2019-03-01 18:24:05 +00:00
Philip Reames	21f7c35df1	[Tests] Add tests for missed optimizations of saturating and idempotent FP atomicrmws llvm-svn: 355212	2019-03-01 18:10:37 +00:00
Philip Reames	77982868c5	[InstCombine] Extend "idempotent" atomicrmw optimizations to floating point An idempotent atomicrmw is one that does not change memory in the process of execution. We have already added handling for the various integer operations; this patch extends the same handling to floating point operations which were recently added to IR. Note: At the moment, we canonicalize idempotent fsub to fadd when ordering requirements prevent us from using a load. As discussed in the review, I will be replacing this with canonicalizing both floating point ops to integer ops in the near future. Differential Revision: https://reviews.llvm.org/D58251 llvm-svn: 355210	2019-03-01 18:00:07 +00:00
Sanjay Patel	12b1f2418d	[InstCombine] add tests for add+umin/umax canonicalization; NFC Fixing this should solve the biggest part of the vector problems seen in: https://bugs.llvm.org/show_bug.cgi?id=14613 llvm-svn: 355206	2019-03-01 17:29:10 +00:00
Fangrui Song	f4b25f700a	[ConstantHoisting] Call cleanup() in ConstantHoistingPass::runImpl to avoid dangling elements in ConstIntInfoVec for new PM Summary: ConstIntInfoVec contains elements extracted from the previous function. In new PM, releaseMemory() is not called and the dangling elements can cause segfault in findConstantInsertionPoint. Rename releaseMemory() to cleanup() to deliver the idea that it is mandatory and call cleanup() in ConstantHoistingPass::runImpl to fix this. Reviewers: ormris, zzheng, dmgreen, wmi Reviewed By: ormris, wmi Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58589 llvm-svn: 355174	2019-03-01 05:27:01 +00:00

... 4 5 6 7 8 ...

12759 Commits