llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjay Patel	d0cdb2f861	[InstCombine] allow fmul fold with less than 'fast' This is a retry of r326502 with updates to the reassociate test file that I missed the first time. @test15_reassoc in the supposed -reassociate test file (except that it tests 2 other passes too...) shows that there's no clear responsiblity for reassociation transforms. Instcombine now gets that case, but only because the constant values are identical. Otherwise, it would still miss that pattern. Reassociate doesn't get that case because it hasn't been updated to use less than 'fast' FMF. llvm-svn: 326513	2018-03-02 00:14:51 +00:00
Sanjay Patel	f2664d0663	[Reassociate] regenerate checks; NFC llvm-svn: 326511	2018-03-01 23:41:03 +00:00
Sanjay Patel	eb5d046890	revert r326502: [InstCombine] allow fmul fold with less than 'fast' I forgot that I added tests for 'reassoc' to -reassociate, but suprisingly that file calls -instcombine too, so it is affected. I'll update that file and try again. llvm-svn: 326510	2018-03-01 23:39:24 +00:00
Sanjay Patel	7373ae5c9a	[InstCombine] allow fmul fold with less than 'fast' llvm-svn: 326502	2018-03-01 22:53:47 +00:00
Craig Topper	f1a7c6755d	[InstCombine] Auto-generate complete checks. NFC llvm-svn: 326474	2018-03-01 20:05:07 +00:00
Sanjay Patel	d696e93cb6	[InstCombine] remove stale comments for tests; NFC llvm-svn: 326448	2018-03-01 16:28:32 +00:00
Sanjay Patel	3fd43a843b	[InstCombine] move/add tests for fmul reassociation; NFC This transform may be out-of-scope for instcombine, but this is only documenting the current behavior. llvm-svn: 326442	2018-03-01 15:30:44 +00:00
Sanjay Patel	237e52674f	[InstCombine] auto-generate full checks; NFC llvm-svn: 326440	2018-03-01 15:13:42 +00:00
Max Kazantsev	f8d2969abb	[SCEV] Smart range calculation for SCEVUnknown Phis The range of SCEVUnknown Phi which merges values `X1, X2, ..., XN` can be evaluated as `U(Range(X1), Range(X2), ..., Range(XN))`. Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D43810 llvm-svn: 326418	2018-03-01 06:56:48 +00:00
Reid Kleckner	3762a089d7	[IPSCCP] do not break musttail invariant (PR36485) Do not replace results of `musttail` calls with a constant if the call itself can't be removed. Do not zap returns of `musttail` callees, if the call site can't be removed and replaced with a constant. Do not zap returns of `musttail`-calling blocks, this breaks invariant too. Patch by Fedor Indutny Differential Revision: https://reviews.llvm.org/D43695 llvm-svn: 326404	2018-03-01 01:19:18 +00:00
Reid Kleckner	cb9611ca67	[DAE] don't remove args of musttail target/caller `musttail` requires identical signatures of caller and callee. Removing arguments breaks `musttail` semantics. PR36441 Patch by Fedor Indutny Differential Revision: https://reviews.llvm.org/D43708 llvm-svn: 326394	2018-03-01 00:09:35 +00:00
Sanjay Patel	eaf5a120ed	[InstCombine] simplify code for X * -1.0 --> -X; NFC I've added random FMF to one of the tests to show those are propagated. llvm-svn: 326377	2018-02-28 22:30:04 +00:00
Jonas Devlieghere	9ca064552a	[GlobalOpt] don't change CC of musttail calle(e\|r) When the function has musttail call - its cc is fixed to be equal to the cc of the musttail callee. In such case (and in the case of the musttail callee), GlobalOpt should not change the cc to fastcc as it will break the invariant. This fixes PR36546 Patch by: Fedor Indutny (indutny) Differential revision: https://reviews.llvm.org/D43859 llvm-svn: 326376	2018-02-28 22:28:44 +00:00
Sanjay Patel	356e77f550	[InstCombine] auto-generate complete checks; NFC llvm-svn: 326331	2018-02-28 16:53:45 +00:00
Xin Tong	8ba674e43b	[MergeICmp] Fix a bug in MergeICmp that can lead to a block being processed more than once. Summary: Fix a bug in MergeICmp that can lead to a BCECmp block being processed more than once and eventually lead to a broken LLVM module. The problem is that if the non-constant value is not produced by the last block, the producer will be processed once when the its parent block is processed and second time when the last block is processed. We end up having 2 same BCECmpBlock in the merge queue. And eventually lead to a broken LLVM module. Reviewers: courbet, davide Reviewed By: courbet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43825 llvm-svn: 326318	2018-02-28 12:08:00 +00:00
Mohammad Shahid	ddeee12f59	[SLP] Added new tests and updated existing for jumbled load, NFC. llvm-svn: 326303	2018-02-28 04:19:34 +00:00
Sanjay Patel	bf28a8fc01	[InstSimplify] add tests for FP with undef operand; NFC Are any of these correct? llvm-svn: 326241	2018-02-27 20:17:18 +00:00
Craig Topper	301991080e	[ValueTracking] Teach cannotBeOrderedLessThanZeroImpl to look through ExtractElement. This is similar to what's done in computeKnownBits and computeSignBits. Don't do anything fancy just collect information valid for any element. Differential Revision: https://reviews.llvm.org/D43789 llvm-svn: 326237	2018-02-27 19:53:45 +00:00
Sanjay Patel	8529dd5ee1	[ARM] add loop vectorizer test based on 482.sphinx3 from SPEC2006; NFC This is a slight reduction of one of the benchmarks that suffered with D43079. Cost model changes should not cause this test to remain scalarized. llvm-svn: 326221	2018-02-27 18:33:24 +00:00
Sanjay Patel	04d1d79ee5	[AArch64] add SLP test based on TSVC; NFC This is a slight reduction of one of the benchmarks that suffered with D43079. Cost model changes should not cause this test to remain scalarized. llvm-svn: 326217	2018-02-27 18:06:15 +00:00
Florian Hahn	1807c516c7	[NewGVN] Update phi-of-ops def block when updating existing ValuePHI. In case we update a ValuePHI node created earlier, we could update it based on a different OpPHI which could be in a different block. We need to update the TempToBlock mapping reflecting the new block, otherwise we would end up placing the new phi node in a wrong block. This problem is exposed by the test case in https://bugs.llvm.org/show_bug.cgi?id=36504. This patch fixes a slightly simpler problem than in the bug report. In the bug's re-producer, the additional problem is that we are re-using a ValuePHI node with to few incoming values for the new OpPHI. If this patch makes sense, I will follow it up with a patch that creates a new PHI node if the existing PHI node has a different number of incoming values. Reviewers: davide, dberlin Reviewed By: dberlin Differential Revision: https://reviews.llvm.org/D43770 llvm-svn: 326181	2018-02-27 09:34:51 +00:00
Adam Nemet	b424cd5d61	Make test agnostic to cost model This was causing bot failures on greendragon llvm-svn: 326169	2018-02-27 05:41:16 +00:00
Evgeny Stupachenko	f1c058d99b	Fix r326154 buildbots test fail Summary: Add specific mtriples to tests added in r326154. From: Evgeny Stupachenko <evstupac@gmail.com> <evgeny.v.stupachenko@intel.com> llvm-svn: 326158	2018-02-27 01:33:11 +00:00
Evgeny Stupachenko	a732611ac8	Fix PR36032, PR35432 Summary: The change fix an assert fail at ScalarEvolutionExpander.cpp: assert(ExitCount != SE.getCouldNotCompute() && "Invalid loop count"); Reviewers: sbaranga Differential Revision: http://reviews.llvm.org/D42604 From: Evgeny Stupachenko <evstupac@gmail.com> <evgeny.v.stupachenko@intel.com> llvm-svn: 326154	2018-02-27 00:17:31 +00:00
Sanjay Patel	66911b16e6	[InstCombine, InstSimplify] add tests with undef elements in constant FP vectors; NFC llvm-svn: 326148	2018-02-26 23:23:02 +00:00
Craig Topper	69c8972fd1	[ValueTracking] Teach cannotBeOrderedLessThanZeroImpl to handle vector constants. Summary: This allows vector fabs to be removed in more cases. Reviewers: spatel, arsenm, RKSimon Reviewed By: spatel Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D43739 llvm-svn: 326138	2018-02-26 22:33:17 +00:00
Simon Pilgrim	9929f90740	[X86][SSE] Reduce FADD/FSUB/FMUL costs on later targets (PR36280) Agner's tables indicate that for SSE42+ targets (Core2 and later) we can reduce the FADD/FSUB/FMUL costs down to 1, which should fix the Himeno benchmark. Note: the AVX512 FDIV costs look rather dodgy, but this isn't part of this patch. Differential Revision: https://reviews.llvm.org/D43733 llvm-svn: 326133	2018-02-26 22:10:17 +00:00
Alexey Bataev	b44e2b75e8	[SLP] Added new test + fixed some checks, NFC. llvm-svn: 326117	2018-02-26 20:01:24 +00:00
Craig Topper	43fb1cdef7	[InstCombine] Add test cases with vector constants to fpextend.ll llvm-svn: 326115	2018-02-26 19:36:37 +00:00
Craig Topper	b284e8b9b4	[InstCombine] Switch to using FileCheck instead of grep. Auto-generate checks. NFC llvm-svn: 326114	2018-02-26 19:36:36 +00:00
Sanjay Patel	31a90468e1	[InstCombine] allow fdiv folds with less than fully 'fast' ops Note: gcc appears to allow this fold with -freciprocal-math alone, but clang/llvm require more than that with this patch. The wording in the definitions seems fuzzy enough that it could go either way, but we'll err on the conservative side of FMF interpretation. This patch also changes the newly created fmul to have FMF propagated by the last fdiv rather than intersecting the FMF of the fdivs. This matches the behavior of other folds near here. The new fmul is only used to produce an intermediate op for the final fdiv result, so it shouldn't be any stricter than that result. The previous behavior could result in dropping FMF via other folds in instcombine or CSE. Differential Revision: https://reviews.llvm.org/D43398 llvm-svn: 326098	2018-02-26 16:02:45 +00:00
Renato Golin	9d1b2acaaa	[LV] Move isLegalMasked* functions from Legality to CostModel All SIMD architectures can emulate masked load/store/gather/scatter through element-wise condition check, scalar load/store, and insert/extract. Therefore, bailing out of vectorization as legality failure, when they return false, is incorrect. We should proceed to cost model and determine profitability. This patch is to address the vectorizer's architectural limitation described above. As such, I tried to keep the cost model and vectorize/don't-vectorize behavior nearly unchanged. Cost model tuning should be done separately. Please see http://lists.llvm.org/pipermail/llvm-dev/2018-January/120164.html for RFC and the discussions. Closes D43208. Patch by: Hideki Saito <hideki.saito@intel.com> llvm-svn: 326079	2018-02-26 11:06:36 +00:00
Florian Hahn	ed45836253	[LoopInterchange] Add test case for D43236. llvm-svn: 326078	2018-02-26 10:46:25 +00:00
Craig Topper	aee341ef28	[InstSimplify] Add test cases for removal of vector fabs on known positive. llvm-svn: 326050	2018-02-25 06:51:52 +00:00
Craig Topper	2b8f051aaa	[InstSimplify] Remove unused parameter from test cases. llvm-svn: 326049	2018-02-25 06:51:51 +00:00
Adam Nemet	e4e1de60aa	Revert "StructurizeCFG: Test for branch divergence correctly" This reverts commit r325881. Breaks many bots llvm-svn: 326037	2018-02-24 17:29:09 +00:00
Sanjay Patel	db53d1847b	[InstSimplify] sqrt(X) * sqrt(X) --> X This was misplaced in InstCombine. We can loosen the FMF as a follow-up step. llvm-svn: 325965	2018-02-23 22:20:13 +00:00
Sanjay Patel	d32104e1b2	[InstCombine] allow fmul-sqrt folds with less than full -ffast-math Also, add a Builder method for intrinsics to reduce code duplication for clients. llvm-svn: 325960	2018-02-23 21:16:12 +00:00
Matt Davis	708271849a	[Test] Fix the test to output to /dev/null instead of redirecting. The redirection was confusing the windows build machine. llvm-svn: 325937	2018-02-23 19:03:04 +00:00
Matt Davis	523c656e25	[Debug] Add dbg.value intrinsics for PHIs created during LCSSA. Summary: This patch is an enhancement to propagate dbg.value information when Phis are created on behalf of LCSSA. I noticed a case where a value carried across a loop was reported as <optimized out>. Specifically this case: ``` int bar(int x, int y) { return x + y; } int foo(int size) { int val = 0; for (int i = 0; i < size; ++i) { val = bar(val, i); // Both val and i are correct } return val; // <optimized out> } ``` In the above case, after all of the interesting computation completes our value is reported as "optimized out." This change will add a dbg.value to correct this. This patch also moves the dbg.value insertion routine from LoopRotation.cpp into Local.cpp, so that we can share it in both places (LoopRotation and LCSSA). Reviewers: mzolotukhin, aprantl, vsk, davide Reviewed By: aprantl, vsk Subscribers: dberlin, llvm-commits Differential Revision: https://reviews.llvm.org/D42551 llvm-svn: 325926	2018-02-23 17:38:27 +00:00
Nicolai Haehnle	43c1115cd4	StructurizeCFG: Test for branch divergence correctly Summary: This fixes cases like the new test @nonuniform. In that test, %cc itself is a uniform value; however, when reading it after the end of the loop in basic block %if, its value is effectively non-uniform. This problem was encountered in https://bugs.freedesktop.org/show_bug.cgi?id=103743; however, this change in itself is not sufficient to fix that bug, as there is another issue in the AMDGPU backend. Change-Id: I32bbffece4a32f686fab54964dae1a5dd72949d4 Reviewers: arsenm, rampitec, jlebar Subscribers: wdng, tpr, llvm-commits Differential Revision: https://reviews.llvm.org/D40546 llvm-svn: 325881	2018-02-23 10:45:46 +00:00
Bjorn Steinbrink	983d6c3f18	Mark MergedLoadStoreMotion as not preserving MemDep results Summary: MemDep caches results that signify that a dependence is non-local, and there is currently no way to invalidate such cache entries. Unfortunately, when MLSM sinks a store that can result in a non-local dependence becoming a local one, and then MemDep gives wrong answers. The easiest way out here is to just say that MLSM does indeed not preserve MemDep results. Reviewers: davide, Gerolf Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43177 llvm-svn: 325880	2018-02-23 10:41:57 +00:00
Daniel Neilson	20c9207be3	[AlignmentFromAssumptions] Set source and dest alignments of memory intrinsiscs separately Summary: This change is part of step five in the series of changes to remove alignment argument from memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the AlignmentFromAssumptions pass to cease using the old getAlignment()/setAlignment API of MemoryIntrinsic in favour of getting/setting source & dest specific alignments through the new API. This allows us to simplify some of the code in this pass and also be more aggressive about setting the source and destination alignments separately. Steps: Step 1) Remove alignment parameter and create alignment parameter attributes for memcpy/memmove/memset. ( rL322965, rC322964, rL322963 ) Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. ( rL323597 ) Step 3) Update Clang to use the new IRBuilder API. ( rC323617 ) Step 4) Update Polly to use the new IRBuilder API. ( rL323618 ) Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get\|set]Alignment() to use [get\|set]DestAlignment() and [get\|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278, rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774, rL324781, rL324784, rL324955, rL324960 ) Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get\|set]Alignment() methods. Reference http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html Reviewers: hfinkel, bollu, reames Reviewed By: reames Subscribers: reames, llvm-commits Differential Revision: https://reviews.llvm.org/D43081 llvm-svn: 325816	2018-02-22 18:55:59 +00:00
Luke Cheeseman	6c1e6bbe0c	[FunctionAttrs][ArgumentPromotion][GlobalOpt] Disable some optimisations passes for naked functions - Fix for bug 36078. - Prevent the functionattrs, function-attrs, globalopt and argpromotion passes from changing naked functions. - These passes can perform some alterations to the functions that should not be applied. An example is removing parameters that are seemingly not used because they are only referenced in the inline assembly. Another example is marking the function as fastcc. llvm-svn: 325788	2018-02-22 14:42:08 +00:00
Sanjay Patel	92b7371113	[InstCombine] add fmul multi-use test; NFC Also, rename tests to make their intent clearer. llvm-svn: 325785	2018-02-22 14:27:16 +00:00
Simon Pilgrim	864949d5e9	[SLPVectorizer][X86] Add load extend tests (PR36091) llvm-svn: 325772	2018-02-22 12:19:34 +00:00
Sanjay Patel	9befaeb582	[InstCombine] add some random FMF to tests so we know it's not dropped; NFC llvm-svn: 325734	2018-02-21 22:48:28 +00:00
Sanjay Patel	d53da082a0	[AArch64] fix IR names to not be 'tmp' because that gives the CHECK script problems llvm-svn: 325718	2018-02-21 20:48:14 +00:00
Sanjay Patel	ffe51e450f	[AArch64] add SLP test for matmul (PR36280); NFC This is a slight reduction of one of the benchmarks that suffered with D43079. Cost model changes should not cause this test to remain scalarized. llvm-svn: 325717	2018-02-21 20:34:16 +00:00
Alexey Bataev	650f639d33	[LV] Fix test checks, NFC llvm-svn: 325699	2018-02-21 16:48:23 +00:00
Alexey Bataev	cdd0675ddc	[SLP] Fix test checks, NFC. llvm-svn: 325689	2018-02-21 15:32:58 +00:00
Silviu Baranga	10ad93c6bf	[SCEV] Temporarily disable loop versioning for the purpose of turning SCEVUnknowns of PHIs into AddRecExprs. This feature is now hidden behind the -scev-version-unknown flag. Fixes PR36032 and PR35432. llvm-svn: 325687	2018-02-21 15:20:32 +00:00
Vedant Kumar	56492f9177	[BDCE] Salvage debug info from dying insts This results in 15 additional unique source variables in a stage2 build of FileCheck (at '-Os -g'), with a negligible increase in the size of the .debug_loc section. llvm-svn: 325660	2018-02-21 01:55:33 +00:00
Sanjay Patel	e6143904b9	revert r325515: [TTI CostModel] change default cost of FP ops to 1 (PR36280) There are too many perf regressions resulting from this, so we need to investigate (and add tests for) targets like ARM and AArch64 before trying to reinstate. llvm-svn: 325658	2018-02-21 01:42:52 +00:00
Sanjay Patel	6f716a7c5e	[InstCombine] C / -X --> -C / X We already do this in DAGCombiner, but it should also be good to eliminate the fsub use in IR. This is similar to rL325648. llvm-svn: 325649	2018-02-21 00:01:45 +00:00
Sanjay Patel	d8dd0151fc	[InstCombine] -X / C --> X / -C for FP We already do this in DAGCombiner, but it should also be good to eliminate the fsub use in IR. llvm-svn: 325648	2018-02-20 23:51:16 +00:00
Sanjay Patel	8357371861	[InstCombine] add tests for fdiv with negated op and constant op; NFC llvm-svn: 325644	2018-02-20 23:34:43 +00:00
Sanjay Patel	3e569ac0cc	[PatternMatch] allow vector matches with m_FNeg llvm-svn: 325642	2018-02-20 23:29:05 +00:00
Sanjoy Das	737fa40ffa	[DSE] Don't DSE stores that subsequent memmove calls read from Summary: We used to remove the first memmove in cases like this: memmove(p, p+2, 8); memmove(p, p+2, 8); which is incorrect. Fix this by changing isPossibleSelfRead to what was most likely the intended behavior. Historical note: the buggy code was added in https://reviews.llvm.org/rL120974 to address PR8728. Reviewers: rsmith Subscribers: mcrosier, llvm-commits, jlebar Differential Revision: https://reviews.llvm.org/D43425 llvm-svn: 325641	2018-02-20 23:19:34 +00:00
Sanjay Patel	4f65e0d008	[InstCombine] auto-generate full checks; NFC llvm-svn: 325639	2018-02-20 23:08:47 +00:00
Sanjay Patel	088f4690f5	[InstCombine] add test for vector -X/-Y; NFC m_FNeg doesn't match vector types. llvm-svn: 325637	2018-02-20 22:46:38 +00:00
Benjamin Kramer	1516dd70bb	Fix broken test from r325630. llvm-svn: 325634	2018-02-20 22:30:16 +00:00
Benjamin Kramer	fd0630665b	[MemoryBuiltins] Check nobuiltin status when identifying calls to free. This is usually not a problem because this code's main purpose is eliminating unused new/delete pairs. We got deletes of nullptr or nobuiltin deletes of builtin new wrong though. llvm-svn: 325630	2018-02-20 22:00:33 +00:00
Sanjay Patel	e29caaa9c5	[PatternMatch] enhance m_SignMask() to ignore undef elements in vectors llvm-svn: 325623	2018-02-20 21:02:40 +00:00
Sanjay Patel	ff7b777bbe	[InstSimplify] add tests for m_SignMask with undef vector elements; NFC llvm-svn: 325622	2018-02-20 20:53:35 +00:00
Alexey Bataev	42bcec7d38	[LV] Fix test checks, NFC. llvm-svn: 325617	2018-02-20 19:49:25 +00:00
Alexey Bataev	47dfd249f0	[SLP] Fix tests checks, NFC. llvm-svn: 325605	2018-02-20 18:11:50 +00:00
Sanjay Patel	90f4c8ec29	[InstCombine] fold fdiv with non-splat divisor to fmul: X/C --> X * (1/C) llvm-svn: 325590	2018-02-20 16:08:15 +00:00
Sanjay Patel	1d14779aed	[InstCombine] allow fdiv with constant dividend folds with less than full -ffast-math It's possible that we could allow this either 'arcp' or 'reassoc' alone, but this should be conservatively better than what we have right now. GCC allows this with only -freciprocal-math. The last test is changed to show a case that is expected to fold, but we need D43398. llvm-svn: 325533	2018-02-19 21:46:52 +00:00
Sanjay Patel	e82cc6fcc5	[InstCombine] move fdiv tests; NFC Also, use vector constants just to prove that already works. llvm-svn: 325530	2018-02-19 21:13:39 +00:00
Sanjay Patel	3e8a76abfd	[TTI CostModel] change default cost of FP ops to 1 (PR36280) This change was mentioned at least as far back as: https://bugs.llvm.org/show_bug.cgi?id=26837#c26 ...and I found a real program that is harmed by this: Himeno running on AMD Jaguar gets 6% slower with SLP vectorization: https://bugs.llvm.org/show_bug.cgi?id=36280 ...but the change here appears to solve that bug only accidentally. The div/rem costs for x86 look very wrong in some cases, but that's already true, so we can fix those in follow-up patches. There's also evidence that more cost model changes are needed to solve SLP problems as shown in D42981, but that's an independent problem (though the solution may be adjusted after this change is made). Differential Revision: https://reviews.llvm.org/D43079 llvm-svn: 325515	2018-02-19 16:11:44 +00:00
Ivan A. Kosarev	f03f579d1d	[Transforms] Propagate new-format TBAA tags on simplification of memory-transfer intrinsics With this patch in place, when a new-format TBAA tag is available for a memory-transfer intrinsic call, we prefer propagating that new-format tag. Otherwise, we fallback to the old approach where we try to construct a proper TBAA access tag from 'tbaa.struct' metadata. Differential Revision: https://reviews.llvm.org/D41543 llvm-svn: 325488	2018-02-19 12:10:20 +00:00
Sanjay Patel	adf6e88c74	[PatternMatch, InstSimplify] enhance m_AllOnes() to ignore undef elements in vectors Loosening the matcher definition reveals a subtle bug in InstSimplify (we should not assume that because an operand constant matches that it's safe to return it as a result). So I'm making that change here too (that diff could be independent, but I'm not sure how to reveal it before the matcher change). This also seems like a good reason to not include matchers that capture the value. We don't want to encourage the potential misstep of propagating undef values when it's not allowed/intended. I didn't include the capture variant option here or in the related rL325437 (m_One), but it already exists for other constant matchers. llvm-svn: 325466	2018-02-18 18:05:08 +00:00
Sanjay Patel	7faceaed31	[InstSimplify] add tests with vector undef elts; NFC llvm-svn: 325465	2018-02-18 17:39:09 +00:00
Sanjay Patel	f569578373	[PatternMatch] enhance m_One() to ignore undef elements in vectors llvm-svn: 325437	2018-02-17 16:00:42 +00:00
Sanjay Patel	a6a1426cf1	[InstSimplify, InstCombine] add tests with vector undef elts; NFC These would fold if the m_One pattern matcher accounted for undef elts. llvm-svn: 325436	2018-02-17 15:55:40 +00:00
Sanjay Patel	841ca95219	[InstSimplify] add vector select tests with undef elts in condition; NFC llvm-svn: 325419	2018-02-17 01:18:53 +00:00
Sanjay Patel	870fbda805	[InstCombine] add FMF to better show current fdiv fold behavior; NFC llvm-svn: 325365	2018-02-16 17:46:50 +00:00
Eugene Leviant	8c83b9b8c5	[ThinLTO] Fix data race in test #2 Switched to the right option (-thinlto-threads) llvm-svn: 325362	2018-02-16 17:25:03 +00:00
Eugene Leviant	c9724d9149	[ThinLTO] Fix data race in test llvm-svn: 325361	2018-02-16 16:56:33 +00:00
Brian M. Rzycki	f1a7df5ef2	[JumpThreading] PR36133 enable/disable DominatorTree for LVI analysis Summary: The LazyValueInfo pass caches a copy of the DominatorTree when available. Whenever there are pending DominatorTree updates within JumpThreading's DeferredDominance object we cannot use the cached DT for LVI analysis. This commit adds the new methods enableDT() and disableDT() to LVI. JumpThreading also sets the appropriate usage model before calling LVI analysis methods. Fixes https://bugs.llvm.org/show_bug.cgi?id=36133 Reviewers: sebpop, dberlin, kuhar Reviewed by: sebpop, kuhar Subscribers: uabelho, llvm-commits, aprantl, hiraditya, a.elovikov Differential Revision: https://reviews.llvm.org/D42717 llvm-svn: 325356	2018-02-16 16:35:17 +00:00
Ivan A. Kosarev	53270d0fa6	[Transforms] Propagate TBAA info in SROA Now that we have the new TBAA metadata format that is capable of representing accesses to aggregates, we can propagate TBAA access tags from memory setting and transferring intrinsics to load and store instructions and vice versa. Since SROA produces lots of new loads and stores on optimized builds, this change significantly decreases the share of undecorated memory accesses on such builds. Differential Revision: https://reviews.llvm.org/D41563 llvm-svn: 325329	2018-02-16 10:10:29 +00:00
Eugene Leviant	7331a0bf1c	[ThinLTO] Import global variables Differential revision: https://reviews.llvm.org/D43077 llvm-svn: 325320	2018-02-16 08:11:04 +00:00
Vedant Kumar	3dc6de619a	Remove brittle check lines from a test, NFC llvm-svn: 325310	2018-02-16 01:21:01 +00:00
Vedant Kumar	616fdb00df	[GVN] Partially revert debug info salvage change (r325063) In r325063, we salvaged debug values from dying instructions in GVN::processBlock() and GVN::performScalarPRE(). The change in performScalarPRE(), while correct, is unhelpful. It introduced a call to salvageDebugInfo() which was immediately followed by a RAUW, meaning it prevented the RAUW from efficiently updating dbg.value intrinsics. This commit reverts the mistake and tightens up the affected test case. llvm-svn: 325308	2018-02-16 01:15:20 +00:00
Vedant Kumar	1df820ecd7	[DCE] Salvage debug info from dead insts This results in small increases in the size of the .debug_loc section and the number of unique source variables in a stage2 build of opt. llvm-svn: 325301	2018-02-15 22:26:18 +00:00
Brian Gesiak	a5e3675bd3	[Coroutines] Don't move stores for allocator args Summary: The behavior described in Coroutines TS `[dcl.fct.def.coroutine]/7` allows coroutine parameters to be passed into allocator functions. The instructions to store values into the alloca'd parameters must not be moved past the frame allocation, otherwise uninitialized values are passed to the allocator. Test Plan: `check-llvm` Reviewers: rsmith, GorNishanov, eric_niebler Reviewed By: GorNishanov Subscribers: compnerd, EricWF, llvm-commits Differential Revision: https://reviews.llvm.org/D43000 llvm-svn: 325285	2018-02-15 19:31:45 +00:00
Vedant Kumar	24953dc876	[SCCP] Test that constant propagation updates debug info, NFC This extends an existing test to check that SCCP updates the operands of relevant dbg.value instructions as it does its work. llvm-svn: 325281	2018-02-15 19:13:04 +00:00
Alexey Bataev	862c476fc2	[SLP] Fix the test for the reversed stores, NFC. llvm-svn: 325268	2018-02-15 17:11:50 +00:00
Alexey Bataev	ac619599d8	[SLP] Added test for reversed stores, NFC. llvm-svn: 325265	2018-02-15 16:56:49 +00:00
Sanjay Patel	9174416e89	[InstCombine] test fdiv folds better; NFC We had redundant tests, but no tests for extra uses or vectors. 'fast' is an overly conservative requirement for these folds. llvm-svn: 325262	2018-02-15 16:28:15 +00:00
Sanjay Patel	339b4d338d	[InstCombine] allow sin/cos transforms with 'reassoc' The variable name 'AllowReassociate' is a lie at this point because it's set to 'isFast()' which is more than the 'reassoc' FMF after rL317488. In D41286, we showed that this transform may be valid even with strict math by brute force checking every 32-bit float result. There's a potential problem here because we're replacing with a tan() libcall rather than a hypothetical LLVM tan intrinsic. So we might set errno when we should be guaranteed not to do that. But that's independent of this change. llvm-svn: 325247	2018-02-15 15:07:12 +00:00
Sanjay Patel	6a0f667077	[InstCombine] allow X / C -> X * (1.0/C) for vector splat FP constants llvm-svn: 325237	2018-02-15 13:55:52 +00:00
Max Kazantsev	6e4ce23add	[NFC] Fix metadata placement in test llvm-svn: 325215	2018-02-15 07:13:18 +00:00
Max Kazantsev	c5941d12f4	[SCEV] Favor isKnownViaSimpleReasoning over constant ranges check There is a more powerful but still simple function `isKnownViaSimpleReasoning ` that does constant range check and few more additional checks. We use it some places (e.g. when proving implications) and in some other places we only check constant ranges. Currently, indvar simplifier fails to remove the check in following loop: int inc = ...; for (int i = inc, j = inc - 1; i < 200; ++i, ++j) if (i > j) { ... } This patch replaces all usages of `isKnownPredicateViaConstantRanges` with `isKnownViaSimpleReasoning` to have smarter proofs. In particular, it fixes the case above. Reviewed-By: sanjoy Differential Revision: https://reviews.llvm.org/D43175 llvm-svn: 325214	2018-02-15 07:09:00 +00:00
Sanjay Patel	af5d499cb9	[InstCombine] add tests and comments for fdiv X, C; NFC llvm-svn: 325161	2018-02-14 19:54:51 +00:00
Craig Topper	1c19cc1745	[InstCombine] Don't fold select(C, Z, binop(select(C, X, Y), W)) -> select(C, Z, binop(Y, W)) if the binop is rem or div. The select may have been preventing a division by zero or INT_MIN/-1 so removing it might not be safe. Fixes PR36362. Differential Revision: https://reviews.llvm.org/D43276 llvm-svn: 325148	2018-02-14 18:08:33 +00:00
Sanjay Patel	48f671bb27	[InstCombine] regenerate checks; NFC llvm-svn: 325144	2018-02-14 17:37:32 +00:00
Alexey Bataev	7f246e003a	[SLP] Allow vectorization of reversed loads. Summary: Reversed loads are handled as gathering. But we can just reshuffle these values. Patch adds support for vectorization of reversed loads. Reviewers: RKSimon, spatel, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43022 llvm-svn: 325134	2018-02-14 15:29:15 +00:00
Florian Hahn	b4e3bad89b	Recommit r325001: [CallSiteSplitting] Support splitting of blocks with instrs before call. For basic blocks with instructions between the beginning of the block and a call we have to duplicate the instructions before the call in all split blocks and add PHI nodes for uses of the duplicated instructions after the call. Currently, the threshold for the number of instructions before a call is quite low, to keep the impact on binary size low. Reviewers: junbuml, mcrosier, davidxl, davide Reviewed By: junbuml Differential Revision: https://reviews.llvm.org/D41860 llvm-svn: 325126	2018-02-14 13:59:12 +00:00
Florian Hahn	c6296fea3f	[LoopInterchange] Incrementally update the dominator tree. We can use incremental dominator tree updates to avoid re-calculating the dominator tree after interchanging 2 loops. Reviewers: dmgreen, kuhar Reviewed By: kuhar Differential Revision: https://reviews.llvm.org/D43176 llvm-svn: 325122	2018-02-14 13:13:15 +00:00
Petar Jovanovic	1768957c82	[Utils] Salvage the debug info of DCE'ed 'and' instructions Preserve debug info from a dead 'and' instruction with a constant. Patch by Djordje Todorovic. Differential Revision: https://reviews.llvm.org/D43163 llvm-svn: 325119	2018-02-14 13:10:35 +00:00
Elena Demikhovsky	945b7e5aa6	Adding a width of the GEP index to the Data Layout. Making a width of GEP Index, which is used for address calculation, to be one of the pointer properties in the Data Layout. p[address space]:size:memory_size:alignment:pref_alignment:index_size_in_bits. The index size parameter is optional, if not specified, it is equal to the pointer size. Till now, the InstCombiner normalized GEPs and extended the Index operand to the pointer width. It works fine if you can convert pointer to integer for address calculation and all registered targets do this. But some ISAs have very restricted instruction set for the pointer calculation. During discussions were desided to retrieve information for GEP index from the Data Layout. http://lists.llvm.org/pipermail/llvm-dev/2018-January/120416.html I added an interface to the Data Layout and I changed the InstCombiner and some other passes to take the Index width into account. This change does not affect any in-tree target. I added tests to cover data layouts with explicitly specified index size. Differential Revision: https://reviews.llvm.org/D42123 llvm-svn: 325102	2018-02-14 06:58:08 +00:00
Sanjay Patel	3ce76ad26f	[InstCombine] put tests of mul with neg operand(s) together; NFC llvm-svn: 325066	2018-02-13 23:02:12 +00:00
Vedant Kumar	1d5d31b706	[GVN] Salvage debug info from dead insts This preserves an additional 581 unique source variables in a stage2 build of clang (according to `llvm-dwarfdump --statistics`). It increases the size of the .debug_loc section by 0.1% (or 87139 bytes). Differential Revision: https://reviews.llvm.org/D43255 llvm-svn: 325063	2018-02-13 22:27:17 +00:00
Sanjay Patel	7558d860af	[InstCombine] (lshr X, 31) * Y --> (ashr X, 31) & Y This replaces the bit-tracking based fold that did the same thing, but it only worked for scalars and not directly. There is no evidence in existing regression tests that the greater power of bit-tracking was needed here, but we should be aware of this potential loss of optimization. llvm-svn: 325062	2018-02-13 22:24:37 +00:00
Sanjay Patel	fdb3b036cc	[InstCombine] add vector tests, fix comments; NFC The scalar folds are done indirectly and use potentially expensive value tracking calls. That can be improved along with the enhancement to support vector types. llvm-svn: 325051	2018-02-13 21:19:42 +00:00
Sanjay Patel	cb8ac00f73	[InstCombine] (bool X) * Y --> X ? Y : 0 This is both a functional improvement for vectors and an efficiency improvement for scalars. The existing code below the new folds does the same thing for scalars, but in an indirect and expensive way. llvm-svn: 325048	2018-02-13 20:41:22 +00:00
Sanjay Patel	2e2958497f	[InstCombine] fix test comment and add vector test; NFC llvm-svn: 325039	2018-02-13 18:48:27 +00:00
Sanjay Patel	b13fcd52ed	[InstCombine, InstSimplify] (re)move tests, regenerate checks; NFC The InstCombine integer mul test file had tests that belong in InstSimplify (including fmul tests). Move things to where they belong and auto-generate complete checks for everything. llvm-svn: 325037	2018-02-13 18:22:53 +00:00
Vedant Kumar	35fc103e1e	[DeadStoreElimination] Salvage debug info from dead insts According to `llvm-dwarfdump --statistics` this salvages 43 additional unique source variables in a stage2 build of clang. It increases the size of the .debug_loc section by 0.002% (or 2864 bytes). Differential Revision: https://reviews.llvm.org/D43220 llvm-svn: 325035	2018-02-13 18:15:26 +00:00
Yaxun Liu	0124b5484c	[AMDGPU] Change constant addr space to 4 Differential Revision: https://reviews.llvm.org/D43170 llvm-svn: 325030	2018-02-13 18:00:25 +00:00
Florian Hahn	35d744d388	Revert r325001: [CallSiteSplitting] Support splitting of blocks with instrs before call. Due to memsan not being happy with the array of ValueToValue maps. llvm-svn: 325009	2018-02-13 14:48:39 +00:00
Ivan A. Kosarev	4a381b444e	[IR] Fix creating mutable versions of TBAA access tags Due to a typo in D41565, mutable TBAA tags created with createMutableTBAAAccessTag() lose their base types. This patch fixes that typo and updates tests respectively. Differential Revision: https://reviews.llvm.org/D42364 llvm-svn: 325008	2018-02-13 14:44:25 +00:00
Florian Hahn	b0884b6443	[CallSiteSplitting] Support splitting of blocks with instrs before call. For basic blocks with instructions between the beginning of the block and a call we have to duplicate the instructions before the call in all split blocks and add PHI nodes for uses of the duplicated instructions after the call. Currently, the threshold for the number of instructions before a call is quite low, to keep the impact on binary size low. Reviewers: junbuml, mcrosier, davidxl, davide Reviewed By: junbuml Differential Revision: https://reviews.llvm.org/D41860 llvm-svn: 325001	2018-02-13 12:00:48 +00:00
Florian Hahn	1f95ef1815	[LoopInterchange] Check number of latch successors before accessing them. In cases where the OuterMostLoopLatchBI only has a single successor, accessing the second successor will fail. This fixes a failure when building the test-suite with loop-interchange enabled. Reviewers: mcrosier, karthikthecool, davide Reviewed by: karthikthecool Differential Revision: https://reviews.llvm.org/D42906 llvm-svn: 324994	2018-02-13 10:02:52 +00:00
Vedant Kumar	388fac5de6	[Utils] Salvage debug info from all no-op casts We already try to salvage debug values from no-op bitcasts and inttoptr instructions: we should handle ptrtoint instructions as well. This saves an additional 24,444 debug values in a stage2 build of clang, and (according to llvm-dwarfdump --statistics) provides an additional 289 unique source variables. llvm-svn: 324982	2018-02-13 03:34:23 +00:00
Vedant Kumar	4011c26cc7	[Utils] Salvage debug info of DCE'ed mul/sdiv/srem instructions Here are the number of additional debug values salvaged in a stage2 build of clang: 63 SALVAGE: MUL 1250 SALVAGE: SDIV (No values were salvaged from `srem` instructions in this experiment, but it's a simple case to handle so we might as well.) llvm-svn: 324976	2018-02-13 01:09:52 +00:00
Vedant Kumar	31ec356a48	[Utils] Salvage debug info of DCE'ed shl/lhsr/ashr instructions Here are the number of additional debug values salvaged in a stage2 build of clang: 1912 SALVAGE: ASHR 405 SALVAGE: LSHR 249 SALVAGE: SHL llvm-svn: 324975	2018-02-13 01:09:49 +00:00
Vedant Kumar	47b16c45d7	[Utils] Salvage the debug info of DCE'ed 'sub' instructions This salvages 14 debug values in a stage2 build of clang. llvm-svn: 324974	2018-02-13 01:09:47 +00:00
Vedant Kumar	96b7dc041b	[Utils] Salvage the debug info of DCE'ed 'xor' instructions This salvages 259 debug values in a stage2 build of clang. Differential Revision: https://reviews.llvm.org/D43207 llvm-svn: 324973	2018-02-13 01:09:46 +00:00
Sanjay Patel	246d769232	[InstSimplify] allow exp/log simplifications with only 'reassoc' FMF These intrinsic folds were added with D41381, but only allowed with isFast(). That's more than necessary because FMF has 'reassoc' to apply to these kinds of folds after D39304, and that's all we need in these cases. Differential Revision: https://reviews.llvm.org/D43160 llvm-svn: 324967	2018-02-12 23:51:23 +00:00
Sanjay Patel	c5d5933bd5	[InstSimplify] change tests to 'fast' to reflect current folds The diff to use 'reassoc' is part of D43160; it should not have been made with rL324961. Reverting that part here, so we'll see the intended diff with the code change. llvm-svn: 324963	2018-02-12 23:39:10 +00:00
Sanjay Patel	de3e889a88	[InstSimplify] consolidate tests for log-exp inverse folds Some tests didn't add much value because we already show stronger constraints for the folds in other tests, so the weaker versions were deleted. Moved the remaining tests into 1 file because the folds are very similar and handled from 1 place in the code. llvm-svn: 324961	2018-02-12 23:18:11 +00:00
Daniel Neilson	2363da9236	[InstCombine] Simplify MemTransferInst's source and dest alignments separately Summary: This change is part of step five in the series of changes to remove alignment argument from memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the InstCombine pass to cease using the deprecated MemoryIntrinsic::getAlignment() method, and instead we use the separate getSourceAlignment and getDestAlignment APIs to simplify the source and destination alignment attributes separately. Steps: Step 1) Remove alignment parameter and create alignment parameter attributes for memcpy/memmove/memset. ( rL322965, rC322964, rL322963 ) Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. ( rL323597 ) Step 3) Update Clang to use the new IRBuilder API. ( rC323617 ) Step 4) Update Polly to use the new IRBuilder API. ( rL323618 ) Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get\|set]Alignment() to use [get\|set]DestAlignment() and [get\|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278, rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774, rL324781, rL324784, rL324955 ) Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get\|set]Alignment() methods. Reference http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html Reviewers: majnemer, bollu, efriedma Reviewed By: efriedma Subscribers: efriedma, llvm-commits Differential Revision: https://reviews.llvm.org/D42871 llvm-svn: 324960	2018-02-12 23:06:55 +00:00
Daniel Neilson	095d72989d	[SafeStack] Use updated CreateMemCpy API to set more accurate source and destination alignments. Summary: This change is part of step five in the series of changes to remove alignment argument from memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the creation of memcpys in the SafeStack pass to set the alignment of the destination object to its stack alignment while separately setting the source byval arguments alignment to its alignment. Steps: Step 1) Remove alignment parameter and create alignment parameter attributes for memcpy/memmove/memset. ( rL322965, rC322964, rL322963 ) Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. ( rL323597 ) Step 3) Update Clang to use the new IRBuilder API. ( rC323617 ) Step 4) Update Polly to use the new IRBuilder API. ( rL323618 ) Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get\|set]Alignment() to use [get\|set]DestAlignment() and [get\|set]SourceAlignment() instead. (rL323886, rL323891, rL324148, rL324273, rL324278, rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774, rL324781, rL324784 ) Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get\|set]Alignment() methods. Reference http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html Reviewers: eugenis, bollu Reviewed By: eugenis Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42710 llvm-svn: 324955	2018-02-12 22:39:47 +00:00
Vedant Kumar	a8b8d3225b	Move the debuginfo-dce-or test into debuginfo-variables.ll, NFC llvm-svn: 324933	2018-02-12 21:02:45 +00:00
Sanjay Patel	4a4f35f324	[InstCombine] X / (X * Y) --> 1.0 / Y This is similar to the instsimplify fold added with D42385 ( rL323716 ) ...but this can't be in instsimplify because we're creating/morphing a different instruction. llvm-svn: 324927	2018-02-12 19:39:21 +00:00
Sanjay Patel	ee4257f676	[InstCombine] add tests for missing fdiv fold; NFC llvm-svn: 324926	2018-02-12 19:23:39 +00:00
Sanjay Patel	e4fcb00290	[InstCombine] regenerate checks; NFC llvm-svn: 324924	2018-02-12 19:14:01 +00:00
Jun Bum Lim	144eb593dd	[LICM] update BlockColors after splitting predecessors Update BlockColors after splitting predecessors. Do not allow splitting EHPad for sinking when the BlockColors is not empty, so we can simply assign predecessor's color to the new block. Fixes PR36184 llvm-svn: 324916	2018-02-12 17:56:55 +00:00
Alexey Bataev	ca2396e673	[SLP] Take user instructions cost into consideration in insertelement vectorization. Summary: For better vectorization result we should take into consideration the cost of the user insertelement instructions when we try to vectorize sequences that build the whole vector. I.e. if we have the following scalar code: ``` <Scalar code> insertelement <ScalarCode>, ... ``` we should consider the cost of the last `insertelement ` instructions as the cost of the scalar code. Reviewers: RKSimon, spatel, hfinkel, mkuper Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D42657 llvm-svn: 324893	2018-02-12 14:54:48 +00:00
Sanjay Patel	510d647a4d	[InstCombine] X / (X * Y) -> 1 / Y if the multiplication does not overflow The related cases for (X * Y) / X were handled in rL124487. https://rise4fun.com/Alive/6k9 The division in these tests is subsequently eliminated by existing instcombines for 1/X. llvm-svn: 324843	2018-02-11 17:20:32 +00:00
Sanjay Patel	aee107f30d	[InstCombine] add tests for div-mul folds; NFC The related cases for (X * Y) / X were handled in rL124487. llvm-svn: 324840	2018-02-11 16:52:44 +00:00
Craig Topper	4dccffc84a	[X86] Change signatures of avx512 packed fp compare intrinsics to return a vXi1 mask type to be closer to an fcmp. Summary: This patch changes the signature of the avx512 packed fp compare intrinsics to return a vXi1 vector and no longer take a mask as input. The casts to scalar type will now need to be explicit in the IR. The masking node will now be an explicit and in the IR. This makes the intrinsic look much more similar to an fcmp instruction that we wish we could use for these but can't. We already use icmp instructions for integer compares. Previously the lowering step of isel would turn the intrinsic into an X86 specific ISD node and a emit the masking nodes as well as some bitcasts. This means DAG combines can't see the vXi1 type until somewhat late, making it more difficult to combine out gpr<->mask transition sequences. By exposing the vXi1 type explicitly in the IR and initial SelectionDAG we give earlier DAG combines and even InstCombine the chance to see it and optimize it. This should make any issues with gpr<->mask sequences the same between integer and fp. Meaning we only have to fix them once. Reviewers: spatel, delena, RKSimon, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43137 llvm-svn: 324827	2018-02-10 23:33:55 +00:00
Simon Pilgrim	19495198af	[InstCombine] Add constant vector support for ~(C >> Y) --> ~C >> Y Includes adding m_NonNegative constant pattern matcher llvm-svn: 324825	2018-02-10 21:46:09 +00:00
Mircea Trofin	73b96d6dcf	[LV] Fix analyzeInterleaving when -pass-remarks enabled Summary: If -pass-remarks=loop-vectorize, atomic ops will be seen by analyzeInterleaving(), even though canVectorizeMemory() == false. This is because we are requesting extra analysis instead of bailing out. In such a case, we end up with a Group in both Load- and StoreGroups, and then we'll try to access freed memory when traversing LoadGroups after having had released the Group when iterating over StoreGroups. The fix is to include mayWriteToMemory() when validating that two instructions are the same kind of memory operation. Reviewers: mssimpso, davidxl Reviewed By: davidxl Subscribers: hsaito, fhahn, llvm-commits Differential Revision: https://reviews.llvm.org/D43064 llvm-svn: 324786	2018-02-10 00:07:45 +00:00
Vedant Kumar	04386d8e3d	[Utils] Salvage debug info from dead 'or' instructions Extend salvageDebugInfo to preserve the debug info from a dead 'or' with a constant. Patch by Ismail Badawi! Differential Revision: https://reviews.llvm.org/D43129 llvm-svn: 324764	2018-02-09 19:19:55 +00:00
Simon Pilgrim	0919a8c130	[InstCombine] Add vector xor tests This doesn't cover everything in InstCombiner.visitXor yet, but increases coverage for a lot of tests llvm-svn: 324753	2018-02-09 17:45:45 +00:00
Simon Pilgrim	9620f4b746	[InstCombine] Add constant vector support for X udiv C, where C >= signbit llvm-svn: 324728	2018-02-09 10:43:59 +00:00
Dmitry Mikulin	87e1c4c8de	Minor tweak to test case. llvm-svn: 324670	2018-02-08 23:10:07 +00:00
Dmitry Mikulin	5cf73cea9c	[ThinLTO] Skip BlockAddresses while replacing uses in function import. Differential Revision: https://reviews.llvm.org/D43027 llvm-svn: 324658	2018-02-08 22:14:56 +00:00
Simon Pilgrim	3f462e90cc	Regenerate test llvm-svn: 324639	2018-02-08 19:28:05 +00:00
Simon Pilgrim	f30656add3	[InstCombine] Add vector udiv tests Tests for X udiv C, where C >= signbit llvm-svn: 324635	2018-02-08 18:58:00 +00:00
Simon Pilgrim	1889f26b94	[InstCombine] Add m_Negative pattern matching Allows us to add non-uniform constant vector support for "X urem C -> X < C ? X : X - C, where C >= signbit." llvm-svn: 324631	2018-02-08 18:36:01 +00:00
Simon Pilgrim	11a02589c1	[InstCombine] Add vector urem tests. Improve coverage of InstCombiner::visitURem for vector types llvm-svn: 324629	2018-02-08 18:10:08 +00:00
Simon Pilgrim	ab689cb638	[InstCombine] Regenerate vector mul tests. llvm-svn: 324627	2018-02-08 17:54:24 +00:00
Daniel Neilson	fb99a493be	[LoopIdiom] Be more aggressive when setting alignment in memcpy Summary: This change is part of step five in the series of changes to remove alignment argument from memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the LoopIdiom pass to cease using the old IRBuilder CreateMemCpy single-alignment APIs in favour of the new API that allows setting source and destination alignments independently. This allows us to be slightly more aggressive in setting the alignment of memcpy calls that loop idiom creates. Steps: Step 1) Remove alignment parameter and create alignment parameter attributes for memcpy/memmove/memset. ( rL322965, rC322964, rL322963 ) Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. ( rL323597 ) Step 3) Update Clang to use the new IRBuilder API. ( rC323617 ) Step 4) Update Polly to use the new IRBuilder API. ( rL323618 ) Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get\|set]Alignment() to use [get\|set]DestAlignment() and [get\|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278, rL324384, rL324395, rL324402 ) Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get\|set]Alignment() methods. Reference http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html llvm-svn: 324626	2018-02-08 17:33:08 +00:00
Sanjay Patel	574fb73c89	[SLPVectorizer] auto-generate complete checks; NFC llvm-svn: 324616	2018-02-08 15:32:28 +00:00
Sanjay Patel	124392f038	[SLPVectorizer] auto-generate complete checks; NFC llvm-svn: 324615	2018-02-08 15:30:39 +00:00
Sanjay Patel	e2c5e9a970	[SLPVectorizer] move RUN line to top-of-file; NFC I was confused what we were checking because the RUN line was in the middle of the file. llvm-svn: 324614	2018-02-08 15:28:49 +00:00
Simon Pilgrim	2a90acd17a	[InstCombine] Fix issue with X udiv (POW2_C1 << N) for non-splat constant vectors foldUDivShl was assuming that the input was a scalar or a splat constant llvm-svn: 324613	2018-02-08 15:19:38 +00:00
Sanjay Patel	cfa5c03039	[SLPVectorizer] auto-generate complete checks; NFC llvm-svn: 324612	2018-02-08 15:16:26 +00:00
Sanjay Patel	42b8c23cc6	[LoopVectorize] auto-generate complete checks; NFC llvm-svn: 324611	2018-02-08 15:13:47 +00:00
Sanjay Patel	a60aec1ab7	[ValueTracking] don't crash when assumptions conflict (PR36270) The last assume in the test says that %B12 is 0. The first assume says that %and1 is less than %B12. Therefore, %and1 is unsigned less than 0...does not compute. That means this line: Known.Zero.setHighBits(RHSKnown.countMinLeadingZeros() + 1); ...tries to set more bits than exist. Differential Revision: https://reviews.llvm.org/D43052 llvm-svn: 324610	2018-02-08 14:52:40 +00:00
Simon Pilgrim	94cc89d5f2	[InstCombine] Fix issue with X udiv 2^C -> X >> C for non-splat constant vectors foldUDivPow2Cst was assuming that the input was a scalar or a splat constant llvm-svn: 324608	2018-02-08 14:46:10 +00:00
Simon Pilgrim	0b9f3912ce	[InstCombine] Improve mul(x, pow2) -> shl combine for vector constants Refactor getLogBase2Vector into getLogBase2 to accept all scalars/vectors. Generalize from ConstantDataVector to support all constant vectors. llvm-svn: 324603	2018-02-08 14:10:01 +00:00
Serguei Katkov	c8016e7a65	[Loop Predication] Teach LP about reverse loops with uge and sge latch conditions Add support of uge and sge latch condition to Loop Prediction for reverse loops. Reviewers: apilipenko, mkazantsev, sanjoy, anna Reviewed By: anna Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42837 llvm-svn: 324589	2018-02-08 10:34:08 +00:00
Serguei Katkov	66182d6c38	[SimplifyCFG] Re-apply Relax restriction for folding unconditional branches The commit rL308422 introduces a restriction for folding unconditional branches. Specifically if empty block with unconditional branch leads to header of the loop then elimination of this basic block is prohibited. However it seems this condition is redundantly strict. If elimination of this basic block does not introduce more back edges then we can eliminate this block. The patch implements this relax of restriction. The test profile/Linux/counter_promo_nest.c in compiler-rt project is updated to meet this change. Reviewers: efriedma, mcrosier, pacxx, hsung, davidxl Reviewed By: pacxx Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42691 llvm-svn: 324572	2018-02-08 07:16:29 +00:00
Mircea Trofin	06ac8cfbd1	Verify profile data confirms large loop trip counts. Summary: Loops with inequality comparers, such as: // unsigned bound for (unsigned i = 1; i < bound; ++i) {...} have getSmallConstantMaxTripCount report a large maximum static trip count - in this case, 0xffff fffe. However, profiling info may show that the trip count is much smaller, and thus counter-recommend vectorization. This change: - flips loop-vectorize-with-block-frequency on by default. - validates profiled loop frequency data supports vectorization, when static info appears to not counter-recommend it. Absence of profile data means we rely on static data, just as we've done so far. Reviewers: twoh, mkuper, davidxl, tejohnson, Ayal Reviewed By: davidxl Subscribers: bkramer, llvm-commits Differential Revision: https://reviews.llvm.org/D42946 llvm-svn: 324543	2018-02-07 23:29:52 +00:00
Alexey Bataev	cd8d6de381	[SLP] Add a tests for PR36280, NFC. llvm-svn: 324510	2018-02-07 20:11:37 +00:00
Max Kazantsev	b299ade2c5	Re-enable "[SCEV] Make isLoopEntryGuardedByCond a bit smarter" The failures happened because of assert which was overconfident about SCEV's proving capabilities and is generally not valid. Differential Revision: https://reviews.llvm.org/D42835 llvm-svn: 324473	2018-02-07 11:16:29 +00:00
Serguei Katkov	69246ca787	Revert [SCEV] Make isLoopEntryGuardedByCond a bit smarter Revert rL324453 commit which causes buildbot failures. Differential Revision: https://reviews.llvm.org/D42835 llvm-svn: 324462	2018-02-07 09:10:08 +00:00
Max Kazantsev	dd5ee6f5d9	[SCEV] Make isLoopEntryGuardedByCond a bit smarter Sometimes `isLoopEntryGuardedByCond` cannot prove predicate `a > b` directly. But it is a common situation when `a >= b` is known from ranges and `a != b` is known from a dominating condition. Thia patch teaches SCEV to sum these facts together and prove strict comparison via non-strict one. Differential Revision: https://reviews.llvm.org/D42835 llvm-svn: 324453	2018-02-07 07:56:26 +00:00
Michael Zolotukhin	cae66ba5f8	The xfailed test from r324448 passed on one of the bots: remove it entirely for now. llvm-svn: 324451	2018-02-07 06:54:11 +00:00
Michael Zolotukhin	1713dd5b8d	Xfail the test added in r324445 until the underlying issue in LoopSink is fixed. llvm-svn: 324448	2018-02-07 06:11:50 +00:00
Michael Zolotukhin	e82e83fcce	Follow-up for r324429: "[LCSSAVerification] Run verification only when asserts are enabled." Before r324429 we essentially didn't have a verification of LCSSA, so no wonder that it has been broken: currently loop-sink breaks it (the attached test illustrates the failure). It was detected during a stage2 RA build, so to unbreak it I'm disabling the check for now. llvm-svn: 324445	2018-02-07 04:24:44 +00:00
Alexey Bataev	1e593fe73e	[SLP] Update test checks, NFC. llvm-svn: 324387	2018-02-06 20:00:05 +00:00
Simon Pilgrim	9f2ae7e2d1	[InstCombine][ValueTracking] Match non-uniform constant power-of-two vectors Generalize existing constant matching to work with non-uniform constant vectors as well. Differential Revision: https://reviews.llvm.org/D42818 llvm-svn: 324369	2018-02-06 18:39:23 +00:00
Simon Pilgrim	e11c64162c	Regenerate vector-urem test. NFCI. llvm-svn: 324357	2018-02-06 16:10:12 +00:00
Clement Courbet	a7a1746865	[MergeICmps] Handle chains with several complex BCE basic blocks. - Fix condition for detecting that a complex basic block was the first in the chain. - Add tests. This was caught by buildbots when submitting rL324319. llvm-svn: 324341	2018-02-06 12:25:33 +00:00
Hiroshi Inoue	ba3585eaf2	[ThinLTO] fix test failure without x86 backend This patch moves ThinLTOBitcodeWriter/module-asm.ll test case into x86 directory to avoid a test failure when x86 backend is not enabled. llvm-svn: 324316	2018-02-06 07:03:09 +00:00
Peter Collingbourne	29c6f4833c	ThinLTOBitcodeWriter: Do not include module-level inline asm in the merged module. If the inline asm provides the definition of a symbol, this can result in duplicate symbol errors. Differential Revision: https://reviews.llvm.org/D42944 llvm-svn: 324313	2018-02-06 03:29:18 +00:00
Teresa Johnson	791c98e4c8	[ThinLTO] Remove dead and dropped symbol declarations when possible Summary: Removing the dropped symbols will prevent indirect call promotion in the ThinLTO Backend from adding a new reference to a symbol, which can result in linker unsats. This can happen when we compile with a sample profile collected from one binary by used for another, which may have profiled targets that aren't used in the new binary. Note that until dropDeadSymbols handles variables and aliases (in progress), we may not be able to remove the declaration and can still have an issue. Reviewers: grimar, davidxl Subscribers: mehdi_amini, inglorion, llvm-commits, eraman Differential Revision: https://reviews.llvm.org/D42816 llvm-svn: 324299	2018-02-06 00:43:39 +00:00
Sanjay Patel	d7c702b451	[LoopStrengthReduce, x86] don't add cost for a cmp that will be macro-fused (PR35681) In the motivating case from PR35681 and represented by the macro-fuse-cmp test: https://bugs.llvm.org/show_bug.cgi?id=35681 ...there's a 37 -> 31 byte size win for the loop because we eliminate the big base address offsets. SPEC2017 on Ryzen shows no significant perf difference. Differential Revision: https://reviews.llvm.org/D42607 llvm-svn: 324289	2018-02-05 23:43:05 +00:00
Sanjay Patel	49aafec2e6	[InstCombine] don't try to evaluate instructions with >1 use (revert r324014) This example causes a compile-time explosion: define i16 @foo(i16 %in) { %x = zext i16 %in to i32 %a1 = mul i32 %x, %x %a2 = mul i32 %a1, %a1 %a3 = mul i32 %a2, %a2 %a4 = mul i32 %a3, %a3 %a5 = mul i32 %a4, %a4 %a6 = mul i32 %a5, %a5 %a7 = mul i32 %a6, %a6 %a8 = mul i32 %a7, %a7 %a9 = mul i32 %a8, %a8 %a10 = mul i32 %a9, %a9 %a11 = mul i32 %a10, %a10 %a12 = mul i32 %a11, %a11 %a13 = mul i32 %a12, %a12 %a14 = mul i32 %a13, %a13 %a15 = mul i32 %a14, %a14 %a16 = mul i32 %a15, %a15 %a17 = mul i32 %a16, %a16 %a18 = mul i32 %a17, %a17 %a19 = mul i32 %a18, %a18 %a20 = mul i32 %a19, %a19 %a21 = mul i32 %a20, %a20 %a22 = mul i32 %a21, %a21 %a23 = mul i32 %a22, %a22 %a24 = mul i32 %a23, %a23 %T = trunc i32 %a24 to i16 ret i16 %T } llvm-svn: 324276	2018-02-05 21:50:32 +00:00
Sanjay Patel	1c84dd9a8f	[InstCombine] add test corresponding to r324252 (PR36225); NFC As PR36225 shows, we definitely don't want to enable the canEvaluate* logic with phis. There's still a question of whether we should just revert r324014 completely because it exposes a compile-time sinkhole (although that problem might exist independently). llvm-svn: 324266	2018-02-05 19:59:52 +00:00
Sanjay Patel	e9a153f414	[InstCombine] add unsigned saturation subtraction canonicalizations This is the instcombine part of unsigned saturation canonicalization. Backend patches already commited: https://reviews.llvm.org/D37510 https://reviews.llvm.org/D37534 It converts unsigned saturated subtraction patterns to forms recognized by the backend: (a > b) ? a - b : 0 -> ((a > b) ? a : b) - b) (b < a) ? a - b : 0 -> ((a > b) ? a : b) - b) (b > a) ? 0 : a - b -> ((a > b) ? a : b) - b) (a < b) ? 0 : a - b -> ((a > b) ? a : b) - b) ((a > b) ? b - a : 0) -> - ((a > b) ? a : b) - b) ((b < a) ? b - a : 0) -> - ((a > b) ? a : b) - b) ((b > a) ? 0 : b - a) -> - ((a > b) ? a : b) - b) ((a < b) ? 0 : b - a) -> - ((a > b) ? a : b) - b) Patch by Yulia Koval! Differential Revision: https://reviews.llvm.org/D41480 llvm-svn: 324255	2018-02-05 17:53:29 +00:00
Hans Wennborg	22db17cf43	Revert r323472 "[Debug] Add dbg.value intrinsics for PHIs created during LCSSA." This broke the Chromium build; see PR36238. > This patch is an enhancement to propagate dbg.value information when > Phis are created on behalf of LCSSA. I noticed a case where a value > carried across a loop was reported as <optimized out>. > > Specifically this case: > > int bar(int x, int y) { > return x + y; > } > > int foo(int size) { > int val = 0; > for (int i = 0; i < size; ++i) { > val = bar(val, i); // Both val and i are correct > } > return val; // <optimized out> > } > > In the above case, after all of the interesting computation completes > our value is reported as "optimized out." This change will add a > dbg.value to correct this. > > This patch also moves the dbg.value insertion routine from > LoopRotation.cpp into Local.cpp, so that we can share it in both places > (LoopRotation and LCSSA). > > Patch by Matt Davis! > > Differential Revision: https://reviews.llvm.org/D42551 llvm-svn: 324247	2018-02-05 16:10:42 +00:00
Serguei Katkov	276b32bb14	Revert [SimplifyCFG] Relax restriction for folding unconditional branches The patch causes the failure of the test compiler-rt/test/profile/Linux/counter_promo_nest.c To unblock buildbot, revert the patch while investigation is in progress. Differential Revision: https://reviews.llvm.org/D42691 llvm-svn: 324214	2018-02-05 09:05:43 +00:00
Max Kazantsev	f7667483c1	[NFC] Add tests for PR35743 llvm-svn: 324209	2018-02-05 08:09:49 +00:00
Serguei Katkov	6e93980e82	[SimplifyCFG] Relax restriction for folding unconditional branches The commit rL308422 introduces a restriction for folding unconditional branches. Specifically if empty block with unconditional branch leads to header of the loop then elimination of this basic block is prohibited. However it seems this condition is redundantly strict. If elimination of this basic block does not introduce more back edges then we can eliminate this block. The patch implements this relax of restriction. Reviewers: efriedma, mcrosier, pacxx, hsung, davidxl Reviewed By: pacxx Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42691 llvm-svn: 324208	2018-02-05 07:56:43 +00:00
Serguei Katkov	ec7029c286	Re-apply [SCEV] Fix isLoopEntryGuardedByCond usage ScalarEvolution::isKnownPredicate invokes isLoopEntryGuardedByCond without check that SCEV is available at entry point of the loop. It is incorrect and fixed by patch. To bugs additionally fixed: assert is moved after the check whether loop is not a nullptr. Usage of isLoopEntryGuardedByCond in ScalarEvolution::isImpliedCondOperandsViaNoOverflow is guarded by isAvailableAtLoopEntry. Reviewers: sanjoy, mkazantsev, anna, dorit, reames Reviewed By: mkazantsev Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42417 llvm-svn: 324204	2018-02-05 05:49:47 +00:00
Florian Hahn	642637aab4	[PartialInliner] Update test (NFC). llvm-svn: 324199	2018-02-04 18:40:24 +00:00
Florian Hahn	8f804fc07d	[InlineFunction] Set arg attrs even if there only are VarArg attrs. When using the partial inliner, we might have attributes for forwarded varargs, but the CodeExtractor does not create an empty argument attribute set for regular arguments in that case, because it does not know of the additional arguments. So in case we have attributes for VarArgs, we also have to make sure we create (empty) attributes for all regular arguments. This fixes PR36210. llvm-svn: 324197	2018-02-04 18:27:47 +00:00
Chad Rosier	a097bc69df	[LV] Use Demanded Bits and ValueTracking for reduction type-shrinking The type-shrinking logic in reduction detection, although narrow in scope, is also rather ad-hoc, which has led to bugs (e.g., PR35734). This patch modifies the approach to rely on the demanded bits and value tracking analyses, if available. We currently perform type-shrinking separately for reductions and other instructions in the loop. Long-term, we should probably think about computing minimal bit widths in a more complete way for the loops we want to vectorize. PR35734 Differential Revision: https://reviews.llvm.org/D42309 llvm-svn: 324195	2018-02-04 15:42:24 +00:00
David Green	9688ed61fe	Remove unneeded -debug argument from new test llvm-svn: 324176	2018-02-03 17:33:50 +00:00
David Green	7174023f57	[InstCombine] Allow common type conversions to i8/i16/i32 This, in instcombine, allows conversions to i8/i16/i32 (very common cases) even if the resulting type is not legal according to the data layout. This can often open up extra combine opportunities. Differential Revision: https://reviews.llvm.org/D42424 llvm-svn: 324174	2018-02-03 16:51:03 +00:00
Sanjay Patel	a767ee5af0	[InstCombine] make sure tests are providing coverage for the stated pattern; NFC Without extra instructions and uses, swapMayExposeCSEOpportunities() would change the icmp (as seen in the check lines), so we were not actually testing patterns that should be handled by D41480. llvm-svn: 324143	2018-02-02 21:40:54 +00:00
Sanjay Patel	5b8cb26bcc	[InstCombine] add baseline tests for unsigned saturated sub (D41480); NFC llvm-svn: 324109	2018-02-02 17:43:16 +00:00
Yaxun Liu	2a22c5deff	[AMDGPU] Switch to the new addr space mapping by default This requires corresponding clang change. Differential Revision: https://reviews.llvm.org/D40955 llvm-svn: 324101	2018-02-02 16:07:16 +00:00
Sanjay Patel	3343fcef86	[InstCombine] allow multi-use values in canEvaluate* if all uses are in 1 inst This is the enhancement suggested in D42536 to fix a shortcoming in regular InstCombine's canEvaluate* functionality. When we have multiple uses of a value, but they're all in one instruction, we can allow that expression to be narrowed or widened for the same cost as a single-use value. AFAICT, this can only matter for multiply: sub/and/or/xor/select would be simplified away if the operands are the same value; add becomes shl; shifts with a variable shift amount aren't handled. Differential Revision: https://reviews.llvm.org/D42739 llvm-svn: 324014	2018-02-01 21:55:53 +00:00
David Green	184df0c35d	Revert commit rL323951 Looks like it's causing timeouts out on at least ppc64le buildbots. llvm-svn: 323959	2018-02-01 13:05:25 +00:00
David Green	e11f0545db	[InstCombine] Allow common type conversions to i8/i16/i32 This, in instcombine, allows conversions to i8/i16/i32 (very common cases) even if the resulting type is not legal according to the data layout. This can often open up extra combine opportunities. Differential Revision: https://reviews.llvm.org/D42424 llvm-svn: 323951	2018-02-01 11:06:18 +00:00
Mikael Holmen	6d06976e74	[LSR] Don't force bases of foldable formulae to the final type. Summary: Before emitting code for scaled registers, we prevent SCEVExpander from hoisting any scaled addressing mode by emitting all the bases first. However, these bases are being forced to the final type, resulting in some odd code. For example, if the type of the base is an integer and the final type is a pointer, we will emit an inttoptr for the base, a ptrtoint for the scale, and then a 'reverse' GEP where the GEP pointer is actually the base integer and the index is the pointer. It's more intuitive to use the pointer as a pointer and the integer as index. Patch by: Bevin Hansson Reviewers: atrick, qcolombet, sanjoy Reviewed By: qcolombet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42103 llvm-svn: 323946	2018-02-01 06:38:34 +00:00
Amjad Aboud	b86b771c02	[AggressiveInstCombine] Fixed TruncCombine class to handle TruncInst leaf node correctly. This covers the case where TruncInst leaf node is a constant expression. See PR36121 for more details. Differential Revision: https://reviews.llvm.org/D42622 llvm-svn: 323926	2018-01-31 22:39:05 +00:00
Puyan Lotfi	43e94b15ea	Followup on Proposal to move MIR physical register namespace to '$' sigil. Discussed here: http://lists.llvm.org/pipermail/llvm-dev/2018-January/120320.html In preparation for adding support for named vregs we are changing the sigil for physical registers in MIR to '$' from '%'. This will prevent name clashes of named physical register with named vregs. llvm-svn: 323922	2018-01-31 22:04:26 +00:00
Marek Olsak	8f2df9d26c	[SeparateConstOffsetFromGEP] Fix up addrspace in the AMDGPU test llvm-svn: 323913	2018-01-31 20:49:19 +00:00
Marek Olsak	13e4741275	AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16} Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D41663 llvm-svn: 323908	2018-01-31 20:18:04 +00:00
Marek Olsak	8e7d149a31	[SeparateConstOffsetFromGEP] Preserve metadata when splitting GEPs Summary: !amdgpu.uniform needs to be preserved for AMDGPU, otherwise bad things happen. Reviewers: arsenm, nhaehnle, jingyue, broune, majnemer, bjarke.roune, dblaikie Subscribers: wdng, tpr, llvm-commits Differential Revision: https://reviews.llvm.org/D42744 llvm-svn: 323907	2018-01-31 20:17:52 +00:00
Daniel Neilson	be58a220e9	[CodeGenPrepare] Improve source and dest alignments of memory intrinsics independently Summary: This change is part of step five in the series of changes to remove alignment argument from memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the CodeGenPrepare pass to be more aggressive in improving the source and destination alignments of memcpy/memmove/memset by exploiting our new ability to record independent alignments for each argument. Steps: Step 1) Remove alignment parameter and create alignment parameter attributes for memcpy/memmove/memset. ( rL322965, rC322964, rL322963 ) Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. ( rL323597 ) Step 3) Update Clang to use the new IRBuilder API. ( rC323617 ) Step 4) Update Polly to use the new IRBuilder API. ( rL323618 ) Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get\|set]Alignment() to use [get\|set]DestAlignment() and [get\|set]SourceAlignment() instead. ( rL323886 ) Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get\|set]Alignment() methods. Reference http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html llvm-svn: 323891	2018-01-31 17:24:53 +00:00
Sanjay Patel	fd58ade81c	[InstCombine] move related tests into the same file; NFC llvm-svn: 323882	2018-01-31 15:47:59 +00:00
Sanjay Patel	8c74a9a155	[InstCombine] add tests to show limit of canEvaluate* ; NFC llvm-svn: 323881	2018-01-31 15:28:39 +00:00
Amjad Aboud	d895bff5f2	[AggressiveInstCombine] Make TruncCombine class ignore unreachable basic blocks. Because dead code may contain non-standard IR that causes infinite looping or crashes in underlying analysis. See PR36134 for more details. Differential Revision: https://reviews.llvm.org/D42683 llvm-svn: 323862	2018-01-31 10:41:31 +00:00
Alexey Bataev	1c8f53f47d	[SLP] Add extra test for extractelement shuffle, NFC. llvm-svn: 323815	2018-01-30 21:06:06 +00:00
Sanjay Patel	ffb37a29d1	[LoopStrengthReduce] add test to show potential macro-fusion-based diff (PR35681); NFC This is the baseline output for the test proposed with D42607. llvm-svn: 323806	2018-01-30 19:17:38 +00:00
Simon Pilgrim	073f089c6e	[X86][XOP] Update isVectorShiftByScalarCheap with cases covered by XOP Similar to D42437, XOP supports variable shift for v16i8/v8i16/v4i32/v2i64 types. Differential Revision: https://reviews.llvm.org/D42526 llvm-svn: 323797	2018-01-30 18:10:21 +00:00
Petar Jovanovic	9208e8fbf6	[DeadArgumentElimination] Preserve llvm.dbg.values's first argument When removing return value Dead Argument Elimination pass clobbers first llvm.dbg.value’s argument for live arguments of that function by replacing it with nullptr. In the next pass it will be deleted, so debug location about those arguments are lost. This change fixes it. Patch by Djordje Todorovic. Differential Revision: https://reviews.llvm.org/D42541 llvm-svn: 323784	2018-01-30 16:42:04 +00:00
Zaara Syeda	1f59ae311b	Re-commit : [PowerPC] Add handling for ColdCC calling convention and a pass to mark candidates with coldcc attribute. This recommits r322721 reverted due to sanitizer memory leak build bot failures. Original commit message: This patch adds support for the coldcc calling convention for Power. This changes the set of non-volatile registers. It includes a pass to stress test the implementation by marking all static directly called functions with the coldcc attribute through the option -enable-coldcc-stress-test. It also includes an option, -ppc-enable-coldcc, to add the coldcc attribute to functions which are cold at all call sites based on BlockFrequencyInfo when the containing function does not call any non cold functions. Differential Revision: https://reviews.llvm.org/D38413 llvm-svn: 323778	2018-01-30 16:17:22 +00:00
Daniel Neilson	594f443b06	[RS4GC] Handle call/invoke instructions as base defining values of vectors Summary: There's an asymmetry in the definitions of findBaseDefiningValueOfVector() and findBaseDefiningValue() of RS4GC. The later handles call and invoke instructions, and the former does not. This appears to be simple oversight. This patch remedies the oversight by adding the call and invoke cases to findBaseDefiningValueOfVector(). Reviewers: DaniilSuchkov, anna Reviewed By: anna Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42653 llvm-svn: 323764	2018-01-30 14:43:41 +00:00
Sanjay Patel	1aef27f5cd	[DSE] make sure memory is not modified before partial store merging (PR36129) We missed a critical check in D30703. We must make sure that no intermediate store is sitting between the stores that we want to merge. This should fix: https://bugs.llvm.org/show_bug.cgi?id=36129 Differential Revision: https://reviews.llvm.org/D42663 llvm-svn: 323759	2018-01-30 13:53:59 +00:00
Sanjay Patel	83f056604c	[InstSimplify] (X * Y) / Y --> X for relaxed floating-point ops This is the FP counterpart that was mentioned in PR35709: https://bugs.llvm.org/show_bug.cgi?id=35709 Differential Revision: https://reviews.llvm.org/D42385 llvm-svn: 323716	2018-01-30 00:18:37 +00:00
Sanjay Patel	d023a9b777	[DSE] add test for PR36129; NFC We can miscompile because we're not checking is the memory might me modified between the seemingly redundant store ops. llvm-svn: 323704	2018-01-29 22:50:08 +00:00
Alexey Bataev	9c5c103283	[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle. Summary: If the same value is going to be vectorized several times in the same tree entry, this entry is considered to be a gather entry and cost of this gather is counter as cost of InsertElementInstrs for each gathered value. But we can consider these elements as ShuffleInstr with SK_PermuteSingle shuffle kind. Reviewers: spatel, RKSimon, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38697 llvm-svn: 323662	2018-01-29 16:08:52 +00:00
Alexey Bataev	10f5c9e765	[SLP] Add a test with extract for PR32086, NFC. llvm-svn: 323661	2018-01-29 15:56:52 +00:00
Davide Italiano	8b797a0fd2	[CVP] Don't Replace incoming values from unreachable blocks with undef. This pretty much reverts r322006, except that we keep the test, because we work around the issue exposed in a different way (a recursion limit in value tracking). There's still probably some sequence that exposes this problem, and the proper way to fix that for somebody who has time is outlined in the code review. llvm-svn: 323630	2018-01-29 05:59:55 +00:00
Hiroshi Inoue	c8e9245816	[NFC] fix trivial typos in comments and documents "to to" -> "to" llvm-svn: 323628	2018-01-29 05:17:03 +00:00
Florian Hahn	1636651e35	[InlineCost] Mark functions accessing varargs as not viable. This prevents functions accessing varargs from being inlined if they have the alwaysinline attribute. Reviewers: efriedma, rnk, davide Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D42556 llvm-svn: 323619	2018-01-28 19:11:49 +00:00
Alexey Bataev	f86be12182	Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle." This reverts commit r323530 to fix possible problems in users code. llvm-svn: 323581	2018-01-27 02:42:21 +00:00
Sanjay Patel	5bce08ddff	[x86] auto-generate complete checks; NFC llvm-svn: 323571	2018-01-26 22:06:07 +00:00
Vedant Kumar	e48597a50e	[InstCombine] Preserve debug values for eliminable casts A cast from A to B is eliminable if its result is casted to C, and if the pair of casts could just be expressed as a single cast. E.g here, %c1 is eliminable: %c1 = zext i16 %A to i32 %c2 = sext i32 %c1 to i64 InstCombine optimizes away eliminable casts. This patch teaches it to insert a dbg.value intrinsic pointing to the final result, so that local variables pointing to the eliminable result are preserved. Differential Revision: https://reviews.llvm.org/D42566 llvm-svn: 323570	2018-01-26 22:02:52 +00:00
Alexey Bataev	7ad4e31c3b	[SLP] Test for trunc vectorization, NFC. llvm-svn: 323556	2018-01-26 20:07:55 +00:00
Alexey Bataev	167003df28	[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle. Summary: If the same value is going to be vectorized several times in the same tree entry, this entry is considered to be a gather entry and cost of this gather is counter as cost of InsertElementInstrs for each gathered value. But we can consider these elements as ShuffleInstr with SK_PermuteSingle shuffle kind. Reviewers: spatel, RKSimon, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38697 llvm-svn: 323530	2018-01-26 14:31:09 +00:00
Florian Hahn	212afb9fd9	[CallSiteSplitting] Fix infinite loop when recording conditions. Fix infinite loop when recording conditions by correctly marking basic blocks as visited. Fixes https://bugs.llvm.org/show_bug.cgi?id=36105 llvm-svn: 323515	2018-01-26 10:36:50 +00:00
Vedant Kumar	6394df9fc4	[Debug] LCSSA: Insert dbg.value at the first available insertion point Inserting a dbg.value instruction at the start of a basic block with a landingpad instruction triggers a verifier failure. We should be OK if we insert the instruction a bit later. Speculative fix for the bot failure described here: https://reviews.llvm.org/D42551 llvm-svn: 323482	2018-01-25 23:48:29 +00:00
Vedant Kumar	60f54084bf	[Debug] Add dbg.value intrinsics for PHIs created during LCSSA. This patch is an enhancement to propagate dbg.value information when Phis are created on behalf of LCSSA. I noticed a case where a value carried across a loop was reported as <optimized out>. Specifically this case: int bar(int x, int y) { return x + y; } int foo(int size) { int val = 0; for (int i = 0; i < size; ++i) { val = bar(val, i); // Both val and i are correct } return val; // <optimized out> } In the above case, after all of the interesting computation completes our value is reported as "optimized out." This change will add a dbg.value to correct this. This patch also moves the dbg.value insertion routine from LoopRotation.cpp into Local.cpp, so that we can share it in both places (LoopRotation and LCSSA). Patch by Matt Davis! Differential Revision: https://reviews.llvm.org/D42551 llvm-svn: 323472	2018-01-25 21:37:07 +00:00
Alexey Bataev	102d4b59f9	Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle." This reverts commit r323441 to fix buildbots. llvm-svn: 323447	2018-01-25 17:28:12 +00:00
Alexey Bataev	c8cfa14b6d	[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle. Summary: If the same value is going to be vectorized several times in the same tree entry, this entry is considered to be a gather entry and cost of this gather is counter as cost of InsertElementInstrs for each gathered value. But we can consider these elements as ShuffleInstr with SK_PermuteSingle shuffle kind. Reviewers: spatel, RKSimon, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38697 llvm-svn: 323441	2018-01-25 16:45:18 +00:00
Sanjay Patel	1d68112c4b	[InstCombine] narrow masked zexted binops (PR35792) This is guarded by shouldChangeType(), so the tests show that we don't do the fold if the narrower type is not legal. Note that there is a proposal (D42424) that would change the results for the specific cases shown in these tests. That difference is also discussed in PR35792: https://bugs.llvm.org/show_bug.cgi?id=35792 Alive proofs for the cases handled here as well as the bitwise logic binops that we should already do better on: https://rise4fun.com/Alive/c97 https://rise4fun.com/Alive/Lc5E https://rise4fun.com/Alive/kdf llvm-svn: 323437	2018-01-25 16:34:36 +00:00
Sanjay Patel	0f95dd234d	[InstCombine] add tests for PR35792; NFC llvm-svn: 323436	2018-01-25 16:03:44 +00:00
Alexey Bataev	a0b2c78efc	Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle." This reverts commit r323430 to fix buildbots. llvm-svn: 323432	2018-01-25 15:20:29 +00:00
Alexey Bataev	ad51fe3644	[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle. Summary: If the same value is going to be vectorized several times in the same tree entry, this entry is considered to be a gather entry and cost of this gather is counter as cost of InsertElementInstrs for each gathered value. But we can consider these elements as ShuffleInstr with SK_PermuteSingle shuffle kind. Reviewers: spatel, RKSimon, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38697 llvm-svn: 323430	2018-01-25 15:01:36 +00:00
Amjad Aboud	f1f57a3137	Another try to commit 323321 (aggressive instruction combine). llvm-svn: 323416	2018-01-25 12:06:32 +00:00
Sanjay Patel	60c13c7712	[InstCombine] fix datalayout in test file The only part of the datalayout that should matter for these tests is the part that specifies the legal int widths ('n*'). But there was a bug - that part of the string was not correctly separated with the expected '-' character, so we were testing as if there were no legal int widths at all. Removed the leading cruft so we have some legal ints to test with. I noticed this while testing a potential change to the way we transform shifts and sexts in D42424. llvm-svn: 323377	2018-01-24 21:36:45 +00:00
Alexey Bataev	0affccc8d7	Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle." This reverts commit r323348 because of the broken buildbots. llvm-svn: 323359	2018-01-24 18:36:51 +00:00
Nicolai Haehnle	4afb64e4c6	Revert r321751, "StructurizeCFG: Fix broken backedge detection" It causes regressions in various OpenGL test suites. Keep the test cases introduced by r321751 as XFAIL, and add a test case for the regression. Change-Id: I90b4cc354f68cebe5fcef1f2422dc8fe1c6d3514 Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=36015 llvm-svn: 323355	2018-01-24 18:02:05 +00:00
Alexey Bataev	4bd8e5332f	[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle. Summary: If the same value is going to be vectorized several times in the same tree entry, this entry is considered to be a gather entry and cost of this gather is counter as cost of InsertElementInstrs for each gathered value. But we can consider these elements as ShuffleInstr with SK_PermuteSingle shuffle kind. Reviewers: spatel, RKSimon, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38697 llvm-svn: 323348	2018-01-24 17:50:53 +00:00
Zvi Rackover	51f0d64b9c	InstSimplify: If divisor element is undef simplify to undef Summary: If any vector divisor element is undef, we can arbitrarily choose it be zero which would make the div/rem an undef value by definition. Reviewers: spatel, reames Reviewed By: spatel Subscribers: magabari, llvm-commits Differential Revision: https://reviews.llvm.org/D42485 llvm-svn: 323343	2018-01-24 17:22:00 +00:00
Simon Pilgrim	f15886eb30	Regenerate shuffle sink test llvm-svn: 323328	2018-01-24 14:59:02 +00:00
Amjad Aboud	d53504e379	Reverted 323321. llvm-svn: 323326	2018-01-24 14:48:49 +00:00
Amjad Aboud	e4453233d7	[InstCombine] Introducing Aggressive Instruction Combine pass (-aggressive-instcombine). Combine expression patterns to form expressions with fewer, simple instructions. This pass does not modify the CFG. For example, this pass reduce width of expressions post-dominated by TruncInst into smaller width when applicable. It differs from instcombine pass in that it contains pattern optimization that requires higher complexity than the O(1), thus, it should run fewer times than instcombine pass. Differential Revision: https://reviews.llvm.org/D38313 llvm-svn: 323321	2018-01-24 12:42:42 +00:00
Max Kazantsev	0f720e1296	[NFC] Remove overconfident assert from IRCE This patch removes assert that SCEV is able to prove that a value is non-negative. In fact, SCEV can sometimes be unable to do this because its cache does not update properly. This assert will be returned once this problem is resolved. llvm-svn: 323309	2018-01-24 07:51:41 +00:00
Hiroshi Inoue	501931b117	[NFC] fix trivial typos in comments "the the" -> "the" llvm-svn: 323302	2018-01-24 05:04:35 +00:00
Zvi Rackover	b5447b1e7c	X86: Update isVectorShiftByScalarCheap with cases covered by AVX512BW Summary: AVX512BW adds support for variable shift amount for 16-bit element vectors. Reviewers: craig.topper, RKSimon, spatel Reviewed By: RKSimon Subscribers: rengolin, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D42437 llvm-svn: 323292	2018-01-24 01:36:40 +00:00
Sanjay Patel	c4ed9ed276	[SLPVectorizer] add test for PR13837; NFC This was probably fixed long ago, but I don't see a test that lines up with the example and target in the bug report: https://bugs.llvm.org/show_bug.cgi?id=13837 ...so adding it here. llvm-svn: 323269	2018-01-23 22:04:17 +00:00
Simon Pilgrim	67b21313ae	Add bdver shuffle sink tests. llvm-svn: 323268	2018-01-23 22:03:57 +00:00
Volkan Keles	dc40be75f8	[llvm-extract] Support extracting basic blocks Summary: Currently, there is no way to extract a basic block from a function easily. This patch extends llvm-extract to extract the specified basic block(s). Reviewers: loladiro, rafael, bogner Reviewed By: bogner Subscribers: hintonda, mgorny, qcolombet, llvm-commits Differential Revision: https://reviews.llvm.org/D41638 llvm-svn: 323266	2018-01-23 21:51:34 +00:00
Simon Pilgrim	bf3c39877e	Regenerate select test. NFCI. llvm-svn: 323265	2018-01-23 21:50:46 +00:00
Simon Pilgrim	a075ce04d8	Regenerate shuffle sink test. NFCI. llvm-svn: 323264	2018-01-23 21:50:11 +00:00
Alexey Bataev	4f74a31c0e	Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle." This reverts commit r323246 because of the broken buildbots. llvm-svn: 323252	2018-01-23 20:11:27 +00:00
Alexey Bataev	6719e2418c	[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle. Summary: If the same value is going to be vectorized several times in the same tree entry, this entry is considered to be a gather entry and cost of this gather is counter as cost of InsertElementInstrs for each gathered value. But we can consider these elements as ShuffleInstr with SK_PermuteSingle shuffle kind. Reviewers: spatel, RKSimon, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38697 llvm-svn: 323246	2018-01-23 19:30:26 +00:00
Zvi Rackover	87937e8ed2	X86 Tests: Add AVX512BW config to CodeGenPrepare test. NFC Case points out that we don't consider shifts supported by AVX512BW in isVectorShiftByScalarCheap() llvm-svn: 323242	2018-01-23 19:20:39 +00:00
Serguei Katkov	17e5794f11	[CGP] Fix the GV handling in complex addressing mode If in complex addressing mode the difference is in GV then base reg should not be installed because we plan to use base reg as a merge point of different GVs. This is a fix for PR35980. Reviewers: reames, john.brawn, santosh Reviewed By: john.brawn Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42230 llvm-svn: 323192	2018-01-23 12:07:49 +00:00
Anton Bikineev	82f61151b3	[InstSimplify] (X << Y) % X -> 0 llvm-svn: 323182	2018-01-23 09:27:47 +00:00
Chandler Carruth	c58f2166ab	Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre.. Summary: First, we need to explain the core of the vulnerability. Note that this is a very incomplete description, please see the Project Zero blog post for details: https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html The basis for branch target injection is to direct speculative execution of the processor to some "gadget" of executable code by poisoning the prediction of indirect branches with the address of that gadget. The gadget in turn contains an operation that provides a side channel for reading data. Most commonly, this will look like a load of secret data followed by a branch on the loaded value and then a load of some predictable cache line. The attacker then uses timing of the processors cache to determine which direction the branch took in the speculative execution, and in turn what one bit of the loaded value was. Due to the nature of these timing side channels and the branch predictor on Intel processors, this allows an attacker to leak data only accessible to a privileged domain (like the kernel) back into an unprivileged domain. The goal is simple: avoid generating code which contains an indirect branch that could have its prediction poisoned by an attacker. In many cases, the compiler can simply use directed conditional branches and a small search tree. LLVM already has support for lowering switches in this way and the first step of this patch is to disable jump-table lowering of switches and introduce a pass to rewrite explicit indirectbr sequences into a switch over integers. However, there is no fully general alternative to indirect calls. We introduce a new construct we call a "retpoline" to implement indirect calls in a non-speculatable way. It can be thought of loosely as a trampoline for indirect calls which uses the RET instruction on x86. Further, we arrange for a specific call->ret sequence which ensures the processor predicts the return to go to a controlled, known location. The retpoline then "smashes" the return address pushed onto the stack by the call with the desired target of the original indirect call. The result is a predicted return to the next instruction after a call (which can be used to trap speculative execution within an infinite loop) and an actual indirect branch to an arbitrary address. On 64-bit x86 ABIs, this is especially easily done in the compiler by using a guaranteed scratch register to pass the target into this device. For 32-bit ABIs there isn't a guaranteed scratch register and so several different retpoline variants are introduced to use a scratch register if one is available in the calling convention and to otherwise use direct stack push/pop sequences to pass the target address. This "retpoline" mitigation is fully described in the following blog post: https://support.google.com/faqs/answer/7625886 We also support a target feature that disables emission of the retpoline thunk by the compiler to allow for custom thunks if users want them. These are particularly useful in environments like kernels that routinely do hot-patching on boot and want to hot-patch their thunk to different code sequences. They can write this custom thunk and use `-mretpoline-external-thunk` in addition to `-mretpoline`. In this case, on x86-64 thu thunk names must be: ``` __llvm_external_retpoline_r11 ``` or on 32-bit: ``` __llvm_external_retpoline_eax __llvm_external_retpoline_ecx __llvm_external_retpoline_edx __llvm_external_retpoline_push ``` And the target of the retpoline is passed in the named register, or in the case of the `push` suffix on the top of the stack via a `pushl` instruction. There is one other important source of indirect branches in x86 ELF binaries: the PLT. These patches also include support for LLD to generate PLT entries that perform a retpoline-style indirection. The only other indirect branches remaining that we are aware of are from precompiled runtimes (such as crt0.o and similar). The ones we have found are not really attackable, and so we have not focused on them here, but eventually these runtimes should also be replicated for retpoline-ed configurations for completeness. For kernels or other freestanding or fully static executables, the compiler switch `-mretpoline` is sufficient to fully mitigate this particular attack. For dynamic executables, you must compile all libraries with `-mretpoline` and additionally link the dynamic executable and all shared libraries with LLD and pass `-z retpolineplt` (or use similar functionality from some other linker). We strongly recommend also using `-z now` as non-lazy binding allows the retpoline-mitigated PLT to be substantially smaller. When manually apply similar transformations to `-mretpoline` to the Linux kernel we observed very small performance hits to applications running typical workloads, and relatively minor hits (approximately 2%) even for extremely syscall-heavy applications. This is largely due to the small number of indirect branches that occur in performance sensitive paths of the kernel. When using these patches on statically linked applications, especially C++ applications, you should expect to see a much more dramatic performance hit. For microbenchmarks that are switch, indirect-, or virtual-call heavy we have seen overheads ranging from 10% to 50%. However, real-world workloads exhibit substantially lower performance impact. Notably, techniques such as PGO and ThinLTO dramatically reduce the impact of hot indirect calls (by speculatively promoting them to direct calls) and allow optimized search trees to be used to lower switches. If you need to deploy these techniques in C++ applications, we strongly recommend that you ensure all hot call targets are statically linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well tuned servers using all of these techniques saw 5% - 10% overhead from the use of retpoline. We will add detailed documentation covering these components in subsequent patches, but wanted to make the core functionality available as soon as possible. Happy for more code review, but we'd really like to get these patches landed and backported ASAP for obvious reasons. We're planning to backport this to both 6.0 and 5.0 release streams and get a 5.0 release with just this cherry picked ASAP for distros and vendors. This patch is the work of a number of people over the past month: Eric, Reid, Rui, and myself. I'm mailing it out as a single commit due to the time sensitive nature of landing this and the need to backport it. Huge thanks to everyone who helped out here, and everyone at Intel who helped out in discussions about how to craft this. Also, credit goes to Paul Turner (at Google, but not an LLVM contributor) for much of the underlying retpoline design. Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D41723 llvm-svn: 323155	2018-01-22 22:05:25 +00:00
Serguei Katkov	f38041dc3e	Revert [SCEV] Fix isLoopEntryGuardedByCond usage It causes buildbot failures. New added assert is fired. It seems not all usages of isLoopEntryGuardedByCond are fixed. llvm-svn: 323079	2018-01-22 07:47:02 +00:00
Serguei Katkov	50714a1cbc	[SCEV] Fix isLoopEntryGuardedByCond usage ScalarEvolution::isKnownPredicate invokes isLoopEntryGuardedByCond without check that SCEV is available at entry point of the loop. It is incorrect and fixed by patch. Reviewers: sanjoy, mkazantsev, anna, dorit Reviewed By: mkazantsev Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42165 llvm-svn: 323077	2018-01-22 07:31:41 +00:00
Sanjay Patel	9530f18864	[InstCombine] (X << Y) / X -> 1 << Y ...when the shift is known to not overflow with the matching signed-ness of the division. This closes an optimization gap caused by canonicalizing mul by power-of-2 to shl as shown in PR35709: https://bugs.llvm.org/show_bug.cgi?id=35709 Patch by Anton Bikineev! Differential Revision: https://reviews.llvm.org/D42032 llvm-svn: 323068	2018-01-21 16:14:51 +00:00
Sanjay Patel	7a44e4d594	[InstSimplify] add baseline tests for (X << Y) % X -> 0; NFC This is the 'rem' counterpart to D42032 and would be folded by D42341. Patch by Anton Bikineev. Differential Revision: https://reviews.llvm.org/D42342 llvm-svn: 323067	2018-01-21 15:36:15 +00:00
Sanjay Patel	439132185d	[InstCombine] add baseline tests for (X << Y) / X -> 1 << Y; NFC This fold is proposed in D42032. llvm-svn: 323043	2018-01-20 16:13:40 +00:00
Craig Topper	0d797a34d8	[X86] Add support for passing 'prefer-vector-width' function attribute into X86Subtarget and exposing via X86's getRegisterWidth TTI interface. This will cause the vectorizers to do some limiting of the vector widths they create. This is not a strict limit. There are reasons I know of that the loop vectorizer will generate larger vectors for. I've written this in such a way that the interface will only return a properly supported width(0/128/256/512) even if the attribute says something funny like 384 or 10. This has been split from D41895 with the remainder in a follow up commit. llvm-svn: 323015	2018-01-20 00:26:08 +00:00
Akira Hatanaka	73ceb50d85	[ObjCARC] Do not turn a call to @objc_autoreleaseReturnValue into a call to @objc_autorelease if its operand is a PHI and the PHI has an equivalent value that is used by a return instruction. For example, ARC optimizer shouldn't replace the call in the following example, as doing so breaks the AutoreleaseRV/RetainRV optimization: %v1 = bitcast i32* %v0 to i8* br label %bb3 bb2: %v3 = bitcast i32* %v2 to i8* br label %bb3 bb3: %p = phi i8* [ %v1, %bb1 ], [ %v3, %bb2 ] %retval = phi i32* [ %v0, %bb1 ], [ %v2, %bb2 ] ; equivalent to %p %v4 = tail call i8* @objc_autoreleaseReturnValue(i8* %p) ret i32* %retval Also, make sure ObjCARCContract replaces @objc_autoreleaseReturnValue's operand uses with its value so that the call gets tail-called. rdar://problem/15894705 llvm-svn: 323009	2018-01-19 23:51:13 +00:00
Jakub Kuderski	d2e371f046	[Dominators] Visit affected node candidates found at different root levels Summary: This patch attempts to fix the DomTree incremental insertion bug found here [[ https://bugs.llvm.org/show_bug.cgi?id=35969 \| PR35969 ]] . When performing an insertion into a piece of unreachable CFG, we may find the same not at different levels. When this happens, the node can turn out to be affected when we find it starting from a node with a lower level in the tree. The level at which we start visitation affects if we consider a node affected or not. This patch tracks the lowest level at which each node was visited during insertion and allows it to be visited multiple times, if it can cause it to be considered affected. Reviewers: brzycki, davide, dberlin, grosser Reviewed By: brzycki Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42231 llvm-svn: 322993	2018-01-19 21:27:24 +00:00
Daniel Neilson	1e68724d24	Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1) Summary: This is a resurrection of work first proposed and discussed in Aug 2015: http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html and initially landed (but then backed out) in Nov 2015: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument which is required to be a constant integer. It represents the alignment of the dest (and source), and so must be the minimum of the actual alignment of the two. This change is the first in a series that allows source and dest to each have their own alignments by using the alignment attribute on their arguments. In this change we: 1) Remove the alignment argument. 2) Add alignment attributes to the source & dest arguments. We, temporarily, require that the alignments for source & dest be equal. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false) will now read call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false) Downstream users may have to update their lit tests that check for @llvm.memcpy/memmove/memset call/declaration patterns. The following extended sed script may help with updating the majority of your tests, but it does not catch all possible patterns so some manual checking and updating will be required. s~declare void @llvm\.mem(set\|cpy\|move)\.p([^(])$(.), i32, i1$~declare void @llvm.mem\1.p\2(\3, i1)~g s~call void @llvm\.memset\.p([^(])i8$i8([^])\ (.), i8 (.), i8 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i8(i8\2* \3, i8 \4, i8 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i16$i8([^])\ (.), i8 (.), i16 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i16(i8\2* \3, i8 \4, i16 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i32$i8([^])\ (.), i8 (.), i32 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i32(i8\2* \3, i8 \4, i32 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i64$i8([^])\ (.), i8 (.), i64 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i64(i8\2* \3, i8 \4, i64 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i128$i8([^])\ (.), i8 (.), i128 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i128(i8\2* \3, i8 \4, i128 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i8$i8([^])\ (.), i8 (.), i8 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i8(i8\2 align \6 \3, i8 \4, i8 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i16$i8([^])\ (.), i8 (.), i16 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i16(i8\2 align \6 \3, i8 \4, i16 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i32$i8([^])\ (.), i8 (.), i32 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i32(i8\2 align \6 \3, i8 \4, i32 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i64$i8([^])\ (.), i8 (.), i64 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i64(i8\2 align \6 \3, i8 \4, i64 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i128$i8([^])\ (.), i8 (.), i128 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i128(i8\2 align \6 \3, i8 \4, i128 \5, i1 \7)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i8$i8([^])\ (.), i8([^])\ (.), i8 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i8(i8\3 \4, i8\5* \6, i8 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i16$i8([^])\ (.), i8([^])\ (.), i16 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i16(i8\3 \4, i8\5* \6, i16 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i32$i8([^])\ (.), i8([^])\ (.), i32 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i32(i8\3 \4, i8\5* \6, i32 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i64$i8([^])\ (.), i8([^])\ (.), i64 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i64(i8\3 \4, i8\5* \6, i64 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i128$i8([^])\ (.), i8([^])\ (.), i128 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i128(i8\3 \4, i8\5* \6, i128 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i8$i8([^])\ (.), i8([^])\ (.), i8 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i8(i8\3* align \8 \4, i8\5* align \8 \6, i8 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i16$i8([^])\ (.), i8([^])\ (.), i16 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i16(i8\3* align \8 \4, i8\5* align \8 \6, i16 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i32$i8([^])\ (.), i8([^])\ (.), i32 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i32(i8\3* align \8 \4, i8\5* align \8 \6, i32 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i64$i8([^])\ (.), i8([^])\ (.), i64 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i64(i8\3* align \8 \4, i8\5* align \8 \6, i64 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i128$i8([^])\ (.), i8([^])\ (.), i128 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i128(i8\3* align \8 \4, i8\5* align \8 \6, i128 \7, i1 \9)~g The remaining changes in the series will: Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. Step 3) Update Clang to use the new IRBuilder API. Step 4) Update Polly to use the new IRBuilder API. Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get\|set]Alignment() to use getDestAlignment() and getSourceAlignment() instead. Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get\|set]Alignment() methods. Reviewers: pete, hfinkel, lhames, reames, bollu Reviewed By: reames Subscribers: niosHD, reames, jholewinski, qcolombet, jfb, sanjoy, arsenm, dschuff, dylanmckay, mehdi_amini, sdardis, nemanjai, david2050, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, llvm-commits Differential Revision: https://reviews.llvm.org/D41675 llvm-svn: 322965	2018-01-19 17:13:12 +00:00
Alexey Bataev	fa80c47c6a	[SLP] Fix vectorization for tree with trunc to minimum required bit width. Summary: If the vectorized tree has truncate to minimum required bit width and the vector type of the cast operation after the truncation is the same as the vector type of the cast operands, count cost of the vector cast operation as 0, because this cast will be later removed. Also, if the vectorization tree root operations are integer cast operations, do not consider them as candidates for truncation. It will just create extra number of the same vector/scalar operations, which will be removed by instcombiner. Reviewers: RKSimon, spatel, mkuper, hfinkel, mssimpso Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41948 llvm-svn: 322946	2018-01-19 14:40:13 +00:00
John Brawn	2867bd72c0	[InstCombine] Make foldSelectOpOp able to handle two-operand getelementptr Three (or more) operand getelementptrs could plausibly also be handled, but handling only two-operand fits in easily with the existing BinaryOperator handling. Differential Revision: https://reviews.llvm.org/D39958 llvm-svn: 322930	2018-01-19 10:05:15 +00:00
Sanjay Patel	a19b748f6d	[InstSimplify] regenerate checks and add tests for commutes; NFC llvm-svn: 322907	2018-01-18 23:11:24 +00:00
Alexey Bataev	a2d6fe4ab4	[SLP] Fix test checks, NFC. llvm-svn: 322865	2018-01-18 17:34:27 +00:00
Craig Topper	83b0a98902	[X86] Use vmovdqu64/vmovdqa64 for unmasked integer vector stores for consistency with loads. Previously we used 64 for vXi64 stores and 32 for everything else. This change uses 64 for everything just like do for loads. llvm-svn: 322820	2018-01-18 07:44:09 +00:00
Rafael Espindola	9fbc040599	Make GlobalValues with non-default visibilility dso_local. This is similar to r322317, but for visibility. It is not as neat because we have to special case extern_weak. The idea is the same as the previous change, make the transition to explicit dso_local easier for the frontends. With this they only have to add dso_local to symbols where we need some external information to decide if it is dso_local (like it being part of an ELF executable). llvm-svn: 322806	2018-01-18 02:08:23 +00:00
Zaara Syeda	c9dc7b451b	Revert [PowerPC] This reverts commit rL322721 Failing build bots. Revert the commit now. llvm-svn: 322748	2018-01-17 20:00:15 +00:00
Sanjay Patel	218a0b51dd	[InstCombine] add baseline tests for D39958; NFC llvm-svn: 322733	2018-01-17 19:04:18 +00:00
Zaara Syeda	8e951fd2f6	[PowerPC] Add handling for ColdCC calling convention and a pass to mark candidates with coldcc attribute. This patch adds support for the coldcc calling convention for Power. This changes the set of non-volatile registers. It includes a pass to stress test the implementation by marking all static directly called functions with the coldcc attribute through the option -enable-coldcc-stress-test. It also includes an option, -ppc-enable-coldcc, to add the coldcc attribute to functions which are cold at all call sites based on BlockFrequencyInfo when the containing function does not call any non cold functions. Differential Revision: https://reviews.llvm.org/D38413 llvm-svn: 322721	2018-01-17 18:22:55 +00:00
Sanjay Patel	aa766efd09	[InstCombine] fix demanded-bits propagation for zext/trunc I was comparing the demanded-bits implementations between InstCombine and TargetLowering as part of investigating questions in D42088 and noticed that this was wrong in IR. We were losing all of the prior known bits when we got back to the 'zext'. llvm-svn: 322662	2018-01-17 14:39:28 +00:00
Sanjay Patel	178deccb63	[InstCombine] add test to show hole in demanded bits; NFC llvm-svn: 322660	2018-01-17 14:27:35 +00:00
Ivan A. Kosarev	4d0ff0c74d	[Transforms] Support making mutable versions of new-format TBAA access tags Differential Revision: https://reviews.llvm.org/D41565 llvm-svn: 322650	2018-01-17 13:29:54 +00:00
Florian Hahn	c6c89bffdc	[CallSiteSplitting] Pass list of (BB, Conditions) pairs to splitCallSite. This removes some duplication from splitCallSite and makes it easier to add additional code dealing with each predecessor. It also allows us to split for more than 2 predecessors, although that is not enabled for now. Reviewers: junbuml, mcrosier, davidxl, davide Reviewed By: junbuml Differential Revision: https://reviews.llvm.org/D41858 llvm-svn: 322599	2018-01-16 22:13:15 +00:00
Alexey Bataev	6977dbcc7b	[SLP] Fix for PR32164: Improve vectorization of reverse order of extract operations. Summary: Sometimes vectorization of insertelement instructions with extractelement operands may produce an extra shuffle operation, if these operands are in the reverse order. Patch tries to improve this situation by the reordering of the operands to remove this extra shuffle operation. Reviewers: mkuper, hfinkel, RKSimon, spatel Subscribers: mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D33954 llvm-svn: 322579	2018-01-16 18:17:01 +00:00
Hiroshi Inoue	99a8faa615	[SROA] fix assetion failure This patch fixes the assertion failure in SROA reported in PR35657. PR35657 reports the assertion failure due to r319522 (splitting for non-whole-alloca slices), but this problem can happen even without r319522. The problem exists in a check for reusing an existing alloca when rewriting partitions. As the original comment said, we can reuse the existing alloca if the new alloca has the same type and offset with the existing one. But the code checks only type of the alloca and then check the offset using an assert. In a corner case with out-of-bounds access (e.g. @PR35657 function added in unit test), it is possible that the two allocas have the same type but different offsets. This patch makes the check of the offset in the if condition, and re-enables the splitting for non-whole-alloca slices. Differential Revision: https://reviews.llvm.org/D41981 llvm-svn: 322533	2018-01-16 06:23:05 +00:00
Andrei Elovikov	7457aa0bce	[LV] Don't call recordVectorLoopValueForInductionCast for newly-created IV from a trunc. Summary: This method is supposed to be called for IVs that have casts in their use-def chains that are completely ignored after vectorization under PSE. However, for truncates of such IVs the same InductionDescriptor is used during creation/widening of both original IV based on PHINode and new IV based on TruncInst. This leads to unintended second call to recordVectorLoopValueForInductionCast with a VectorLoopVal set to the newly created IV for a trunc and causes an assert due to attempt to store new information for already existing entry in the map. This is wrong and should not be done. Fixes PR35773. Reviewers: dorit, Ayal, mssimpso Reviewed By: dorit Subscribers: RKSimon, dim, dcaballe, hsaito, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D41913 llvm-svn: 322473	2018-01-15 10:56:07 +00:00
Sanjay Patel	4158eff0f8	[InstSimplify] fold implied null ptr check (PR35790) This extends rL322327 to handle the pointer cast and should solve: https://bugs.llvm.org/show_bug.cgi?id=35790 Name: or_eq_zero %isnull = icmp eq i64* %p, null %x = ptrtoint i64* %p to i64 %somebits = and i64 %x, %y %somebits_are_zero = icmp eq i64 %somebits, 0 %or = or i1 %somebits_are_zero, %isnull => %or = %somebits_are_zero Name: and_ne_zero %isnotnull = icmp ne i64* %p, null %x = ptrtoint i64* %p to i64 %somebits = and i64 %x, %y %somebits_are_not_zero = icmp ne i64 %somebits, 0 %and = and i1 %somebits_are_not_zero, %isnotnull => %and = %somebits_are_not_zero https://rise4fun.com/Alive/CQ3 llvm-svn: 322439	2018-01-13 15:44:44 +00:00
Sanjay Patel	6691e40980	[InstSimplify] add tests for implied ptr cmp with null (PR35790); NFC llvm-svn: 322411	2018-01-12 22:16:26 +00:00
Brian M. Rzycki	9b7ae23256	[JumpThreading] Preservation of DT and LVI across the pass Summary: See D37528 for a previous (non-deferred) version of this patch and its description. Preserves dominance in a deferred manner using a new class DeferredDominance. This reduces the performance impact of updating the DominatorTree at every edge insertion and deletion. A user may call DDT->flush() within JumpThreading for an up-to-date DT. This patch currently has one flush() at the end of runImpl() to ensure DT is preserved across the pass. LVI is also preserved to help subsequent passes such as CorrelatedValuePropagation. LVI is simpler to maintain and is done immediately (not deferred). The code to perform the preversation was minimally altered and simply marked as preserved for the PassManager to be informed. This extends the analysis available to JumpThreading for future enhancements such as threading across loop headers. Reviewers: dberlin, kuhar, sebpop Reviewed By: kuhar, sebpop Subscribers: mgorny, dmgreen, kuba, rnk, rsmith, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D40146 llvm-svn: 322401	2018-01-12 21:06:48 +00:00
Max Kazantsev	ef0576000c	[IRCE][NFC] Make range check's End a non-null SCEV Currently, IRC contains `Begin` and `Step` as SCEVs and `End` as value. Aside from that, `End` can also be `nullptr` which can be later conditionally converted into a non-null SCEV. To make this logic more transparent, this patch makes `End` a SCEV and calculates it early, so that it is never a null. Differential Revision: https://reviews.llvm.org/D39590 llvm-svn: 322364	2018-01-12 10:00:26 +00:00
Serguei Katkov	a757d65cec	[LoopDeletion] Handle users in unreachable block This is a fix for PR35884. When we want to delete dead loop we must clean uses in unreachable blocks otherwise we'll get an assert during deletion of instructions from the loop. Reviewers: anna, davide Reviewed By: anna Subscribers: llvm-commits, lebedev.ri Differential Revision: https://reviews.llvm.org/D41943 llvm-svn: 322357	2018-01-12 07:24:43 +00:00
Sanjay Patel	6ef6aa987c	[InstSimplify] fold implied cmp with zero (PR35790) This doesn't handle the more complicated case in the bug report yet: https://bugs.llvm.org/show_bug.cgi?id=35790 For that, we have to match / look through a cast. llvm-svn: 322327	2018-01-11 23:27:37 +00:00
Sanjay Patel	ac0edcb3f3	[InstSimplify] add tests for implied cmp with zero (PR35790); NFC llvm-svn: 322323	2018-01-11 22:48:07 +00:00
Rafael Espindola	e4b0231c63	Make internal/private GVs implicitly dso_local. While updating clang tests for having clang set dso_local I noticed that: - There are a lot of tests to update. - Many of the updates are redundant. They are redundant because a GV is "obviously dso_local". This patch starts formalizing that a bit by requiring that internal and private GVs be dso_local too. Since they all are, we don't have to print dso_local to the textual representation, making it a bit more compact and easier to read. llvm-svn: 322317	2018-01-11 22:15:05 +00:00
Fiona Glaser	efe6a84e5b	[Sink] Really really fix predicate in legality check LoadInst isn't enough; we need to include intrinsics that perform loads too. All side-effecting intrinsics and such are already covered by the isSafe check, so we just need to care about things that read from memory. D41960, originally from D33179. llvm-svn: 322311	2018-01-11 21:28:57 +00:00
Benjamin Kramer	738e6e7cb0	[InstCombine] Apply the fix from r322284 for sin / cos -> tan too llvm-svn: 322285	2018-01-11 15:33:21 +00:00
Benjamin Kramer	44993ede60	[InstCombine] For cos/sin -> tan copy attributes from cos instead of the parent function Ideally we should merge the attributes from the functions somehow, but this is obviously an improvement over taking random attributes from the caller which will trip up the verifier if they're nonsensical for an unary intrinsic call. llvm-svn: 322284	2018-01-11 15:19:02 +00:00
Sanjay Patel	e63d8dda5a	[ValueTracking] recognize min/max-of-min/max with notted ops (PR35875) This was originally planned as the fix for: https://bugs.llvm.org/show_bug.cgi?id=35834 ...but simpler transforms handled that case, so I implemented a lesser solution. It turns out we need to handle the case with 'not' ops too because the real code example that we are trying to solve: https://bugs.llvm.org/show_bug.cgi?id=35875 ...has extra uses of the intermediate values, so we can't rely on smaller canonicalizations to get us to the goal. As with rL321672, I've tried to show every possibility in the codegen tests because that's the simplest way to prove we're doing the right thing in the wide variety of permutations of this pattern. We can also show an InstCombine win because we added a fold for this case in: rL321998 / D41603 An Alive proof for one variant of the pattern to show that the InstCombine and codegen results are correct: https://rise4fun.com/Alive/vd1 Name: min3_nots %nx = xor i8 %x, -1 %ny = xor i8 %y, -1 %nz = xor i8 %z, -1 %cmpxz = icmp slt i8 %nx, %nz %minxz = select i1 %cmpxz, i8 %nx, i8 %nz %cmpyz = icmp slt i8 %ny, %nz %minyz = select i1 %cmpyz, i8 %ny, i8 %nz %cmpyx = icmp slt i8 %y, %x %r = select i1 %cmpyx, i8 %minxz, i8 %minyz => %cmpxyz = icmp slt i8 %minxz, %ny %r = select i1 %cmpxyz, i8 %minxz, i8 %ny Name: min3_nots_alt %nx = xor i8 %x, -1 %ny = xor i8 %y, -1 %nz = xor i8 %z, -1 %cmpxz = icmp slt i8 %nx, %nz %minxz = select i1 %cmpxz, i8 %nx, i8 %nz %cmpyz = icmp slt i8 %ny, %nz %minyz = select i1 %cmpyz, i8 %ny, i8 %nz %cmpyx = icmp slt i8 %y, %x %r = select i1 %cmpyx, i8 %minxz, i8 %minyz => %xz = icmp sgt i8 %x, %z %maxxz = select i1 %xz, i8 %x, i8 %z %xyz = icmp sgt i8 %maxxz, %y %maxxyz = select i1 %xyz, i8 %maxxz, i8 %y %r = xor i8 %maxxyz, -1 llvm-svn: 322283	2018-01-11 15:13:47 +00:00
Sanjay Patel	e0df4650f8	[InstCombine] add min3-with-nots test (PR35875); NFC llvm-svn: 322281	2018-01-11 14:53:45 +00:00
Dmitry Venikov	e5fbf591a7	[InstCombine] Missed optimization in math expression: sin(x) / cos(x) => tan(x) Summary: This patch enables folding sin(x) / cos(x) -> tan(x), cos(x) / sin(x) -> 1 / tan(x) under -ffast-math flag Reviewers: hfinkel, spatel Reviewed By: spatel Subscribers: andrew.w.kaylor, efriedma, scanon, llvm-commits Differential Revision: https://reviews.llvm.org/D41286 llvm-svn: 322255	2018-01-11 06:33:00 +00:00
Alexey Bataev	90e29b81d6	[SLP] Add/update tests for SLP vectorizer, NFC. llvm-svn: 322225	2018-01-10 21:29:18 +00:00
Sanjay Patel	d04026ea43	[InstCombine] add test to show missed bswap; NFC D41353 / D41233 are proposing to alter the shl/and canonicalization, but I think that would just move an existing pattern-matching hole to a different place. llvm-svn: 322206	2018-01-10 18:47:21 +00:00
Bjorn Pettersson	3851496e6e	Avoid inlining if there is byval arguments with non-alloca address space Summary: After teaching InlineCost more about address spaces () another fault was detected in the inliner. If an argument has the byval attribute the parameter might be copied to an alloca. That part seems to work fine even if the argument has a different address space than the alloca address space. However, if the address spaces differ, then the inlined function still might refer to the parameter using the original address space (the inliner does not handle that situation very well). This patch avoids the problem by simply disallowing inlining when there are byval arguments with address space that differs from the alloca address space. I'm not really sure how to transform the code if we want to get inlining for this situation. I assume that it never has been working, and that the fixes in r321809 just exposed an old problem. Fault found by skatkov (Serguei Katkov). It is mentioned in follow up comments to https://reviews.llvm.org/D40455. Reviewers: skatkov Reviewed By: skatkov Subscribers: uabelho, eraman, llvm-commits, haicheng Differential Revision: https://reviews.llvm.org/D41898 llvm-svn: 322181	2018-01-10 13:01:18 +00:00
Vlad Tsyrklevich	cdec22ef9a	LowerTypeTests: Add limited support for aliases Summary: LowerTypeTests moves some function definitions from individual object files to the merged module, leaving a stub to be called in the merged module's jump table. If an alias was pointing to such a function definition LowerTypeTests would fail because the alias would be left without a definition to point to. This change 1) emits information about aliases to the ThinLTO summary, 2) replaces aliases pointing to function definitions that are moved to the merged module with function declarations, and 3) re-emits those aliases in the merged module pointing to the correct function definitions. The patch does not correctly fix all possible mis-uses of aliases in LowerTypeTests. For example, it does not handle aliases with a different type from the pointed to function. The addition of alias data increases the size of Chrome build artifacts by less than 1%. Reviewers: pcc Reviewed By: pcc Subscribers: mehdi_amini, eraman, mgrang, llvm-commits, eugenis, kcc Differential Revision: https://reviews.llvm.org/D41741 llvm-svn: 322139	2018-01-10 00:00:51 +00:00
Michael Zolotukhin	1f562176e9	[LoopRotate] Detect loops with indirect branches better (we're giving up on them). llvm-svn: 322137	2018-01-09 23:54:35 +00:00
Chris Bieneman	abdea268c1	[IPSCCP] Remove calls without side effects Summary: When performing constant propagation for call instructions we have historically replaced all uses of the return from a call, but not removed the call itself. This is required for correctness if the calls have side effects, however the compiler should be able to safely remove calls that don't have side effects. This allows the compiler to completely fold away calls to functions that have no side effects if the inputs are constant and the output can be determined at compile time. Reviewers: davide, sanjoy, bruno, dberlin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38856 llvm-svn: 322125	2018-01-09 21:58:46 +00:00
Daniel Berlin	56cca7437c	NewGVN: Fix PR/33367, which was causing us to delete non-copy intrinsics accidentally in some rare cases llvm-svn: 322115	2018-01-09 20:12:42 +00:00
Hubert Tong	55662a8e9f	Profiling tests: Endianess XFAIL for powerpc- (32-bit) Add powerpc- (32-bit) as XFAIL for tests that are documented either in- line or via commit messages as expected to fail on big-endian systems. Tests not documented in-line are documented in commit messages as follows: r211172 - test/tools/llvm-cov/llvm-cov.test r247920 - test/Transforms/SampleProfile/gcc-simple.ll llvm-svn: 322114	2018-01-09 20:09:23 +00:00
Easwaran Raman	bdf20261d8	Add a pass to generate synthetic function entry counts. Summary: This pass synthesizes function entry counts by traversing the callgraph and using the relative block frequencies of the callsites. The intended use of these counts is in inlining to determine hot/cold callsites in the absence of profile information. The pass is split into two files with the code that propagates the counts in a callgraph in a Utils file. I plan to add support for propagation in the thinlto link phase and the propagation code will be shared and hence this split. I did not add support to the old PM since hot callsite determination in inlining is not possible in old PM (although we could use hot callee heuristic with synthetic counts in the old PM it is not worth the effort tuning it) Reviewers: davidxl, silvas Subscribers: mgorny, mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D41604 llvm-svn: 322110	2018-01-09 19:39:35 +00:00
Alexey Bataev	771ec9f399	[COST]Fix PR35865: Fix cost model evaluation for shuffle on X86. Summary: If the vector type is transformed to non-vector single type, the compile may crash trying to get vector information about non-vector type. Reviewers: RKSimon, spatel, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41862 llvm-svn: 322106	2018-01-09 19:08:22 +00:00
Sanjay Patel	6fb1357c35	[InstCombine] weaken assertions for icmp folds (PR35846) Because of potential UB (known bits conflicts with an llvm.assume), we have to check rather than assert here because InstSimplify doesn't kill the compare: https://bugs.llvm.org/show_bug.cgi?id=35846 llvm-svn: 322104	2018-01-09 18:56:03 +00:00
Petar Jovanovic	1d26c7e4ff	[EarlyCSE] Salvage debug info during DCE EarlyCSE did not try to salvage debug info during erasing of instructions. This change fixes it. Patch by Djordje Todorovic. Differential Revision: https://reviews.llvm.org/D41496 llvm-svn: 322083	2018-01-09 15:08:37 +00:00
Simon Pilgrim	5d909be91b	[InstCombine] Check for out of range ashr values using APInt before calling getZExtValue Reduced from oss-fuzz #5032 test case llvm-svn: 322078	2018-01-09 14:23:46 +00:00
Simon Pilgrim	94357afd26	[InstCombine] Add pow2 mul -> shl tests for vectors with uniform/non-uniform constants llvm-svn: 322072	2018-01-09 11:55:27 +00:00
Serguei Katkov	6a7a4c6a55	[SCEV] Do not cache S -> V if S is not equivalent of V SCEV tracks the correspondence of created SCEV to original instruction. However during creation of SCEV it is possible that nuw/nsw/exact flags are lost. As a result during expansion of the SCEV the instruction with nuw/nsw/exact will be used where it was expected and we produce poison incorreclty. Reviewers: sanjoy, mkazantsev, sebpop, jbhateja Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41578 llvm-svn: 322058	2018-01-09 06:47:14 +00:00
Serguei Katkov	4d1dd6b53a	[CGP] Fix Complex addressing mode for offset If the offset is differ in two addressing mode we can continue only if ScaleReg is not set due to we will use it as merge of different offsets. It should fix PR35799 and PR35805. Reviewers: john.brawn, reames Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41227 llvm-svn: 322056	2018-01-09 04:37:06 +00:00
Sanjay Patel	7dfe96ad16	[ValueTracking] remove overzealous assert The test is derived from a failing fuzz test: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=5008 Credit to @rksimon for pointing out the problem. llvm-svn: 322016	2018-01-08 18:31:13 +00:00
Sanjay Patel	52149f0305	[TargetLibraryInfo] fix finite mathlib function availability This patch was part of: https://reviews.llvm.org/D41338 ...but we can expose the bug in IR via constant propagation as shown in the test. Unless the triple includes 'linux', we should not fold these because the functions don't exist on other platforms (yet?). llvm-svn: 322010	2018-01-08 17:38:09 +00:00
Davide Italiano	9a60d2c157	[CVP] Replace incoming values from unreachable blocks with undef. This is an attempt of fixing PR35807. Due to the non-standard definition of dominance in LLVM, where uses in unreachable blocks are dominated by anything, you can have, in an unreachable block: %patatino = OP1 %patatino, CONSTANT When `SimplifyInstruction` receives a PHI where an incoming value is of the aforementioned form, in some cases, loops indefinitely. What I propose here instead is keeping track of the incoming values from unreachable blocks, and replacing them with undef. It fixes this case, and it seems to be good regardless (even if we can't prove that the value is constant, as it's coming from an unreachable block, we can ignore it). Differential Revision: https://reviews.llvm.org/D41812 llvm-svn: 322006	2018-01-08 16:34:06 +00:00
Sanjay Patel	31b4b76f99	[InstCombine] fold min/max tree with common operand (PR35717) There is precedence for factorization transforms in instcombine for FP ops with fast-math. We also have similar logic in foldSPFofSPF(). It would take more work to add this to reassociate because that's specialized for binops, and min/max are not binops (or even single instructions). Also, I don't have evidence that larger min/max trees than this exist in real code, but if we find that's true, we might want to reorganize where/how we do this optimization. In the motivating example from https://bugs.llvm.org/show_bug.cgi?id=35717 , we have: int test(int xc, int xm, int xy) { int xk; if (xc < xm) xk = xc < xy ? xc : xy; else xk = xm < xy ? xm : xy; return xk; } This patch solves that problem because we recognize more min/max patterns after rL321672 https://rise4fun.com/Alive/Qjne https://rise4fun.com/Alive/3yg Differential Revision: https://reviews.llvm.org/D41603 llvm-svn: 321998	2018-01-08 15:05:34 +00:00
Alexey Bataev	5b9a77d4ea	[SLP] Fix PR35777: Incorrect handling of aggregate values. Summary: Fixes the bug with incorrect handling of InsertValue\|InsertElement instrucions in SLP vectorizer. Currently, we may use incorrect ExtractElement instructions as the operands of the original InsertValue\|InsertElement instructions. Reviewers: mkuper, hfinkel, RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41767 llvm-svn: 321994	2018-01-08 14:43:06 +00:00
Alexey Bataev	118a0a2c38	[SLP] Fix PR35628: Count external uses on extra reduction arguments. Summary: If the vectorized value is marked as extra reduction argument, its users are not considered as external users. Patch fixes this. Reviewers: mkuper, hfinkel, RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41786 llvm-svn: 321993	2018-01-08 14:33:11 +00:00
Davide Italiano	4c39758a38	[SLPVectorizer] Reintroduce std::stable_sort(properlyDominates()). The approach was never discussed, I wasn't able to reproduce this non-determinism, and the original author went AWOL. After a discussion on the ML, Philip suggested to revert this. llvm-svn: 321974	2018-01-07 22:06:24 +00:00
Florian Hahn	55be37e7d4	[CodeExtractor] Use subset of function attributes for extracted function. In addition to target-dependent attributes, we can also preserve a white-listed subset of target independent function attributes. The white-list excludes problematic attributes, most prominently: * attributes related to memory accesses, as alloca instructions could be moved in/out of the extracted block * control-flow dependent attributes, like no_return or thunk, as the relerelevant instructions might or might not get extracted. Thanks @efriedma and @aemerson for providing a set of attributes that cannot be propagated. Reviewers: efriedma, davidxl, davide, silvas Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D41334 llvm-svn: 321961	2018-01-07 11:22:25 +00:00
Florian Hahn	a82eef2363	[InlineFunction] Preserve calling convention when forwarding VarArgs. Reviewers: efriedma, rnk, davide Reviewed By: rnk, davide Differential Revision: https://reviews.llvm.org/D41556 llvm-svn: 321943	2018-01-06 20:56:27 +00:00
Florian Hahn	de10e6e064	[InlineFunction] Preserve attributes when forwarding VarArgs. Reviewers: rnk, efriedma Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D41555 llvm-svn: 321942	2018-01-06 20:46:00 +00:00
Florian Hahn	80788d8088	[InlineFunction] Inline vararg functions that do not access varargs. If the varargs are not accessed by a function, we can inline the function. Reviewers: dblaikie, chandlerc, davide, efriedma, rnk, hfinkel Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D41335 llvm-svn: 321940	2018-01-06 19:45:40 +00:00
Sanjay Patel	26a6fcde83	[InstCombine] relax use constraint for min/max (~a, ~b) --> ~min/max(a, b) In the minimal case, this won't remove instructions, but it still improves uses of existing values. In the motivating example from PR35834, it does remove instructions, and sets that case up to be optimized by something like D41603: https://reviews.llvm.org/D41603 llvm-svn: 321936	2018-01-06 17:34:22 +00:00
Sanjay Patel	f7e775291e	[InstCombine] add more tests for max(~a, ~b) and PR35834; NFC llvm-svn: 321935	2018-01-06 17:14:46 +00:00
Sanjay Patel	5a48aef3f0	[x86, MemCmpExpansion] allow 2 pairs of loads per block (PR33325) This is the last step needed to fix PR33325: https://bugs.llvm.org/show_bug.cgi?id=33325 We're trading branch and compares for loads and logic ops. This makes the code smaller and hopefully faster in most cases. The 24-byte test shows an interesting construct: we load the trailing scalar elements into vector registers and generate the same pcmpeq+movmsk code that we expected for a pair of full vector elements (see the 32- and 64-byte tests). Differential Revision: https://reviews.llvm.org/D41714 llvm-svn: 321934	2018-01-06 16:16:04 +00:00
Vedant Kumar	1f6f5f1df9	[Debugify] Handled unsized types llvm-svn: 321918	2018-01-06 00:37:01 +00:00
Sanjay Patel	5b6aacf2c1	[InstCombine] add folds for min(~a, b) --> ~max(a, b) Besides the bug of omitting the inverse transform of max(~a, ~b) --> ~min(a, b), the use checking and operand creation were off. We were potentially creating repeated identical instructions of existing values. This led to infinite looping after I added the extra folds. By using the simpler m_Not matcher and not creating new 'not' ops for a and b, we avoid that problem. It's possible that not using IsFreeToInvert() here is more limiting than the simpler matcher, but there are no tests for anything more exotic. It's also possible that we should relax the use checking further to handle a case like PR35834: https://bugs.llvm.org/show_bug.cgi?id=35834 ...but we can make that a follow-up if it is needed. llvm-svn: 321882	2018-01-05 19:01:17 +00:00
Alexey Bataev	fa13848da8	[SLP] Update more test checks, NFC. llvm-svn: 321872	2018-01-05 16:15:17 +00:00
Alexey Bataev	e565ebcdad	[SLP] Update test checks, NFC. llvm-svn: 321870	2018-01-05 15:20:40 +00:00
Alexey Bataev	988db0bd50	[SLP] Update tests checks, NFC. llvm-svn: 321869	2018-01-05 14:40:04 +00:00
Reid Kleckner	cd78ddc119	Revert "[JumpThreading] Preservation of DT and LVI across the pass" This reverts r321825, it causes crashes in Chromium. Reproducer forthcoming. llvm-svn: 321832	2018-01-04 23:23:46 +00:00
Brian M. Rzycki	cdad6c0b60	[JumpThreading] Preservation of DT and LVI across the pass Summary: See D37528 for a previous (non-deferred) version of this patch and its description. Preserves dominance in a deferred manner using a new class DeferredDominance. This reduces the performance impact of updating the DominatorTree at every edge insertion and deletion. A user may call DDT->flush() within JumpThreading for an up-to-date DT. This patch currently has one flush() at the end of runImpl() to ensure DT is preserved across the pass. LVI is also preserved to help subsequent passes such as CorrelatedValuePropagation. LVI is simpler to maintain and is done immediately (not deferred). The code to perfom the preversation was minimally altered and was simply marked as preserved for the PassManager to be informed. This extends the analysis available to JumpThreading for future enhancements. One example is loop boundary threading. Reviewers: dberlin, kuhar, sebpop Reviewed By: kuhar, sebpop Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D40146 llvm-svn: 321825	2018-01-04 21:57:32 +00:00
Bjorn Pettersson	77f3299415	Teach InlineCost about address spaces Summary: I basically copied this patch from here: https://reviews.llvm.org/D1251 But I skipped some of the refactoring to make the patch more clean. The new outer3/inner3 test case in ptr-diff.ll triggers the following assert without this patch: lib/IR/Constants.cpp:1834: static llvm::Constant llvm::ConstantExpr::getCompare(unsigned short, llvm::Constant , llvm::Constant *, bool): Assertion `C1->getType() == C2->getType() && "Op types should be identical!"' failed. The other new test cases makes sure that there is code coverage for all modifications in InlineCost.cpp (getting different values due to not fetching sizes for address space zero). I only guarantee code coverage for those tests. The tests are not written in a way that they would break if not having the corrections in InlineCost.cpp. I found it quite hard to fine tune the tests into getting different results based on the pointer sizes (except for the test case where we hit an assert if not teaching InlineCost about address spaces). Reviewers: chandlerc, arsenm, haicheng Reviewed By: arsenm Subscribers: wdng, eraman, llvm-commits, haicheng Differential Revision: https://reviews.llvm.org/D40455 llvm-svn: 321809	2018-01-04 18:23:40 +00:00
Matt Arsenault	35c4244aea	StructurizeCFG: xfail one of the testcases from r321751 It fails with -verify-region-info. This seems to be a issue with RegionInfo itself which existed before. llvm-svn: 321806	2018-01-04 17:23:24 +00:00
Sanjay Patel	c63f9014d6	[InstCombine] safely create a constant of the right type (PR35794) llvm-svn: 321801	2018-01-04 14:31:56 +00:00
Aditya Kumar	1f90cae80f	[GVNHoist] Fix: PR35222 gvn-hoist incorrectly erases load in case of a loop Reviewers: dberlin sebpop eli.friedman Differential Revision: https://reviews.llvm.org/D41453 llvm-svn: 321789	2018-01-04 07:47:24 +00:00
Philip Reames	cf524a408a	[PRE] Add a bunch of test cases for LICM-like PRE patterns These were inspired by a very old review I'm about to abandon (https://reviews.llvm.org/D7061). Several of the test cases from that worked without modification and expanding test coverage of such cases is always worthwhile. llvm-svn: 321764	2018-01-03 22:28:26 +00:00
Matt Arsenault	8070882b4e	StructurizeCFG: Fix broken backedge detection The work order was changed in r228186 from SCC order to RPO with an arbitrary sorting function. The sorting function attempted to move inner loop nodes earlier. This was was apparently relying on an assumption that every block in a given loop / the same loop depth would be seen before visiting another loop. In the broken testcase, a block outside of the loop was encountered before moving onto another block in the same loop. The testcase would then structurize such that one blocks unconditional successor could never be reached. Revert to plain RPO for the analysis phase. This fixes detecting edges as backedges that aren't really. The processing phase does use another visited set, and I'm unclear on whether the order there is as important. An arbitrary order doesn't work, and triggers some infinite loops. The reversed RPO list seems to work and is closer to the order that was used before, minus the arbitary custom sorting. A few of the changed tests now produce smaller code, and a few are slightly worse looking. llvm-svn: 321751	2018-01-03 18:45:37 +00:00
Simon Pilgrim	3bf2d64589	[InstCombine] Check for out of range shift values using APInt before calling getZExtValue Reduced from oss-fuzz #4871 test case llvm-svn: 321748	2018-01-03 18:28:20 +00:00
Dmitry Venikov	3d8cd34a5d	[InstSimplify] Missed optimization in math expression: squashing exp(log), log(exp) Summary: This patch enables folding following expressions under -ffast-math flag: exp(log(x)) -> x, exp2(log2(x)) -> x, log(exp(x)) -> x, log2(exp2(x)) -> x Reviewers: spatel, hfinkel, davide Reviewed By: spatel, hfinkel, davide Subscribers: scanon, llvm-commits Differential Revision: https://reviews.llvm.org/D41381 llvm-svn: 321710	2018-01-03 14:37:42 +00:00
Florian Hahn	dcc0ba9bbb	[InstCombine] Add test to remove VarArg casts (NFC) llvm-svn: 321706	2018-01-03 13:35:43 +00:00
Anna Thomas	bdb9430917	[BasicBlockUtils] Check for unreachable preds before updating LI in UpdateAnalysisInformation Summary: We are incorrectly updating the LI when loop-simplify generates dedicated exit blocks for a loop. The issue is that there's an implicit assumption that the Preds passed into UpdateAnalysisInformation are reachable. However, this is not true and breaks LI by incorrectly updating the header of a loop. One such case is when we generate dedicated exits when the exit block is a landing pad (through SplitLandingPadPredecessors). There maybe other cases as well, since we do not guarantee that Preds passed in are reachable basic blocks. The added test case shows how loop-simplify breaks LI for the outer loop (and DT in turn) after we try to generate the LoopSimplifyForm. Reviewers: davide, chandlerc, sanjoy Reviewed By: davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41519 llvm-svn: 321653	2018-01-02 16:25:50 +00:00
Dmitry Venikov	a58d8deb3a	[InstCombine] Missed optimization in math expression: squashing sqrt functions Summary: This patch enables folding under -ffast-math flag sqrt(a) * sqrt(b) -> sqrt(a*b) Reviewers: hfinkel, spatel, davide Reviewed By: spatel, davide Subscribers: davide, llvm-commits Differential Revision: https://reviews.llvm.org/D41322 llvm-svn: 321637	2018-01-02 05:58:11 +00:00
Simon Pilgrim	6720726d27	[ValueTracking] Don't assume shift values are in range Reduced (as best I could...) from oss-fuzz #4857 test case llvm-svn: 321634	2018-01-01 22:44:59 +00:00
Simon Pilgrim	af35f5ec1d	[InstCombine] Regenerate udiv tests. llvm-svn: 321633	2018-01-01 22:27:49 +00:00
Davide Italiano	9f074fe915	[SimplifyCFG] Stop hoisting musttail calls incorrectly. PR35774. llvm-svn: 321603	2017-12-31 16:47:16 +00:00
Philip Reames	e499bc3042	[instsimplify] consistently handle undef and out of bound indices for insertelement and extractelement In one case, we were handling out of bounds, but not undef indices. In the other, we were handling undef (with the comment making the analogy to out of bounds), but not out of bounds. Be consistent and treat both undef and constant out of bounds indices as producing undefined results. As a side effect, this also protects instcombine from having to handle large constant indices as we always simplify first. llvm-svn: 321575	2017-12-30 05:54:22 +00:00
Philip Reames	8e1abe4a7d	Add another test case for r321489 Went to reduce another fuzzer failure to find it's already been fixed, but the test case is slightly different so it's worth adding anyways. Reduced from oss-fuzz #4768 test case llvm-svn: 321573	2017-12-30 04:10:48 +00:00
Philip Reames	3e9c671923	Move tests associated with transforms moved in r321467 llvm-svn: 321572	2017-12-30 03:13:00 +00:00
Fedor Sergeev	02e7f0247b	[PM] pass -debug-pass-manager flag into FunctionToLoopPassAdaptor's canonicalization PM Summary: New pass manager driver passes DebugPM (-debug-pass-manager) flag into individual PassManager constructors in order to enable debug logging. FunctionToLoopPassAdaptor has its own internal LoopCanonicalizationPM which never gets its debug logging enabled and that means canonicalization passes like LoopSimplify are never present in -debug-pass-manager output. Extending FunctionToLoopPassAdaptor's constructor and createFunctionToLoopPassAdaptor wrapper with an optional boolean DebugLogging argument. Passing debug-logging flags there as appropriate. Reviewers: chandlerc, davide Reviewed By: davide Subscribers: mehdi_amini, eraman, llvm-commits, JDevlieghere Differential Revision: https://reviews.llvm.org/D41586 llvm-svn: 321548	2017-12-29 08:16:06 +00:00
Guozhi Wei	29697c13bc	Revert r321377, it causes regression to https://reviews.llvm.org/P8055 . llvm-svn: 321528	2017-12-28 17:02:34 +00:00
Max Kazantsev	a13e163a27	[RewriteStatepoints] Fix incorrect assertion `RewriteStatepointsForGC` iterates over function blocks and their predecessors in order of declaration. One of outcomes of this is that callsites are placed in arbitrary order which has nothing to do with travelsar order. On the other hand, function `recomputeLiveInValues` asserts that bases are added to `Info.PointerToBase` before their deried pointers are updated. But if call sites are processed in order different from RPOT, this is not necessarily true. We cannot guarantee that the base was placed there before every pointer derived from it. All we can guarantee is that this base was marked as known base by this point. This patch replaces the fact that we assert from checking that the base was added to the map with assert that the base was marked as known base. Differential Revision: https://reviews.llvm.org/D41593 llvm-svn: 321517	2017-12-28 12:03:12 +00:00
Simon Pilgrim	472689a159	[InstCombine] Check for isa<Instruction> before using cast<> Protects against casts from constexpr etc. Reduced from oss-fuzz #4788 test case llvm-svn: 321515	2017-12-28 09:35:35 +00:00
Reid Kleckner	6d31001cd6	Revert "[memcpyopt] Teach memcpyopt to optimize across basic blocks" This reverts r321138. It seems there are still underlying issues with memdep. PR35519 seems to still be present if debug info is enabled. We end up losing a memcpy. Somehow during store to memset merging, we insert the memset after the memcpy or fail to update the memdep analysis to account for the newly inserted memset of a pair. Reduced test case: #include <assert.h> #include <stdio.h> #include <string> #include <utility> #include <vector> void do_push_back( std::vector<std::pair<std::string, std::vector<std::string>>>* crls) { crls->push_back(std::make_pair(std::string(), std::vector<std::string>())); } int __attribute__((optnone)) main() { // Put some data in the vector and then remove it so we take the push_back // fast path. std::vector<std::pair<std::string, std::vector<std::string>>> crl_set; crl_set.push_back({"asdf", {}}); crl_set.pop_back(); printf("first word in vector storage: %p\n", (void)crl_set.data()); // Do the push_back which may fail to initialize the data. do_push_back(&crl_set); auto first = &crl_set.back().first; printf("first word in vector storage (should be zero): %p\n", (void*)crl_set.data()); assert(first->empty()); puts("ok"); } Compile with libc++, enable optimizations, and enable debug info: $ clang++ -stdlib=libc++ -g -O2 t.cpp -o t.exe -Wl,-rpath=llvm/build/lib This program will assert with this change. llvm-svn: 321510	2017-12-28 05:10:33 +00:00
Sanjay Patel	84d54c3d8f	[InstCombine] add tests for min/max folds (PR35717); NFC llvm-svn: 321500	2017-12-27 21:55:06 +00:00
Simon Pilgrim	e7d032f1d8	[InstCombine] Gracefully handle out of range extractelement indices InstSimplify is responsible for handling these, but we shouldn't just assert here. Reduced from oss-fuzz #4808 test case llvm-svn: 321489	2017-12-27 12:00:18 +00:00
Philip Reames	cd13a66381	[instcombine] add powi(x, 2) -> x * x llvm-svn: 321468	2017-12-27 01:30:12 +00:00
Zhaoshi Zheng	8af1e1cb78	[Unroll][DebugInfo] Propagate loop body's debug location to epilog preheader NewExit and epilog PreHeader should has the same debug loc as the original loop body, instead of original loop exit. llvm-svn: 321465	2017-12-26 23:31:21 +00:00
Sanjay Patel	14adbacd8a	[InstCombine] fix miscompile of frem with 0.0 operand (PR34870) We might want to select NAN here or do this transform with fast-math, but this should at least fix the miscompile. llvm-svn: 321461	2017-12-26 22:12:20 +00:00
Sanjay Patel	546c43fd1a	[InstCombine] add test for frem with 0.0 (PR34870); NFC llvm-svn: 321460	2017-12-26 22:06:57 +00:00
Sanjay Patel	9a39979dd2	[ValueTracking] ignore FP signed-zero when detecting a casted-to-integer fmin/fmax pattern This is a preliminary step for the patch discussed in D41136 (and denoted here with the FIXME comment). When we match an FP min/max that is cast to integer, any intermediate difference between +0.0 or -0.0 should be muted in the result by the conversion (either fptosi or fptoui) of the result. Thus, we can enable 'nsz' for the purpose of matching fmin/fmax. Note that there's probably room to generalize this more, possibly by fixing the current calls to the weak version of isKnownNonZero() in matchSelectPattern() to the more powerful recursive version. Differential Revision: https://reviews.llvm.org/D41333 llvm-svn: 321456	2017-12-26 15:09:19 +00:00
Simon Pilgrim	79c2c2f08c	[InstSimplify] Check for in range extraction index before calling APInt::getZExtValue() Reduced from oss-fuzz #4768 test case llvm-svn: 321454	2017-12-26 11:42:39 +00:00
Florian Hahn	7e9328906b	[CallSiteSplitting] Remove isOrHeader restriction. By following the single predecessors of the predecessors of the call site, we do not need to restrict the control flow. Reviewed By: junbuml, davide Differential Revision: https://reviews.llvm.org/D40729 llvm-svn: 321413	2017-12-23 20:02:26 +00:00
Guozhi Wei	33250340f4	[SimplifyCFG] Don't do if-conversion if there is a long dependence chain If after if-conversion, most of the instructions in this new BB construct a long and slow dependence chain, it may be slower than cmp/branch, even if the branch has a high miss rate, because the control dependence is transformed into data dependence, and control dependence can be speculated, and thus, the second part can execute in parallel with the first part on modern OOO processor. This patch checks for the long dependence chain, and give up if-conversion if find one. Differential Revision: https://reviews.llvm.org/D39352 llvm-svn: 321377	2017-12-22 18:54:04 +00:00
Haicheng Wu	6d14dfe8f3	[InlineCost] Find more free binary operations Currently, inline cost model considers a binary operator as free only if both its operands are constants. Some simple cases are missing such as a + 0, a - a, etc. This patch modifies visitBinaryOperator() to call SimplifyBinOp() without going through simplifyInstruction() to get rid of the constant restriction. Thus, visitAnd() and visitOr() are not needed. Differential Revision: https://reviews.llvm.org/D41494 llvm-svn: 321366	2017-12-22 17:09:09 +00:00
Eli Friedman	07d94d59a4	inline-fp.ll was moved in r321332; delete it properly. llvm-svn: 321333	2017-12-22 02:10:40 +00:00
Eli Friedman	39ed9a602b	[Inliner] Restrict soft-float inlining penalty. The penalty is currently getting applied in a bunch of places where it doesn't make sense, like bitcasts (which are free) and calls (which were getting the call penalty applied twice). Instead, just apply the penalty to binary operators and floating-point casts. While I'm here, also fix getFPOpCost() to do the right thing in more cases, so we don't have to dig into function attributes. Differential Revision: https://reviews.llvm.org/D41522 llvm-svn: 321332	2017-12-22 02:08:08 +00:00
Matthew Simpson	cb35c5d5c2	[ICP] Expose unconditional call promotion interface This patch modifies the indirect call promotion utilities by exposing and using an unconditional call promotion interface. The unconditional promotion interface (i.e., call promotion without creating an if-then-else) can be used if it's known that an indirect call has only one possible callee. The existing conditional promotion interface uses this unconditional interface to promote an indirect call after it has been versioned and placed within the "then" block. A consequence of unconditional promotion is that the fix-up operations for phi nodes in the normal destination of invoke instructions are changed. This is necessary because the existing implementation assumed that an invoke had been versioned, creating a "merge" block where a return value bitcast could be placed. In the new implementation, the edge between a promoted invoke's parent block and its normal destination is split if needed to add a bitcast for the return value. If the invoke is also versioned, the phi node merging the return value of the promoted and original invoke instructions is placed in the "merge" block. Differential Revision: https://reviews.llvm.org/D40751 llvm-svn: 321210	2017-12-20 19:26:37 +00:00
Teresa Johnson	a4ce3bfdda	[PGO] Function section hotness prefix should look at all blocks Summary: The function section prefix for PGO based layout (e.g. hot/unlikely) should look at the hotness of all blocks not just the entry BB. A function with a cold entry but a very hot loop should be placed in the hot section, for example, so that it is located close to other hot functions it may call. For SamplePGO it was already looking at the branch weights on calls, and I made that code conditional on whether this is SamplePGO since it was essentially a noop for instrumentation PGO anyway. Reviewers: davidxl Subscribers: eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D41395 llvm-svn: 321197	2017-12-20 17:53:10 +00:00
Florian Hahn	012c8f97b2	[InstCombine] Add debug location to new caller. Reviewers: rnk, aprantl, majnemer Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D414 llvm-svn: 321191	2017-12-20 17:16:59 +00:00
Mohammad Shahid	3a934d6ab9	Revert r320548:[SLP] Vectorize jumbled memory loads llvm-svn: 321181	2017-12-20 15:26:59 +00:00
Florian Hahn	467abe3e4f	[LV] Remove unnecessary DoExtraAnalysis guard (silent bug) canVectorize is only checking if the loop has a normalized pre-header if DoExtraAnalysis is true. This doesn't make sense to me because reporting analysis information shouldn't alter legality checks. This is probably the result of a last minute minor change before committing (?). Patch by Diego Caballero. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D40973 llvm-svn: 321172	2017-12-20 13:28:38 +00:00
Dan Gohman	aa3922819e	[memcpyopt] Teach memcpyopt to optimize across basic blocks This teaches memcpyopt to make a non-local memdep query when a local query indicates that the dependency is non-local. This notably allows it to eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%. This is r319482 and r319483, along with fixes for PR35519: fix the optimization that merges stores into memsets to preserve cached memdep info, and fix memdep's non-local caching strategy to not assume that larger queries are always more conservative than smaller ones. Fixes PR28958 and PR35519. Differential Revision: https://reviews.llvm.org/D40802 llvm-svn: 321138	2017-12-20 01:36:25 +00:00
Haicheng Wu	b3689cabda	[InlineCost] Skip volatile loads when looking for repeated loads This is a follow-up fix of r320814. A test case is also added. llvm-svn: 321075	2017-12-19 13:42:58 +00:00
Max Kazantsev	fd95ee0c9a	[JumpThreading] Restrict PRE across instructions that don't pass control to successors PRE in JumpThreading should not be able to hoist copy of non-speculable loads across instructions that don't always transfer execution to their successors, otherwise they may introduce an unsafe load which otherwise would not be executed. The same problem for GVN was fixed as rL316975. Differential Revision: https://reviews.llvm.org/D40347 llvm-svn: 321063	2017-12-19 09:10:21 +00:00
Ivan A. Kosarev	a80c79b5bf	[Analysis] Generate more precise TBAA tags when one access encloses the other There are cases when two tags with different base types denote accesses to the same direct or indirect member of a structure type. Currently, merging of such tags results in a tag that represents an access to an object that has the type of that member. This patch changes this so that if one of the accesses encloses the other, then the generic tag is the one of the enclosed access. Differential Revision: https://reviews.llvm.org/D39557 llvm-svn: 321019	2017-12-18 20:05:20 +00:00
Teresa Johnson	915897e21b	[PGO] Fix handling of cold entry count for instrumented PGO Summary: In r277849, getEntryCount was changed to return None when the entry count was 0, specifically for SamplePGO where it means no samples were recorded. However, for instrumentation PGO a 0 entry count should be returned directly, since it does mean that the function was completely cold. Otherwise we end up treating these functions conservatively in isFunctionEntryCold() and isColdBB(). Instead, for SamplePGO use -1 when there are no samples, and change getEntryCount to return None when the value is -1. Reviewers: danielcdh, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41307 llvm-svn: 321018	2017-12-18 20:02:43 +00:00
Xinliang David Li	19fb5b467b	[PGO] add MST min edge selection heuristic to ensure non-zero entry count Differential Revision: http://reviews.llvm.org/D41059 llvm-svn: 320998	2017-12-18 17:56:19 +00:00
Igor Laevsky	7bd3fb15e1	[TargetLibraryInfo] Discard library functions with incorrectly sized integers Differential Revision: https://reviews.llvm.org/D41184 llvm-svn: 320964	2017-12-18 10:31:58 +00:00
Hiroshi Inoue	c6faf15459	[SROA] Disable non-whole-alloca splits by default This patch introduce a switch to control splitting of non-whole-alloca slices with default off. The switch will be default on again after fixing an issue reported in PR35657. llvm-svn: 320958	2017-12-18 06:47:37 +00:00
Serguei Katkov	b0b67a8d38	[CGP] Fix the handling select inst in complex addressing mode When we put the value in select placeholder we must pass the value through simplification tracker due to the value might be already simplified and erased. This is a fix for PR35658. Reviewers: john.brawn, uabelho Reviewed By: john.brawn Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41251 llvm-svn: 320956	2017-12-18 04:25:07 +00:00
Simon Pilgrim	5f022d278b	[InstCombine] Regenerate FMUL/FMA combine tests with update_test_checks.py llvm-svn: 320922	2017-12-16 17:18:15 +00:00
Sanjay Patel	5a0cdac174	[InstCombine] canonicalize shifty abs(): ashr+add+xor --> cmp+neg+sel We want to do this for 2 reasons: 1. Value tracking does not recognize the ashr variant, so it would fail to match for cases like D39766. 2. DAGCombiner does better at producing optimal codegen when we have the cmp+sel pattern. More detail about what happens in the backend: 1. DAGCombiner has a generic transform for all targets to convert the scalar cmp+sel variant of abs into the shift variant. That is the opposite of this IR canonicalization. 2. DAGCombiner has a generic transform for all targets to convert the vector cmp+sel variant of abs into either an ABS node or the shift variant. That is again the opposite of this IR canonicalization. 3. DAGCombiner has a generic transform for all targets to convert the exact shift variants produced by #1 or #2 into an ISD::ABS node. Note: It would be an efficiency improvement if we had #1 go directly to an ABS node when that's legal/custom. 4. The pattern matching above is incomplete, so it is possible to escape the intended/optimal codegen in a variety of ways. a. For #2, the vector path is missing the case for setlt with a '1' constant. b. For #3, we are missing a match for commuted versions of the shift variants. 5. Therefore, this IR canonicalization can only help get us to the optimal codegen. The version of cmp+sel produced by this patch will be recognized in the DAG and converted to an ABS node when possible or the shift sequence when not. 6. In the following examples with this patch applied, we may get conditional moves rather than the shift produced by the generic DAGCombiner transforms. The conditional move is created using a target-specific decision for any given target. Whether it is optimal or not for a particular subtarget may be up for debate. define i32 @abs_shifty(i32 %x) { %signbit = ashr i32 %x, 31 %add = add i32 %signbit, %x %abs = xor i32 %signbit, %add ret i32 %abs } define i32 @abs_cmpsubsel(i32 %x) { %cmp = icmp slt i32 %x, zeroinitializer %sub = sub i32 zeroinitializer, %x %abs = select i1 %cmp, i32 %sub, i32 %x ret i32 %abs } define <4 x i32> @abs_shifty_vec(<4 x i32> %x) { %signbit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31> %add = add <4 x i32> %signbit, %x %abs = xor <4 x i32> %signbit, %add ret <4 x i32> %abs } define <4 x i32> @abs_cmpsubsel_vec(<4 x i32> %x) { %cmp = icmp slt <4 x i32> %x, zeroinitializer %sub = sub <4 x i32> zeroinitializer, %x %abs = select <4 x i1> %cmp, <4 x i32> %sub, <4 x i32> %x ret <4 x i32> %abs } > $ ./opt -instcombine shiftyabs.ll -S \| ./llc -o - -mtriple=x86_64 -mattr=avx > abs_shifty: > movl %edi, %eax > negl %eax > cmovll %edi, %eax > retq > > abs_cmpsubsel: > movl %edi, %eax > negl %eax > cmovll %edi, %eax > retq > > abs_shifty_vec: > vpabsd %xmm0, %xmm0 > retq > > abs_cmpsubsel_vec: > vpabsd %xmm0, %xmm0 > retq > > $ ./opt -instcombine shiftyabs.ll -S \| ./llc -o - -mtriple=aarch64 > abs_shifty: > cmp w0, #0 // =0 > cneg w0, w0, mi > ret > > abs_cmpsubsel: > cmp w0, #0 // =0 > cneg w0, w0, mi > ret > > abs_shifty_vec: > abs v0.4s, v0.4s > ret > > abs_cmpsubsel_vec: > abs v0.4s, v0.4s > ret > > $ ./opt -instcombine shiftyabs.ll -S \| ./llc -o - -mtriple=powerpc64le > abs_shifty: > srawi 4, 3, 31 > add 3, 3, 4 > xor 3, 3, 4 > blr > > abs_cmpsubsel: > srawi 4, 3, 31 > add 3, 3, 4 > xor 3, 3, 4 > blr > > abs_shifty_vec: > vspltisw 3, -16 > vspltisw 4, 15 > vsubuwm 3, 4, 3 > vsraw 3, 2, 3 > vadduwm 2, 2, 3 > xxlxor 34, 34, 35 > blr > > abs_cmpsubsel_vec: > vspltisw 3, -16 > vspltisw 4, 15 > vsubuwm 3, 4, 3 > vsraw 3, 2, 3 > vadduwm 2, 2, 3 > xxlxor 34, 34, 35 > blr > Differential Revision: https://reviews.llvm.org/D40984 llvm-svn: 320921	2017-12-16 16:41:17 +00:00
Hal Finkel	92ea8acbcd	Move Transforms/LoopVectorize/consecutive-ptr-cg-bug.ll into the X86 subdirectory This test depends on X86's TTI; move into the X86 subdirectory. llvm-svn: 320914	2017-12-16 05:10:20 +00:00
Hal Finkel	5444f40965	[LV] Extend InstWidening with CM_Widen_Recursive Changes to the original scalar loop during LV code gen cause the return value of Legal->isConsecutivePtr() to be inconsistent with the return value during legal/cost phases (further analysis and information of the bug is in D39346). This patch is an alternative fix to PR34965 following the CM_Widen approach proposed by Ayal and Gil in D39346. It extends InstWidening enum with CM_Widen_Reverse to properly record the widening decision for consecutive reverse memory accesses and, consequently, get rid of the Legal->isConsetuviePtr() call in LV code gen. I think this is a simpler/cleaner solution to PR34965 than the one in D39346. Fixes PR34965. Patch by Diego Caballero, thanks! Differential Revision: https://reviews.llvm.org/D40742 llvm-svn: 320913	2017-12-16 02:55:24 +00:00
Vitaly Buka	a5376f393e	[LTO] Make processing of combined module more consistent Summary: 1. Use stream 0 only for combined module. Previously if combined module was not processes ThinLTO used the stream for own output. However small changes in input, could trigger combined module and shuffle outputs making life of llvm::LTO harder. 2. Always process combined module and write output to stream 0. Processing empty combined module is cheap and allows llvm::LTO users to avoid implementing processing which is already done in llvm::LTO. Subscribers: mehdi_amini, inglorion, eraman, hiraditya Differential Revision: https://reviews.llvm.org/D41267 llvm-svn: 320905	2017-12-16 02:10:00 +00:00
Hal Finkel	2ff24731bb	[SimplifyLibCalls] Inline calls to cabs when it's safe to do so When unsafe algerbra is allowed calls to cabs(r) can be replaced by: sqrt(creal(r)creal(r) + cimag(r)cimag(r)) Patch by Paul Walker, thanks! Differential Revision: https://reviews.llvm.org/D40069 llvm-svn: 320901	2017-12-16 01:26:25 +00:00
Teresa Johnson	81bbf74265	[ThinLTO] Enable importing of aliases as copy of aliasee Summary: This implements a missing feature to allow importing of aliases, which was previously disabled because alias cannot be available_externally. We instead import an alias as a copy of its aliasee. Some additional work was required in the IndexBitcodeWriter for the distributed build case, to ensure that the aliasee has a value id in the distributed index file (i.e. even when it is not being imported directly). This is a performance win in codes that have many aliases, e.g. C++ applications that have many constructor and destructor aliases. Reviewers: pcc Subscribers: mehdi_amini, inglorion, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D40747 llvm-svn: 320895	2017-12-16 00:18:12 +00:00
Jun Bum Lim	44c58d35c1	Re-commit : [LICM] Allow sinking when foldable in loop This recommits r320823 reverted due to the test failure in sink-foldable.ll and an unused variable. Added "REQUIRES: aarch64-registered-target" in the test and removed unused variable. Original commit message: Continue trying to sink an instruction if its users in the loop is foldable. This will allow the instruction to be folded in the loop by decoupling it from the user outside of the loop. Reviewers: hfinkel, majnemer, davidxl, efriedma, danielcdh, bmakam, mcrosier Reviewed By: hfinkel Subscribers: javed.absar, bmakam, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D37076 llvm-svn: 320858	2017-12-15 20:33:24 +00:00
Jun Bum Lim	5efd4d8b5e	Revert "Re-commit : [LICM] Allow sinking when foldable in loop" This reverts commit r320833. llvm-svn: 320836	2017-12-15 18:12:49 +00:00
Jun Bum Lim	83ccad6684	Re-commit : [LICM] Allow sinking when foldable in loop This recommit r320823 after fixing a test failure. Original commit message: Continue trying to sink an instruction if its users in the loop is foldable. This will allow the instruction to be folded in the loop by decoupling it from the user outside of the loop. Reviewers: hfinkel, majnemer, davidxl, efriedma, danielcdh, bmakam, mcrosier Reviewed By: hfinkel Subscribers: javed.absar, bmakam, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D37076 llvm-svn: 320833	2017-12-15 17:58:59 +00:00
Jun Bum Lim	6136d87f5d	Revert "[LICM] Allow sinking when foldable in loop" This reverts commit r320823. llvm-svn: 320828	2017-12-15 16:35:09 +00:00
Jun Bum Lim	22855c26a5	[LICM] Allow sinking when foldable in loop Summary: Continue trying to sink an instruction if its users in the loop is foldable. This will allow the instruction to be folded in the loop by decoupling it from the user outside of the loop. Reviewers: hfinkel, majnemer, davidxl, efriedma, danielcdh, bmakam, mcrosier Reviewed By: hfinkel Subscribers: javed.absar, bmakam, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D37076 llvm-svn: 320823	2017-12-15 16:09:54 +00:00
Haicheng Wu	a446151552	[InlineCost] Find repeated loads in the callee SROA analysis of InlineCost can figure out that some stores can be removed after inlining and then the repeated loads clobbered by these stores are also free. This patch finds these clobbered loads and adjust the inline cost accordingly. Differential Revision: https://reviews.llvm.org/D33946 llvm-svn: 320814	2017-12-15 14:34:41 +00:00
Fedor Sergeev	4b86d79048	[PM] port Rewrite Statepoints For GC to the new pass manager. Summary: The port is nearly straightforward. The only complication is related to the analyses handling, since one of the analyses used in this module pass is domtree, which is a function analysis. That requires asking for the results of each function and disallows a single interface for run-on-module pass action. Decided to copy-paste the main body of this pass. Most of its code is requesting analyses anyway, so not that much of a copy-paste. The rest of the code movement is to transform all the implementation helper functions like stripNonValidData into non-member statics. Extended all the related LLVM tests with new-pass-manager use. No failures. Reviewers: sanjoy, anna, reames Reviewed By: anna Subscribers: skatkov, llvm-commits Differential Revision: https://reviews.llvm.org/D41162 llvm-svn: 320796	2017-12-15 09:32:11 +00:00
Serguei Katkov	67da7696a0	[SCEV] Fix the movement of insertion point in expander. PR35406. We cannot move the insertion point to header if SCEV contains div/rem operations due to they may go over check for zero denominator. Reviewers: sanjoy, mkazantsev, sebpop Reviewed By: sebpop Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41229 llvm-svn: 320789	2017-12-15 05:24:42 +00:00
Sanjay Patel	0ab0c1a201	[SimplifyCFG] don't sink common insts too soon (PR34603) This should solve: https://bugs.llvm.org/show_bug.cgi?id=34603 ...by preventing SimplifyCFG from altering redundant instructions before early-cse has a chance to run. It changes the default (canonical-forming) behavior of SimplifyCFG, so we're only doing the sinking transform later in the optimization pipeline. Differential Revision: https://reviews.llvm.org/D38566 llvm-svn: 320749	2017-12-14 22:05:20 +00:00
Guozhi Wei	d22d1b953d	[SLPVectorizer] Don't ignore scalar extraction instructions of aggregate value In SLPVectorizer, the vector build instructions (insertvalue for aggregate type) is passed to BoUpSLP.buildTree, it is treated as UserIgnoreList, so later in cost estimation, the cost of these instructions are not counted. For aggregate value, later usage are more likely to be done in scalar registers, either used as individual scalars or used as a whole for function call or return value. Ignore scalar extraction instructions may cause too aggressive vectorization for aggregate values, and slow down performance. So for vectorization of aggregate value, the scalar extraction instructions are required in cost estimation. Differential Revision: https://reviews.llvm.org/D41139 llvm-svn: 320736	2017-12-14 19:35:43 +00:00
Bjorn Pettersson	33c9d5535f	[ScalarEvolution] Fix base condition in isNormalAddRecPHI. Summary: The function is meant to recurse until it comes upon the phi it's looking for. However, with the current condition, it will recurse until it finds anything _but_ the phi. The function will even fail for simple cases like: %i = phi i32 [ %inc, %loop ], ... ... %inc = add i32 %i, 1 because the base condition will not happen when the phi is recursed to, and the recursion will end with a 'false' result since the previous instruction is a phi. Reviewers: sanjoy, atrick Reviewed By: sanjoy Subscribers: Ka-Ka, bjope, llvm-commits Committing on behalf of: Bevin Hansson (bevinh) Differential Revision: https://reviews.llvm.org/D40946 llvm-svn: 320700	2017-12-14 14:47:52 +00:00
Haicheng Wu	3739e14ab4	[InlineCost] Tracking Values through PHI Nodes This patch fix this FIXME in visitPHI() FIXME: We should potentially be tracking values through phi nodes, especially when they collapse to a single value due to deleted CFG edges during inlining. Differential Revision: https://reviews.llvm.org/D38594 llvm-svn: 320699	2017-12-14 14:36:18 +00:00
Omer Paparo Bivas	bcd4318881	Inserting several lit tests to reflect current behaviour Change-Id: I1b8188dc3c6c7c0f455715364ece7d35ef485f2f llvm-svn: 320692	2017-12-14 12:00:04 +00:00

... 6 7 8 9 10 ...

10693 Commits