llvm-project

Commit Graph

Author	SHA1	Message	Date
Guillaume Chatelet	17380227e8	[Alignment][NFC] Remove LoadInst::setAlignment(unsigned) Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, jdoerfert Subscribers: hiraditya, asbirlea, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D68142 llvm-svn: 373195	2019-09-30 09:37:05 +00:00
Aditya Kumar	a6d9d31279	[LLVM-C][Ocaml] Add MergeFunctions and DCE pass MergeFunctions and DCE pass are missing from OCaml/C-api. This patch adds them. Differential Revision: https://reviews.llvm.org/D65071 Reviewers: whitequark, hiraditya, deadalnix Reviewed By: whitequark Subscribers: llvm-commits Tags: #llvm Authored by: kren1 llvm-svn: 373170	2019-09-29 16:06:22 +00:00
Roman Lebedev	d30093bb8a	[DivRemPairs] Don't assert that we won't ever get expanded-form rem pairs in different BB's (PR43500) If we happen to have the same div in two basic blocks, and in one of those we also happen to have the rem part, we'd match the div-rem pair, but the wrong ones. So let's drop overly-ambiguous assert. Fixes https://bugs.llvm.org/show_bug.cgi?id=43500 llvm-svn: 373167	2019-09-29 15:25:24 +00:00
Alexey Bataev	8b1eeafb91	[SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!") Initially SLP vectorizer replaced all going-to-be-vectorized instructions with Undef values. It may break ScalarEvaluation and may cause a crash. Reworked SLP vectorizer so that it does not replace vectorized instructions by UndefValue anymore. Instead vectorized instructions are marked for deletion inside if BoUpSLP class and deleted upon class destruction. Reviewers: mzolotukhin, mkuper, hfinkel, RKSimon, davide, spatel Subscribers: RKSimon, Gerolf, anemet, hans, majnemer, llvm-commits, sanjoy Differential Revision: https://reviews.llvm.org/D29641 llvm-svn: 373166	2019-09-29 14:18:06 +00:00
Aditya Kumar	2adae76cc6	[NFC] Move hot cold splitting class to header file Summary: This is to facilitate unittests Reviewers: compnerd, vsk, tejohnson, sebpop, brzycki, SirishP Reviewed By: tejohnson Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68079 llvm-svn: 373151	2019-09-28 18:13:33 +00:00
Wei Mi	f0c4e70e95	[SampleFDO] Create a separate flag profile-accurate-for-symsinlist to handle profile symbol list. Currently many existing users using profile-sample-accurate want to reduce code size as much as possible. Their use cases are different from the scenario profile symbol list tries to handle -- the major motivation of adding profile symbol list is to get the major memory/code size saving without introduce performance regression. So to keep the behavior of profile-sample-accurate unchanged, we think decoupling these two things and using a new flag to control the handling of profile symbol list may be better. When profile-sample-accurate and the new flag profile-accurate-for-symsinlist are both present, since profile-sample-accurate is a user assertion we let it have a higher precedence. Differential Revision: https://reviews.llvm.org/D68047 llvm-svn: 373133	2019-09-27 22:33:59 +00:00
Roman Lebedev	269f1bea0d	[InstCombine] Simplify shift-by-sext to shift-by-zext Summary: This is valid for any `sext` bitwidth pair: ``` Processing /tmp/opt.ll.. ---------------------------------------- %signed = sext %y %r = shl %x, %signed ret %r => %unsigned = zext %y %r = shl %x, %unsigned ret %r %signed = sext %y Done: 2016 Optimization is correct! ``` (This isn't so for funnel shifts, there it's illegal for e.g. i6->i7.) Main motivation is the C++ semantics: ``` int shl(int a, char b) { return a << b; } ``` ends as ``` %3 = sext i8 %1 to i32 %4 = shl i32 %0, %3 ``` https://godbolt.org/z/0jgqUq which is, as this shows, too pessimistic. There is another problem here - we can only do the fold if sext is one-use. But we can trivially have cases where several shifts have the same sext shift amount. This should be resolved, later. Reviewers: spatel, nikic, RKSimon Reviewed By: spatel Subscribers: efriedma, hiraditya, nlopes, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68103 llvm-svn: 373106	2019-09-27 18:12:15 +00:00
Simon Pilgrim	2e0de86808	ModuleUtils - silence static analyzer dyn_cast<> null dereference warning. NFCI. The static analyzer is warning about a potential null dereference, but we should be able to use cast<> directly and if not assert will fire for us. llvm-svn: 373099	2019-09-27 16:55:49 +00:00
Simon Pilgrim	f71f23d14d	FunctionImportGlobalProcessing::processGlobalForThinLTO - silence static analyzer dyn_cast<FunctionSummary> null dereference warning. NFCI. The static analyzer is warning about a potential null dereference, but we should be able to use cast<FunctionSummary> directly and if not assert will fire for us. llvm-svn: 373097	2019-09-27 15:49:19 +00:00
Simon Pilgrim	623b0e6963	SCCP - silence static analyzer dyn_cast<StructType> null dereference warning. NFCI. The static analyzer is warning about a potential null dereference, but we should be able to use cast<StructType> directly and if not assert will fire for us. llvm-svn: 373095	2019-09-27 15:49:10 +00:00
Guillaume Chatelet	18f805a7ea	[Alignment][NFC] Remove unneeded llvm:: scoping on Align types llvm-svn: 373081	2019-09-27 12:54:21 +00:00
Guillaume Chatelet	d886f391af	[Alignment][NFC] MaybeAlign in GVNExpression Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67922 llvm-svn: 373054	2019-09-27 08:56:43 +00:00
Peter Collingbourne	c336557f02	hwasan: Compatibility fixes for short granules. We can't use short granules with stack instrumentation when targeting older API levels because the rest of the system won't understand the short granule tags stored in shadow memory. Moreover, we need to be able to let old binaries (which won't understand short granule tags) run on a new system that supports short granule tags. Such binaries will call the __hwasan_tag_mismatch function when their outlined checks fail. We can compensate for the binary's lack of support for short granules by implementing the short granule part of the check in the __hwasan_tag_mismatch function. Unfortunately we can't do anything about inline checks, but I don't believe that we can generate these by default on aarch64, nor did we do so when the ABI was fixed. A new function, __hwasan_tag_mismatch_v2, is introduced that lets code targeting the new runtime avoid redoing the short granule check. Because tag mismatches are rare this isn't important from a performance perspective; the main benefit is that it introduces a symbol dependency that prevents binaries targeting the new runtime from running on older (i.e. incompatible) runtimes. Differential Revision: https://reviews.llvm.org/D68059 llvm-svn: 373035	2019-09-27 01:02:10 +00:00
Jordan Rupprecht	f98d2c099a	Revert [SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!") This reverts r372626 (git commit `6a278d9073`) llvm-svn: 373019	2019-09-26 22:09:17 +00:00
Kit Barton	50bc610460	[LoopFusion] Add ability to fuse guarded loops Summary: This patch extends the current capabilities in loop fusion to fuse guarded loops (as defined in https://reviews.llvm.org/D63885). The patch adds the necessary safety checks to ensure that it safe to fuse the guarded loops (control flow equivalent, no intervening code, and same guard conditions). It also provides an alternative method to perform the actual fusion of guarded loops. The mechanics to fuse guarded loops are slightly different then fusing non-guarded loops, so I opted to keep them separate methods. I will be cleaning this up in later patches, and hope to converge on a single method to fuse both guarded and non-guarded loops, but for now I think the review will be easier to keep them separate. Reviewers: jdoerfert, Meinersbur, dmgreen, etiotto, Whitney Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65464 llvm-svn: 373018	2019-09-26 21:42:45 +00:00
Zhaoshi Zheng	1128fa0924	[Unroll] Do NOT unroll a loop with small runtime upperbound For a runtime loop if we can compute its trip count upperbound: Don't unroll if: 1. loop is not guaranteed to run either zero or upperbound iterations; and 2. trip count upperbound is less than UnrollMaxUpperBound Unless user or TTI asked to do so. If unrolling, limit unroll factor to loop's trip count upperbound. Differential Revision: https://reviews.llvm.org/D62989 Change-Id: I6083c46a9d98b2e22cd855e60523fdc5a4929c73 llvm-svn: 373017	2019-09-26 21:40:27 +00:00
Craig Topper	46721bb7f5	[InstCombine] Use m_Zero instead of isNullValue() when checking if a GEP index is all zeroes to prevent an infinite loop. The test case here previously infinite looped. Only one element from the GEP is used so SimplifyDemandedVectorElts would replace the other lanes in each index with undef leading to the first index being <0, undef, undef, undef>. But there's a GEP transform that tries to replace an index into a 0 sized type with a zero index. But the zero index check only works on ConstantInt 0 or ConstantAggregateZero so it would turn the index back to zeroinitializer. Resulting in a loop. The fix is to use m_Zero() to allow a vector of zeroes and undefs. Differential Revision: https://reviews.llvm.org/D67977 llvm-svn: 373000	2019-09-26 17:20:50 +00:00
Jakub Kuderski	d98cb81cd1	Handle successor's PHI node correctly when flattening CFG merges two if-regions Summary: FlattenCFG merges two 'if' basicblocks by inserting one basicblock to another basicblock. The inserted basicblock can have a successor that contains a PHI node whoes incoming basicblock is the inserted basicblock. Since the existing code does not handle it, it becomes a badref. if (cond1) statement if (cond2) statement successor - contains PHI node whose predecessor is cond2 --> if (cond1 \|\| cond2) statement (BB for cond2 was deleted) successor - contains PHI node whose predecessor is cond2 --> bad ref! Author: Jaebaek Seo Reviewers: asbirlea, kuhar, tstellar, chandlerc, davide, dexonsmith Reviewed By: kuhar Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68032 llvm-svn: 372989	2019-09-26 15:20:17 +00:00
Simon Pilgrim	c15cd009ac	[FlattenCFG] Silence static analyzer dyn_cast<BranchInst> null dereference warnings. NFCI. The static analyzer is warning about a potential null dereferences, but we should be able to use cast<BranchInst> directly and if not assert will fire for us. llvm-svn: 372977	2019-09-26 13:33:15 +00:00
Bjorn Pettersson	163c54d288	[InstCombine] Don't assume CmpInst has been visited in getFlippedStrictnessPredicateAndConstant Summary: Removing an assumption (assert) that the CmpInst already has been simplified in getFlippedStrictnessPredicateAndConstant. Solution is to simply bail out instead of hitting the assertion. Instead we assume that any profitable rewrite will happen in the next iteration of InstCombine. The reason why we can't assume that the CmpInst already has been simplified is that the worklist does not guarantee such an ordering. Solves https://bugs.llvm.org/show_bug.cgi?id=43376 Reviewers: spatel, lebedev.ri Reviewed By: lebedev.ri Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68022 llvm-svn: 372972	2019-09-26 12:16:01 +00:00
Simon Pilgrim	6b794dfd3d	MemorySanitizer - silence static analyzer dyn_cast<> null dereference warnings. NFCI. The static analyzer is warning about a potential null dereferences, but we should be able to use cast<> directly and if not assert will fire for us. llvm-svn: 372960	2019-09-26 10:56:14 +00:00
Simon Pilgrim	faa5b39e4e	PGOMemOPSizeOpt - silence static analyzer dyn_cast<MemIntrinsic> null dereference warning. NFCI. The static analyzer is warning about a potential null dereference, but we should be able to use cast<MemIntrinsic> directly and if not assert will fire for us. llvm-svn: 372959	2019-09-26 10:56:07 +00:00
Roman Lebedev	a2fa03af3a	[InstCombine] foldUnsignedUnderflowCheck(): one last pattern with 'sub' (PR43251) https://rise4fun.com/Alive/0j9 llvm-svn: 372930	2019-09-25 22:59:59 +00:00
Eli Friedman	69dddfe268	[LICM] Don't verify domtree/loopinfo unless EXPENSIVE_CHECKS is enabled. For large functions, verifying the whole function after each loop takes non-linear time. Differential Revision: https://reviews.llvm.org/D67571 llvm-svn: 372924	2019-09-25 22:35:47 +00:00
Roman Lebedev	23646952e2	[InstCombine] Fold (A - B) u>=/u< A --> B u>/u<= A iff B != 0 https://rise4fun.com/Alive/KtL This also shows that the fold added in D67412 / r372257 was too specific, and the new fold allows those test cases to be handled more generically, therefore i delete now-dead code. This is yet again motivated by D67122 "[UBSan][clang][compiler-rt] Applying non-zero offset to nullptr is undefined behaviour" llvm-svn: 372912	2019-09-25 19:06:40 +00:00
Florian Hahn	f3ab99dcf8	[InstCombine] Limit FMul constant folding for fma simplifications. As @reames pointed out post-commit, rL371518 adds additional rounding in some cases, when doing constant folding of the multiplication. This breaks a guarantee llvm.fma makes and must be avoided. This patch reapplies rL371518, but splits off the simplifications not requiring rounding from SimplifFMulInst as SimplifyFMAFMul. Reviewers: spatel, lebedev.ri, reames, scanon Reviewed By: reames Differential Revision: https://reviews.llvm.org/D67434 llvm-svn: 372899	2019-09-25 17:03:20 +00:00
Florian Hahn	5c3bc3c930	[PatternMatch] Make m_Br more flexible, add matchers for BB values. Currently m_Br only takes references to BasicBlock*, which limits its flexibility. For example, you have to declare a variable, even if you ignore the result or you have to have additional checks to make sure the matched BB matches an expected one. This patch adds m_BasicBlock and m_SpecificBB matchers, which can be used like the existing matchers for constants or values. I also had a look at the existing uses and updated a few. IMO it makes the code a bit more explicit. Reviewers: spatel, craig.topper, RKSimon, majnemer, lebedev.ri Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D68013 llvm-svn: 372885	2019-09-25 15:05:08 +00:00
Philip Reames	d9629b88ff	[GCRelocate] Add a peephole to canonicalize base pointer relocation If we generate the gc.relocate, and then later prove two arguments to the statepoint are equivalent, we should canonicalize the gc.relocate to the form we would have produced if this had been known before rewriting. llvm-svn: 372771	2019-09-24 17:24:16 +00:00
Roman Lebedev	45fd1e9d50	[InstCombine] (a+b) < a && (a+b) != 0 -> (0-b) < a iff a/b != 0 (PR43259) Summary: This is again motivated by D67122 sanitizer check enhancement. That patch seemingly worsens `-fsanitize=pointer-overflow` overhead from 25% to 50%, which strongly implies missing folds. For ``` #include <cassert> char* test(char& base, signed long offset) { __builtin_assume(offset < 0); return &base + offset; } ``` We produce https://godbolt.org/z/r40U47 and again those two icmp's can be merged: ``` Name: 0 Pre: C != 0 %adjusted = add i8 %base, C %not_null = icmp ne i8 %adjusted, 0 %no_underflow = icmp ult i8 %adjusted, %base %r = and i1 %not_null, %no_underflow => %neg_offset = sub i8 0, C %r = icmp ugt i8 %base, %neg_offset ``` https://rise4fun.com/Alive/ALap https://rise4fun.com/Alive/slnN There are 3 other variants of this pattern, i believe they all will go into InstSimplify. https://bugs.llvm.org/show_bug.cgi?id=43259 Reviewers: spatel, xbolva00, nikic Reviewed By: spatel Subscribers: efriedma, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67849 llvm-svn: 372768	2019-09-24 16:10:50 +00:00
Roman Lebedev	5b881f356c	[InstCombine] (a+b) <= a && (a+b) != 0 -> (0-b) < a (PR43259) Summary: This is again motivated by D67122 sanitizer check enhancement. That patch seemingly worsens `-fsanitize=pointer-overflow` overhead from 25% to 50%, which strongly implies missing folds. This pattern isn't exactly what we get there (strict vs. non-strict predicate), but this pattern does not require known-bits analysis, so it is best to handle it first. ``` Name: 0 %adjusted = add i8 %base, %offset %not_null = icmp ne i8 %adjusted, 0 %no_underflow = icmp ule i8 %adjusted, %base %r = and i1 %not_null, %no_underflow => %neg_offset = sub i8 0, %offset %r = icmp ugt i8 %base, %neg_offset ``` https://rise4fun.com/Alive/knp There are 3 other variants of this pattern, they all will go into InstSimplify: https://rise4fun.com/Alive/bIDZ https://bugs.llvm.org/show_bug.cgi?id=43259 Reviewers: spatel, xbolva00, nikic Reviewed By: spatel Subscribers: hiraditya, majnemer, vsk, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67846 llvm-svn: 372767	2019-09-24 16:10:38 +00:00
Simon Pilgrim	934f18144d	LoopVectorize - silence static analyzer dyn_cast<CmpInst> null dereference warning. NFCI. The static analyzer is warning about a potential null dereference, but we should be able to use cast<CmpInst> directly and if not assert will fire for us. llvm-svn: 372732	2019-09-24 11:27:38 +00:00
Simon Pilgrim	b6d11def37	[SimplifyCFG] FoldTwoEntryPHINode - silence static analyzer null dereference warning. NFCI. Assert that we've found the DomBlock. llvm-svn: 372728	2019-09-24 11:17:20 +00:00
Simon Pilgrim	9e8076b219	SimplifyCFG - silence static analyzer dyn_cast<LandingPadInst> null dereference warning. NFCI. The static analyzer is warning about a potential null dereference, but we should be able to use cast<LandingPadInst> directly and if not assert will fire for us. llvm-svn: 372727	2019-09-24 11:17:13 +00:00
Simon Pilgrim	bc58230e29	SimplifyCFG - silence static analyzer dyn_cast<Instruction> null dereference warning. NFCI. The static analyzer is warning about a potential null dereference, but we should be able to use cast<Instruction> directly and if not assert will fire for us. llvm-svn: 372726	2019-09-24 11:17:06 +00:00
Alexey Lapshin	49f3c2b604	[Debuginfo] dbg.value points to undef value after Induction Variable Simplification. Induction Variable Simplification pass does not update dbg.value intrinsic. Before: %add = add nuw nsw i32 %ArgIndex.06, 1 call void @llvm.dbg.value(metadata i32 %add, metadata !17, metadata !DIExpression()) After: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 call void @llvm.dbg.value(metadata i64 undef, metadata !17, metadata !DIExpression()) There should be: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 call void @llvm.dbg.value(metadata i64 %indvars.iv.next, metadata !17, metadata !DIExpression()) Differential Revision: https://reviews.llvm.org/D67770 llvm-svn: 372703	2019-09-24 08:47:03 +00:00
Sjoerd Meijer	0fcb3afb40	[LV] Forced vectorization with runtime checks and OptForSize When vectorisation is forced with a pragma, we optimise for min size, and we need to emit runtime memory checks, then allow this code growth and don't run in an assert like we currently do. This is the result of D65197 and D66803, and was a use-case not really considered before. If this now happens, we emit an optimisation remark warning about the code-size expansion, which can be avoided by not forcing vectorisation or possibly source-code modifications. Differential Revision: https://reviews.llvm.org/D67764 llvm-svn: 372694	2019-09-24 08:03:34 +00:00
Huihui Zhang	a4dd98f2e9	[InstCombine] Fold a shifty implementation of clamp-to-allones. Summary: Fold or(ashr(subNSW(Y, X), ScalarSizeInBits(Y)-1), X) into X s> Y ? -1 : X https://rise4fun.com/Alive/d8Ab clamp255 is a common operator in image processing, can be implemented in a shifty way "(255 - X) >> 31 \| X & 255". Fold shift into select enables more optimization, e.g., vmin generation for ARM target. Reviewers: lebedev.ri, efriedma, spatel, kparzysz, bcahoon Reviewed By: lebedev.ri Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67800 llvm-svn: 372678	2019-09-24 00:30:09 +00:00
Huihui Zhang	8952199715	[InstCombine] Fold a shifty implementation of clamp-to-zero. Summary: Fold and(ashr(subNSW(Y, X), ScalarSizeInBits(Y)-1), X) into X s> Y ? X : 0 https://rise4fun.com/Alive/lFH Fold shift into select enables more optimization, e.g., vmax generation for ARM target. Reviewers: lebedev.ri, efriedma, spatel, kparzysz, bcahoon Reviewed By: lebedev.ri Subscribers: xbolva00, andreadb, craig.topper, RKSimon, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67799 llvm-svn: 372676	2019-09-24 00:15:03 +00:00
Saleem Abdulrasool	082f895b1a	HotColdSplitting: invalidate the AssumptionCache on split When a cold path is outlined, the value tracking in the assumption cache may be invalidated due to the code motion. We would previously trip an assertion in subsequent passes (but required the passes to happen in a single run as the assumption cache is shared across the passes). Invalidating the cache ensures that we get the correct information when needed with the legacy pass manager as well. llvm-svn: 372667	2019-09-23 22:23:01 +00:00
Wei Mi	22fd88530b	[SampleFDO] Treat names in profile as not cold only when profile symbol list is available In rL372232, we treated names showing up in profile as not cold when profile-sample-accurate is enabled. This caused 70k size regression in Chrome/Android. The patch put a guard and only enable the change when profile symbol list is available, i.e., keep the old behavior when profile symbol list is not available. Differential Revision: https://reviews.llvm.org/D67931 llvm-svn: 372665	2019-09-23 22:11:35 +00:00
Roman Lebedev	23aac95a32	[InstCombine] foldOrOfICmps(): Acquire SimplifyQuery with set CxtI Extracted from https://reviews.llvm.org/D67849#inline-610377 llvm-svn: 372654	2019-09-23 20:40:47 +00:00
Roman Lebedev	595cfda059	[InstCombine] foldAndOfICmps(): Acquire SimplifyQuery with set CxtI Extracted from https://reviews.llvm.org/D67849#inline-610377 llvm-svn: 372653	2019-09-23 20:40:40 +00:00
David Bolvansky	48db0272d6	[InstCombine] Annotate strndup calls with dereferenceable_or_null "Implementations are free to malloc() a buffer containing either (size + 1) bytes or (strnlen(s, size) + 1) bytes. Applications should not assume that strndup() will allocate (size + 1) bytes when strlen(s) is smaller than size." llvm-svn: 372647	2019-09-23 19:55:45 +00:00
Roman Lebedev	47e1ce4abe	[IR] Add getExtendedType() to IntegerType and Type (dispatching to IntegerType or VectorType) llvm-svn: 372638	2019-09-23 18:21:33 +00:00
Roman Lebedev	1972327d63	[InstCombine] dropRedundantMaskingOfLeftShiftInput(): improve comment llvm-svn: 372637	2019-09-23 18:21:14 +00:00
David Bolvansky	8d52016155	[SLC] Convert some strndup calls to strdup calls Summary: Motivation: - If we can fold it to strdup, we should (strndup does more things than strdup). - Annotation mechanism. (Works for strdup well). strdup and strndup are part of C 20 (currently posix fns), so we should optimize them. Reviewers: efriedma, jdoerfert Reviewed By: jdoerfert Subscribers: lebedev.ri, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67679 llvm-svn: 372636	2019-09-23 18:20:01 +00:00
Roman Lebedev	0a51e1f66d	[InstCombine] dropRedundantMaskingOfLeftShiftInput(): pat. c/d/e with mask (PR42563) Summary: If we have a pattern `(x & (-1 >> maskNbits)) << shiftNbits`, we already know (have a fold) that will drop the `& (-1 >> maskNbits)` mask iff `(shiftNbits-maskNbits) s>= 0` (i.e. `shiftNbits u>= maskNbits`). So even if `(shiftNbits-maskNbits) s< 0`, we can still fold, we will just need to apply a constant mask afterwards: ``` Name: c, normal+mask %t0 = lshr i32 -1, C1 %t1 = and i32 %t0, %x %r = shl i32 %t1, C2 => %n0 = shl i32 %x, C2 %n1 = i32 ((-(C2-C1))+32) %n2 = zext i32 %n1 to i64 %n3 = lshr i64 -1, %n2 %n4 = trunc i64 %n3 to i32 %r = and i32 %n0, %n4 ``` https://rise4fun.com/Alive/gslRa Naturally, old `%masked` will have to be one-use. This is not valid for pattern f - where "masking" is done via `ashr`. https://bugs.llvm.org/show_bug.cgi?id=42563 Reviewers: spatel, nikic, xbolva00 Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67725 llvm-svn: 372630	2019-09-23 17:04:28 +00:00
Roman Lebedev	b4a1d8a84c	[InstCombine] dropRedundantMaskingOfLeftShiftInput(): pat. a/b with mask (PR42563) Summary: And this is finally the interesting part of that fold! If we have a pattern `(x & (~(-1 << maskNbits))) << shiftNbits`, we already know (have a fold) that will drop the `& (~(-1 << maskNbits))` mask iff `(maskNbits+shiftNbits) u>= bitwidth(x)`. But that is actually ignorant, there's more general fold here: In this pattern, `(maskNbits+shiftNbits)` actually correlates with the number of low bits that will remain in the final value. So even if `(maskNbits+shiftNbits) u< bitwidth(x)`, we can still fold, we will just need to apply a constant mask afterwards: ``` Name: a, normal+mask %onebit = shl i32 -1, C1 %mask = xor i32 %onebit, -1 %masked = and i32 %mask, %x %r = shl i32 %masked, C2 => %n0 = shl i32 %x, C2 %n1 = add i32 C1, C2 %n2 = zext i32 %n1 to i64 %n3 = shl i64 -1, %n2 %n4 = xor i64 %n3, -1 %n5 = trunc i64 %n4 to i32 %r = and i32 %n0, %n5 ``` https://rise4fun.com/Alive/F5R Naturally, old `%masked` will have to be one-use. Similar fold exists for patterns c,d,e, will post patch later. https://bugs.llvm.org/show_bug.cgi?id=42563 Reviewers: spatel, nikic, xbolva00 Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67677 llvm-svn: 372629	2019-09-23 17:04:14 +00:00
Alexey Bataev	6a278d9073	[SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!") Summary: Initially SLP vectorizer replaced all going-to-be-vectorized instructions with Undef values. It may break ScalarEvaluation and may cause a crash. Reworked SLP vectorizer so that it does not replace vectorized instructions by UndefValue anymore. Instead vectorized instructions are marked for deletion inside if BoUpSLP class and deleted upon class destruction. Reviewers: mzolotukhin, mkuper, hfinkel, RKSimon, davide, spatel Subscribers: RKSimon, Gerolf, anemet, hans, majnemer, llvm-commits, sanjoy Differential Revision: https://reviews.llvm.org/D29641 llvm-svn: 372626	2019-09-23 16:25:03 +00:00
Roman Lebedev	01ac23ca62	[InstCombine] foldUnsignedUnderflowCheck(): s/Subtracted/ZeroCmpOp/ llvm-svn: 372625	2019-09-23 16:04:32 +00:00
David Bolvansky	d90fd41f7e	[FunctionAttrs] Enable nonnull arg propagation Enable flag introduced in rL294998. Security concerns are no longer valid, since function signatures for mentioned libc functions has no nonnull attribute (Clang does not generate them? I see no nonnull attr in LLVM IR for these functions) and since rL372091 we carefully annotate the callsites where we know that size is static, non zero. So let's enable this flag again.. llvm-svn: 372573	2019-09-23 09:58:02 +00:00
Simon Pilgrim	2441455bc8	[LSR] Silence static analyzer null dereference warnings with assertions. NFCI. Add assertions to make it clear that GenerateIVChain / NarrowSearchSpaceByPickingWinnerRegs should succeed in finding non-null values llvm-svn: 372518	2019-09-22 17:59:24 +00:00
Simon Pilgrim	db05a482bc	ConstantHoisting - Silence static analyzer dyn_cast<PointerType> null dereference warning. NFCI. llvm-svn: 372517	2019-09-22 17:45:05 +00:00
Sanjay Patel	eb8d39e113	[InstCombine] allow icmp+binop folds before min/max bailout (PR43310) This has the potential to uncover missed analysis/folds as shown in the min/max code comment/test, but fewer restrictions on icmp folds should be better in general to solve cases like: https://bugs.llvm.org/show_bug.cgi?id=43310 llvm-svn: 372510	2019-09-22 14:31:53 +00:00
Simon Pilgrim	a56bd6c51e	[VPlan] Silence static analyzer dyn_cast null dereference warning. NFCI. llvm-svn: 372502	2019-09-22 13:02:00 +00:00
Suyog Sarda	cd629ea0a8	SROA: Check Total Bits of vector type While Promoting alloca instruction of Vector Type, Check total size in bits of its slices too. If they don't match, don't promote the alloca instruction. Bug : https://bugs.llvm.org/show_bug.cgi?id=42585 llvm-svn: 372480	2019-09-21 18:16:37 +00:00
Suyog Sarda	c62136e674	Test mail. NFC. Testing commit acces. NFC. llvm-svn: 372479	2019-09-21 18:03:30 +00:00
Hideto Ueno	63f6066b53	[Attributor] Implement "norecurse" function attribute deduction Summary: This patch introduces `norecurse` function attribute deduction. `norecurse` will be deduced if the following conditions hold: * The size of SCC in which the function belongs equals to 1. * The function doesn't have self-recursion. * We have `norecurse` for all call site. To avoid a large change, SCC is calculated using scc_iterator in InfoCache initialization for now. Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67751 llvm-svn: 372475	2019-09-21 15:13:19 +00:00
Simon Pilgrim	2e0c95edfe	[AddressSanitizer] Don't dereference dyn_cast<ConstantInt> results. NFCI. The static analyzer is warning about potential null dereference, but we can use cast<ConstantInt> directly and if not assert will fire for us. llvm-svn: 372429	2019-09-20 20:52:21 +00:00
Akira Hatanaka	75fbb171c3	[ObjC][ARC] Skip debug instructions when computing the insert point of objc_release calls This fixes a bug where the presence of debug instructions would cause ARC optimizer to change the order of retain and release calls. rdar://problem/55319419 llvm-svn: 372352	2019-09-19 20:58:51 +00:00
Jakub Kuderski	e6b2164723	Don't use invalidated iterators in FlattenCFGPass Summary: FlattenCFG may erase unnecessary blocks, which also invalidates iterators to those erased blocks. Before this patch, `iterativelyFlattenCFG` could try to increment a BB iterator after that BB has been removed and crash. This patch makes FlattenCFGPass use `WeakVH` to skip over erased blocks. Reviewers: dblaikie, tstellar, davide, sanjoy, asbirlea, grosser Reviewed By: asbirlea Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67672 llvm-svn: 372347	2019-09-19 19:39:42 +00:00
Roman Lebedev	7a67ed5795	[InstCombine] Simplify @llvm.usub.with.overflow+non-zero check (PR43251) Summary: This is again motivated by D67122 sanitizer check enhancement. That patch seemingly worsens `-fsanitize=pointer-overflow` overhead from 25% to 50%, which strongly implies missing folds. In this particular case, given ``` char* test(char& base, unsigned long offset) { return &base - offset; } ``` it will end up producing something like https://godbolt.org/z/luGEju which after optimizations reduces down to roughly ``` declare void @use64(i64) define i1 @test(i8* dereferenceable(1) %base, i64 %offset) { %base_int = ptrtoint i8* %base to i64 %adjusted = sub i64 %base_int, %offset call void @use64(i64 %adjusted) %not_null = icmp ne i64 %adjusted, 0 %no_underflow = icmp ule i64 %adjusted, %base_int %no_underflow_and_not_null = and i1 %not_null, %no_underflow ret i1 %no_underflow_and_not_null } ``` Without D67122 there was no `%not_null`, and in this particular case we can "get rid of it", by merging two checks: Here we are checking: `Base u>= Offset && (Base u- Offset) != 0`, but that is simply `Base u> Offset` Alive proofs: https://rise4fun.com/Alive/QOs The `@llvm.usub.with.overflow` pattern itself is not handled here because this is the main pattern, that we currently consider canonical. https://bugs.llvm.org/show_bug.cgi?id=43251 Reviewers: spatel, nikic, xbolva00, majnemer Reviewed By: xbolva00, majnemer Subscribers: vsk, majnemer, xbolva00, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67356 llvm-svn: 372341	2019-09-19 17:25:19 +00:00
Sanjay Patel	13e71ce693	[Float2Int] avoid crashing on unreachable code (PR38502) In the example from: https://bugs.llvm.org/show_bug.cgi?id=38502 ...we hit infinite looping/crashing because we have non-standard IR - an instruction operand is used before defined. This and other unusual constructs are allowed in unreachable blocks, so avoid the problem by using DominatorTree to step around landmines. Differential Revision: https://reviews.llvm.org/D67766 llvm-svn: 372339	2019-09-19 16:31:17 +00:00
Serguei Katkov	a44768858c	[Unroll] Add an option to control complete unrolling Add an ability to specify the max full unroll count for LoopUnrollPass pass in pass options. Reviewers: fhahn, fedor.sergeev Reviewed By: fedor.sergeev Subscribers: hiraditya, zzheng, dmgreen, llvm-commits Differential Revision: https://reviews.llvm.org/D67701 llvm-svn: 372305	2019-09-19 06:57:29 +00:00
Roman Lebedev	feea722cf3	[SimplifyCFG] mergeConditionalStoreToAddress(): try to pacify MSAN MSAN bot complains that there is use-of-uninitialized-value of this FreeStores later in IsWorthwhile(). Perhaps FreeStores needs to be stored in a vector? llvm-svn: 372262	2019-09-18 21:04:39 +00:00
Roman Lebedev	b646dd92c2	[InstCombine] foldUnsignedUnderflowCheck(): handle last few cases (PR43251) Summary: I don't have a direct motivational case for this, but it would be good to have this for completeness/symmetry. This pattern is basically the motivational pattern from https://bugs.llvm.org/show_bug.cgi?id=43251 but with different predicate that requires that the offset is non-zero. The completeness bit comes from the fact that a similar pattern (offset != zero) will be needed for https://bugs.llvm.org/show_bug.cgi?id=43259, so it'd seem to be good to not overlook very similar patterns.. Proofs: https://rise4fun.com/Alive/21b Also, there is something odd with `isKnownNonZero()`, if the non-zero knowledge was specified as an assumption, it didn't pick it up (PR43267) With this, i see no other missing folds for https://bugs.llvm.org/show_bug.cgi?id=43251 Reviewers: spatel, nikic, xbolva00 Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67412 llvm-svn: 372257	2019-09-18 20:10:07 +00:00
Roman Lebedev	dd0170ab24	[SimplifyCFG] mergeConditionalStoreToAddress(): consider cost, not instruction count Summary: As it can be see in the changed test, while `div` is really costly, we were speculating it. This does not seem correct. Also, the old code would run for every single insturuction in BB, instead of eagerly bailing out as soon as there are too many instructions. This function still has a problem that `PHINodeFoldingThreshold` is per-basic-block, while it should be for all the basic blocks. Reviewers: efriedma, craig.topper, dmgreen, jmolloy Reviewed By: jmolloy Subscribers: xbolva00, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67315 llvm-svn: 372255	2019-09-18 19:46:57 +00:00
Roman Lebedev	ba4cad9039	[InstCombine] dropRedundantMaskingOfLeftShiftInput(): some cleanup before upcoming patch llvm-svn: 372245	2019-09-18 18:38:40 +00:00
Wei Mi	5f7e822dc7	[SampleFDO] Minimize performance impact when profile-sample-accurate is enabled. We can save memory and reduce binary size significantly by enabling ProfileSampleAccurate. However when ProfileSampleAccurate is true, function without sample will be regarded as cold and this could potentially cause performance regression. To minimize the potential negative performance impact, we want to be a little conservative here saying if a function shows up in the profile, no matter as outline instance, inline instance or call targets, treat the function as not being cold. This will handle the cases such as most callsites of a function are inlined in sampled binary (thus outline copy don't get any sample) but not inlined in current build (because of source code drift, imprecise debug information, or the callsites are all cold individually but not cold accumulatively...), so that the outline function showing up as cold in sampled binary will actually not be cold after current build. After the change, such function will be treated as not cold even profile-sample-accurate is enabled. At the same time we lower the hot criteria of callsiteIsHot check when profile-sample-accurate is enabled. callsiteIsHot is used to determined whether a callsite is hot and qualified for early inlining. When profile-sample-accurate is enabled, functions without profile will be regarded as cold and much less inlining will happen in CGSCC inlining pass, so we can worry less about size increase and be aggressive to allow more early inlining to happen for warm callsites and it is helpful for performance overall. Differential Revision: https://reviews.llvm.org/D67561 llvm-svn: 372232	2019-09-18 16:06:28 +00:00
Sanjay Patel	d46bf63fbb	[SimplifyLibCalls] fix crash with empty function name (PR43347) ...and improve some variable names while here. https://bugs.llvm.org/show_bug.cgi?id=43347 llvm-svn: 372227	2019-09-18 14:33:40 +00:00
Teresa Johnson	fd2044f299	[PGO] Change hardcoded thresholds for cold/inlinehint to use summary Summary: The PGO counter reading will add cold and inlinehint (hot) attributes to functions that are very cold or hot. This was using hardcoded thresholds, instead of the profile summary cutoffs which are used in other hot/cold detection and are more dynamic and adaptable. Switch to using the summary-based cold/hot detection. The hardcoded limits were causing some code that had a medium level of hotness (per the summary) to be incorrectly marked with a cold attribute, blocking inlining. Reviewers: davidxl Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67673 llvm-svn: 372189	2019-09-17 23:12:13 +00:00
Reid Kleckner	23e872a3d0	[PGO] Don't use comdat groups for counters & data on COFF For COFF, a comdat group is really a symbol marked IMAGE_COMDAT_SELECT_ANY and zero or more other symbols marked IMAGE_COMDAT_SELECT_ASSOCIATIVE. Typically the associative symbols in the group are not external and are not referenced by other TUs, they are things like debug info, C++ dynamic initializers, or other section registration schemes. The Visual C++ linker reports a duplicate symbol error for symbols marked IMAGE_COMDAT_SELECT_ASSOCIATIVE even if they would be discarded after handling the leader symbol. Fixes coverage-inline.cpp in check-profile after r372020. llvm-svn: 372182	2019-09-17 21:10:49 +00:00
Roman Lebedev	97bc5ae993	[NFC][InstCombine] dropRedundantMaskingOfLeftShiftInput(): some NFC diff shaving llvm-svn: 372171	2019-09-17 19:32:26 +00:00
David Bolvansky	0c0de794f1	Reland "[SLC] Preserve attrs for strncpy(x, "", y) -> memset(align 1 x, '\0', y)" llvm-svn: 372142	2019-09-17 17:12:24 +00:00
Krasimir Georgiev	bdff164e0e	Revert "[SLC] Preserve attrs for strncpy(x, "", y) -> memset(align 1 x, '\0', y)" Summary: This reverts commit r372101. Causes ASAN build bot failures: http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/14176 From http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/14176/steps/64-bit%20check-asan/logs/stdio: ``` [ RUN ] AddressSanitizer.StrNCatOOBTest /home/buildbots/ppc64be-sanitizer/sanitizer-ppc64be/build/llvm-project/compiler-rt/lib/asan/tests/asan_str_test.cpp:462: Failure Death test: strncat(to - 1, from, 0) Result: failed to die. ``` Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67658 llvm-svn: 372125	2019-09-17 14:15:23 +00:00
Simon Pilgrim	a2719f38c1	[LoopVectorize] Don't dereference a dyn_cast result. NFCI. The static analyzer is warning about potential null dereferences of dyn_cast<> results, we can use cast<> directly as we know that these cases should all be CastInst, which is why its working atm and anyway cast<> will assert if they aren't. llvm-svn: 372116	2019-09-17 13:24:54 +00:00
Benjamin Kramer	df4b9a3f4f	Hide implementation details in namespaces. llvm-svn: 372113	2019-09-17 12:56:29 +00:00
Johannes Doerfert	3ab9e8b818	[Attributor][Fix] Initialize the cache prior to using it Summary: There were segfaults as we modified and iterated the instruction maps in the cache at the same time. This was happening because we created new instructions while we populated the cache. This fix changes the order in which we perform these actions. First, the caches for the whole module are created, then we start to create abstract attributes. I don't have a unit test but the LLVM test suite exposes this problem. Reviewers: uenoku, sstefan1 Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67232 llvm-svn: 372105	2019-09-17 10:52:41 +00:00
David Bolvansky	ded48e93e6	[SLC] Preserve attrs for strncpy(x, "", y) -> memset(align 1 x, '\0', y) llvm-svn: 372101	2019-09-17 10:25:38 +00:00
David Bolvansky	be2487a2ba	[InstCombine] Annotate strdup with deref_or_null llvm-svn: 372098	2019-09-17 10:12:48 +00:00
David Bolvansky	3a3dddd9d7	[NFCI] Fixed buildbots llvm-svn: 372097	2019-09-17 10:03:45 +00:00
Fangrui Song	8351763709	[SimplifyLibCalls] Fix -Wunused-result after D53342/r372091 llvm-svn: 372096	2019-09-17 09:56:55 +00:00
David Bolvansky	e80fcf0340	[SimplifyLibCalls] Mark known arguments with nonnull Reviewers: efriedma, jdoerfert Reviewed By: jdoerfert Subscribers: ychen, rsmith, joerg, aaron.ballman, lebedev.ri, uenoku, jdoerfert, hfinkel, javed.absar, spatel, dmgreen, llvm-commits Differential Revision: https://reviews.llvm.org/D53342 llvm-svn: 372091	2019-09-17 09:32:52 +00:00
Florian Hahn	1bd58870e5	[LoopUnroll] Use LoopSize+1 as threshold, to allow unrolling loops matching LoopSize. We use `< UP.Threshold` later on, so we should use LoopSize + 1, to allow unrolling if the result won't exceed to loop size. Fixes PR43305. Reviewers: efriedma, dmgreen, paquette Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D67594 llvm-svn: 372084	2019-09-17 09:02:48 +00:00
Hideto Ueno	30d86f1858	[Attributor] Use Alias Analysis in noalias callsite argument deduction Summary: This patch adds a check of alias analysis in `noalias` callsite argument deduction. Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67604 llvm-svn: 372075	2019-09-17 06:53:27 +00:00
Hideto Ueno	3bb5cbc20b	[Attributor] Create helper struct for handling analysis getters Summary: This patch introduces a helper struct `AnalysisGetter` to put together analysis getters. In this patch, a getter for `AAResult` is also added for `noalias`. Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67603 llvm-svn: 372072	2019-09-17 05:45:18 +00:00
Reid Kleckner	32837a0c93	[PGO] Use linkonce_odr linkage for __profd_ variables in comdat groups This fixes relocations against __profd_ symbols in discarded sections, which is PR41380. In general, instrumentation happens very early, and optimization and inlining happens afterwards. The counters for a function are calculated early, and after inlining, counters for an inlined function may be widely referenced by other functions. For C++ inline functions of all kinds (linkonce_odr & available_externally mainly), instr profiling wants to deduplicate these __profc_ and __profd_ globals. Otherwise the binary would be quite large. I made __profd_ and __profc_ comdat in r355044, but I chose to make __profd_ internal. At the time, I was only dealing with coverage, and in that case, none of the instrumentation needs to reference __profd_. However, if you use PGO, then instrumentation passes add calls to __llvm_profile_instrument_range which reference __profd_ globals. The solution is to make these globals externally visible by using linkonce_odr linkage for data as was done for counters. This is safe because PGO adds a CFG hash to the names of the data and counter globals, so if different TUs have different globals, they will get different data and counter arrays. Reviewers: xur, hans Differential Revision: https://reviews.llvm.org/D67579 llvm-svn: 372020	2019-09-16 18:49:09 +00:00
Roman Lebedev	10151f6618	[SimplifyCFG] FoldTwoEntryPHINode(): consider total speculation cost, not per-BB cost Summary: Previously, if the threshold was 2, we were willing to speculatively execute 2 cheap instructions in both basic blocks (thus we were willing to speculatively execute cost = 4), but weren't willing to speculate when one BB had 3 instructions and other one had no instructions, even thought that would have total cost of 3. This looks inconsistent to me. I don't think `cmov`-like instructions will start executing until both of it's inputs are available: https://godbolt.org/z/zgHePf So i don't see why the existing behavior is the correct one. Also, let's add it's own `cl::opt` for this threshold, with default=4, so it is not stricter than the previous threshold: will allow to fold when there are 2 BB's each with cost=2. And since the logic has changed, it will also allow to fold when one BB has cost=3 and other cost=1, or there is only one BB with cost=4. This is an alternative solution to D65148: This fix is mainly motivated by `signbit-like-value-extension.ll` test. That pattern comes up in JPEG decoding, see e.g. `Figure F.12 – Extending the sign bit of a decoded value in V` of `ITU T.81` (JPEG specification). That branch is not predictable, and it is within the innermost loop, so the fact that that pattern ends up being stuck with a branch instead of `select` (i.e. `CMOV` for x86) is unlikely to be beneficial. This has great results on the final assembly (vanilla test-suite + RawSpeed): (metric pass - D67240) \| metric \| old \| new \| delta \| % \| \| x86-mi-counting.NumMachineFunctions \| 37720 \| 37721 \| 1 \| 0.00% \| \| x86-mi-counting.NumMachineBasicBlocks \| 773545 \| 771181 \| -2364 \| -0.31% \| \| x86-mi-counting.NumMachineInstructions \| 7488843 \| 7486442 \| -2401 \| -0.03% \| \| x86-mi-counting.NumUncondBR \| 135770 \| 135543 \| -227 \| -0.17% \| \| x86-mi-counting.NumCondBR \| 423753 \| 422187 \| -1566 \| -0.37% \| \| x86-mi-counting.NumCMOV \| 24815 \| 25731 \| 916 \| 3.69% \| \| x86-mi-counting.NumVecBlend \| 17 \| 17 \| 0 \| 0.00% \| We significantly decrease basic block count, notably decrease instruction count, significantly decrease branch count and very significantly increase `cmov` count. Performance-wise, unsurprisingly, this has great effect on target RawSpeed benchmark. I'm seeing 5 major improvements: ``` Benchmark Time CPU Time Old Time New CPU Old CPU New ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49 Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_mean -0.3064 -0.3064 226.9913 157.4452 226.9800 157.4384 Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_median -0.3057 -0.3057 226.8407 157.4926 226.8282 157.4828 Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_stddev -0.4985 -0.4954 0.3051 0.1530 0.3040 0.1534 Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49 Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_mean -0.1747 -0.1747 80.4787 66.4227 80.4771 66.4146 Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_median -0.1742 -0.1743 80.4686 66.4542 80.4690 66.4436 Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_stddev +0.6089 +0.5797 0.0670 0.1078 0.0673 0.1062 Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49 Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_mean -0.1598 -0.1598 171.6996 144.2575 171.6915 144.2538 Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_median -0.1598 -0.1597 171.7109 144.2755 171.7018 144.2766 Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_stddev +0.4024 +0.3850 0.0847 0.1187 0.0848 0.1175 Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49 Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_mean -0.0550 -0.0551 280.3046 264.8800 280.3017 264.8559 Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_median -0.0554 -0.0554 280.2628 264.7360 280.2574 264.7297 Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_stddev +0.7005 +0.7041 0.2779 0.4725 0.2775 0.4729 Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49 Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_mean -0.0354 -0.0355 316.7396 305.5208 316.7342 305.4890 Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_median -0.0354 -0.0356 316.6969 305.4798 316.6917 305.4324 Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_stddev +0.0493 +0.0330 0.3562 0.3737 0.3563 0.3681 ``` That being said, it's always best-effort, so there will likely be cases where this worsens things. Reviewers: efriedma, craig.topper, dmgreen, jmolloy, fhahn, Carrot, hfinkel, chandlerc Reviewed By: jmolloy Subscribers: xbolva00, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67318 llvm-svn: 372009	2019-09-16 16:18:24 +00:00
Sanjay Patel	3961a143e1	[InstCombine] remove unneeded one-use checks for icmp fold Related folds were added in: rL125734 ...the code comment about register pressure is discussed in more detail in: https://bugs.llvm.org/show_bug.cgi?id=2698 But 10 years later, perf testing bzip2 with this change now shows a slight (0.2% average) improvement on Haswell although that's probably within test noise. Given that this is IR canonicalization, we shouldn't be worried about register pressure though; the backend should be able to adjust for that as needed. This is part of solving PR43310 the theoretically right way: https://bugs.llvm.org/show_bug.cgi?id=43310 ...ie, if we don't cripple basic transforms, then we won't need to add special-case code to detect larger patterns. rL371940 and rL371981 are related patches in this series. llvm-svn: 372007	2019-09-16 16:15:25 +00:00
Sanjay Patel	c5cd808156	[InstCombine] remove unneeded one-use checks for icmp fold This fold and several others were added in: rL125734 <https://reviews.llvm.org/rL125734> ...with no explanation for the one-use checks other than the code comments about register pressure. Given that this is IR canonicalization, we shouldn't be worried about register pressure though; the backend should be able to adjust for that as needed. This is part of solving PR43310 the theoretically right way: https://bugs.llvm.org/show_bug.cgi?id=43310 ...ie, if we don't cripple basic transforms, then we won't need to add special-case code to detect larger patterns. rL371940 is a related patch in this series. llvm-svn: 371981	2019-09-16 12:54:34 +00:00
Sanjay Patel	91c2cd0691	[InstCombine] fix comments to match code; NFC This blob was written before match() existed, so it could probably be reduced significantly. But I suspect it isn't well tested, so tests would have to be added to reduce risk from logic changes. llvm-svn: 371978	2019-09-16 12:12:05 +00:00
Simon Pilgrim	1aaefbca24	[VPlanSLP] Don't dereference a cast_or_null<VPInstruction> result. NFCI. The static analyzer is warning about a potential null dereference of the cast_or_null result, I've split the cast_or_null check from the ->getUnderlyingInstr() call to avoid this, but it appears that we weren't seeing any null pointers in the dumped bundles in the first place. llvm-svn: 371975	2019-09-16 11:22:44 +00:00
Simon Pilgrim	bfe6b35c70	[SLPVectorizer] Assert that we find a LastInst to silence analyzer null dereference warning. NFCI. llvm-svn: 371974	2019-09-16 10:48:16 +00:00
Simon Pilgrim	ae625d70cd	[SLPVectorizer] Don't dereference a dyn_cast result. NFCI. The static analyzer is warning about potential null dereferences of dyn_cast<> results - in these cases we can safely use cast<> directly as we know that these cases should all be the correct type, which is why its working atm and anyway cast<> will assert if they aren't. llvm-svn: 371973	2019-09-16 10:35:09 +00:00
Stefan Stipanovic	431141c5cc	[Attributor] Heap-To-Stack Conversion D53362 gives a prototype heap-to-stack conversion pass. With addition of new attributes in the attributor, this can now be revisted and improved. This will place it in the Attributor to make it easier to use new attributes (eg. nofree, nosync, willreturn, etc.) and other attributor features. Reviewers: jdoerfert, uenoku, hfinkel, efriedma Subscribers: lebedev.ri, xbolva00, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D65408 llvm-svn: 371942	2019-09-15 21:47:41 +00:00
Sanjay Patel	3daf168fa9	[InstCombine] remove unneeded one-use checks for icmp fold This fold and several others were added in: rL125734 ...with no explanation for the one-use checks other than the code comments about register pressure. Given that this is IR canonicalization, we shouldn't be worried about register pressure though; the backend should be able to adjust for that as needed. There are similar checks as noted with the TODO comments. I'm hoping to remove those restrictions too, but if any of these does cause a regression, it should be easier to correct by making small, individual commits. This is part of solving PR43310 the theoretically right way: https://bugs.llvm.org/show_bug.cgi?id=43310 ...ie, if we don't cripple basic transforms, then we won't need to add special-case code to detect larger patterns. llvm-svn: 371940	2019-09-15 20:56:34 +00:00
Simon Pilgrim	4e46ea3946	[LoadStoreVectorizer] vectorizeLoadChain - ensure we find a valid Type down the load chain. NFCI. Silence static analyzer uninitialized variable warning by setting the LoadTy to null and then asserting we find a real value. llvm-svn: 371936	2019-09-15 16:44:35 +00:00
Sanjay Patel	b6a0faaa0c	[SLP] limit vectorization of Constant subclasses (PR33958) This is a fix for: https://bugs.llvm.org/show_bug.cgi?id=33958 It seems universally true that we would not want to transform this kind of sequence on any target, but if that's not correct, then we could view this as a target-specific cost model problem. We could also white-list ConstantInt, ConstantFP, etc. rather than blacklist Global and ConstantExpr. Differential Revision: https://reviews.llvm.org/D67362 llvm-svn: 371931	2019-09-15 13:03:24 +00:00
Johannes Doerfert	e7c6f97039	[Attributor][Fix] Use right type to replace expressions Summary: This should be obsolete once the functionality in D66967 is integrated. Reviewers: uenoku, sstefan1 Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67231 llvm-svn: 371915	2019-09-14 02:57:50 +00:00
Florian Hahn	cde8343d85	[BasicBlockUtils] Add optional BBName argument, in line with BB:splitBasicBlock Reviewers: spatel, asbirlea, craig.topper Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D67521 llvm-svn: 371819	2019-09-13 08:03:32 +00:00
Alina Sbirlea	6943472d45	[MemorySSA] Pass (for update) MSSAU when hoisting instructions. Summary: Pass MSSAU to makeLoopInvariant in order to properly update MSSA. Reviewers: george.burgess.iv Subscribers: Prazek, sanjoy.google, uabelho, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67470 llvm-svn: 371748	2019-09-12 17:12:51 +00:00
Sanjay Patel	2bfb955c51	[InstCombine] rename variable for readability; NFC There's more that can be done here, but "OpI" doesn't convey that we casted to BinaryOperator. llvm-svn: 371682	2019-09-11 22:31:34 +00:00
Eli Friedman	403e08d4cf	[ConstantHoisting] Fix non-determinism. Differential Revision: https://reviews.llvm.org/D66114 llvm-svn: 371644	2019-09-11 18:55:00 +00:00
Petr Hosek	7bdad08429	Reland "clang-misexpect: Profile Guided Validation of Performance Annotations in LLVM" This patch contains the basic functionality for reporting potentially incorrect usage of __builtin_expect() by comparing the developer's annotation against a collected PGO profile. A more detailed proposal and discussion appears on the CFE-dev mailing list (http://lists.llvm.org/pipermail/cfe-dev/2019-July/062971.html) and a prototype of the initial frontend changes appear here in D65300 We revised the work in D65300 by moving the misexpect check into the LLVM backend, and adding support for IR and sampling based profiles, in addition to frontend instrumentation. We add new misexpect metadata tags to those instructions directly influenced by the llvm.expect intrinsic (branch, switch, and select) when lowering the intrinsics. The misexpect metadata contains information about the expected target of the intrinsic so that we can check against the correct PGO counter when emitting diagnostics, and the compiler's values for the LikelyBranchWeight and UnlikelyBranchWeight. We use these branch weight values to determine when to emit the diagnostic to the user. A future patch should address the comment at the top of LowerExpectIntrisic.cpp to hoist the LikelyBranchWeight and UnlikelyBranchWeight values into a shared space that can be accessed outside of the LowerExpectIntrinsic pass. Once that is done, the misexpect metadata can be updated to be smaller. In the long term, it is possible to reconstruct portions of the misexpect metadata from the existing profile data. However, we have avoided this to keep the code simple, and because some kind of metadata tag will be required to identify which branch/switch/select instructions are influenced by the use of llvm.expect Patch By: paulkirth Differential Revision: https://reviews.llvm.org/D66324 llvm-svn: 371635	2019-09-11 16:19:50 +00:00
Florian Hahn	51de22c8ee	Revert [InstCombine] Use SimplifyFMulInst to simplify multiply in fma. This introduces additional rounding error in some cases. See D67434. This reverts r371518 (git commit `18a1f0818b`) llvm-svn: 371634	2019-09-11 16:17:03 +00:00
Whitney Tsang	1ccba7c1a1	LLVM: Optimization Pass: Remove conflicting attribute, if any, before adding new read attribute to an argument Summary: Update optimization pass to prevent adding read-attribute to an argument without removing its conflicting attribute. A read attribute, based on the result of the attribute deduction process, might be added to an argument. The attribute might be in conflict with other read/write attribute currently associated with the argument. To ensure the compatibility of attributes, conflicting attribute, if any, must be removed before a new one is added. The following snippet shows the current behavior of the compiler, where the compilation process is aborted due to incompatible attributes. $ cat x.ll ; ModuleID = 'x.bc' %_type_of_d-ccc = type <{ i8, i8, i8, i8, i8 }> @d-ccc = internal global %_type_of_d-ccc <{ i8 null, i8 1, i8 13, i8 0, i8 -127 }>, align 8 define void @foo(i32* writeonly %.aaa) { foo_entry: %_param_.aaa = alloca i32, align 8 store i32 %.aaa, i32** %_param_.aaa, align 8 store i8 0, i8* getelementptr inbounds (%_type_of_d-ccc, %_type_of_d-ccc* @d-ccc, i32 0, i32 3) ret void } $ opt -O3 x.ll Attributes 'readnone and writeonly' are incompatible! void (i32) @foo in function foo LLVM ERROR: Broken function found, compilation aborted! The purpose of this changeset is to fix the above error. This fix is based on a suggestion from Johannes @jdoerfert (many thanks!!!) Authored By: anhtuyen Reviewer: nicholas, rnk, chandlerc, jdoerfert Reviewed By: rnk Subscribers: hiraditya, jdoerfert, llvm-commits, anhtuyen, LLVM Tag: LLVM Differential Revision: https://reviews.llvm.org/D58694 llvm-svn: 371622	2019-09-11 14:26:22 +00:00
Sanjay Patel	80bea345d1	[InstCombine] fold sign-bit compares of srem (srem X, pow2C) sgt/slt 0 can be reduced using bit hacks by masking off the sign bit and the module (low) bits: https://rise4fun.com/Alive/jSO A '2' divisor allows slightly more folding: https://rise4fun.com/Alive/tDBM Any chance to remove an 'srem' use is probably worthwhile, but this is limited to the one-use improvement case because doing more may expose other missing folds. That means it does nothing for PR21929 yet: https://bugs.llvm.org/show_bug.cgi?id=21929 Differential Revision: https://reviews.llvm.org/D67334 llvm-svn: 371610	2019-09-11 12:04:26 +00:00
David Bolvansky	4dae283cd3	[InstCombine] Fixed handling of isOpNewLike (PR11748) llvm-svn: 371602	2019-09-11 10:37:03 +00:00
Florian Hahn	e79381c3f7	[LoopInterchange] Drop unused splitInnerLoopHeader declaration. llvm-svn: 371601	2019-09-11 10:32:15 +00:00
Dmitri Gribenko	57256af307	Revert "clang-misexpect: Profile Guided Validation of Performance Annotations in LLVM" This reverts commit r371584. It introduced a dependency from compiler-rt to llvm/include/ADT, which is problematic for multiple reasons. One is that it is a novel dependency edge, which needs cross-compliation machinery for llvm/include/ADT (yes, it is true that right now compiler-rt included only header-only libraries, however, if we allow compiler-rt to depend on anything from ADT, other libraries will eventually get used). Secondly, depending on ADT from compiler-rt exposes ADT symbols from compiler-rt, which would cause ODR violations when Clang is built with the profile library. llvm-svn: 371598	2019-09-11 09:16:17 +00:00
Florian Hahn	e4961218fd	[LoopInterchange] Properly move condition, induction increment and ops to latch. Currently we only rely on the induction increment to come before the condition to ensure the required instructions get moved to the new latch. This patch duplicates and moves the required instructions to the newly created latch. We move the condition to the end of the new block, then process its operands. We stop at operands that are defined outside the loop, or are the induction PHI. We duplicate the instructions and update the uses in the moved instructions, to ensure other users remain intact. See the added test2 for such an example. Reviewers: efriedma, mcrosier Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D67367 llvm-svn: 371595	2019-09-11 08:23:23 +00:00
Hideto Ueno	1d68ed8c24	[Attributor] Implement "noalias" callsite argument deduction Summary: Now, `nocapture` is deduced in Attributor therefore, this patch introduces deduction for `noalias` callsite argument using `nocapture`. Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Subscribers: lebedev.ri, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67286 llvm-svn: 371590	2019-09-11 07:00:33 +00:00
Hideto Ueno	3736764657	[Attributor][Fix] Manifest nocapture only in CSArgument or Argument Summary: We can query to Attributor whether the value is captured in the scope or not on the following way: ``` const auto & NoCapAA = A.getAAFor<AANoCapture>(*this, IRPosition::value(V)); ``` And if V is CallSiteReturned then `getDeducedAttribute` will add `nocatpure` to the callsite returned value. It is not valid. This patch checks the position is an argument or call site argument. This is tested in D67286. Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67342 llvm-svn: 371589	2019-09-11 06:52:11 +00:00
Alexey Lapshin	6b1c6c1287	[Debuginfo][Instcombiner] Do not clone dbg.declare. TryToSinkInstruction() has a bug: While updating debug info for sunk instruction, it could clone dbg.declare intrinsic. That is wrong. There could be only one dbg.declare. The fix is to not clone dbg.declare intrinsic and to update it`s arguments, to not to point to sunk instruction. Differential Revision: https://reviews.llvm.org/D67217 llvm-svn: 371587	2019-09-11 06:07:16 +00:00
Petr Hosek	394a8ed8f1	clang-misexpect: Profile Guided Validation of Performance Annotations in LLVM This patch contains the basic functionality for reporting potentially incorrect usage of __builtin_expect() by comparing the developer's annotation against a collected PGO profile. A more detailed proposal and discussion appears on the CFE-dev mailing list (http://lists.llvm.org/pipermail/cfe-dev/2019-July/062971.html) and a prototype of the initial frontend changes appear here in D65300 We revised the work in D65300 by moving the misexpect check into the LLVM backend, and adding support for IR and sampling based profiles, in addition to frontend instrumentation. We add new misexpect metadata tags to those instructions directly influenced by the llvm.expect intrinsic (branch, switch, and select) when lowering the intrinsics. The misexpect metadata contains information about the expected target of the intrinsic so that we can check against the correct PGO counter when emitting diagnostics, and the compiler's values for the LikelyBranchWeight and UnlikelyBranchWeight. We use these branch weight values to determine when to emit the diagnostic to the user. A future patch should address the comment at the top of LowerExpectIntrisic.cpp to hoist the LikelyBranchWeight and UnlikelyBranchWeight values into a shared space that can be accessed outside of the LowerExpectIntrinsic pass. Once that is done, the misexpect metadata can be updated to be smaller. In the long term, it is possible to reconstruct portions of the misexpect metadata from the existing profile data. However, we have avoided this to keep the code simple, and because some kind of metadata tag will be required to identify which branch/switch/select instructions are influenced by the use of llvm.expect Patch By: paulkirth Differential Revision: https://reviews.llvm.org/D66324 llvm-svn: 371584	2019-09-11 01:09:16 +00:00
Alina Sbirlea	f9cc0393b3	[MemorySSA] MemorySSA should not model debuginfo, and need not update it. Reverts the change in r371084, but keeps the test. After r371565, debuginfo cannot be modelled in MemorySSA, even with a non-standard AA pipeline. llvm-svn: 371573	2019-09-10 23:36:43 +00:00
Philip Reames	cffa630c80	[Loads] Move generic code out of vectorizer into a location it might be reused [NFC] llvm-svn: 371558	2019-09-10 21:33:53 +00:00
Philip Reames	1e1db80048	[ValueTracking] Factor our common speculation suppression logic [NFC] Expose a utility function so that all places which want to suppress speculation (when otherwise legal) due to ordering and/or sanitizer interaction can do so. llvm-svn: 371556	2019-09-10 21:12:29 +00:00
Florian Hahn	18a1f0818b	[InstCombine] Use SimplifyFMulInst to simplify multiply in fma. This allows us to fold fma's that multiply with 0.0. Also, the multiply by 1.0 case is handled there as well. The fneg/fabs cases are not handled by SimplifyFMulInst, so we need to keep them. Reviewers: spatel, anemet, lebedev.ri Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D67351 llvm-svn: 371518	2019-09-10 13:10:28 +00:00
Dmitri Gribenko	2bf8d77453	Revert "Reland "r364412 [ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline."" This reverts commit r371502, it broke tests (clang/test/CodeGenCXX/auto-var-init.cpp). llvm-svn: 371507	2019-09-10 10:39:09 +00:00
Clement Courbet	612c260ec3	Reland "r364412 [ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline." With a fix for sanitizer breakage (see explanation in D60318). llvm-svn: 371502	2019-09-10 09:18:00 +00:00
Petr Hosek	7d1757aba8	Revert "clang-misexpect: Profile Guided Validation of Performance Annotations in LLVM" This reverts commit r371484: this broke sanitizer-x86_64-linux-fast bot. llvm-svn: 371488	2019-09-10 06:25:13 +00:00
Petr Hosek	a10802fd73	clang-misexpect: Profile Guided Validation of Performance Annotations in LLVM This patch contains the basic functionality for reporting potentially incorrect usage of __builtin_expect() by comparing the developer's annotation against a collected PGO profile. A more detailed proposal and discussion appears on the CFE-dev mailing list (http://lists.llvm.org/pipermail/cfe-dev/2019-July/062971.html) and a prototype of the initial frontend changes appear here in D65300 We revised the work in D65300 by moving the misexpect check into the LLVM backend, and adding support for IR and sampling based profiles, in addition to frontend instrumentation. We add new misexpect metadata tags to those instructions directly influenced by the llvm.expect intrinsic (branch, switch, and select) when lowering the intrinsics. The misexpect metadata contains information about the expected target of the intrinsic so that we can check against the correct PGO counter when emitting diagnostics, and the compiler's values for the LikelyBranchWeight and UnlikelyBranchWeight. We use these branch weight values to determine when to emit the diagnostic to the user. A future patch should address the comment at the top of LowerExpectIntrisic.cpp to hoist the LikelyBranchWeight and UnlikelyBranchWeight values into a shared space that can be accessed outside of the LowerExpectIntrinsic pass. Once that is done, the misexpect metadata can be updated to be smaller. In the long term, it is possible to reconstruct portions of the misexpect metadata from the existing profile data. However, we have avoided this to keep the code simple, and because some kind of metadata tag will be required to identify which branch/switch/select instructions are influenced by the use of llvm.expect Patch By: paulkirth Differential Revision: https://reviews.llvm.org/D66324 llvm-svn: 371484	2019-09-10 03:11:39 +00:00
Philip Reames	7403569be7	[LoopVectorize] Leverage speculation safety to avoid masked.loads If we're vectorizing a load in a predicated block, check to see if the load can be speculated rather than predicated. This allows us to generate a normal vector load instead of a masked.load. To do so, we must prove that all bytes accessed on any iteration of the original loop are dereferenceable, and that all loads (across all iterations) are properly aligned. This is equivelent to proving that hoisting the load into the loop header in the original scalar loop is safe. Note: There are a couple of code motion todos in the code. My intention is to wait about a day - to be sure this sticks - and then perform the NFC motion without furthe review. Differential Revision: https://reviews.llvm.org/D66688 llvm-svn: 371452	2019-09-09 20:54:13 +00:00
Sanjay Patel	aff5bee35f	[InstCombine] fold extract+insert into identity shuffle This is similar to the existing fold for splats added with: rL365379 If we can adjust the shuffle mask to include another element in an identity mask (if it changes vector length, that's an extract/insert subvector operation in the backend), then that can eliminate extractelement/insertelement pairs in IR. All targets are expected to lower shuffles with identity masks efficiently. llvm-svn: 371340	2019-09-08 19:03:01 +00:00
Simon Pilgrim	879ed20bde	Fix typo. NFCI llvm-svn: 371317	2019-09-07 18:09:09 +00:00
Roman Lebedev	45ba26599b	[SimplifyCFG] SpeculativelyExecuteBB(): It's SpeculatedInstructions, not SpeculationCost It counts the number of instructions we are ok speculating (at most 1 there), not their cost, so rename accordingly. llvm-svn: 371294	2019-09-07 09:06:06 +00:00
Hideto Ueno	f2b9dc4758	[Attributor] ValueSimplify Abstract Attribute Summary: This patch introduces initial `AAValueSimplify` which simplifies a value in a context. example - (for function returned) If all the return values are the same and constant, then we can replace callsite returned with the constant. - If an internal function takes the same value(constant) as an argument in the callsite, then we can replace the argument with that constant. Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66967 llvm-svn: 371291	2019-09-07 07:03:05 +00:00
Teresa Johnson	9c27b59cec	Change TargetLibraryInfo analysis passes to always require Function Summary: This is the first change to enable the TLI to be built per-function so that -fno-builtin* handling can be migrated to use function attributes. See discussion on D61634 for background. This is an enabler for fixing handling of these options for LTO, for example. This change should not affect behavior, as the provided function is not yet used to build a specifically per-function TLI, but rather enables that migration. Most of the changes were very mechanical, e.g. passing a Function to the legacy analysis pass's getTLI interface, or in Module level cases, adding a callback. This is similar to the way the per-function TTI analysis works. There was one place where we were looking for builtins but not in the context of a specific function. See FindCXAAtExit in lib/Transforms/IPO/GlobalOpt.cpp. I'm somewhat concerned my workaround could provide the wrong behavior in some corner cases. Suggestions welcome. Reviewers: chandlerc, hfinkel Subscribers: arsenm, dschuff, jvesely, nhaehnle, mehdi_amini, javed.absar, sbc100, jgravelle-google, eraman, aheejin, steven_wu, george.burgess.iv, dexonsmith, jfb, asbirlea, gchatelet, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66428 llvm-svn: 371284	2019-09-07 03:09:36 +00:00
Evandro Menezes	7d677adf2d	[InstCombine] Refactor substitution of instruction in the parent BB (NFC) Add the new method `LibCallSimplifier::substituteInParent()` that calls `LibCallSimplifier::replaceAllUsesWith()' and `LibCallSimplifier::eraseFromParent()` back to back, simplifying the resulting code. llvm-svn: 371264	2019-09-06 22:07:11 +00:00
Sanjay Patel	4f0e429acc	[SimplifyLibCalls] handle pow(x,-0.0) before it can assert (PR43233) https://bugs.llvm.org/show_bug.cgi?id=43233 llvm-svn: 371221	2019-09-06 16:10:18 +00:00
Matt Arsenault	524a9d5774	InstCombine: Fix crash on icmp of gep with addrspacecasted null llvm-svn: 371146	2019-09-05 23:39:21 +00:00
Vitaly Buka	9020f11377	[SimplifyCFG] Don't SimplifyBranchOnICmpChain with ExtraCase Summary: Here we try to avoid issues with "explicit branch" with SimplifyBranchOnICmpChain which can check on undef. Msan by design reports branches on uninitialized memory and undefs, so we have false report here. In general msan does not like when we convert ``` // If at least one of them is true we can MSAN is ok if another is undefs if (a \|\| b) return; ``` into ``` // If 'a' is undef MSAN will complain even if 'b' is true if (a) return; if (b) return; ``` Example Before optimization we had something like this: ``` while (true) { bool maybe_undef = doStuff(); while (true) { char c = getChar(); if (c != 10 && c != 13) continue break; } // we know that c == 10 \|\| c == 13 if we get here, // so msan know that branch is not affected by maybe_undef if (maybe_undef \|\| c == 10 \|\| c == 13) continue; return; } ``` SimplifyBranchOnICmpChain will convert that into ``` while (true) { bool maybe_undef = doStuff(); while (true) { char c = getChar(); if (c != 10 && c != 13) continue; break; } // however msan will complain here: if (maybe_undef) continue; // we know that c == 10 \|\| c == 13, so either way we will get continue switch(c) { case 10: continue; case 13: continue; } return; } ``` Reviewers: eugenis, efriedma Reviewed By: eugenis, efriedma Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67205 llvm-svn: 371138	2019-09-05 22:49:34 +00:00
Roman Lebedev	8360c42e25	[InstCombine] foldICmpBinOp(): consider inverted check in 'unsigned sub overflow' check A follow-up for r329011. This may be changed to produce @llvm.sub.with.overflow in a later patch, but for now just make things more consistent overall. A few observations stem from this: * There does not seem to be a similar one-instruction fold for uadd-overflow * I'm not sure we'll want to canonicalize `B u> A` as `usub.with.overflow`, so since the `icmp` here no longer refers to `sub`, reconstructing `usub.with.overflow` will be problematic, and will likely require standalone pass (similar to DivRemPairs). https://rise4fun.com/Alive/Zqs Name: (A - B) u> A --> B u> A %t0 = sub i8 %A, %B %r = icmp ugt i8 %t0, %A => %r = icmp ugt i8 %B, %A Name: (A - B) u<= A --> B u<= A %t0 = sub i8 %A, %B %r = icmp ule i8 %t0, %A => %r = icmp ule i8 %B, %A Name: C u< (C - D) --> C u< D %t0 = sub i8 %C, %D %r = icmp ult i8 %C, %t0 => %r = icmp ult i8 %C, %D Name: C u>= (C - D) --> C u>= D %t0 = sub i8 %C, %D %r = icmp uge i8 %C, %t0 => %r = icmp uge i8 %C, %D llvm-svn: 371101	2019-09-05 17:41:02 +00:00
Roman Lebedev	ecb7ea1ae7	[InstCombine] foldICmpBinOp(): consider inverted check in 'unsigned add overflow' check A follow-up for r342004. This will be changed to produce @llvm.add.with.overflow in a later patch, but for now just make things more consistent overall. https://rise4fun.com/Alive/qxE Name: (Op1 + X) u< Op1 --> ~Op1 u< X %t0 = add i8 %Op1, %X %r = icmp ult i8 %t0, %Op1 => %n = xor i8 %Op1, -1 %r = icmp ult i8 %n, %X Name: (Op1 + X) u>= Op1 --> ~Op1 u>= X %t0 = add i8 %Op1, %X %r = icmp uge i8 %t0, %Op1 => %n = xor i8 %Op1, -1 %r = icmp uge i8 %n, %X ;------------------------------------------------------------------------------- Name: Op0 u> (Op0 + X) --> X u> ~Op0 %t0 = add i8 %Op0, %X %r = icmp ugt i8 %Op0, %t0 => %n = xor i8 %Op0, -1 %r = icmp ugt i8 %X, %n Name: Op0 u<= (Op0 + X) --> X u<= ~Op0 %t0 = add i8 %Op0, %X %r = icmp ule i8 %Op0, %t0 => %n = xor i8 %Op0, -1 %r = icmp ule i8 %X, %n llvm-svn: 371100	2019-09-05 17:40:49 +00:00
Denis Bakhvalov	58f172f05a	[MergedLoadStoreMotion] Sink stores to BB with more than 2 predecessors If we have: bb5: br i1 %arg3, label %bb6, label %bb7 bb6: %tmp = getelementptr inbounds i32, i32* %arg1, i64 2 store i32 3, i32* %tmp, align 4 br label %bb9 bb7: %tmp8 = getelementptr inbounds i32, i32* %arg1, i64 2 store i32 3, i32* %tmp8, align 4 br label %bb9 bb9: ; preds = %bb4, %bb6, %bb7 ... We can't sink stores directly into bb9. This patch creates new BB that is successor of %bb6 and %bb7 and sinks stores into that block. SplitFooterBB is the parameter to the pass that controls that behavior. Change-Id: I7fdf50a772b84633e4b1b860e905bf7e3e29940f Differential: https://reviews.llvm.org/D66234 llvm-svn: 371089	2019-09-05 17:00:32 +00:00
Alina Sbirlea	2ac69aadb5	[MemorySSA] Verify MSSAUpdater exists. llvm-svn: 371087	2019-09-05 16:58:15 +00:00
Hiroshi Yamauchi	d842f2eec4	[PGO][CHR] Speed up following long, interlinked use-def chains. Summary: Avoid visiting an instruction more than once by using a map. This is similar to https://reviews.llvm.org/rL361416. Reviewers: davidxl Reviewed By: davidxl Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67198 llvm-svn: 371086	2019-09-05 16:56:55 +00:00
Alina Sbirlea	ae900d3882	[MemorySSA] Update MemorySSA when removing debug.value calls. llvm-svn: 371084	2019-09-05 16:25:24 +00:00
Guillaume Chatelet	33671ceffa	[LLVM][Alignment] Convert isLegalNTStore/isLegalNTLoad to llvm::Align Summary: This is patch is part of a serie to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67223 llvm-svn: 371063	2019-09-05 13:09:42 +00:00
Johannes Doerfert	56e9b608ad	[Attributor][Stats] Use the right statistics macro llvm-svn: 370976	2019-09-04 20:34:57 +00:00
Johannes Doerfert	7ab5253704	[Attributor][Fix] Make sure we do not delete live code Summary: Liveness needs to mark edges, not blocks as dead. Reviewers: sstefan1, uenoku Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67191 llvm-svn: 370975	2019-09-04 20:34:52 +00:00
Leonard Chan	eca01b031d	[NewPM][Sancov] Make Sancov a Module Pass instead of 2 Passes This patch merges the sancov module and funciton passes into one module pass. The reason for this is because we ran into an out of memory error when attempting to run asan fuzzer on some protobufs (pc.cc files). I traced the OOM error to the destructor of SanitizerCoverage where we only call appendTo[Compiler]Used which calls appendToUsedList. I'm not sure where precisely in appendToUsedList causes the OOM, but I am able to confirm that it's calling this function repeatedly that causes the OOM. (I hacked sancov a bit such that I can still create and destroy a new sancov on every function run, but only call appendToUsedList after all functions in the module have finished. This passes, but when I make it such that appendToUsedList is called on every sancov destruction, we hit OOM.) I don't think the OOM is from just adding to the SmallSet and SmallVector inside appendToUsedList since in either case for a given module, they'll have the same max size. I suspect that when the existing llvm.compiler.used global is erased, the memory behind it isn't freed. I could be wrong on this though. This patch works around the OOM issue by just calling appendToUsedList at the end of every module run instead of function run. The same amount of constants still get added to llvm.compiler.used, abd we make the pass usage and logic simpler by not having any inter-pass dependencies. Differential Revision: https://reviews.llvm.org/D66988 llvm-svn: 370971	2019-09-04 20:30:29 +00:00
Alina Sbirlea	6da79ce1fe	[MemorySSA] Re-enable MemorySSA use. Differential Revision: https://reviews.llvm.org/D58311 llvm-svn: 370957	2019-09-04 19:16:04 +00:00
Johannes Doerfert	a7a3b3aa43	[Attributor][Fix] Ensure the attribute names are created properly The names of the attributes were not always created properly which caused problems with the yaml output. llvm-svn: 370956	2019-09-04 19:01:08 +00:00
Philip Reames	4228245e41	[NFC] Switch last couple of invariant_load checks to use hasMetadata llvm-svn: 370948	2019-09-04 18:27:31 +00:00
David Bolvansky	420cbb6190	[InstCombine] sub(xor(x, y), or(x, y)) -> neg(and(x, y)) Summary: ``` Name: sub(xor(x, y), or(x, y)) -> neg(and(x, y)) %or = or i32 %y, %x %xor = xor i32 %x, %y %sub = sub i32 %xor, %or => %sub1 = and i32 %x, %y %sub = sub i32 0, %sub1 Optimization: sub(xor(x, y), or(x, y)) -> neg(and(x, y)) Done: 1 Optimization is correct! ``` https://rise4fun.com/Alive/8OI Reviewers: lebedev.ri Reviewed By: lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67188 llvm-svn: 370945	2019-09-04 18:03:21 +00:00
David Bolvansky	0e07248704	[InstCombine] Fold sub (and A, B) (or A, B)) to neg (xor A, B) Summary: ``` Name: sub(and(x, y), or(x, y)) -> neg(xor(x, y)) %or = or i32 %y, %x %and = and i32 %x, %y %sub = sub i32 %and, %or => %sub1 = xor i32 %x, %y %sub = sub i32 0, %sub1 Optimization: sub(and(x, y), or(x, y)) -> neg(xor(x, y)) Done: 1 Optimization is correct! ``` https://rise4fun.com/Alive/VI6 Found by @lebedev.ri. Also author of the proof. Reviewers: lebedev.ri, spatel Reviewed By: lebedev.ri Subscribers: llvm-commits, lebedev.ri Tags: #llvm Differential Revision: https://reviews.llvm.org/D67155 llvm-svn: 370934	2019-09-04 17:30:53 +00:00
Philip Reames	27820f9909	[Instruction] Add hasMetadata(Kind) helper [NFC] It's a common idiom, so let's add the obvious wrapper for metadata kinds which are basically booleans. llvm-svn: 370933	2019-09-04 17:28:48 +00:00
Johannes Doerfert	2f6220633c	[Attributor] Look at internal functions only on-demand Summary: Instead of building attributes for internal functions which we do not update as long as we assume they are dead, we now do not create attributes until we assume the internal function to be live. This improves the number of required iterations, as well as the number of required updates, in real code. On our tests, the results are mixed. Reviewers: sstefan1, uenoku Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66914 llvm-svn: 370924	2019-09-04 16:35:20 +00:00

1 2 3 4 5 ...

22597 Commits