llvm-project

Commit Graph

Author	SHA1	Message	Date
Roman Lebedev	3885207651	[InstCombine] (-NSW x) s>= x --> x s<= 0 (PR39480) Name: (-x) s>= x --> x s<= 0 %neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN %r = icmp sge i8 %neg_x, %x => %r = icmp sle i8 %x, 0 https://rise4fun.com/Alive/Hdip https://bugs.llvm.org/show_bug.cgi?id=39480	2020-08-06 11:50:34 +03:00
Roman Lebedev	8878b79cfe	[InstCombine] (-NSW x) ==/!= x --> x ==/!= 0 (PR39480) Name: (-x) == x --> x == 0 %neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN %r = icmp eq i8 %neg_x, %x => %r = icmp eq i8 %x, 0 Name: (-x) != x --> x != 0 %neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN %r = icmp ne i8 %neg_x, %x => %r = icmp ne i8 %x, 0 https://rise4fun.com/Alive/4slH https://bugs.llvm.org/show_bug.cgi?id=39480	2020-08-06 11:50:34 +03:00
Roman Lebedev	5060f5682b	[InstCombine] (-NSW x) s> x --> x s< 0 (PR39480) Name: (-x) s> x --> x s< 0 %neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN %r = icmp sgt i8 %neg_x, %x => %r = icmp slt i8 %x, 0 https://rise4fun.com/Alive/ZslD https://bugs.llvm.org/show_bug.cgi?id=39480	2020-08-06 11:50:34 +03:00
Chuanqi Xu	92f1f1e40d	[Coroutines] Use to collect lifetime marker of in CoroFrame Differential Revision: https://reviews.llvm.org/D85279	2020-08-06 14:21:55 +08:00
Juneyoung Lee	9f717d7b94	[JumpThreading] Allow duplicating a basic block into preds when its branch condition is freeze(phi) This is the last JumpThreading patch for getting the performance numbers shown at https://reviews.llvm.org/D84940#2184653 . This patch makes ProcessBlock call ProcessBranchOnPHI when the branch condition is freeze(phi) as well (originally it calls the function when the condition is phi only). Since what ProcessBranchOnPHI does is to duplicate the basic block into predecessors if profitable, it is still valid when the condition is freeze(phi) too. ``` p = phi [a, pred1] [b, pred2] p.fr = freeze p br p.fr, ... => pred1: p.fr = freeze a br p.fr, ... pred2: p.fr2 = freeze b br p.fr2, ... ``` Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85029	2020-08-06 09:51:17 +09:00
Sanjay Patel	c66169136f	[InstCombine] fold icmp with 'mul nsw/nuw' and constant operands This also removes a more specific fold that only handled icmp with 0. https://rise4fun.com/Alive/sdM9 Name: mul nsw with icmp eq Pre: (C1 != 0) && (C2 % C1) == 0 %a = mul nsw i8 %x, C1 %r = icmp eq i8 %a, C2 => %r = icmp eq i8 %x, C2 / C1 Name: mul nuw with icmp eq Pre: (C1 != 0) && (C2 %u C1) == 0 %a = mul nuw i8 %x, C1 %r = icmp eq i8 %a, C2 => %r = icmp eq i8 %x, C2 /u C1 Name: mul nsw with icmp ne Pre: (C1 != 0) && (C2 % C1) == 0 %a = mul nsw i8 %x, C1 %r = icmp ne i8 %a, C2 => %r = icmp ne i8 %x, C2 / C1 Name: mul nuw with icmp ne Pre: (C1 != 0) && (C2 %u C1) == 0 %a = mul nuw i8 %x, C1 %r = icmp ne i8 %a, C2 => %r = icmp ne i8 %x, C2 /u C1	2020-08-05 17:29:32 -04:00
Roman Lebedev	f3056dcc02	[InstCombine] Negator: -(cond ? x : -x) --> cond ? -x : x We were errneously only doing that for old-style abs/nabs, but we have no such legality check on the condition of the select. https://rise4fun.com/Alive/xBHS	2020-08-05 21:47:30 +03:00
Evgenii Stepanov	f2c0423995	[msan] Remove readnone and friends from call sites. MSan removes readnone/readonly and similar attributes from callees, because after MSan instrumentation those attributes no longer apply. This change removes the attributes from call sites, as well. Failing to do this may cause DSE of paramTLS stores before calls to readonly/readnone functions. Differential Revision: https://reviews.llvm.org/D85259	2020-08-05 10:34:45 -07:00
Jordan Rupprecht	3c39db0c44	Revert "[LoopVectorizer] Inloop vector reductions" This reverts commit `e9761688e4`. It breaks the build: ``` ~/src/llvm-project/llvm/lib/Analysis/IVDescriptors.cpp:868:10: error: no viable conversion from returned value of type 'SmallVector<[...], 8>' to function return type 'SmallVector<[...], 4>' return ReductionOperations; ```	2020-08-05 10:24:15 -07:00
David Green	e9761688e4	[LoopVectorizer] Inloop vector reductions Arm MVE has multiple instructions such as VMLAVA.s8, which (in this case) can take two 128bit vectors, sign extend the inputs to i32, multiplying them together and sum the result into a 32bit general purpose register. So taking 16 i8's as inputs, they can multiply and accumulate the result into a single i32 without any rounding/truncating along the way. There are also reduction instructions for plain integer add and min/max, and operations that sum into a pair of 32bit registers together treated as a 64bit integer (even though MVE does not have a plain 64bit addition instruction). So giving the vectorizer the ability to use these instructions both enables us to vectorize at higher bitwidths, and to vectorize things we previously could not. In order to do that we need a way to represent that the reduction operation, specified with a llvm.experimental.vector.reduce when vectorizing for Arm, occurs inside the loop not after it like most reductions. This patch attempts to do that, teaching the vectorizer about in-loop reductions. It does this through a vplan recipe representing the reductions that the original chain of reduction operations is replaced by. Cost modelling is currently just done through a prefersInloopReduction TTI hook (which follows in a later patch). Differential Revision: https://reviews.llvm.org/D75069	2020-08-05 18:14:05 +01:00
Roman Lebedev	a05ec856a3	[NFC][InstCombine] Negator: include all the needed headers, IWYU	2020-08-05 20:12:36 +03:00
Roman Lebedev	3a3c9519e2	[InstCombine] Negator: 0 - (X + Y) --> (-X) - Y iff a single operand negated This was the most obvious regression in f5df5cd5586ae9cfb2d9e53704dfc76f47aff149.f5df5cd5586ae9cfb2d9e53704dfc76f47aff149 We really don't want to do this if the original/outermost subtraction isn't a negation, and therefore doesn't go away - just sinking negation isn't a win. We are actually appear to be missing folds so hoist it. https://rise4fun.com/Alive/tiVe	2020-08-05 20:01:13 +03:00
Roman Lebedev	f5df5cd558	Recommit "[InstCombine] Negator: -(X << C) --> X * (-1 << C)" This reverts commit `ac70b37a00` which reverted commit `8aeb2fe13a` because codegen tests got broken and i needed time to investigate. This shows some regressions in tests, but they are all around GEP's, so i'm not really sure how important those are. https://rise4fun.com/Alive/1Gn	2020-08-05 15:59:13 +03:00
Juneyoung Lee	e0d99e9aaf	[JumpThreading] Consider freeze as a zero-cost instruction This is a simple patch that makes freeze as a zero-cost instruction, as bitcast already is. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85023	2020-08-05 14:42:36 +09:00
Roman Lebedev	ac70b37a00	Revert "[InstCombine] Negator: -(X << C) --> X * (-1 << C)" Breaks codegen tests, will recommit later. This reverts commit `8aeb2fe13a`.	2020-08-05 03:19:38 +03:00
Roman Lebedev	8aeb2fe13a	[InstCombine] Negator: -(X << C) --> X * (-1 << C) This shows some regressions in tests, but they are all around GEP's, so i'm not really sure how important those are. https://rise4fun.com/Alive/1Gn	2020-08-05 03:13:14 +03:00
Adrian Prantl	bf82ff61a6	Teach SROA to handle allocas with more than one dbg.declare. It is technically legal for optimizations to create an alloca that is used by more than one dbg.declare, if one or both of them are inlined instances of aliasing variables. Differential Revision: https://reviews.llvm.org/D85172	2020-08-04 15:54:51 -07:00
Arthur Eubanks	f50b3ff02e	[Hexagon] Use InstSimplify instead of ConstantProp This is the last remaining use of ConstantProp, migrate it to InstSimplify in the goal of removing ConstantProp. Add -hexagon-instsimplify option to enable skipping of instsimplify in tests that can't handle the extra optimization. Differential Revision: https://reviews.llvm.org/D85047	2020-08-04 15:42:39 -07:00
Ilya Leoshkevich	153df1373e	[SanitizerCoverage] Fix types of __stop* and __start* symbols If a section is supposed to hold elements of type T, then the corresponding CreateSecStartEnd()'s Ty parameter represents T. Forwarding it to GlobalVariable constructor causes the resulting GlobalVariable's type to be T, and its SSA value type to be T**, which is one indirection too many. This issue is mostly masked by pointer casts, however, the global variable still gets an incorrect alignment, which causes SystemZ to choose wrong instructions to access the section.	2020-08-04 21:53:27 +02:00
Bardia Mahjour	3c0f347002	[NFC][LV] Vectorized Loop Skeleton Refactoring This patch tries to improve readability and maintenance of createVectorizedLoopSkeleton by reorganizing some lines, updating some of the comments and breaking it up into smaller logical units. Reviewed By: pjeeva01 Differential Revision: https://reviews.llvm.org/D83824	2020-08-04 14:50:57 -04:00
Nikita Popov	4564974504	[SCCP] Propagate inequalities Teach SCCP to create notconstant lattice values from inequality assumes and nonnull metadata, and update getConstant() to make use of them. Additionally isOverdefined() needs to be changed to consider notconstant an overdefined value. Handling inequality branches is delayed until our branch on undef story in other passes has been improved. Differential Revision: https://reviews.llvm.org/D83643	2020-08-04 20:20:52 +02:00
Juneyoung Lee	e734e8286b	[JumpThreading] Remove cast's constraint As discussed in D84949, this removes the constraint to cast since it does not cause compile time degradation. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D85188	2020-08-04 19:09:25 +09:00
Juneyoung Lee	6f97103b56	[JumpThreading] Don't limit the type of an operand Compared to the optimized code with branch conditions never frozen, limiting the type of freeze's operand causes generation of suboptimal code in some cases. I would like to suggest removing the constraint, as this patch does. If the number of freeze instructions becomes significant, this can be revisited. Differential Revision: https://reviews.llvm.org/D84949	2020-08-04 16:21:58 +09:00
Fangrui Song	b959906cb9	[PGO] Use multiple comdat groups for COFF D84723 caused multiple definition issues (related to comdat) on Windows: http://lab.llvm.org:8011/builders/sanitizer-windows/builds/67465	2020-08-03 21:33:16 -07:00
Fangrui Song	e56626e438	[PGO] Move __profc_ and __profvp_ from their own comdat groups to __profd_'s comdat group D68041 placed `__profc_`, `__profd_` and (if exists) `__profvp_` in different comdat groups. There are some issues: * Cost: one or two additional section headers (`.group` section(s)): 64 or 128 bytes on ELF64. * `__profc_`, `__profd_` and (if exists) `__profvp_` should be retained or discarded. Placing them into separate comdat groups is conceptually inferior. * If the prevailing group does not include `__profvp_` (value profiling not used) but a non-prevailing group from another translation unit has `__profvp_` (the function is inlined into another and triggers value profiling), there will be a stray `__profvp_` if --gc-sections is not enabled. This has been fixed by `3d6f53018f`. Actually, we can reuse an existing symbol (we choose `__profd_`) as the group signature to avoid a string in the string table (the sole reason that D68041 could improve code size is that `__profv_` was an otherwise unused symbol which wasted string table space). This saves one or two section headers. For a -DCMAKE_BUILD_TYPE=Release -DLLVM_BUILD_INSTRUMENTED=IR build, `ninja clang lld`, the patch has saved 10.5MiB (2.2%) for the total .o size. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D84723	2020-08-03 20:35:50 -07:00
Max Kazantsev	7647c2716e	[SimpleLoopUnswitch][NFC] Add option to always drop make.implicit metadata in non-trivial unswitching and save compile time We might want this if we find out that using of MustExecute analysis is too expensive. By default we do the analysis because its complexity does not exceed the complexity of whole loop copying in unswitching. Follow-up for D84925. Differential Revision: https://reviews.llvm.org/D85001 Reviewed By: asbirlea	2020-08-04 10:16:40 +07:00
Shinji Okumura	ffe0066b62	[Attributor][NFC] Clang format	2020-08-04 09:04:12 +09:00
Hiroshi Yamauchi	3e89cbf38e	[PGO] Enable the extended value profile buckets for mem op sizes. Following up D81682 and enable the new, extended value profile buckets for mem op sizes. Differential Revision: https://reviews.llvm.org/D83903	2020-08-03 12:25:11 -07:00
Arthur Eubanks	456f38a971	Fix layering violation Transforms/Utils -> Scalar Introduced in D85063.	2020-08-03 11:53:23 -07:00
Florian Hahn	1e392fc445	[ArgPromotion] Replace all md uses of promoted values with undef. Currently, ArgPromotion may leave metadata uses of promoted values, which will end up in the wrong function, creating invalid IR. PR33641 fixed this for dead arguments, but it can be also be triggered arguments with users that are promoted (see the updated test case). We also have to drop uses to them after promoting them. We need to do this after dealing with the non-metadata uses, so I also moved the empty use case to the loop that deals with updating the arguments of the new function. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D85127	2020-08-03 19:31:53 +01:00
Hiroshi Yamauchi	f78f509c75	[PGO] Extend the value profile buckets for mem op sizes. Extend the memop value profile buckets to be more flexible (could accommodate a mix of individual values and ranges) and to cover more value ranges (from 11 to 22 buckets). Disabled behind a flag (to be enabled separately) and the existing code to be removed later. Differential Revision: https://reviews.llvm.org/D81682	2020-08-03 11:04:32 -07:00
Arthur Eubanks	7c19c89dd5	[NewPM][LoopVersioning] Port LoopVersioning to NPM Reviewed By: ychen, fhahn Differential Revision: https://reviews.llvm.org/D85063	2020-08-03 10:32:09 -07:00
Gui Andrade	3ebd1ba64f	[MSAN] Instrument freeze instruction by clearing shadow Freeze always returns a defined value. This also prevents msan from checking the input shadow, which happened because freeze wasn't explicitly visited. Differential Revision: https://reviews.llvm.org/D85040	2020-08-03 16:42:17 +00:00
Sanjay Patel	23693ffc3b	[InstCombine] reduce xor-of-or's bitwise logic (PR46955); 2nd try The 1st try at this (rG2265d01f2a5b) exposed what looks like unspecified behavior in C/C++ resulting in test variations. The arguments to BinaryOperator::CreateAnd() were both IRBuilder function calls, and the order in which they execute determines the order of the new instructions in the IR. But the order of function arg evaluation is not fixed by the rules of C/C++, so depending on compiler config, the test would fail because the test expected a single fixed ordering of instructions. Original commit message: I tried to use m_Deferred() on this, but didn't find a clean way to do that. http://bugs.llvm.org/PR46955 https://alive2.llvm.org/ce/z/2h6QTq	2020-08-03 10:21:56 -04:00
Sanjay Patel	f19a9be385	Revert "[InstCombine] reduce xor-of-or's bitwise logic (PR46955)" This reverts commit `2265d01f2a`. Seeing bot failures after this change like: http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/42586	2020-08-03 08:58:41 -04:00
Sanjay Patel	2265d01f2a	[InstCombine] reduce xor-of-or's bitwise logic (PR46955) I tried to use m_Deferred() on this, but didn't find a clean way to do that. http://bugs.llvm.org/PR46955 https://alive2.llvm.org/ce/z/2h6QTq	2020-08-03 08:31:43 -04:00
Florian Hahn	98db27711d	[LV] Do not check widening decision for instrs outside of loop. No widening decisions will be computed for instructions outside the loop. Do not try to get a widening decision. The load/store will be just a scalar load, so treating at as normal should be fine I think. Fixes PR46950. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D85087	2020-08-03 10:09:24 +01:00
Shinji Okumura	434cf2ded3	[Attributor] Check nonnull attribute violation in AAUndefinedBehavior This patch makes it possible to handle nonnull attribute violation at callsites in AAUndefinedBehavior. If null pointer is passed to callee at a callsite and the corresponding argument of callee has nonnull attribute, the behavior of the callee is undefined. In this patch, violations of argument nonnull attributes is only handled. But violations of returned nonnull attributes can be handled and I will implement that in a follow-up patch. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D84733	2020-08-03 17:12:50 +09:00
Florian Hahn	599955eb56	Recommit "[IPConstProp] Remove and move tests to SCCP." This reverts commit `59d6e814ce`. The cause for the revert (3 clang tests running opt -ipconstprop) was fixed by removing those lines.	2020-08-02 22:23:54 +01:00
Shinji Okumura	376b64926b	Revert "[Attributor] AAPotentialValues Interface" The commit cause build failure.	2020-08-02 22:49:52 +09:00
Shinji Okumura	d3f01b6681	[Attributor] AAPotentialValues Interface This is a split patch of D80991. This patch introduces AAPotentialValues and its interface only. For more detail of AAPotentialValues abstract attribute, see the original patch. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D83283	2020-08-02 19:12:17 +09:00
AK	20797989ea	Outline non returning functions unless a longjmp __assert_fail, abort, exit etc. are cold. TODO: outline throw Authored by: rjf (Ruijie Fang) Reviewed by: hiraditya,tejohnson,fhahn Differential Revision: https://reviews.llvm.org/D69257	2020-08-01 22:16:14 -07:00
Kazu Hirata	60434989e5	Use llvm::is_contained where appropriate (NFC) Use llvm::is_contained where appropriate (NFC) Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D85083	2020-08-01 21:51:06 -07:00
Nikita Popov	25af353b0e	[NewPM][LVI] Abandon LVI after CVP As mentioned on D70376, LVI can currently cause performance issues when running under NewPM. The problem is that, unlike the legacy pass manager, NewPM will not immediately discard the LVI analysis if the following pass does not need it. This is a problem, because LVI has a high memory requirement, and mass invalidation of LVI values is very inefficient. LVI should only be alive during passes that actively interact with it. This patch addresses the issue by explicitly abandoning LVI after CVP, which gets us back to the LegacyPM behavior. Differential Revision: https://reviews.llvm.org/D84959	2020-08-01 23:47:46 +02:00
Craig Topper	4a19e6156e	[InstCombine] Fold abs(-x) -> abs(x) Negating the input doesn't matter. I left a FIXME to copy the nsw flag if its present on the neg but not on the abs. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D85055	2020-08-01 13:25:00 -07:00
Florian Hahn	05b44f7eae	[LCSSA] Provide option for caller to clean up unused PHIs. formLCSSAForInstructions is used by SCEVExpander, which tracks all inserted instructions including LCSSA phis using asserting value handles. This means cleanup needs to happen in the caller. Extend formLCSSAForInstructions to take an optional pointer to a vector. If this argument is non-nullptr, instead of directly deleting the phis, add them to the vector, so the caller can process them. This should address various PPC buildbot failures, including http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt/builds/40567	2020-08-01 20:43:19 +01:00
Florian Hahn	a9b06a2c14	[LCSSA] Use IRBuilder for PHI creation. Use IRBuilder instead PHINode::Create. This should not impact the generated code, but IRBuilder provides a way to register callbacks for inserted instructions, which is convenient for some users. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D85037	2020-08-01 18:44:15 +01:00
Chen Zheng	8c5edf5023	[SCEV] don't query getSCEV() for incomplete phis querying getSCEV() for incomplete phis leads to wrong cache value in `ExprToIVMap`, because incomplete phis may be simplified to same value before get SCEV expression. Reviewed By: lebedev.ri, mkazantsev Differential Revision: https://reviews.llvm.org/D77560	2020-08-01 02:38:54 -04:00
Sidharth Baveja	b7cfa6ca92	[Loop Peeling] Separate the Loop Peeling Utilities from the Loop Unrolling Utilities Summary: This patch separates the Loop Peeling Utilities from Loop Unrolling. The reason for this change is that Loop Peeling is no longer only being used by loop unrolling; Patch D82927 introduces loop peeling with fusion, such that loops can be modified to have to same trip count, making them legal to be peeled. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D83056	2020-07-31 18:31:58 +00:00
Teresa Johnson	1479cdfe4f	[ThinLTO] Compile time improvement to propagateAttributes I found that propagateAttributes was ~23% of a thin link's run time (almost 4x higher than the second hottest function). The main reason is that it re-examines a global var each time it is referenced. This becomes unnecessary once it is marked both non read only and non write only. I added a set to avoid doing redundant work, which dropped the runtime of that thin link by almost 15%. I made a smaller efficiency improvement (no measurable impact) to skip all summaries for a VI if the first copy is dead. I added an assert to ensure that all copies are dead if any is. The code in computeDeadSymbols marks all summaries for a VI as live. There is one corner case where it was skipping marking an alias as live, that I fixed. However, since the code earlier marked all copies of a preserved GUID's VI as live, and each 'visit' marks all copies live, the only case where this could make a difference is summaries that were marked live when they were built initially, and that is only a few special compiler generated symbols and inline assembly symbols, so it likely is never provoked in practice. Differential Revision: https://reviews.llvm.org/D84985	2020-07-31 10:54:02 -07:00
Florian Hahn	3b0d30ffd3	[SCEVExpander] Name temporary instructions for LCSSA insertion (NFC).	2020-07-31 18:16:46 +01:00
Hongtao Yu	d23c1d6a8d	[AutoFDO] Avoid merging inlinee samples multiple times A function call can be replicated by optimizations like loop unroll and jump threading and the replicates end up sharing the sample nested callee profile. Therefore when it comes to merging samples for uninlined callees in the sample profile inliner, a callee profile can be merged multiple times which will cause an assert to fire. This change avoids merging same callee profile for duplicate callsites by filtering out callee profiles with a non-zero head sample count. Reviewed By: wenlei, wmi Differential Revision: https://reviews.llvm.org/D84997	2020-07-31 09:30:05 -07:00
Benjamin Kramer	c6f08b14d4	Hide some internal symbols. NFC.	2020-07-31 17:28:02 +02:00
Vitaly Buka	b0eb40ca39	[NFC] Remove unused GetUnderlyingObject paramenter Depends on D84617. Differential Revision: https://reviews.llvm.org/D84621	2020-07-31 02:10:03 -07:00
Juneyoung Lee	ad48367722	[JumpThreading] Let SimplifyPartiallyRedundantLoad look into freeze This patch allows SimplifyPartiallyRedundantLoad work when the branch condition was frozen. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84944	2020-07-31 15:28:24 +09:00
Max Kazantsev	8aaeee5fb6	[SimpleLoopUnswitch] Preserve make.implicit in non-trivial unswitch if legal We can preserve make.implicit metadata in the split block if it is guaranteed that after following the branch we always reach the block where processing of null case happens, which is equivalent to "initial condition must execute if the loop is entered". Differential Revision: https://reviews.llvm.org/D84925 Reviewed By: asbirlea	2020-07-31 11:38:43 +07:00
Max Kazantsev	d889e17eca	[SimpleLoopUnswitch] Drop make.implicit metadata in case of non-trivial unswitching Non-trivial unswitching simply moves terminator being unswitch from the loop up to the switch block. It also preserves all metadata that was there. It might not be a correct thing to do for `make.implicit` metadata. Consider case: ``` for (...) { cond = // computed in loop if (cond) return X; if (p == null) throw_npe(); !make implicit } ``` Before the unswitching, if `p` is null and we reach this check, we are guaranteed to go to `throw_npe()` block. Now we unswitch on `p == null` condition: ``` if (p == null) !make implicit { for (...) { if (cond) return X; throw_npe() } } else { for (...) { if (cond) return X; } } ``` Now, following `true` branch of `p == null` does not always lead us to `throw_npe()` because the loop has side exit. Now, if we run ImplicitNullCheck pass on this code, it may end up making the unswitch condition implicit. This may lead us to turning normal path to `return X` into signal-throwing path, which is not efficient. Note that this does not happen during trivial unswitch: it guarantees that we do not have side exits before condition being unswitched. This patch fixes this situation by unconditional dropping of `make.implicit` metadata when we perform non-trivial unswitch. We could preserve it if we could prove that the condition always executes. This can be done as a follow-up. Differential Revision: https://reviews.llvm.org/D84916 Reviewed By: asbirlea	2020-07-31 11:33:02 +07:00
Wei Mi	836991d367	Fix a crash when the sample profile uses md5 and -sample-profile-merge-inlinee is enabled. When -sample-profile-merge-inlinee is enabled, new FunctionSamples may be created during profile merge without GUIDToFuncNameMap being initialized. That will occasionally cause compiler crash. The patch fixes it. Differential Revision: https://reviews.llvm.org/D84994	2020-07-30 21:21:06 -07:00
Vitaly Buka	89051ebace	[NFC] GetUnderlyingObject -> getUnderlyingObject I am going to touch them in the next patch anyway	2020-07-30 21:08:24 -07:00
Vitaly Buka	b256cb88a7	[ValueTracking] Remove AllocaForValue parameter findAllocaForValue uses AllocaForValue to cache resolved values. The function is used only to resolve arguments of lifetime intrinsic which usually are not fare for allocas. So result reuse is likely unnoticeable. In followup patches I'd like to replace the function with GetUnderlyingObjects. Depends on D84616. Differential Revision: https://reviews.llvm.org/D84617	2020-07-30 18:48:34 -07:00
Vitaly Buka	61cab352e3	[NFC] Move findAllocaForValue into ValueTracking.h Differential Revision: https://reviews.llvm.org/D84616	2020-07-30 18:22:59 -07:00
kuterd	49def10e02	[Attributor] Add time trace support. This patch addes time trace functionality to have a better understanding of the analysis times. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D84980	2020-07-31 03:08:50 +03:00
Nikita Popov	4c16eafe12	[SCCP] Remove dead switch cases based on range information Determine whether switch edges are feasible based on range information, and remove non-feasible edges lateron. This does not try to determine whether the default edge is dead, as we'd have to determine that the range is fully covered by the cases for that. Another limitation here is that we don't remove dead cases that have the same successor as a live case. I'm not handling this because I wanted to keep the edge removal based on feasible edges only, rather than inspecting ranges again there -- this does not seem like a particularly useful case to handle. Differential Revision: https://reviews.llvm.org/D84270	2020-07-30 21:21:08 +02:00
Simon Pilgrim	4a161bd8b3	LoopUnroll.cpp - pass std::vector by const reference to needToInsertPhisForLCSSA helper. NFCI. Avoid an unnecessary pass by value.	2020-07-30 18:17:04 +01:00
Yuanfang Chen	555cf42f38	[NewPM][PassInstrument] Add PrintPass callback to StandardInstrumentations Problem: Right now, our "Running pass" is not accurate when passes are wrapped in adaptor because adaptor is never skipped and a pass could be skipped. The other problem is that "Running pass" for a adaptor is before any "Running pass" of passes/analyses it depends on. (for example, FunctionToLoopPassAdaptor). So the order of printing is not the actual order. Solution: Doing things like PassManager::Debuglogging is very intrusive because we need to specify Debuglogging whenever adaptor is created. (Actually, right now we're not specifying Debuglogging for some sub-PassManagers. Check PassBuilder) This patch move debug logging for pass as a PassInstrument callback. We could be sure that all running passes are logged and in the correct order. This could also be used to implement hierarchy pass logging in legacy PM. We could also move logging of pass manager to this if we want. The test fixes looks messy. It includes changes: - Remove PassInstrumentationAnalysis - Remove PassAdaptor - If a PassAdaptor is for a real pass, the pass is added - Pass reorder (to the correct order), related to PassAdaptor - Add missing passes (due to Debuglogging not passed down) Reviewed By: asbirlea, aeubanks Differential Revision: https://reviews.llvm.org/D84774	2020-07-30 10:07:57 -07:00
Hiroshi Yamauchi	3d6f53018f	[PGO] Include the mem ops into the function hash. To avoid hash collisions when the only difference is in mem ops.	2020-07-30 09:26:20 -07:00
Simon Pilgrim	6316b0023e	Attributor.h - remove unnecessary includes. NFCI. Fix implicit cpp include dependencies.	2020-07-30 15:26:41 +01:00
David Green	1da0c47fa2	[LoopVectorizer] Don't create unused block masks for reductions. NFC This removes some unneeded block masks when we don't have any reductions. It should not have any effect on codegen as the values created are dead anyway. Differential Revision: https://reviews.llvm.org/D81415	2020-07-30 14:28:08 +01:00
Florian Hahn	59d6e814ce	Revert "[IPConstProp] Remove and move tests to SCCP." This reverts commit `e77624a3be`. Looks like some clang tests manually invoke -ipconstprop via opt.....	2020-07-30 13:06:54 +01:00
Florian Hahn	e77624a3be	[IPConstProp] Remove and move tests to SCCP. As far as I know, ipconstprop has not been used in years and ipsccp has been used instead. This has the potential for confusion and sometimes leads people to spend time finding & reporting bugs as well as updating it to work with the latest API changes. This patch moves the tests over to SCCP. There's one functional difference I am aware of: ipconstprop propagates for each call-site individually, so for functions that are called with different constant arguments it can sometimes produce better results than ipsccp (at much higher compile-time cost).But IPSCCP can be thought to do so as well for internal functions and as mentioned earlier, the pass seems unused in practice (and there are no plans on working towards enabling it anytime). Also discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2020-July/143773.html Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D84447	2020-07-30 12:36:27 +01:00
Simon Pilgrim	cc529285fd	VectorUtils.h - reduce unnecessary includes. NFC. Replace TargetLibraryInfo.h include with forward declaration and fix implicit dependencies. Reduce SmallSet.h include to SmallVector.h include.	2020-07-30 12:27:49 +01:00
Max Kazantsev	3678ad88a6	[NFC] Remove unused variable	2020-07-30 13:32:15 +07:00
Juneyoung Lee	111a02decd	[JumpThreading] Fold br(freeze(undef)) This patch makes JumpThreading fold br(freeze(undef)) if the freeze instruction is only used by the branch. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84818	2020-07-30 09:38:50 +09:00
Hiroshi Yamauchi	ae7589e1f1	Revert "[PGO] Include the mem ops into the function hash." This reverts commit `120e66b341`. Due to a buildbot failure.	2020-07-29 15:04:57 -07:00
Hiroshi Yamauchi	120e66b341	[PGO] Include the mem ops into the function hash. To avoid hash collisions when the only difference is in mem ops. Differential Revision: https://reviews.llvm.org/D84782	2020-07-29 13:59:40 -07:00
Florian Hahn	f75564ad4e	Reland "[SCEVExpander] Add option to preserve LCSSA directly." This reverts the revert commit `dc28675768`. It includes a fix for Polly, which uses SCEVExpander on IR that is not in LCSSA form. Set PreserveLCSSA = false in that case, to ensure we do not introduce LCSSA phis where there were none before.	2020-07-29 20:41:53 +01:00
Matt Morehouse	e2d0b44a7c	[DFSan] Add efficient fast16labels instrumentation mode. Adds the -fast-16-labels flag, which enables efficient instrumentation for DFSan when the user needs <=16 labels. The instrumentation eliminates most branches and most calls to __dfsan_union or __dfsan_union_load. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D84371	2020-07-29 18:58:47 +00:00
Florian Hahn	dc28675768	Revert "[SCEVExpander] Add option to preserve LCSSA directly." This reverts commit `99166fd4fb`, because it breaks the polly builders. polly/test/Isl/CodeGen/invariant_load_escaping_second_scop.ll fails because a apparently unnecessary LCSSA phi node is introduced. Make the bots green again, while I take a closer look.	2020-07-29 19:19:04 +01:00
Arthur Eubanks	71d0a2b8a3	[DFSan][NewPM] Port DataFlowSanitizer to NewPM Reviewed By: ychen, morehouse Differential Revision: https://reviews.llvm.org/D84707	2020-07-29 10:19:15 -07:00
Roman Lebedev	1d51dc38d8	[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline I've been looking at missed vectorizations in one codebase. One particular thing that stands out is that some of the loops reach vectorizer in a rather mangled form, with weird PHI's, and some of the loops aren't even in a rotated form. After taking a more detailed look, that happened because the loop's headers were too big by then. It is evident that SimplifyCFG's common code hoisting transform is at fault there, because the pattern it handles is precisely the unrotated loop basic block structure. Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled by default, and is always run, unlike it's friend, common code sinking transform, `SinkCommonCodeFromPredecessors()`, which is not enabled by default and is only run once very late in the pipeline. I'm proposing to harmonize this, and disable common code hoisting until //late// in pipeline. Definition of //late// may vary, here currently i've picked the same one as for code sinking, but i suppose we could enable it as soon as right after loop rotation happens. Experimentation shows that this does indeed unsurprizingly help, more loops got rotated, although other issues remain elsewhere. Now, this undoubtedly seriously shakes phase ordering. This will undoubtedly be a mixed bag in terms of both compile- and run- time performance, codesize. Since we no longer aggressively hoist+deduplicate common code, we don't pay the price of said hoisting (which wasn't big). That may allow more loops to be rotated, so we pay that price. That, in turn, that may enable all the transforms that require canonical (rotated) loop form, including but not limited to vectorization, so we pay that too. And in general, no deduplication means more [duplicate] instructions going through the optimizations. But there's still late hoisting, some of them will be caught late. As per benchmarks i've run {F12360204}, this is mostly within the noise, there are some small improvements, some small regressions. One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure this will expose many more pre-existing missed optimizations, as usual :S llvm-compile-time-tracker.com thoughts on this: http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions * this does regress compile-time by +0.5% geomean (unsurprizingly) * size impact varies; for ThinLTO it's actually an improvement The largest fallout appears to be in GVN's load partial redundancy elimination, it spends much more time in `MemoryDependenceResults::getNonLocalPointerDependency()`. Non-local `MemoryDependenceResults` is widely-known to be, uh, costly. There does not appear to be a proper solution to this issue, other than silencing the compile-time performance regression by tuning cut-off thresholds in `MemoryDependenceResults`, at the cost of potentially regressing run-time performance. D84609 attempts to move in that direction, but the path is unclear and is going to take some time. If we look at stats before/after diffs, some excerpts: * RawSpeed (the target) {F12360200} * -14 (-73.68%) loops not rotated due to the header size (yay) * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer * -3937 (-64.19%) common instructions hoisted * +561 (+0.06%) x86 asm instructions * -2 basic blocks * +2418 (+0.11%) IR instructions * vanilla test-suite + RawSpeed + darktable {F12360201} * -36396 (-65.29%) common instructions hoisted * +1676 (+0.02%) x86 asm instructions * +662 (+0.06%) basic blocks * +4395 (+0.04%) IR instructions It is likely to be sub-optimal for when optimizing for code size, so one might want to change tune pipeline by enabling sinking/hoisting when optimizing for size. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D84108	2020-07-29 20:05:30 +03:00
David Sherwood	9ad7c980bb	[SVE] Don't consider scalable vector types in SLPVectorizerPass::vectorizeChainsInBlock In vectorizeChainsInBlock we try to collect chains of PHI nodes that have the same element type, but the code is relying upon the implicit conversion from TypeSize -> uint64_t. For now, I have modified the code to ignore PHI nodes with scalable types. Differential Revision: https://reviews.llvm.org/D83542	2020-07-29 16:29:19 +01:00
Florian Hahn	99166fd4fb	[SCEVExpander] Add option to preserve LCSSA directly. This patch teaches SCEVExpander to directly preserve LCSSA. As it is currently, SCEV does not look through PHI nodes in loops, as it might break LCSSA form. Once SCEVExpander can preserve LCSSA form, it should be safe for SCEV to look through PHIs. To preserve LCSSA form, this patch uses formLCSSAForInstructions on operands of newly created instructions, if the definition is inside a different loop than the new instruction. The final value we return from expandCodeFor may also need LCSSA phis, depending on the insert point. As no user for it exists there yet, create a temporary instruction at the insert point, which can be passed to formLCSSAForInstructions. This temporary instruction is removed after LCSSA construction. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D71538	2020-07-29 15:07:37 +01:00
David Green	60280e9818	[Analysis] TTI: Add CastContextHint for getCastInstrCost Currently, getCastInstrCost has limited information about the cast it's rating, often just the opcode and types. Sometimes there is a context instruction as well, but it isn't trustworthy: for instance, when the vectorizer is rating a plan, it calls getCastInstrCost with the old instructions when, in fact, it's trying to evaluate the cost of the instruction post-vectorization. Thus, the current system can get the cost of certain casts incorrect as the correct cost can vary greatly based on the context in which it's used. For example, if the vectorizer queries getCastInstrCost to evaluate the cost of a sext(load) with tail predication enabled, getCastInstrCost will think it's free most of the time, but it's not always free. On ARM MVE, a VLD2 group cannot be extended like a normal VLDR can. Similar situations can come up with how masked loads can be extended when being split. To fix that, this path adds a new parameter to getCastInstrCost to give it a hint about the context of the cast. It adds a CastContextHint enum which contains the type of the load/store being created by the vectorizer - one for each of the types it can produce. Original patch by Pierre van Houtryve Differential Revision: https://reviews.llvm.org/D79162	2020-07-29 13:32:53 +01:00
Yevgeny Rouban	5d6cd61904	[LoopSimplifyCFG] Delete landing pads in dead exit blocks In addition to removing phi nodes this patch removes any landing pad that the dead exit block might have. Without this fix Verifier complains about a new switch instruction jumps to a block with a landing pad. Differential Revision: https://reviews.llvm.org/D84320	2020-07-29 18:36:51 +07:00
Johannes Doerfert	450dc09d69	[SROA][Mem2Reg] Use efficient droppable use API (after D83976) Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84804	2020-07-28 17:41:01 -05:00
Sanjay Patel	f75cf240d6	[InstCombine] avoid crashing on vector constant expression (PR46872)	2020-07-28 15:02:36 -04:00
Juneyoung Lee	4c9af6d0e0	[JumpThreading] Add a basic support for freeze instruction This patch adds a basic support for freeze instruction to JumpThreading by making ComputeValueKnownInPredecessorsImpl look into its operand. Reviewed By: efriedma, nikic Differential Revision: https://reviews.llvm.org/D84598	2020-07-29 03:12:14 +09:00
Arthur Eubanks	2ca6c422d2	[FunctionAttrs] Rename functionattrs -> function-attrs To match NewPM pass name, and also for readability. Also rename rpo-functionattrs -> rpo-function-attrs while we're here. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D84694	2020-07-28 09:09:13 -07:00
Jinsong Ji	d28f86723f	Re-land "[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support" This reverts commit `bf544fa1c3`. Fixed the typo in PPCInstrInfo.cpp.	2020-07-28 14:00:11 +00:00
Luofan Chen	5ee07dc53f	[Attributor] Track AA dependency using dependency graph This patch added dependency graph to the attributor so that we can dump the dependencies between AAs more easily. We can also apply general graph algorithms to the graph, making it easier for us to create deep wrappers. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D78861	2020-07-28 18:02:49 +08:00
Roman Lebedev	e40315d2b4	[GVN] Rewrite IsValueFullyAvailableInBlock(): no recursion, less false-negatives While this doesn't appear to help with the perf issue being exposed by D84108, the function as-is is very weird, convoluted, and what's worse, recursive. There was no need for `SpeculativelyAvaliableAndUsedForSpeculation`, tri-state choice is enough. We don't even ever check for that state. The basic idea here is that we need to perform a depth-first traversal of the predecessors of the basic block in question, either finding a preexisting state for the block in a map, or inserting a "placeholder" `SpeculativelyAvaliable`, If we encounter an `Unavaliable` block, then we need to give up search, and back-propagate the `Unavaliable` state to the each successor of said block, more specifically to the each `SpeculativelyAvaliable` we've just created. However, if we have traversed entirety of the predecessors and have not encountered an `Unavaliable` block, then it must mean the value is fully available. We could update each inserted `SpeculativelyAvaliable` into a `Avaliable`, but we don't need to, as assertion excersizes, because we can assume that if we see an `SpeculativelyAvaliable` entry, it is actually `Avaliable`, because during the time we've produced it, if we would have found that it has an `Unavaliable` predecessor, we would have updated it's successors, including this block, into `Unavaliable` Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D84181	2020-07-28 10:19:28 +03:00
Wei Mi	a23f62343c	Supplement instr profile with sample profile. PGO profile is usually more precise than sample profile. However, PGO profile needs to be collected from loadtest and loadtest may not be representative enough to the production workload. Sample profile collected from production can be used as a supplement -- for functions cold in loadtest but warm/hot in production, we can scale up the related function in PGO profile if the function is warm or hot in sample profile. The implementation contains changes in compiler side and llvm-profdata side. Given an instr profile and a sample profile, for a function cold in PGO profile but warm/hot in sample profile, llvm-profdata will either mark all the counters in the profile to be -1 or scale up the max count in the function to be above hot threshold, depending on the zero counter ratio in the profile. The assumption is if there are too many counters being zero in the function profile, the profile is more likely to cause harm than good, then llvm-profdata will mark all the counters to be -1 indicating the function is hot but the profile is unaccountable. In compiler side, if a function profile with all -1 counters is seen, the function entry count will be set to be above hot threshold but its internal profile will be dropped. In the long run, it may be useful to let compiler support using PGO profile and sample profile at the same time, but that requires more careful design and more substantial changes to make two profiles work seamlessly. The patch here serves as a simple intermediate solution. Differential Revision: https://reviews.llvm.org/D81981	2020-07-27 20:17:40 -07:00
Arthur Eubanks	c37bb5e2a5	[DFSan] Remove unused DataFlowSanitizer vars Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D84704	2020-07-27 14:59:07 -07:00
Jinsong Ji	bf544fa1c3	Revert "[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support" This reverts commit `adffce7153`. This is breaking test-suite, revert while investigation.	2020-07-27 21:07:00 +00:00
Roman Lebedev	351d234d86	[OpenMPOpt] Most SCC's are uninteresting, don't waste time on them (up to 16x faster) Summary: This seems obvious in hindsight, but the result is surprising. I've measured compile-time of `-openmpopt` pass standalone on RawSpeed unity build, and while there is some OpenMP stuff, most is not OpenMP. But nonetheless the pass does a lot of costly preparations before ever trying to look for OpenMP stuff in SCC. Numbers (n=25): 0.094624s -> 0.005976s, an -93.68% improvement, or ~16x Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: yaxunl, hiraditya, guansong, llvm-commits, sstefan1 Tags: #llvm Differential Revision: https://reviews.llvm.org/D84689	2020-07-27 23:36:34 +03:00
Jinsong Ji	adffce7153	[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support Per RFC http://lists.llvm.org/pipermail/llvm-dev/2020-April/141295.html no one is making use of QPX/A2Q/BGQ/BGP CNK anymore. This patch remove the support of QPX/A2Q in llvm, BGQ/BGP in clang, CNK support in openmp/polly. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D83915	2020-07-27 19:24:39 +00:00
Kazu Hirata	902cbcd59e	Use llvm::is_contained where appropriate (NFC) Summary: This patch replaces std::find with llvm::is_contained where appropriate. Reviewers: efriedma, nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, jvesely, nhaehnle, hiraditya, rogfer01, kerbowa, llvm-commits, vkmr Tags: #llvm Differential Revision: https://reviews.llvm.org/D84489	2020-07-27 10:20:44 -07:00
Roman Lebedev	1da9834557	[JumpThreading] ProcessBranchOnXOR(): bailout if any pred ends in indirect branch (PR46857) SplitBlockPredecessors() can not split blocks that have such terminators, and in two other places we already ensure that we don't end up calling SplitBlockPredecessors() on such blocks. Do so in one more place. Fixes https://bugs.llvm.org/show_bug.cgi?id=46857	2020-07-27 15:39:03 +03:00
Nathan James	d127112724	[llvm][NFC] Silence unused variable warning by using isa over dyn_cast	2020-07-27 13:37:21 +01:00
Juneyoung Lee	e1eacf27c6	[InstCombine] Fold freeze into phi if one operand is not undef This patch adds folding freeze into phi if it has only one operand to target. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D84601	2020-07-27 17:07:27 +09:00
Fangrui Song	fae221e7ad	[gcov] Simplify/speed up CFG hash calculation	2020-07-26 21:15:33 -07:00
Roman Lebedev	96d74530c0	[Reduce] Argument reduction: do deal with function declarations We can happily turn function definitions into declarations, thus obscuring their argument from being elided by this pass. I don't believe there is a good reason to just ignore declarations. likely even proper llvm intrinsics ones, at worst the input becomes uninteresting. The other question here is that all these transforms are all-or-nothing. In some cases, should we be treating each use separately? The main blocker here seemed to be that llvm::CloneFunctionInto() does `&OldFunc->front()`, which inserts a nullptr into a densemap, which is not happy about it and asserts.	2020-07-26 01:31:56 +03:00
Nikita Popov	632a89e866	[SCCP] Restore the change reporting as well Reapply `5db5b4bc43`.	2020-07-25 15:11:30 +02:00
Nikita Popov	ad16e71c95	Reapply [SCCP] Directly remove non-feasible edges Reapply with DTU update moved after CFG update, which is a requirement of the API. ----- Non-feasible control-flow edges are currently removed by replacing the branch condition with a constant and then calling ConstantFoldTerminator. This happens in a rather roundabout manner, by inspecting the users (effectively: predecessors) of unreachable blocks, and further complicated by the need to explicitly materialize the condition for "forced" edges. I would like to extend SCCP to discard switch conditions that are non-feasible based on range information, but this is incompatible with the current approach (as there is no single constant we could use.) Instead, this patch explicitly removes non-feasible edges. It currently only needs to handle the case where there is a single feasible edge. The llvm_unreachable() branch will need to be implemented for the aforementioned switch improvement. Differential Revision: https://reviews.llvm.org/D84264	2020-07-25 14:52:35 +02:00
Simon Pilgrim	b5e14d78f1	SimplifyLibCalls - remove unnecessary header and forward declaration. NFC. We include TargetLibraryInfo.h so don't need to forward declare it, and we don't need to include TargetLibraryInfo.h in SimplifyLibCalls.cpp as well.	2020-07-25 12:58:39 +01:00
Florian Hahn	3c1476d26c	[IPSCCP] Drop argmemonly after replacing pointer argument. This patch updates IPSCCP to drop argmemonly and inaccessiblemem_or_argmemonly if it replaces a pointer argument. Fixes PR46717. Reviewers: efriedma, davide, nikic, jdoerfert Reviewed By: efriedma, jdoerfert Differential Revision: https://reviews.llvm.org/D84432	2020-07-25 11:52:14 +01:00
Rong Xu	1dd39b1133	[PGO] Fix incorrect function entry count Function entry count might be zero after the profile counts reset and before reentry to the function. Zero profile entry count is very bad as the profile count from BFI will be wrong. A simple fix is to set the profile entry count to 1 if there are non-zero profile counts in this function. Differential Revision: https://reviews.llvm.org/D84378	2020-07-24 17:39:55 -07:00
Rong Xu	31bd15c562	[PGO][InstrProf] Do not promote count if the exit blocks contains ret instruction Skip profile count promotion if any of the ExitBlocks contains a ret instruction. This is to prevent dumping of incomplete profile -- if the the loop is a long running loop and dump is called in the middle of the loop, the result profile is incomplete. ExitBlocks containing a ret instruction is an indication of a long running loop -- early exit to error handling code. Differential Revision: https://reviews.llvm.org/D84379	2020-07-24 17:38:31 -07:00
Rong Xu	5546c2ab42	Revert "[PGO][InstrProf] Do not promote count if the exit blocks contains ret instruction" This reverts commit `6fdc6f6c7d`.	2020-07-24 17:35:44 -07:00
Rong Xu	6fdc6f6c7d	[PGO][InstrProf] Do not promote count if the exit blocks contains ret instruction Skip profile count promotion if any of the ExitBlocks contains a ret instruction. This is to prevent dumping of incomplete profile -- if the the loop is a long running loop and dump is called in the middle of the loop, the result profile is incomplete. ExitBlocks containing a ret instruction is an indication of a long running loop -- early exit to error handling code. Differential Revision: https://reviews.llvm.org/D84379	2020-07-24 17:13:58 -07:00
Johannes Doerfert	aa09db495a	[SROA] Teach promote to register about droppable instructions This is the second of two patches to address PR46753. We basically allow SROA to promote allocas that are used in doppable instructions, for now that means `llvm.assume`. The (transitive) uses are replaced by `undef` in the droppable instructions. See also D83976. Reviewed By: Tyker Differential Revision: https://reviews.llvm.org/D83978	2020-07-24 15:15:39 -05:00
Johannes Doerfert	ce8928f2e4	[Mem2Reg] Teach promote to register about droppable instructions This is the first of two patches to address PR46753. We basically allow mem2reg to promote allocas that are used in doppable instructions, for now that means `llvm.assume`. The uses of the alloca (or a bitcast or zero offset GEP from there) are replaced by `undef` in the droppable instructions. Reviewed By: Tyker Differential Revision: https://reviews.llvm.org/D83976	2020-07-24 15:15:38 -05:00
Johannes Doerfert	ce2d69b557	[SROA][Mem2Reg] Do not crash on alloca + addrspacecast SROA knows that it can look through addrspacecast but PromoteMemoryToRegister did not handle them. This caused an assertion error for the test case, exposed while running `Transforms/PhaseOrdering/inlining-alignment-assumptions.ll` with D83978 applied. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D84085	2020-07-24 15:15:38 -05:00
Gui Andrade	1e77b3af12	[MSAN] Allow inserting array checks Flattens arrays by ORing together all their elements. Differential Revision: https://reviews.llvm.org/D84446	2020-07-24 20:12:58 +00:00
Simon Pilgrim	0128b9505c	Revert rG5dd566b7c7b78bd- "PassManager.h - remove unnecessary Function.h/Module.h includes. NFCI." This reverts commit `5dd566b7c7`. Causing some buildbot failures that I'm not seeing on MSVC builds.	2020-07-24 13:02:33 +01:00
Simon Pilgrim	5dd566b7c7	PassManager.h - remove unnecessary Function.h/Module.h includes. NFCI. PassManager.h is one of the top headers in the ClangBuildAnalyzer frontend worst offenders list. This exposes a large number of implicit dependencies on various forward declarations/includes in other headers that need addressing.	2020-07-24 12:40:50 +01:00
Fangrui Song	4637daa990	Revert D84264 "[SCCP] Directly remove non-feasible edges" & `5db5b4bc43` It breaks stage-2 build. Clang crashed when compiling llvm/lib/Target/Hexagon/HexagonFrameLowering.cpp llvm/Support/GenericDomTree.h eraseNode: Node is not a leaf node	2020-07-23 17:51:48 -07:00
Sidharth Baveja	38a8217931	[Loop Fusion] Integrate Loop Peeling into Loop Fusion (re-land after fixing ASAN build failures) This patch adds the ability to peel off iterations of the first loop in loop fusion. This can allow for both loops to have the same trip count, making it legal for them to be fused together. Here is a simple scenario peeling can be used in loop fusion: for (i = 0; i < 10; ++i) a[i] = a[i] + 3; for (j = 1; j < 10; ++j) b[j] = b[j] + 5; Here is we can make use of peeling, and then fuse the two loops together. We can peel off the 0th iteration of the loop i, and then combine loop i and j for i = 1 to 10. a[0] = a[0] +3; for (i = 1; i < 10; ++i) { a[i] = a[i] + 3; b[i] = b[i] + 5; } Currently peeling with loop fusion is only supported for loops with constant trip counts and a single exit point. Both unguarded and guarded loops are supported. Reviewed By: bmahjour (Bardia Mahjour), MaskRay (Fangrui Song) Differential Revision: https://reviews.llvm.org/D82927	2020-07-23 21:02:04 +00:00
Nikita Popov	5db5b4bc43	[SCCP] Add missing change reporting Forgot to actually use the return value of the function.	2020-07-23 20:58:29 +02:00
Nikita Popov	9394c3ec88	[SCCP] Directly remove non-feasible edges Non-feasible control-flow edges are currently removed by replacing the branch condition with a constant and then calling ConstantFoldTerminator. This happens in a rather roundabout manner, by inspecting the users (effectively: predecessors) of unreachable blocks, and further complicated by the need to explicitly materialize the condition for "forced" edges. I would like to extend SCCP to discard switch conditions that are non-feasible based on range information, but this is incompatible with the current approach (as there is no single constant we could use.) Instead, this patch explicitly removes non-feasible edges. It currently only needs to handle the case where there is a single feasible edge. The llvm_unreachable() branch will need to be implemented for the aforementioned switch improvement. Differential Revision: https://reviews.llvm.org/D84264	2020-07-23 20:32:57 +02:00
Nikita Popov	def48b0e88	[PredicateInfo][SCCP] Remove assertion (PR46814) As long as RenamedOp is not guaranteed to be accurate, we cannot assert here and should just return false. This was already done for the other conditions in this function. Fixes https://bugs.llvm.org/show_bug.cgi?id=46814.	2020-07-23 19:36:51 +02:00
Gui Andrade	3285b24249	[MSAN] Allow emitting checks for struct types Differential Revision: https://reviews.llvm.org/D82680	2020-07-23 16:50:59 +00:00
Gui Andrade	0025d52c0f	[MSAN] Never allow checking calls to __sanitizer_unaligned_{load,store} These functions expect the caller to always pass shadows over TLS. Differential Revision: https://reviews.llvm.org/D84351	2020-07-23 16:42:59 +00:00
Simon Pilgrim	86fd5be6fd	AggressiveInstCombine.h - remove unused includes. NFC.	2020-07-23 16:20:13 +01:00
Braedy Kuzma	24e41a34fe	[Matrix] Add asserts for mismatched element types. This patch clarifies the failing point of having input or output vectors of differing types. Before, lowering would fail elsewhere (e.g. in `fmul` creation) which may have been not immediately clear. As a side effect, the `getElementType` and `getVectoryTy` functions required the `const` qualifier to be added. Reviewers: fhahn Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D84374	2020-07-23 16:02:48 +01:00
Florian Hahn	ecd3f853a8	[SCEVExpander] Use IRBuilderCallbackInserter to call rememberInstruction. Currently there are plenty of instructions that SCEVExpander creates but does not track as created. IRBuilder allows specifying a callback whenever an instruction is inserted. Use this to call rememberInstruction automatically for each created instruction. There are still a few rememberInstruction calls remaining, because in some cases Inst::Create functions are used to construct instructions. Suggested by @lebedev.ri in D75980. Reviewers: mkazantsev, reames, sanjoy.google, lebedev.ri Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D84326	2020-07-23 14:25:28 +01:00
Shinji Okumura	697c6d8907	[Attributor] Cache query results for isPotentiallyReachable in AAReachability Summary: This is the next patch of [[ https://reviews.llvm.org/D76210 \| D76210 ]]. This patch made a map in `InformationCache` for caching results. Reviewers: jdoerfert, sstefan1, uenoku, homerdin, baziotis Reviewed By: jdoerfert Subscribers: hiraditya, uenoku, kuter, bbn, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83246	2020-07-23 20:49:28 +09:00
Simon Pilgrim	5b20c14525	ValueProfileCollector.h - remove unnecessary includes. NFC.	2020-07-23 12:33:13 +01:00
Hiroshi Yamauchi	557db6f8aa	Reland D84057 [PGO][PGSO] Remove a temporary flag used for gradual rollout. The revert was a misfire. Remove the temporary flag PGSOIRPassOrTestOnly and the guard code which was used for the staged rollout. This is a cleanup (NFC) as it's now false by default. Differential Revision: https://reviews.llvm.org/D84057	2020-07-22 20:57:25 -07:00
Fangrui Song	27650ec554	Revert D81682 "[PGO] Extend the value profile buckets for mem op sizes." This reverts commit `4a539faf74`. There is a __llvm_profile_instrument_range related crash in PGO-instrumented clang: ``` (gdb) bt llvm::ConstantRange const&, llvm::APInt const&, unsigned int, bool) () llvm::ScalarEvolution::getRangeForAffineAR(llvm::SCEV const, llvm::SCEV const, llvm::SCEV const*, unsigned int) () ``` (The body of __llvm_profile_instrument_range is inlined, so we can only find__llvm_profile_instrument_target in the trace) ``` 23│ 0x000055555dba0961 <+65>: nopw %cs:0x0(%rax,%rax,1) 24│ 0x000055555dba096b <+75>: nopl 0x0(%rax,%rax,1) 25│ 0x000055555dba0970 <+80>: mov %rsi,%rbx 26│ 0x000055555dba0973 <+83>: mov 0x8(%rsi),%rsi # %rsi=-1 -> SIGSEGV 27│ 0x000055555dba0977 <+87>: cmp %r15,(%rbx) 28│ 0x000055555dba097a <+90>: je 0x55555dba0a76 <__llvm_profile_instrument_target+342> ```	2020-07-22 16:08:25 -07:00
Rong Xu	50da55a585	[PGO] Supporting code for always instrumenting entry block This patch includes the supporting code that enables always instrumenting the function entry block by default. This patch will NOT the default behavior. It adds a variant bit in the profile version, adds new directives in text profile format, and changes llvm-profdata tool accordingly. This patch is a split of D83024 (https://reviews.llvm.org/D83024) Many test changes from D83024 are also included. Differential Revision: https://reviews.llvm.org/D84261	2020-07-22 15:01:53 -07:00
Fangrui Song	dbdda8232a	Revert D84057 "[PGO][PGSO] Remove a temporary flag used for gradual rollout." This reverts commit `e64afefdf8`. It caused a PGO bootstrapped clang to crash on many source files. `__llvm_profile_instrument_range` seems to trigger a null pointer dereference. Call stack: __llvm_profile_instrument_range llvm::APInt::udiv(llvm::APInt const&) const getRangeForAffineARHelper	2020-07-22 14:28:28 -07:00
Fangrui Song	5724c8ba29	Temporarily revert D83903 "[PGO] Enable the extended value profile buckets for mem op sizes." `__llvm_profile_instrument_memop` transitively calls calloc, thus calloc should not be instrumented. I saw a `calloc -> __llvm_profile_instrument_memop -> calloc -> __llvm_profile_instrument_memop -> ...` infinite loop leading to stack overflow when the malloc implementation (e.g. tcmalloc) is built and instrumented along with the application. We should figure out the library calls which may be instrumented and disable their instrumentation before rolling out this change. Reviewed By: yamauchi Differential Revision: https://reviews.llvm.org/D84358	2020-07-22 13:12:19 -07:00
Gui Andrade	33d239513c	[MSAN] Instrument libatomic load/store calls These calls are neither intercepted by compiler-rt nor is libatomic.a naturally instrumented. This patch uses the existing libcall mechanism to detect a call to atomic_load or atomic_store, and instruments them much like the preexisting instrumentation for atomics. Calls to _load are modified to have at least Acquire ordering, and calls to _store at least Release ordering. Because this needs to be converted at runtime, msan injects a LUT (implemented as a vector with extractelement). Differential Revision: https://reviews.llvm.org/D83337	2020-07-22 16:45:06 +00:00
Sebastian Neubauer	2a6c871596	[InstCombine] Move target-specific inst combining For a long time, the InstCombine pass handled target specific intrinsics. Having target specific code in general passes was noted as an area for improvement for a long time. D81728 moves most target specific code out of the InstCombine pass. Applying the target specific combinations in an extra pass would probably result in inferior optimizations compared to the current fixed-point iteration, therefore the InstCombine pass resorts to newly introduced functions in the TargetTransformInfo when it encounters unknown intrinsics. The patch should not have any effect on generated code (under the assumption that code never uses intrinsics from a foreign target). This introduces three new functions: TargetTransformInfo::instCombineIntrinsic TargetTransformInfo::simplifyDemandedUseBitsIntrinsic TargetTransformInfo::simplifyDemandedVectorEltsIntrinsic A few target specific parts are left in the InstCombine folder, where it makes sense to share code. The largest left-over part in InstCombineCalls.cpp is the code shared between arm and aarch64. This allows to move about 3000 lines out from InstCombine to the targets. Differential Revision: https://reviews.llvm.org/D81728	2020-07-22 15:59:49 +02:00
Sebastian Neubauer	2c659082bd	[AMDGPU] Don't combine memory intrs to v3i16 v3i16 and v3f16 currently cannot be legalized and lowered so they should not be emitted by inst combining. Moved the check down to still allow extracting 1 or 2 elements via the dmask. Fixes image intrinsics being combined to return v3x16. Differential Revision: https://reviews.llvm.org/D84223	2020-07-22 12:44:01 +02:00
Sjoerd Meijer	5567c62afa	[Matrix] Add LowerMatrixIntrinsics to the NPM Pass LowerMatrixIntrinsics wasn't running yet running under the new pass manager, and this adds LowerMatrixIntrinsics to the pipeline (to the same place as where it is running in the old PM). Differential Revision: https://reviews.llvm.org/D84180	2020-07-22 09:47:53 +01:00
Max Kazantsev	360ab70712	[SimplifyCFG] Do not create unneeded PR Phi in block with convergent calls We do not thread blocks with convergent calls, but this check was missing when we decide to insert PR Phis into it (which we only do for threading). Differential Revision: https://reviews.llvm.org/D83936 Reviewed By: nikic	2020-07-22 13:53:50 +07:00
Fangrui Song	8a268bec1b	Revert D82927 "[Loop Fusion] Integrate Loop Peeling into Loop Fusion" This reverts commit `bb8850d34d`. It broke 3 check-llvm-transforms-loopfusion tests in an ASAN build. LoopFuse.cpp `for (BasicBlock *Pred : predecessors(BB)) {` may operate on a deleted BB.	2020-07-21 12:24:50 -07:00
Hiroshi Yamauchi	7bedae7dee	[PGO][PGSO] Add profile guided size optimization to loop vectorization legality.	2020-07-21 11:16:36 -07:00
Jordan Rupprecht	1ee1da1ea5	[NFC] Fix unused var warning	2020-07-21 09:26:01 -07:00
Sidharth Baveja	bb8850d34d	[Loop Fusion] Integrate Loop Peeling into Loop Fusion Summary: This patch adds the ability to peel off iterations of the first loop in loop fusion. This can allow for both loops to have the same trip count, making it legal for them to be fused together. Here is a simple scenario peeling can be used in loop fusion: for (i = 0; i < 10; ++i) a[i] = a[i] + 3; for (j = 1; j < 10; ++j) b[j] = b[j] + 5; Here is we can make use of peeling, and then fuse the two loops together. We can peel off the 0th iteration of the loop i, and then combine loop i and j for i = 1 to 10. a[0] = a[0] +3; for (i = 1; i < 10; ++i) { a[i] = a[i] + 3; b[i] = b[i] + 5; } Currently peeling with loop fusion is only supported for loops with constant trip counts and a single exit point. Both unguarded and guarded loops are supported. Author: sidbav (Sidharth Baveja) Reviewers: kbarton, Meinersbur, bkramer, Whitney, skatkov, ashlykov, fhahn, bmahjour Reviewed By: bmahjour Subscribers: bmahjour, mgorny, hiraditya, zzheng Tags: LLVM Differential Revision: https://reviews.llvm.org/D82927	2020-07-21 15:59:14 +00:00
Jon Roelofs	dc09c65f63	LoopIdiomRecognize: use ExpandedValuesCleaner in another place This is a necessary cleanup after having expanded a SCEV. See: https://reviews.llvm.org/D84071#inline-774728 Differential Revision: https://reviews.llvm.org/D84174	2020-07-21 09:32:23 -06:00
Jon Roelofs	4d75cc4b0a	More conservatively report status from LoopIdiomRecognize Being "precise" here is getting us into trouble with one of the EXPENSIVE_CHECKS buildbots, see [1]. Rather than reporting IR additions that later get rolled back as "no change", instead we now conservatively report that there was. 1: http://lists.llvm.org/pipermail/llvm-dev/2020-July/143509.html Differential Revision: https://reviews.llvm.org/D84071	2020-07-21 09:32:22 -06:00
Florian Hahn	752fea7c27	[SCCP] Add range metadata to call sites with known return ranges. If we inferred a range for the function return value, we can add !range at all call-sites of the function, if the range does not include undef. Reviewers: efriedma, davide, nikic Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D83952	2020-07-21 10:06:54 +01:00
Sanjay Patel	750f4c591d	[InstCombine] allow peeking through zext of shift amount to match rotate idioms (PR45701) We might want to also allow trunc of the shift amount, but that seems less likely? define i32 @src(i32 %x, i1 %y) { %0: %rem = and i1 %y, 1 %cmp = icmp eq i1 %rem, 0 %sh_prom = zext i1 %rem to i32 %sub = sub nsw nuw i1 0, %rem %sh_prom1 = zext i1 %sub to i32 %shr = lshr i32 %x, %sh_prom1 %shl = shl i32 %x, %sh_prom %or = or i32 %shl, %shr %r = select i1 %cmp, i32 %x, i32 %or ret i32 %r } => define i32 @tgt(i32 %x, i1 %y) { %0: %t = zext i1 %y to i32 %r = fshl i32 %x, i32 %x, i32 %t ret i32 %r } Transformation seems to be correct! https://alive2.llvm.org/ce/z/xgMvE3 http://bugs.llvm.org/PR45701	2020-07-20 16:18:11 -04:00
Florian Hahn	f13a59bcff	[Matrix] Use TileInfo to create tiled loop nest for matrix multiply. This patch uses the TileInfo introduced in D77550 to generate a loop nest for tiled matrix multiplication, instead of generating the unrolled code for the whole multiplication. This makes code-generation more scalable for larger matrixes. Initially loops are only used if both the number of rows and columns are divisible by the tile size. Other cases will be added as follow-up. Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, nicolasvasilache Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D81308	2020-07-20 21:11:53 +01:00
Hiroshi Yamauchi	9f5d8e8a72	[PGO] Enable the extended value profile buckets for mem op sizes. Following up D81682 and enable the new, extended value profile buckets for mem op sizes. Differential Revision: https://reviews.llvm.org/D83903	2020-07-20 12:05:09 -07:00
Hiroshi Yamauchi	e64afefdf8	[PGO][PGSO] Remove a temporary flag used for gradual rollout. Remove the temporary flag PGSOIRPassOrTestOnly and the guard code which was used for the staged rollout. This is a cleanup (NFC) as it's now false by default. Differential Revision: https://reviews.llvm.org/D84057	2020-07-20 11:12:11 -07:00
Florian Hahn	e1270b16c9	[Matrix] Add TileInfo abstraction for tiled matrix code-gen. This patch adds a TileInfo abstraction and utilities to create a 3-level loop nest for tiling. Reviewers: anemet Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D77550	2020-07-20 18:49:08 +01:00
Matt Arsenault	5e999cbe8d	IR: Define byref parameter attribute This allows tracking the in-memory type of a pointer argument to a function for ABI purposes. This is essentially a stripped down version of byval to remove some of the stack-copy implications in its definition. This includes the base IR changes, and some tests for places where it should be treated similarly to byval. Codegen support will be in a future patch. My original attempt at solving some of these problems was to repurpose byval with a different address space from the stack. However, it is technically permitted for the callee to introduce a write to the argument, although nothing does this in reality. There is also talk of removing and replacing the byval attribute, so a new attribute would need to take its place anyway. This is intended avoid some optimization issues with the current handling of aggregate arguments, as well as fixes inflexibilty in how frontends can specify the kernel ABI. The most honest representation of the amdgpu_kernel convention is to expose all kernel arguments as loads from constant memory. Today, these are raw, SSA Argument values and codegen is responsible for turning these into loads. Background: There currently isn't a satisfactory way to represent how arguments for the amdgpu_kernel calling convention are passed. In reality, arguments are passed in a single, flat, constant memory buffer implicitly passed to the function. It is also illegal to call this function in the IR, and this is only ever invoked by a driver of some kind. It does not make sense to have a stack passed parameter in this context as is implied by byval. It is never valid to write to the kernel arguments, as this would corrupt the inputs seen by other dispatches of the kernel. These argumets are also not in the same address space as the stack, so a copy is needed to an alloca. From a source C-like language, the kernel parameters are invisible. Semantically, a copy is always required from the constant argument memory to a mutable variable. The current clang calling convention lowering emits raw values, including aggregates into the function argument list, since using byval would not make sense. This has some unfortunate consequences for the optimizer. In the aggregate case, we end up with an aggregate store to alloca, which both SROA and instcombine turn into a store of each aggregate field. The optimizer never pieces this back together to see that this is really just a copy from constant memory, so we end up stuck with expensive stack usage. This also means the backend dictates the alignment of arguments, and arbitrarily picks the LLVM IR ABI type alignment. By allowing an explicit alignment, frontends can make better decisions. For example, there's real no advantage to an aligment higher than 4, so a frontend could choose to compact the argument layout. Similarly, there is a high penalty to using an alignment lower than 4, so a frontend could opt into more padding for small arguments. Another design consideration is when it is appropriate to expose the fact that these arguments are all really passed in adjacent memory. Currently we have a late IR optimization pass in codegen to rewrite the kernel argument values into explicit loads to enable vectorization. In most programs, unrelated argument loads can be merged together. However, exposing this property directly from the frontend has some disadvantages. We still need a way to track the original argument sizes and alignments to report to the driver. I find using some side-channel, metadata mechanism to track this unappealing. If the kernel arguments were exposed as a single buffer to begin with, alias analysis would be unaware that the padding bits betewen arguments are meaningless. Another family of problems is there are still some gaps in replacing all of the available parameter attributes with metadata equivalents once lowered to loads. The immediate plan is to start using this new attribute to handle all aggregate argumets for kernels. Long term, it makes sense to migrate all kernel arguments, including scalars, to be passed indirectly in the same manner. Additional context is in D79744.	2020-07-20 10:23:09 -04:00
Benjamin Kramer	e88b6ed748	[LLE] std::inserter doesn't work with SmallSet, so don't use it.	2020-07-20 15:47:42 +02:00
Benjamin Kramer	44ab60f74d	[LoopSimplify] Use SmallPtrSet and range for loops more. NFCI.	2020-07-20 15:00:59 +02:00
Florian Hahn	dc1087d408	[Matrix] Add minimal lowering pass that only requires TTI. This patch adds a new variant of the matrix lowering pass that only does a minimal lowering and only depends on TTI. The main purpose of this pass is to have a pass with minimal dependencies to run as part of the backend pipeline. At the moment, the only difference to the regular lowering pass is that it does not support remarks. But in subsequent patches add support for tiling to the lowering pass which will require more analysis, which we do not want to run in the backend, as the lowering should happen in the middle-end in practice and running it in the backend is mostly for convenience when running llc. Reviewers: anemet, Gerolf, efriedma, hfinkel Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D76867	2020-07-20 11:16:11 +01:00
Roman Lebedev	04b729d076	[NFCI][SimplifyCFG] Guard common code hoisting with a (default-on) flag Common code sinking is already guarded with a (with default-off!) flag, so add a flag for hoisting, too. D84108 will hopefully make hoisting off-by-default too.	2020-07-20 10:29:57 +03:00
Juneyoung Lee	0a6aee5160	[ValueTracking] Add canCreateUndefOrPoison & let canCreatePoison use Operator This patch - adds `canCreateUndefOrPoison` - refactors `canCreatePoison` so it can deal with constantexprs `canCreateUndefOrPoison` will be used at D83926. Reviewed By: nikic, jdoerfert Differential Revision: https://reviews.llvm.org/D84007	2020-07-20 01:24:30 +09:00
Wenlei He	d41d952be9	Revert "[InlineAdvisor] New inliner advisor to replay inlining from optimization remarks" This reverts commit `2d6ecfa168`.	2020-07-19 08:49:04 -07:00
Wenlei He	2d6ecfa168	[InlineAdvisor] New inliner advisor to replay inlining from optimization remarks Summary: This change added a new inline advisor that takes optimization remarks from previous inlining as input, and provides the decision as advice so current inlining can replay inline decisions of a different compilation. Dwarf inline stack with line and discriminator is used as anchor for call sites including call context. The change can be useful for Inliner tuning as it provides a channel to allow external input for tweaking inline decisions. Existing alternatives like alwaysinline attribute is per-function, not per-callsite. Per-callsite inline intrinsic can be another solution (not yet existing), but it's intrusive to implement and also does not differentiate call context. A switch -sample-profile-inline-replay=<inline_remarks_file> is added to hook up the new inline advisor with SampleProfileLoader's inline decision for replay. Since SampleProfileLoader does top-down inlining, inline decision can be specialized for each call context, hence we should be able to replay inlining accurately. However with a bottom-up inliner like CGSCC inlining, the replay can be limited due to lack of specialization for different call context. Apart from that limitation, the new inline advisor can still be used by regular CGSCC inliner later if needed for tuning purpose. Subscribers: mgorny, aprantl, hiraditya, llvm-commits Tags: #llvm Resubmit for https://reviews.llvm.org/D84086	2020-07-19 08:21:05 -07:00
Roman Lebedev	2f3862eb9f	Reland "[InstCombine] Lower infinite combine loop detection thresholds" This reverts commit `4500db8c59`, which was reverted because lower thresholds exposed a new issue (PR46680). Now that it was resolved by `d12ec0f752`, we can reinstate lower limits and wait for a new bugreport before reverting this again...	2020-07-19 16:37:03 +03:00
Nikita Popov	c6e13667e7	[PredicateInfo] Add a method to interpret predicate as cmp constraint Both users of predicteinfo (NewGVN and SCCP) are interested in getting a cmp constraint on the predicated value. They currently implement separate logic for this. This patch adds a common method for this in PredicateBase. This enables a missing bit of PredicateInfo handling in SCCP: Now the predicate on the condition itself is also used. For switches it means we know that the switched-on value is the same as the case value. For assumes/branches we know that the condition is true or false. Differential Revision: https://reviews.llvm.org/D83640	2020-07-19 15:34:32 +02:00
Roman Lebedev	fb5577d4f8	[NFCI][GVN] Make IsValueFullyAvailableInBlock() readable - use enum class instead of magic numbers This does not change any logic, it only wraps the magic 0/1/2/3 constants into an enum class.	2020-07-19 16:33:56 +03:00
Nikita Popov	d12ec0f752	[InstCombine] Fix store merge worklist management (PR46680) Fixes https://bugs.llvm.org/show_bug.cgi?id=46680. Just like insertions through IRBuilder, InsertNewInstBefore() should be using the deferred worklist mechanism, so that processing of newly added instructions is prioritized. There's one side-effect of the worklist order change which could be classified as a regression. An add op gets pushed through a select that at the time is not a umax. We could add a reverse transform that tries to push adds in the reverse direction to restore a min/max, but that seems like a sure way of getting infinite loops... Seems like something that should best wait on min/max intrinsics. Differential Revision: https://reviews.llvm.org/D84109	2020-07-19 15:05:45 +02:00
Fangrui Song	5809a32e7c	[gcov] Add __gcov_dump/__gcov_reset and delete __gcov_flush GCC r187297 (2012-05) introduced `__gcov_dump` and `__gcov_reset`. `__gcov_flush = __gcov_dump + __gcov_reset` The resolution to https://gcc.gnu.org/PR93623 ("No need to dump gcdas when forking" target GCC 11.0) removed the unuseful and undocumented __gcov_flush. Close PR38064. Reviewed By: calixte, serge-sans-paille Differential Revision: https://reviews.llvm.org/D83149	2020-07-18 15:07:46 -07:00
Roman Lebedev	9dceb32f30	[NFC][CVP] processSDiv(): pacify gcc compilers	2020-07-18 19:41:43 +03:00
Florian Hahn	4b19cccbb5	[PredicateInfo] Fold PredicateWithCondition into PredicateBase (NFC). Each concrete instance of a predicate has a condition (also noted in the original PredicateBase comment) and to me it seems like there is no clear benefit of having both PredicateBase and PredicateWithCondition and they can be folded together. Reviewers: nikic, efriedma Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D84089	2020-07-18 16:21:56 +01:00
Roman Lebedev	8d487668d0	[CVP] Soften SDiv into a UDiv as long as we know domains of both of the operands. Yes, if operands are non-positive this comes at the extra cost of two extra negations. But a. division is already just ridiculously costly, two more subtractions can't hurt much :) and b. we have better/more analyzes/folds for an unsigned division, we could end up narrowing it's bitwidth, converting it to lshr, etc. This is essentially a take two on `0fdcca07ad`, which didn't fix the potential regression i was seeing, because ValueTracking's computeKnownBits() doesn't make use of dominating conditions in it's analysis. While i could teach it that, this seems like the more general fix. This big hammer actually does catch said potential regression. Over vanilla test-suite + RawSpeed + darktable (10M IR instrs, 1M IR BB, 1M X86 ASM instrs), this fires/converts 5 more (+2%) SDiv's, the total instruction count at the end of middle-end pipeline is only +6, so out of +10 extra negations, ~half are folded away, and asm instr count is only +1, so practically speaking all extra negations are folded away and are therefore free. Sadly, all these new UDiv's remained, none folded away. But there are two less basic blocks. https://rise4fun.com/Alive/VS6 Name: v0 Pre: C0 >= 0 && C1 >= 0 %r = sdiv i8 C0, C1 => %r = udiv i8 C0, C1 Name: v1 Pre: C0 <= 0 && C1 >= 0 %r = sdiv i8 C0, C1 => %t0 = udiv i8 -C0, C1 %r = sub i8 0, %t0 Name: v2 Pre: C0 >= 0 && C1 <= 0 %r = sdiv i8 C0, C1 => %t0 = udiv i8 C0, -C1 %r = sub i8 0, %t0 Name: v3 Pre: C0 <= 0 && C1 <= 0 %r = sdiv i8 C0, C1 => %r = udiv i8 -C0, -C1	2020-07-18 17:59:56 +03:00
Roman Lebedev	45b7388824	[NFC][CVP] Rename predicates - s/positive/non negative/ to better note that zero is ok	2020-07-18 17:59:32 +03:00
Roman Lebedev	2cde6984d8	[NFC][CVP] Refactor isPositive() out of hasPositiveOperands()	2020-07-18 17:59:32 +03:00
Gui Andrade	951584db4f	Revert "update libatomic instrumentation" This was committed mistakenly. This reverts commit `1f29171ae7`.	2020-07-18 03:53:00 +00:00
Gui Andrade	1f29171ae7	update libatomic instrumentation	2020-07-18 03:39:21 +00:00
Chen Zheng	6d247f980d	[SCEV][IndVarSimplify] insert point should not be block front. Recommit after removing the unused cast instructions. Differential Revision: https://reviews.llvm.org/D80975	2020-07-17 22:25:10 -04:00
Kuba Mracek	176a6e7abe	[asan] Use dynamic shadow memory position on Apple Silicon macOS This is needed because macOS on Apple Silicon has some reserved pages inside the "regular" shadow memory location, and mapping over that location fails. Differential Revision: https://reviews.llvm.org/D82912	2020-07-17 17:40:21 -07:00
Arthur Eubanks	0dfa4a83fa	Revert "[PGO][PGSO] Add profile guided size optimization to loop vectorization legality." This reverts commit `30c382a7c6`. See https://crbug.com/1106813.	2020-07-17 16:47:41 -07:00
Leonard Chan	cf5df40c4c	Revert "[AddressSanitizer] Don't use weak linkage for __{start,stop}_asan_globals" This reverts commit `d76e62fdb7`. Reverting since this can lead to linker errors: ``` ld.lld: error: undefined hidden symbol: __start_asan_globals ``` when using --gc-sections. The linker can discard __start_asan_globals once there are no more `asan_globals` sections left, which can lead to this error if we have external linkages to them.	2020-07-17 15:29:50 -07:00
Eric Christopher	ae08dbc673	Temporarily Revert "[InlineAdvisor] New inliner advisor to replay inlining from optimization remarks" as it is failing the inline-replay.ll test as well as sanitizers/Werror from returning a stack local variable. This reverts commit `029946b112`.	2020-07-17 14:58:01 -07:00
Wenlei He	029946b112	[InlineAdvisor] New inliner advisor to replay inlining from optimization remarks Summary: This change added a new inline advisor that takes optimization remarks for previous inlining as input, and provide the decision as advice so current inlining can replay inline decision of a different compilation. Dwarf inline stack with line and discriminator is used as anchor for call sites. The change can be useful for Inliner tuning. A switch -sample-profile-inline-replay=<inline_remarks_file> is added to hook up the new inliner advisor with SampleProfileLoader's inline decision for replay. The new inline advisor can also be used by regular CGSCC inliner later if needed. Reviewers: davidxl, mtrofin, wmi, hoy Subscribers: aprantl, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83743	2020-07-17 13:30:47 -07:00
Roman Lebedev	0fdcca07ad	[InstCombine] Fold X sdiv (-1 << C) -> -(X u>> Y) iff X is non-negative This is the one i'm seeing as missed optimization, although there are likely other possibilities, as usual. There are 4 variants of a general sdiv->udiv fold: https://rise4fun.com/Alive/VS6 Name: v0 Pre: C0 >= 0 && C1 >= 0 %r = sdiv i8 C0, C1 => %r = udiv i8 C0, C1 Name: v1 Pre: C0 <= 0 && C1 >= 0 %r = sdiv i8 C0, C1 => %t0 = udiv i8 -C0, C1 %r = sub i8 0, %t0 Name: v2 Pre: C0 >= 0 && C1 <= 0 %r = sdiv i8 C0, C1 => %t0 = udiv i8 C0, -C1 %r = sub i8 0, %t0 Name: v3 Pre: C0 <= 0 && C1 <= 0 %r = sdiv i8 C0, C1 => %r = udiv i8 -C0, -C1 If we really don't like sdiv (more than udiv that is), and are okay with increasing instruction count (2 new negations), and we ensure that we don't undo the fold, then we could just implement these..	2020-07-17 22:50:09 +03:00
Stanislav Mekhanoshin	efb5040262	Fixed warning about signed/unsigned comparison I've got the report clang11 issues signed/unsigned mismatch warning here. For some reason only clang11 seems to issue this warning. Differential Revision: https://reviews.llvm.org/D83916	2020-07-17 11:03:42 -07:00
Florian Hahn	31d71c69f1	[Matrix] Only run matrix lowering early with -O0. Currently matrix lowering is run twice if OptLevel > 0. Fix that and also add a test for OptLevel > 0 with matrix lowering enabled.	2020-07-17 15:53:16 +01:00
Sidharth Baveja	11e879d4f1	[Loop Simplify] Resolve an issue where metadata is not applied to a loop latch. Summary: This patch resolves an issue where the metadata of a loop is not added to the new loop latch, and not removed from the old loop latch. This issue occurs in the SplitBlockPredecessors function, which adds a new block in a loop, and in the case that the block passed into this function is the header of the loop, the loop can be modified such that the latch of the loop is replaced. This patch applies to the Loop Simplify pass since it ensures that each loop has exit blocks which only have predecessors that are inside of the loop. In the case that this is not true, the pass will create a new exit block for the loop. This guarantees that the loop preheader/header will dominate the exit blocks. Author: sidbav (Sidharth Baveja) Reviewers: asbirlea (Alina Sbirlea), chandlerc (Chandler Carruth), Whitney (Whitney Tsang), bmahjour (Bardia Mahjour) Reviewed By: asbirlea (Alina Sbirlea) Subscribers: hiraditya (Aditya Kumar), llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D83869	2020-07-17 14:02:14 +00:00
Anna Welker	23c9534515	[LV] Enable the LoopVectorizer to create pointer inductions This patch enables the LoopVectorizer to build a phi of pointer type and provide the vector loads and stores with vector type getelementptrs built from the pointer induction variable, which produces much less instructions than the previous approach of creating scalar getelementpointers and glue them together to a vector. Differential Revision: https://reviews.llvm.org/D81267	2020-07-17 13:35:07 +01:00
Benjamin Kramer	9a0689e072	Make helpers static. NFC.	2020-07-17 13:49:11 +02:00
Marco Elver	785d41a261	[TSan] Add option for emitting compound read-write instrumentation This adds option -tsan-compound-read-before-write to emit different instrumentation for the write if the read before that write is omitted from instrumentation. The default TSan runtime currently does not support the different instrumentation, and the option is disabled by default. Alternative runtimes, such as the Kernel Concurrency Sanitizer (KCSAN) can make use of the feature. Indeed, the initial motivation is for use in KCSAN as it was determined that due to the Linux kernel having a large number of unaddressed data races, it makes sense to improve performance and reporting by distinguishing compounded operations. E.g. the compounded instrumentation is typically emitted for compound operations such as ++, +=, \|=, etc. By emitting different reports, such data races can easily be noticed, and also automatically bucketed differently by CI systems. Reviewed By: dvyukov, glider Tags: #llvm Differential Revision: https://reviews.llvm.org/D83867	2020-07-17 10:24:20 +02:00
Max Kazantsev	c989881078	[InstCombine] Fix replace select with Phis when branch has the same labels ``` define i32 @test(i1 %cond) { entry: br i1 %cond, label %exit, label %exit exit: %result = select i1 %cond, i32 123, i32 456 ret i32 %result } ``` In this test, after applying transformation of replacing select with Phis, the result will be: ``` define i32 @test(i1 %cond) { entry: br i1 %cond, label %exit, label %exit exit: %result = i32 phi [123, %exit], [123, %exit] ret i32 %result } ``` That is, select is transformed into an invalid Phi, which will then be reduced to 123 and the second value will be lost. But it is worth noting that this problem will arise only if select is in the InstCombine worklist will be before the branch. Otherwise, InstCombine will replace the branch condition with false and transformation will not be applied. The fix is to check the target labels in the branch condition for equality. Patch By: Kirill Polushin Differential Revision: https://reviews.llvm.org/D84003 Reviewed By: mkazantsev	2020-07-17 14:04:58 +07:00
Jon Roelofs	a0537fc35f	[SimplifyCFG] Fix crash in the EXPENSIVE_CHECKS build SimplifyCFG was incorrectly reporting to the pass manager that it had not made changes after folding away a PHI. This is detected in the EXPENSIVE_CHECKS build when the function's hash changes. Differential Revision: https://reviews.llvm.org/D83985	2020-07-16 15:34:41 -06:00
Eric Christopher	7bfaa40086	Temporarily Revert "[AssumeBundles] Use operand bundles to encode alignment assumptions" due to the performance bugs filed in https://bugs.llvm.org/show_bug.cgi?id=46753. An SROA change soon may obviate some of these problems. This reverts commit `8d09f20798`.	2020-07-16 11:54:04 -07:00
Nadav Rotem	8f0a8ed44e	[InjectTLIMappings] Use StringRef instead of std::string for FN name. https://reviews.llvm.org/D83797	2020-07-16 11:53:04 -07:00
Matt Arsenault	023883a834	IR: Rename Argument::hasPassPointeeByValueAttr to prepare for byref When the byref attribute is added, there will need to be two similar functions for the existing cases which have an associate value copy, and byref which does not. Most, but not all of the existing uses will use the existing version. The associated size function added by D82679 also needs to contextually differ, and will help eliminate a few places still relying on pointee element types.	2020-07-16 13:50:49 -04:00
Florian Hahn	569868f6b7	[SCCP] Only track returns of functions with non-void ret ty (NFC). There is no need to add functions with void return types to the set of tracked return values. This does not change functionality, because we such functions do not have return values and we never update or access them.	2020-07-16 15:15:19 +01:00
Roman Lebedev	30f6c08ba3	Reland "[NFC] SimplifyCFG: refactor/deduplicate command-line settings override handling" Initially i forgot to stage the SimplifyCFGPass::SimplifyCFGPass() change to actually take the passed params..	2020-07-16 15:25:11 +03:00
Roman Lebedev	ff2f5c3e58	Revert "[NFC] SimplifyCFG: refactor/deduplicate command-line settings override handling" Seems to be breaking the bots. This reverts commit `740a1da108`.	2020-07-16 14:27:07 +03:00
Roman Lebedev	740a1da108	[NFC] SimplifyCFG: refactor/deduplicate command-line settings override handling	2020-07-16 13:40:02 +03:00
Roman Lebedev	fb432a51f4	Reland "[NFCI] createCFGSimplificationPass(): migrate to also take SimplifyCFGOptions" This reverts commit `1067d3e176`, which reverted commit `b2018198c3`, because it introduced a Dependency Cycle between Transforms/Scalar and Transforms/Utils. So let's just move SimplifyCFGOptions.h into Utils/, thus avoiding the cycle.	2020-07-16 13:40:01 +03:00
Florian Hahn	cbe0e539e7	[Matrix] Also run lowering during -O0. Currently the backends cannot lower the matrix intrinsics directly and rely on the lowering to vector instructions happening in the middle-end. At the moment, this means the backend crashes when matrix types extension code is compiled with -O0, e.g. http://green.lab.llvm.org/green/job/test-suite-verify-machineinstrs-aarch64-O0-g/7902/ This patch enables also runs the lowering with -O0 in the middle-end as a temporary solution. Long term, a lightweight version of the lowering should run in the backend, on demand.	2020-07-16 10:51:31 +01:00
Max Kazantsev	90798e09e2	Re-enable "[InstCombine] Simplify boolean Phis with const inputs using CFG" This reverts commit `b893822e32`. + Clang test fixes + Insertion point fix for landing pads	2020-07-16 16:09:08 +07:00
Adrian Kuegel	1067d3e176	Revert "[NFCI] createCFGSimplificationPass(): migrate to also take SimplifyCFGOptions" This reverts commit `b2018198c3`. This commit introduced a Dependency Cycle between Transforms/Scalar and Transforms/Utils. Transforms/Scalar already depends on Transforms/Utils, so if SimplifyCFGOptions.h is moved to Scalar, and Utils/Local.h still depends on it, we have a cycle.	2020-07-16 10:54:10 +02:00
Max Kazantsev	b893822e32	Revert "[InstCombine] Simplify boolean Phis with const inputs using CFG" This reverts commit `00472067c3`. Need to fix failing clang tests.	2020-07-16 12:58:39 +07:00
Max Kazantsev	00472067c3	[InstCombine] Simplify boolean Phis with const inputs using CFG This patch adds simplification for pattern: ``` if (cond) / \ ... ... \ / p = phi [true] [false] ... br p, succ_1, succ_2 ``` If we can prove that top block's branches dominate respective inputs of a block that has a Phi with constant inputs, we can use the branch condition (maybe inverted) instead of Phi. This will make proofs of implication for further jump threading more transparent. Differential Revision: https://reviews.llvm.org/D81375 Reviewed By: xbolva00	2020-07-16 12:06:10 +07:00
Roman Lebedev	b2018198c3	[NFCI] createCFGSimplificationPass(): migrate to also take SimplifyCFGOptions Taking so many parameters is simply unmaintainable. We don't want to include the entire llvm/Transforms/Utils/Local.h into llvm/Transforms/Scalar.h so i've split SimplifyCFGOptions into it's own header.	2020-07-16 01:27:54 +03:00
Roman Lebedev	af19b1ceef	[NFCI] CFGSimplifyPass: change (the only) constructor to take SimplifyCFGOptions Taking that long list of parameters is already simply unmaintainable.	2020-07-16 01:27:53 +03:00

... 2 3 4 5 6 ...

24944 Commits