llvm-project

Commit Graph

Author	SHA1	Message	Date
Arthur Eubanks	7167e5203a	Port -print-memderefs to NPM There is lots of code duplication, but hopefully it won't matter soon. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D91683	2020-11-23 11:56:22 -08:00
Arthur Eubanks	14a68b4aa9	[CGSCC] Detect devirtualization in more cases The devirtualization wrapper misses cases where if it wraps a pass manager, an individual pass may devirtualize an indirect call created by a previous pass. For example, inlining may create a new indirect call which is devirtualized by instcombine. Currently the devirtualization wrapper will not see that because it only checks cgscc edges at the very beginning and end of the pass (manager) it wraps. This fixes some tests testing this exact behavior in the legacy PM. Instead of checking WeakTrackingVHs for CallBases at the very beginning and end of the pass it wraps, check every time updateCGAndAnalysisManagerForPass() is called. check-llvm and check-clang with -abort-on-max-devirt-iterations-reached on by default doesn't show any failures outside of tests specifically testing it so it doesn't needlessly rerun passes more than necessary. (The NPM -O2/3 pipeline run the inliner/function simplification pipeline under a devirtualization repeater pass up to 4 times by default). http://llvm-compile-time-tracker.com/?config=O3&stat=instructions&remote=aeubanks shows that 7zip has ~1% compile time regression. I looked at it and saw that there indeed was devirtualization happening that was not previously caught, so now it reruns the CGSCC pipeline on some SCCs, which is WAI. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D89587	2020-11-23 11:55:20 -08:00
Jay Foad	000400ca0a	Fix speling in comments. NFC.	2020-11-23 14:43:24 +00:00
Mikael Holmen	faf848ac32	[Inline] Fix in handling of ptrtoint in InlineCost ConstantOffsetPtrs contains mappings from a Value to a base pointer and an offset. The offset is typed and has a size, and at least when dealing with ptrtoint, it could happen that we had a mapping from a ptrtoint with type i32 to an offset with type i16. This could later cause problems, showing up in PR 47969 and PR 38500. In PR 47969 we ended up in an assert complaining that trunc i16 to i16 is invalid and in Pr 38500 that a cmp on an i32 and i16 value isn't valid. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D90610	2020-11-23 14:33:06 +01:00
Max Kazantsev	48d7cc6ae2	[SCEV] Fix incorrect treatment of max taken count. PR48225 SCEV makes a logical mistake when handling EitherMayExit in case when both conditions must be met to exit the loop. The mistake looks like follows: "if condition `A` fails within at most `X` first iterations, and `B` fails within at most `Y` first iterations, then `A & B` fails at most within `min (X, Y)` first iterations". This is wrong, because both of them must fail at the same time. Simple example illustrating this is following: we have an IV with step 1, condition `A` = "IV is even", condition `B` = "IV is odd". Both `A` and `B` will fail within first two iterations. But it doesn't mean that both of them will fail within first two first iterations at the same time, which would mean that IV is neither even nor odd at the same time within first 2 iterations. We can only do so for known exact BE counts, but not for max. Differential Revision: https://reviews.llvm.org/D91942 Reviewed By: nikic	2020-11-23 16:52:39 +07:00
Max Kazantsev	47e31d1b5e	[NFC] Reduce code duplication in binop processing in computeExitLimitFromCondCached Handling of `and` and `or` vastly uses copy-paste. Factored out into a helper function as preparation step for further fix (see PR48225). Differential Revision: https://reviews.llvm.org/D91864 Reviewed By: nikic	2020-11-23 13:18:12 +07:00
Nikita Popov	6f5ef648a5	[BasicAA] Avoid unnecessary cache update (NFC) If the final recursive query returns MayAlias as well, there is no need to update the cache (which already stores MayAlias).	2020-11-22 20:10:45 +01:00
Sanjay Patel	c5a4d80fd4	[ValueTracking][MemCpyOpt] avoid crash on inttoptr with vector pointer type (PR48075)	2020-11-22 12:54:18 -05:00
Simon Pilgrim	24d6e60488	[Analysis] Remove unused system header includes Cleanup unused system headers and fix an implicit dependency	2020-11-22 10:32:37 +00:00
Nikita Popov	ded5928866	[BasicAA] Remove unnecessary sextOrSelf (NFC) We are doing a sextOrTrunc directly afterwards, so this seems useless. There is a multiplication in between, but truncating before or after the multiplication should not make a difference.	2020-11-21 21:32:56 +01:00
Nikita Popov	0d114f56d7	[BasicAA] Return DecomposedGEP (NFC) Instead of requiring the caller to initialize the DecomposedGEP structure and then passing it in by reference, make DecomposeGEPExpression() responsible for initializing and returning the structure.	2020-11-21 21:05:26 +01:00
Nikita Popov	f4412c5ae4	[BasicAA] Remove some intermediate variables (NFC) Use DecompGEP1.Offset instead of GEP1BaseOffset, etc. I found the asymmetry of modifying DecompGEP1.VarIndices, but not modifying DecompGEP1.Offset odd here.	2020-11-21 20:36:25 +01:00
Nikita Popov	913a99c474	[BasicAA] Remove stale FIXME (NFC) If aliasGEP returns MayAlias, the code does fall through to aliasPHI etc, so this FIXME is no longer applicable.	2020-11-21 20:07:26 +01:00
Kazu Hirata	226beb494c	[Analysis] Use llvm::is_contained (NFC)	2020-11-20 18:08:05 -08:00
Hongtao Yu	f3c445697d	[CSSPGO] IR intrinsic for pseudo-probe block instrumentation This change introduces a new IR intrinsic named `llvm.pseudoprobe` for pseudo-probe block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story. A pseudo probe is used to collect the execution count of the block where the probe is instrumented. This requires a pseudo probe to be persisting. The LLVM PGO instrumentation also instruments in similar places by placing a counter in the form of atomic read/write operations or runtime helper calls. While these operations are very persisting or optimization-resilient, in theory we can borrow the atomic read/write implementation from PGO counters and cut it off at the end of compilation with all the atomics converted into binary data. This was our initial design and we’ve seen promising sample correlation quality with it. However, the atomics approach has a couple issues: 1. IR Optimizations are blocked unexpectedly. Those atomic instructions are not going to be physically present in the binary code, but since they are on the IR till very end of compilation, they can still prevent certain IR optimizations and result in lower code quality. 2. The counter atomics may not be fully cleaned up from the code stream eventually. 3. Extra work is needed for re-targeting. We choose to implement pseudo probes based on a special LLVM intrinsic, which is expected to have most of the semantics that comes with an atomic operation but does not block desired optimizations as much as possible. More specifically the semantics associated with the new intrinsic enforces a pseudo probe to be virtually executed exactly the same number of times before and after an IR optimization. The intrinsic also comes with certain flags that are carefully chosen so that the places they are probing are not going to be messed up by the optimizer while most of the IR optimizations still work. The core flags given to the special intrinsic is `IntrInaccessibleMemOnly`, which means the intrinsic accesses memory and does have a side effect so that it is not removable, but is does not access memory locations that are accessible by any original instructions. This way the intrinsic does not alias with any original instruction and thus it does not block optimizations as much as an atomic operation does. We also assign a function GUID and a block index to an intrinsic so that they are uniquely identified and not merged in order to achieve good correlation quality. Let's now look at an example. Given the following LLVM IR: ``` define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 { bb0: %cmp = icmp eq i32 %x, 0 br i1 %cmp, label %bb1, label %bb2 bb1: br label %bb3 bb2: br label %bb3 bb3: ret void } ``` The instrumented IR will look like below. Note that each `llvm.pseudoprobe` intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID. ``` define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 { bb0: %cmp = icmp eq i32 %x, 0 call void @llvm.pseudoprobe(i64 837061429793323041, i64 1) br i1 %cmp, label %bb1, label %bb2 bb1: call void @llvm.pseudoprobe(i64 837061429793323041, i64 2) br label %bb3 bb2: call void @llvm.pseudoprobe(i64 837061429793323041, i64 3) br label %bb3 bb3: call void @llvm.pseudoprobe(i64 837061429793323041, i64 4) ret void } ``` Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D86490	2020-11-20 10:39:24 -08:00
Nikita Popov	e8dc6e9a32	[MemLoc] Use hasValue() method more (NFC) Followup to `7de7c40898`. I previously removed a number of == comparisons to LocationSize::unknown(), but missed these != comparisons.	2020-11-19 22:29:44 +01:00
Nikita Popov	7de7c40898	[MemLoc] Use hasValue() method (NFC) Instead of comparing to LocationSize::unknown(), prefer calling the hasValue() method instead, which is less reliant on implementation details.	2020-11-19 21:53:50 +01:00
Nikita Popov	393b9e9db3	[MemLoc] Require LocationSize argument (NFC) When constructing a MemoryLocation by hand, require that a LocationSize is explicitly specified. D91649 will split up LocationSize::unknown() into two different states, and callers should make an explicit choice regarding the kind of MemoryLocation they want to have.	2020-11-19 21:45:52 +01:00
Artur Pilipenko	887c7660bd	[BasicAA] Deoptimize intrinsics don't modify memory Similarly to assumes and guards deoptimize intrinsics are marked as writing to ensure proper control dependencies but they never modify any particular memory location. Differential Revision: https://reviews.llvm.org/D91658	2020-11-19 12:08:33 -08:00
Nikita Popov	22ec72f803	[Lint] Use MemoryLocation Instead of separately passing pointer and size, make use of MemoryLocation. This allows us to also reuse all the existing logic for determining the MemoryLocation correponding to an instruction or call argument. Not quite NFC because used locations may be more precise in some cases.	2020-11-19 20:55:25 +01:00
Leonard Chan	a97f62837f	[llvm][IR] Add dso_local_equivalent Constant The `dso_local_equivalent` constant is a wrapper for functions that represents a value which is functionally equivalent to the global passed to this. That is, if this accepts a function, calling this constant should have the same effects as calling the function directly. This could be a direct reference to the function, the `@plt` modifier on X86/AArch64, a thunk, or anything that's equivalent to the resolved function as a call target. When lowered, the returned address must have a constant offset at link time from some other symbol defined within the same binary. The address of this value is also insignificant. The name is leveraged from `dso_local` where use of a function or variable is resolved to a symbol in the same linkage unit. In this patch: - Addition of `dso_local_equivalent` and handling it - Update Constant::needsRelocation() to strip constant inbound GEPs and take advantage of `dso_local_equivalent` for relative references This is useful for the [Relative VTables C++ ABI](https://reviews.llvm.org/D72959) which makes vtables readonly. This works by replacing the dynamic relocations for function pointers in them with static relocations that represent the offset between the vtable and virtual functions. If a function is externally defined, `dso_local_equivalent` can be used as a generic wrapper for the function to still allow for this static offset calculation to be done. See [RFC](http://lists.llvm.org/pipermail/llvm-dev/2020-August/144469.html) for more details. Differential Revision: https://reviews.llvm.org/D77248	2020-11-19 10:26:17 -08:00
Simon Pilgrim	fceaff41d6	[ValueTracking] computeKnownBitsFromShiftOperator - move shift amount analysis to top of the function. NFCI. These are all lightweight to compute and helps avoid issues with Known being used to hold both the shift amount and then the shifted result. Minor cleanup for D90479.	2020-11-19 13:50:49 +00:00
Mircea Trofin	8ab2353a4c	[NFC][TFUtils] also include output specs lookup logic in loadOutputSpecs The lookup logic is also reusable. Also refactored the API to return the loaded vector - this makes it more clear what state it is in in the case of error (as it won't be returned). Differential Revision: https://reviews.llvm.org/D91759	2020-11-18 21:20:21 -08:00
Mircea Trofin	b51e844f7a	[NFC][TFUtils] Extract out the output spec loader It's generic for the 'development mode', not specific to the inliner case. Differential Revision: https://reviews.llvm.org/D91751	2020-11-18 20:03:20 -08:00
Nikita Popov	cd3c22c47e	[BasicAA] Generalize base offset modulus handling The GEP aliasing implementation currently has two pieces of code that solve two different subsets of the same basic problem: If you have GEPs with offsets 4x + 0 and 4y + 1 (assuming access size 1), then they do not alias regardless of whether x and y are the same. One implementation is in aliasSameBasePointerGEPs(), which looks at this in a limited structural way. It requires both GEP base pointers to be exactly the same, then (optionally) a number of equal indexes, then an unknown index, then a non-equal index into a struct. This set of limitations works, but it's overly restrictive and hides the core property we're trying to exploit. The second implementation is part of aliasGEP() itself and tries to find a common modulus in the scales, so it can then check that the constant offset doesn't overlap under modular arithmetic. The second implementation has the right idea of what the general problem is, but effectively only considers power of two factors in the scales (while aliasSameBasePointerGEPs also works with non-pow2 struct sizes.) What this patch does is to adjust the aliasGEP() implementation to instead find the largest common factor in all the scales (i.e. the GCD) and use that as the modulus. Differential Revision: https://reviews.llvm.org/D91027	2020-11-18 21:48:49 +01:00
Nikita Popov	85ccdcaa50	[BasicAA] Remove assert in AA evaluator As reported in https://reviews.llvm.org/D91383#2401825, this assert breaks external -aa-eval tests. We'll have to fix this case before re-enabling it.	2020-11-18 20:04:38 +01:00
Simon Pilgrim	eef203dbdf	[Analysis] CGSCCPassManager.cpp - fix Wshadow warnings. NFCI.	2020-11-18 09:59:31 +00:00
Wei Wang	3279347da0	[BPI] Look through bitcasts in calcZeroHeuristic Constant hoisting may hide the constant value behind bitcast for And's operand. Track down the constant to make the BFI result consistent regardless of hoisting. Differential Revision: https://reviews.llvm.org/D91450	2020-11-17 09:33:05 -08:00
Nikita Popov	cb4fc25c91	[BasicAA] Make alias GEP positive offset handling symmetric aliasGEP() currently implements some special handling for the case where all variable offsets are positive, in which case the constant offset can be taken as the minimal offset. However, it does not perform the same handling for the all-negative case. This means that the alias-analysis result between two GEPs is asymmetric: If GEP1 - GEP2 is all-positive, then GEP2 - GEP1 is all-negative, and the first will result in NoAlias, while the second will result in MayAlias. Apart from producing sub-optimal results for one order, this also violates our caching assumption. In particular, if BatchAA is used, the cached result depends on the order of the GEPs in the first query. This results in an inconsistency in BatchAA and AA results, which is how I noticed this issue in the first place. Differential Revision: https://reviews.llvm.org/D91383	2020-11-17 18:05:34 +01:00
Sander de Smalen	f571fe6df5	Reland [LoopVectorizer] NFCI: Calculate register usage based on TLI.getTypeLegalizationCost. This relands https://reviews.llvm.org/D91059 and reverts commit `30fded75b4`. GetRegUsage now returns 0 when Ty is not a valid vector element type.	2020-11-17 13:45:10 +00:00
Philip Reames	0f41a2fe83	test commit for new client	2020-11-16 17:26:52 -08:00
Michael Liao	f375885ab8	[InferAddrSpace] Teach to handle assumed address space. - In certain cases, a generic pointer could be assumed as a pointer to the global memory space or other spaces. With a dedicated target hook to query that address space from a given value, infer-address-space pass could infer and propagate that to all its users. Differential Revision: https://reviews.llvm.org/D91121	2020-11-16 17:06:33 -05:00
Philip Reames	257d33c815	[SCEV] Factor out part of wrap flag detection logic [NFC](try 2) This is a cut down version of 1ec6e1 which was reverted due to a compile time issue. The key changes made from that patch: 1) only infer the flags needed along each path, 2) be careful to preserve order of checks, and 3) avoid computing NW flags at all since we need to prove the stronger property (does not cross 0) in the caller anyways. Assuming this doesn't trip regressions, I'm going to try weakening (1). My end objective is to move flag inference into addrec construction. If I can't weaken (1) without compile time impact, I'll have a problem.	2020-11-16 12:07:21 -08:00
Kazu Hirata	147ccc848a	[JumpThreading] Call eraseBlock when folding a conditional branch This patch teaches the jump threading pass to call BPI->eraseBlock when it folds a conditional branch. Without this patch, BranchProbabilityInfo could end up with stale edge probabilities for the basic block containing the conditional branch -- one edge probability with less than 1.0 and the other for a removed edge. This patch is one of the steps before we can safely re-apply D91017. Differential Revision: https://reviews.llvm.org/D91511	2020-11-15 22:29:30 -08:00
Kazu Hirata	c5cc2d8b94	[BranchProbabilityInfo] Use predecessors(BB) and successors(BB) (NFC)	2020-11-15 19:26:38 -08:00
Nikita Popov	3b7f84d97f	[AA] Add missing AAQI parameter This alias() call did not pass on the AAQueryInfo.	2020-11-15 20:29:53 +01:00
Nikita Popov	9ace4b337f	Revert "[SCEV] Factor out part of wrap flag detection logic [NFC-ish]" This reverts commit `1ec6e1eb8a`. This change causes a significant compile-time regression: https://llvm-compile-time-tracker.com/compare.php?from=dd0b8b94d0796bd895cc998dd163b4fbebceb0b8&to=1ec6e1eb8a084bffae8a40236eb9925d8026dd07&stat=instructions I assume that this is due to the non-NFC part of the change, which now performs expensive nowrap inference even for nowrap flags that are not used by the particular code.	2020-11-15 10:19:44 +01:00
Philip Reames	1ec6e1eb8a	[SCEV] Factor out part of wrap flag detection logic [NFC-ish] In an effort to make code around flag determination more readable, and (possibly) prepare for a follow up change, factor out some of the flag detection logic. In the process, reduce the number of locations we mutate wrap flags by a couple. Note that this isn't NFC. The old code tried for NSW xor (NUW \|\| NW). This is, two different paths computed different sets of wrap flags. The new code will try for all three. The result is that some expressions end up with a few extra flags set.	2020-11-14 19:21:05 -08:00
Nikita Popov	0b72444211	[BasicAA] Remove unnecessary size limitation We're dropping a common offset from both GEPs here. It's not necessary for the access sizes to be the same as well.	2020-11-14 16:51:31 +01:00
Nikita Popov	9a85643cd3	[KnownBits] Combine abs() implementations ValueTracking was using a more powerful abs() implementation. Roll it into KnownBits::abs(). Also add an exhaustive test for abs(), in both the poisoning and non-poisoning variants.	2020-11-13 22:23:50 +01:00
Nikita Popov	f3124a46c1	[SCEV] Fix nsw flags for GEP expressions The SCEV code for constructing GEP expressions currently assumes that the addition of the base and all the offsets is nsw if the GEP is inbounds. While the addition of the offsets is indeed nsw, the addition to the base address is not, as the base address is interpreted as an unsigned value. Fix the GEP expression code to not assume nsw for the base+offset calculation. However, do assume nuw if we know that the offset is non-negative. With this, we use the same behavior as the construction of GEP addrecs does. (Modulo the fact that we disregard SCEV unification, as the pre-existing FIXME points out). Differential Revision: https://reviews.llvm.org/D90648	2020-11-13 18:19:32 +01:00
Nikita Popov	92b708902e	[ValueTracking] Don't set nsw flag for inbounds addition When computing the known bits for a GEP, don't set the nsw flag when adding an offset to an address. The nsw flag only applies to pure offset additions (see also D90708). The nsw flag is only used in a very minor way by the code, to the point that I was not able to come up with a test case where it makes a difference. Differential Revision: https://reviews.llvm.org/D90637	2020-11-13 17:58:21 +01:00
Piotr Sobczak	47dec5aa60	[DivergenceAnalysis] Use addRequiredTransitive For querying divergence the chained analysis passes are required to be alive, for instance LoopInfoWrapperPass. Ensure that by using addRequiredTransitive. Differential Revision: https://reviews.llvm.org/D91335	2020-11-13 14:40:00 +01:00
Simon Pilgrim	49623fa77a	[ValueTracking] computeKnownBitsFromShiftOperator use KnownBits direct for constant shift amounts. Let KnownBits shift handlers deal with out-of-range shift amounts.	2020-11-13 10:54:35 +00:00
serge-sans-paille	9218ff50f9	llvmbuildectomy - replace llvm-build by plain cmake No longer rely on an external tool to build the llvm component layout. Instead, leverage the existing `add_llvm_componentlibrary` cmake function and introduce `add_llvm_component_group` to accurately describe component behavior. These function store extra properties in the created targets. These properties are processed once all components are defined to resolve library dependencies and produce the header expected by llvm-config. Differential Revision: https://reviews.llvm.org/D90848	2020-11-13 10:35:24 +01:00
Max Kazantsev	0a1d394bf3	[NFC] Refactor loop-invariant getters to return Optional	2020-11-13 15:03:10 +07:00
Nikita Popov	c00545dc32	[BasicAA] Remove checks for GEP decomposition limit reached The GEP aliasing code currently checks for the GEP decomposition limit being reached (i.e., we did not reach the "final" underlying object). As far as I can see, these checks are not necessary. It is perfectly fine to work with a GEP whose base can still be further decomposed. Looking back through the commit history, these checks were originally introduced in `1a444489e9`. However, I believe that the problem this was intended to address was later properly fixed with `1726fc698c`, and the checks are no longer necessary since then (and were not the right fix in the first place). Differential Revision: https://reviews.llvm.org/D91010	2020-11-12 20:43:38 +01:00
Jamie Schmeiser	5f672fefeb	Reland: Introduce -dot-cfg-mssa option which creates dot-cfg style file with mssa comments included in source Summary: Expand the print-memoryssa and print<memoryssa> passes with a new hidden option -cfg-dot-mssa that names a file. When set, a dot-cfg style file will be generated into the named file with the memoryssa comments retained and those blocks containing them shown in light pink. The option does nothing in isolation. Author: Jamie Schmeiser <schmeise@ca.ibm.com> Reviewed By: asbirlea (Alina Sbirlea), dblaikie (David Blaikie) Differential Revision: https://reviews.llvm.org/D90638	2020-11-12 17:39:14 +00:00
Simon Pilgrim	f72d350bfb	[ValueTracking] Update computeKnownBitsFromShiftOperator callbacks to take KnownBits shift amount. NFCI. We were creating this internally, but will need to support general KnownBits amounts as part of D90479.	2020-11-12 16:56:55 +00:00
Simon Pilgrim	8996742741	[KnownBits] Add KnownBits::makeConstant helper. NFCI. Helper for cases where we need to create a KnownBits from a (fully known) constant value.	2020-11-12 16:16:04 +00:00

1 2 3 4 5 ...

9960 Commits