Commit Graph

9991 Commits

Author SHA1 Message Date
Fangrui Song a5309438fe static const char *const foo => const char foo[]
By default, a non-template variable of non-volatile const-qualified type
having namespace-scope has internal linkage, so no need for `static`.
2020-12-01 10:33:18 -08:00
Caroline Concatto 4b0ef2b075 [NFC][CostModel]Extend class IntrinsicCostAttributes to use ElementCount Type
This patch replaces the attribute  `unsigned VF`  in the class
IntrinsicCostAttributes by `ElementCount VF`.
This is a non-functional change to help upcoming patches to compute the cost
model for scalable vector inside this class.

Differential Revision: https://reviews.llvm.org/D91532
2020-12-01 11:12:51 +00:00
Wei Wang 93dc1b5b8c [Remarks][2/2] Expand remarks hotness threshold option support in more tools
This is the #2 of 2 changes that make remarks hotness threshold option
available in more tools. The changes also allow the threshold to sync with
hotness threshold from profile summary with special value 'auto'.

This change expands remarks hotness threshold option
-fdiagnostics-hotness-threshold in clang and *-remarks-hotness-threshold in
other tools to utilize hotness threshold from profile summary.

Remarks hotness filtering relies on several driver options. Table below lists
how different options are correlated and affect final remarks outputs:

| profile | hotness | threshold | remarks printed |
|---------|---------|-----------|-----------------|
| No      | No      | No        | All             |
| No      | No      | Yes       | None            |
| No      | Yes     | No        | All             |
| No      | Yes     | Yes       | None            |
| Yes     | No      | No        | All             |
| Yes     | No      | Yes       | None            |
| Yes     | Yes     | No        | All             |
| Yes     | Yes     | Yes       | >=threshold     |

In the presence of profile summary, it is often more desirable to directly use
the hotness threshold from profile summary. The new argument value 'auto'
indicates threshold will be synced with hotness threshold from profile summary
during compilation. The "auto" threshold relies on the availability of profile
summary. In case of missing such information, no remarks will be generated.

Differential Revision: https://reviews.llvm.org/D85808
2020-11-30 21:55:50 -08:00
Nick Desaulniers 91aff1d8ba [InlineCost] prefer range-for. NFC
Prefer range-for over iterators when such methods exist. Precommitted
from https://reviews.llvm.org/D91816.

Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D92350
2020-11-30 16:07:40 -08:00
Mircea Trofin 5fe10263ab [llvm][inliner] Reuse the inliner pass to implement 'always inliner'
Enable performing mandatory inlinings upfront, by reusing the same logic
as the full inliner, instead of the AlwaysInliner. This has the
following benefits:
- reduce code duplication - one inliner codebase
- open the opportunity to help the full inliner by performing additional
function passes after the mandatory inlinings, but before th full
inliner. Performing the mandatory inlinings first simplifies the problem
the full inliner needs to solve: less call sites, more contextualization, and,
depending on the additional function optimization passes run between the
2 inliners, higher accuracy of cost models / decision policies.

Note that this patch does not yet enable much in terms of post-always
inline function optimization.

Differential Revision: https://reviews.llvm.org/D91567
2020-11-30 12:03:39 -08:00
Nikita Popov e987fbdd85 [BasicAA] Generalize recursive phi alias analysis
For recursive phis, we skip the recursive operands and check that
the remaining operands are NoAlias with an unknown size. Currently,
this is limited to inbounds GEPs with positive offsets, to
guarantee that the recursion only ever increases the pointer.

Make this more general by only requiring that the underlying object
of the phi operand is the phi itself, i.e. it it based on itself in
some way. To compensate, we need to use a beforeOrAfterPointer()
location size, as we no longer have the guarantee that the pointer
is strictly increasing.

This allows us to handle some additional cases like negative geps,
geps with dynamic offsets or geps that aren't inbounds.

Differential Revision: https://reviews.llvm.org/D91914
2020-11-29 10:25:23 +01:00
Nikita Popov 1dea8ed8b7 [BasicAA] Remove unnecessary known size requirement
The size requirement on V2 was present because it was not clear
whether an unknown size would allow an access before the start of
V2, which could then overlap. This is clarified since D91649: In
this part of BasicAA, all accesses can occur only after the base
pointer, even if they have unknown size.

This makes the positive and negative offset cases symmetric.

Differential Revision: https://reviews.llvm.org/D91482
2020-11-28 10:17:12 +01:00
Nikita Popov 8351f9b5ce [ValueTracking] Fix assert on shufflevector of pointers
In this case getScalarSizeInBits() is not well-defined. Use the
existing TyBits variable that handles vectors of pointers correctly.
2020-11-27 21:19:31 +01:00
Martin Storsjö fa10383664 Revert "[BasicAA] Fix BatchAA results for phi-phi assumptions"
This reverts commit 8166ed1a7a,
as it caused some compilations to hang/loop indefinitely, see
https://reviews.llvm.org/D91936 for details.
2020-11-27 21:50:59 +02:00
Cullen Rhodes 7b8d50b141 [InstSimplify] Clarify use of FixedVectorType in SimplifySelectInst
Folding a select of vector constants that include undef elements only
applies to fixed vectors, but there's no earlier check the type is not
scalable so it crashes for scalable vectors. This adds a check so this
optimization is only attempted for fixed vectors.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D92046
2020-11-27 09:55:29 +00:00
Kazu Hirata 60e749aa23 [InlineCost] Fix indentation (NFC) 2020-11-26 18:00:55 -08:00
Nikita Popov 8166ed1a7a [BasicAA] Fix BatchAA results for phi-phi assumptions
Add a flag that disables caching when computing aliasing results
potentially based on a phi-phi NoAlias assumption. We'll still
insert cache entries temporarily to catch infinite recursion,
but will drop them afterwards, so they won't persist in BatchAA.

Differential Revision: https://reviews.llvm.org/D91936
2020-11-26 21:43:50 +01:00
Nikita Popov 4df8efce80 [AA] Split up LocationSize::unknown()
Currently, we have some confusion in the codebase regarding the
meaning of LocationSize::unknown(): Some parts (including most of
BasicAA) assume that LocationSize::unknown() only allows accesses
after the base pointer. Some parts (various callers of AA) assume
that LocationSize::unknown() allows accesses both before and after
the base pointer (but within the underlying object).

This patch splits up LocationSize::unknown() into
LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer()
to make this completely unambiguous. I tried my best to determine
which one is appropriate for all the existing uses.

The test changes in cs-cs.ll in particular illustrate a previously
clearly incorrect AA result: We were effectively assuming that
argmemonly functions were only allowed to access their arguments
after the passed pointer, but not before it. I'm pretty sure that
this was not intentional, and it's certainly not specified by
LangRef that way.

Differential Revision: https://reviews.llvm.org/D91649
2020-11-26 18:39:55 +01:00
Max Kazantsev 035955f925 Revert "Return "[SCEV] Use isBasicBlockEntryGuardedByCond in isLoopBackedgeGuardedByCond", 2nd try"
This reverts commit f690986f31.

Compile time then and again...
2020-11-26 18:12:51 +07:00
Max Kazantsev f690986f31 Return "[SCEV] Use isBasicBlockEntryGuardedByCond in isLoopBackedgeGuardedByCond", 2nd try
Reverted because the compile time impact is still too high.

isKnownViaNonRecursiveReasoning is used twice, we can do it just once.

Differential Revision: https://reviews.llvm.org/D92152
2020-11-26 17:45:13 +07:00
Max Kazantsev 91d6b6b5fb Revert "[SCEV] Use isBasicBlockEntryGuardedByCond in isLoopBackedgeGuardedByCond"
This reverts commit 3d4c0460ec.

Compile time impact is still high. Need to understand why.

Differential Revision: https://reviews.llvm.org/D92153
2020-11-26 17:28:30 +07:00
Max Kazantsev 3d4c0460ec [SCEV] Use isBasicBlockEntryGuardedByCond in isLoopBackedgeGuardedByCond
Previously we tried to using isKnownPredicateAt, but it makes an
extra query to isKnownPredicate, which has negative impact on compile
time. Let's try to use more lightweight isBasicBlockEntryGuardedByCond.

Differential Revision: https://reviews.llvm.org/D92152
2020-11-26 17:08:38 +07:00
Max Kazantsev 3b6481eae2 Revert "[SCEV] Use isKnownPredicateAt in isLoopBackedgeGuardedByCond"
This reverts commit 14f2ad0e3c.

Reverting to investigate compile time drop.

Differential Revision: https://reviews.llvm.org/D92152
2020-11-26 16:42:43 +07:00
Max Kazantsev 14f2ad0e3c [SCEV] Use isKnownPredicateAt in isLoopBackedgeGuardedByCond
A piece of code in `isLoopBackedgeGuardedByCond` basically duplicates
the dominators traversal from `isBlockEntryGuardedByCond` called from
`isKnownPredicateAt`, but it's less powerful because it does not give context
to `isImpliedCond`. This patch reuses the `isKnownPredicateAt `function there,
reducing the amount of code duplication and making it more powerful.

Differential Revision: https://reviews.llvm.org/D92152
Reviewed By: skatkov
2020-11-26 13:20:02 +07:00
Max Kazantsev f10500e220 [IndVars] Use isLoopBackedgeGuardedByCond for last iteration check
Use more context to prove contextual facts about the last iteration. It is
only executed when the backedge is taken, so we can use `isLoopBackedgeGuardedByCond`
to make this check.

Differential Revision: https://reviews.llvm.org/D91535
Reviewed By: skatkov
2020-11-26 12:37:21 +07:00
Joe Ellis 06654a5348 [SVE] Fix TypeSize warning in RuntimePointerChecking::insert
The TypeSize warning would occur because RuntimePointerChecking::insert
was not scalable vector aware. The fix is to use
ScalarEvolution::getSizeOfExpr to grab the size of types.

Differential Revision: https://reviews.llvm.org/D90171
2020-11-25 16:59:03 +00:00
Cullen Rhodes 1ba4b82f67 [LAA] NFC: Rename [get]MaxSafeRegisterWidth -> [get]MaxSafeVectorWidthInBits
MaxSafeRegisterWidth is a misnomer since it actually returns the maximum
safe vector width. Register suggests it relates directly to a physical
register where it could be a vector spanning one or more physical
registers.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D91727
2020-11-25 13:06:26 +00:00
Max Kazantsev 9130651126 Revert "[SCEV] Generalize no-self-wrap check in isLoopInvariantExitCondDuringFirstIterations"
This reverts commit 7dcc889917.

This patch introduced a logical error that breaks whole logic of this analysis.
All checks we are making are supposed to be loop-independent, so that we could
safely remove the range check. The 'nw' fact is loop-dependent, so we can remove
the check basing on facts from this very check.

Motivating examples will follow-up.
2020-11-25 13:26:17 +07:00
Philip Reames 10ddb927c1 [SCEV] Use isa<> pattern for testing for CouldNotCompute [NFC]
Some older code - and code copied from older code - still directly tested against the singelton result of SE::getCouldNotCompute.  Using the isa<SCEVCouldNotCompute> form is both shorter, and more readable.
2020-11-24 18:47:49 -08:00
Philip Reames b3a8a15343 [LAA] Minor code style tweaks [NFC] 2020-11-24 15:49:27 -08:00
Janek van Oirschot 42eaf4fe0a [HardwareLoops] Change order of SCEV expression construction for InitLoopCount.
Putting the +1 before the zero-extend will allow scalar evolution to fold the expression in some cases such as the one shown in PowerPC's `shrink-wrap.ll` test.

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D91724
2020-11-24 18:01:42 +00:00
Max Kazantsev 02fdbc3567 Revert "[NFC][SCEV] Generalize monotonicity check for full and limited iteration space"
This reverts commit 2734a9ebf4.

This patch appeared to not be a NFC. It introduced an execution path where
monotonicity check on limited space started relying in existing nsw/nuw
flags, which is illegal. The motivating test will follow-up.
2020-11-24 17:56:59 +07:00
Arthur Eubanks aff058b1a9 Reland [CGSCC] Detect devirtualization in more cases
The devirtualization wrapper misses cases where if it wraps a pass
manager, an individual pass may devirtualize an indirect call created by
a previous pass. For example, inlining may create a new indirect call
which is devirtualized by instcombine. Currently the devirtualization
wrapper will not see that because it only checks cgscc edges at the very
beginning and end of the pass (manager) it wraps.

This fixes some tests testing this exact behavior in the legacy PM.

Instead of checking WeakTrackingVHs for CallBases at the very beginning
and end of the pass it wraps, check every time
updateCGAndAnalysisManagerForPass() is called.

check-llvm and check-clang with -abort-on-max-devirt-iterations-reached
on by default doesn't show any failures outside of tests specifically
testing it so it doesn't needlessly rerun passes more than necessary.
(The NPM -O2/3 pipeline run the inliner/function simplification pipeline
under a devirtualization repeater pass up to 4 times by default).

http://llvm-compile-time-tracker.com/?config=O3&stat=instructions&remote=aeubanks
shows that 7zip has ~1% compile time regression. I looked at it and saw
that there indeed was devirtualization happening that was not previously
caught, so now it reruns the CGSCC pipeline on some SCCs, which is WAI.

The initial land assumed CallBase WeakTrackingVHs would always be
CallBases, but they can be RAUW'd with undef.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D89587
2020-11-23 21:28:59 -08:00
Yichao Yu 4bc88a0e9a Enable support for floating-point division reductions
Similar to fsub, fdiv can also be vectorized using fmul.

Also http://llvm.org/viewvc/llvm-project?view=revision&revision=215200

Differential Revision: https://reviews.llvm.org/D34078

Co-authored-by: Jameson Nash <jameson@juliacomputing.com>
2020-11-23 20:00:58 -05:00
Arthur Eubanks 6a2799cf8e Revert "[CGSCC] Detect devirtualization in more cases"
This reverts commit 14a68b4aa9.

Causes building self hosted clang to crash when using NPM.
2020-11-23 13:21:05 -08:00
Arthur Eubanks 3c811ce4f3 [NPM] Share pass building options with legacy PM
We should share options when possible.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D91741
2020-11-23 13:04:05 -08:00
Arthur Eubanks 7167e5203a Port -print-memderefs to NPM
There is lots of code duplication, but hopefully it won't matter soon.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D91683
2020-11-23 11:56:22 -08:00
Arthur Eubanks 14a68b4aa9 [CGSCC] Detect devirtualization in more cases
The devirtualization wrapper misses cases where if it wraps a pass
manager, an individual pass may devirtualize an indirect call created by
a previous pass. For example, inlining may create a new indirect call
which is devirtualized by instcombine. Currently the devirtualization
wrapper will not see that because it only checks cgscc edges at the very
beginning and end of the pass (manager) it wraps.

This fixes some tests testing this exact behavior in the legacy PM.

Instead of checking WeakTrackingVHs for CallBases at the very beginning
and end of the pass it wraps, check every time
updateCGAndAnalysisManagerForPass() is called.

check-llvm and check-clang with -abort-on-max-devirt-iterations-reached
on by default doesn't show any failures outside of tests specifically
testing it so it doesn't needlessly rerun passes more than necessary.
(The NPM -O2/3 pipeline run the inliner/function simplification pipeline
under a devirtualization repeater pass up to 4 times by default).

http://llvm-compile-time-tracker.com/?config=O3&stat=instructions&remote=aeubanks
shows that 7zip has ~1% compile time regression. I looked at it and saw
that there indeed was devirtualization happening that was not previously
caught, so now it reruns the CGSCC pipeline on some SCCs, which is WAI.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D89587
2020-11-23 11:55:20 -08:00
Jay Foad 000400ca0a Fix speling in comments. NFC. 2020-11-23 14:43:24 +00:00
Mikael Holmen faf848ac32 [Inline] Fix in handling of ptrtoint in InlineCost
ConstantOffsetPtrs contains mappings from a Value to a base pointer and
an offset. The offset is typed and has a size, and at least when dealing
with ptrtoint, it could happen that we had a mapping from a ptrtoint
with type i32 to an offset with type i16. This could later cause
problems, showing up in PR 47969 and PR 38500.

In PR 47969 we ended up in an assert complaining that trunc i16 to i16
is invalid and in Pr 38500 that a cmp on an i32 and i16 value isn't
valid.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D90610
2020-11-23 14:33:06 +01:00
Max Kazantsev 48d7cc6ae2 [SCEV] Fix incorrect treatment of max taken count. PR48225
SCEV makes a logical mistake when handling EitherMayExit in
case when both conditions must be met to exit the loop. The
mistake looks like follows: "if condition `A` fails within at most `X` first
iterations, and `B` fails within at most `Y` first iterations, then `A & B`
fails at most within `min (X, Y)` first iterations". This is wrong, because
both of them must fail at the same time.

Simple example illustrating this is following: we have an IV with step 1,
condition `A` = "IV is even", condition `B` = "IV is odd". Both `A` and `B`
will fail within first two iterations. But it doesn't mean that both of them
will fail within first two first iterations at the same time, which would mean
that IV is neither even nor odd at the same time within first 2 iterations.

We can only do so for known exact BE counts, but not for max.

Differential Revision: https://reviews.llvm.org/D91942
Reviewed By: nikic
2020-11-23 16:52:39 +07:00
Max Kazantsev 47e31d1b5e [NFC] Reduce code duplication in binop processing in computeExitLimitFromCondCached
Handling of `and` and `or` vastly uses copy-paste. Factored out into
a helper function as preparation step for further fix (see PR48225).

Differential Revision: https://reviews.llvm.org/D91864
Reviewed By: nikic
2020-11-23 13:18:12 +07:00
Nikita Popov 6f5ef648a5 [BasicAA] Avoid unnecessary cache update (NFC)
If the final recursive query returns MayAlias as well, there is
no need to update the cache (which already stores MayAlias).
2020-11-22 20:10:45 +01:00
Sanjay Patel c5a4d80fd4 [ValueTracking][MemCpyOpt] avoid crash on inttoptr with vector pointer type (PR48075) 2020-11-22 12:54:18 -05:00
Simon Pilgrim 24d6e60488 [Analysis] Remove unused system header includes
Cleanup unused system headers and fix an implicit dependency
2020-11-22 10:32:37 +00:00
Nikita Popov ded5928866 [BasicAA] Remove unnecessary sextOrSelf (NFC)
We are doing a sextOrTrunc directly afterwards, so this seems
useless. There is a multiplication in between, but truncating
before or after the multiplication should not make a difference.
2020-11-21 21:32:56 +01:00
Nikita Popov 0d114f56d7 [BasicAA] Return DecomposedGEP (NFC)
Instead of requiring the caller to initialize the DecomposedGEP
structure and then passing it in by reference, make
DecomposeGEPExpression() responsible for initializing and returning
the structure.
2020-11-21 21:05:26 +01:00
Nikita Popov f4412c5ae4 [BasicAA] Remove some intermediate variables (NFC)
Use DecompGEP1.Offset instead of GEP1BaseOffset, etc. I found the
asymmetry of modifying DecompGEP1.VarIndices, but not modifying
DecompGEP1.Offset odd here.
2020-11-21 20:36:25 +01:00
Nikita Popov 913a99c474 [BasicAA] Remove stale FIXME (NFC)
If aliasGEP returns MayAlias, the code does fall through to
aliasPHI etc, so this FIXME is no longer applicable.
2020-11-21 20:07:26 +01:00
Kazu Hirata 226beb494c [Analysis] Use llvm::is_contained (NFC) 2020-11-20 18:08:05 -08:00
Hongtao Yu f3c445697d [CSSPGO] IR intrinsic for pseudo-probe block instrumentation
This change introduces a new IR intrinsic named `llvm.pseudoprobe` for pseudo-probe block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.

A pseudo probe is used to collect the execution count of the block where the probe is instrumented. This requires a pseudo probe to be persisting. The LLVM PGO instrumentation also instruments in similar places by placing a counter in the form of atomic read/write operations or runtime helper calls. While these operations are very persisting or optimization-resilient, in theory we can borrow the atomic read/write implementation from PGO counters and cut it off at the end of compilation with all the atomics converted into binary data. This was our initial design and we’ve seen promising sample correlation quality with it. However, the atomics approach has a couple issues:

1. IR Optimizations are blocked unexpectedly. Those atomic instructions are not going to be physically present in the binary code, but since they are on the IR till very end of compilation, they can still prevent certain IR optimizations and result in lower code quality.
2. The counter atomics may not be fully cleaned up from the code stream eventually.
3. Extra work is needed for re-targeting.

We choose to implement pseudo probes based on a special LLVM intrinsic, which is expected to have most of the semantics that comes with an atomic operation but does not block desired optimizations as much as possible. More specifically the semantics associated with the new intrinsic enforces a pseudo probe to be virtually executed exactly the same number of times before and after an IR optimization. The intrinsic also comes with certain flags that are carefully chosen so that the places they are probing are not going to be messed up by the optimizer while most of the IR optimizations still work. The core flags given to the special intrinsic is `IntrInaccessibleMemOnly`, which means the intrinsic accesses memory and does have a side effect so that it is not removable, but is does not access memory locations that are accessible by any original instructions. This way the intrinsic does not alias with any original instruction and thus it does not block optimizations as much as an atomic operation does. We also assign a function GUID and a block index to an intrinsic so that they are uniquely identified and not merged in order to achieve good correlation quality.

Let's now look at an example. Given the following LLVM IR:

```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
  %cmp = icmp eq i32 %x, 0
   br i1 %cmp, label %bb1, label %bb2
bb1:
   br label %bb3
bb2:
   br label %bb3
bb3:
   ret void
}
```

The instrumented IR will look like below. Note that each `llvm.pseudoprobe` intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID.

```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
   %cmp = icmp eq i32 %x, 0
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 1)
   br i1 %cmp, label %bb1, label %bb2
bb1:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 2)
   br label %bb3
bb2:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 3)
   br label %bb3
bb3:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 4)
   ret void
}

```

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D86490
2020-11-20 10:39:24 -08:00
Nikita Popov e8dc6e9a32 [MemLoc] Use hasValue() method more (NFC)
Followup to 7de7c40898. I previously
removed a number of == comparisons to LocationSize::unknown(), but
missed these != comparisons.
2020-11-19 22:29:44 +01:00
Nikita Popov 7de7c40898 [MemLoc] Use hasValue() method (NFC)
Instead of comparing to LocationSize::unknown(), prefer calling
the hasValue() method instead, which is less reliant on
implementation details.
2020-11-19 21:53:50 +01:00
Nikita Popov 393b9e9db3 [MemLoc] Require LocationSize argument (NFC)
When constructing a MemoryLocation by hand, require that a
LocationSize is explicitly specified. D91649 will split up
LocationSize::unknown() into two different states, and callers
should make an explicit choice regarding the kind of MemoryLocation
they want to have.
2020-11-19 21:45:52 +01:00
Artur Pilipenko 887c7660bd [BasicAA] Deoptimize intrinsics don't modify memory
Similarly to assumes and guards deoptimize intrinsics are
marked as writing to ensure proper control dependencies
but they never modify any particular memory location.

Differential Revision: https://reviews.llvm.org/D91658
2020-11-19 12:08:33 -08:00