Commit Graph

2512 Commits

Author SHA1 Message Date
Nikita Popov b0ce2b72e8 [BasicAA] Add tests for non-zero var index (NFC) 2020-12-12 15:00:46 +01:00
Nikita Popov 7ea37d2f94 [BasicAA] Add extra check in phi-spec-order.ll (NFC)
The (scevgep, scevgep5) relation regressed with a patch I was
trying, but wasn't tested.
2020-12-11 21:20:51 +01:00
Nikita Popov 8b1c4e310c [BasicAA] Handle two unknown sizes for GEPs
If we have two unknown sizes and one GEP operand and one non-GEP
operand, then we currently simply return MayAlias. The comment says
we can't do anything useful ... but we can! We can still check that
the underlying objects are different (and do so for the GEP-GEP case).

To reduce the compile-time impact, this a) checks this early, before
doing the relatively expensive GEP decomposition that will not be
used and b) doesn't do the check if the other operand is a phi or
select. In that case, the phi/select will already recurse, so this
would just do two slightly different recursive walks that arrive at
the same roots.

Compile-time is still a bit of a mixed bag: https://llvm-compile-time-tracker.com/compare.php?from=624af932a808b363a888139beca49f57313d9a3b&to=845356e14adbe651a553ed11318ddb5e79a24bcd&stat=instructions
On average this is a small improvement, but sqlite with ThinLTO has
a 0.5% regression (lencod has a 1% improvement).

The BasicAA test case checks this by using two memsets with unknown
size. However, the more interesting case where this is useful is
the LoopVectorize test case, as analysis of accesses in loops tends
to always us unknown sizes.

Differential Revision: https://reviews.llvm.org/D92401
2020-12-11 18:45:53 +01:00
Philip Reames 2656885390 Teach isKnownNonEqual how to recurse through invertible multiplies
Build on the work started in 8f07629, and add the multiply case. In the process, more clearly describe the requirement for the operation we're looking through.

Differential Revision: https://reviews.llvm.org/D92726
2020-12-07 14:52:08 -08:00
Simon Pilgrim db900995ed [CostModel][X86] getGatherScatterOpCost - use default implementation for alt costkinds
Noticed while looking at D92701 - we only really handle TCK_RecipThroughput gather/scatter costs - for now drop back to the default implementation for non-legal gathers/scatters.
2020-12-06 14:08:26 +00:00
Nikita Popov 5e69e2ebad [BasicAA] Migrate "same base pointer" logic to decomposed GEPs
BasicAA has some special bit of logic for "same base pointer" GEPs
that performs a structural comparison: It only looks at two GEPs
with the same base (as opposed to two GEP chains with a MustAlias
base) and compares their indexes in a limited way. I generalized
part of this code in D91027, and this patch merges the remainder
into the normal decomposed GEP logic.

What this code ultimately wants to do is to determine that
gep %base, %idx1 and gep %base, %idx2 don't alias if %idx1 != %idx2,
and the access size fits within the stride.

We can express this in terms of a decomposed GEP expression with
two indexes scale*%idx1 + -scale*%idx2 where %idx1 != %idx2, and
some appropriate checks for sizes and offsets.

This makes the reasoning slightly more powerful, and more
importantly brings all the GEP logic under a common umbrella.

Differential Revision: https://reviews.llvm.org/D92723
2020-12-06 10:27:35 +01:00
Philip Reames 8f076291be Add recursive decomposition reasoning to isKnownNonEqual
The basic idea is that by looking through operand instructions which don't change the equality result that we can push the existing known bits comparison down past instructions which would obscure them.

We have analogous handling in InstSimplify for most - though weirdly not all - of these cases starting from an icmp root. It's a bit unfortunate to duplicate logic, but since my actual goal is to extend BasicAA, the icmp logic doesn't help. (And just makes it hard to test here.)  The BasicAA change will be posted separately for review.

Differential Revision: https://reviews.llvm.org/D92698
2020-12-05 15:58:19 -08:00
Philip Reames bfda69416c [BasicAA] Fix a bug with relational reasoning across iterations
Due to the recursion through phis basicaa does, the code needs to be extremely careful not to reason about equality between values which might represent distinct iterations. I'm generally skeptical of the correctness of the whole scheme, but this particular patch fixes one particular instance which is demonstrateable incorrect.

Interestingly, this appears to be the second attempted fix for the same issue. The former fix is incomplete and doesn't address the actual issue.

Differential Revision: https://reviews.llvm.org/D92694
2020-12-05 14:10:21 -08:00
Nikita Popov ae5e013f6e [BasicAA] Add more tests for non-equal index (NFC) 2020-12-05 21:22:57 +01:00
Nikita Popov 8925d23474 [BasicAA] Add recphi tests with nested loops (NFC) 2020-12-05 11:09:15 +01:00
Philip Reames 99f79cbf31 [test] precommit test for D92698 2020-12-04 15:17:39 -08:00
modimo 1860331932 [MemCpyOpt] Correctly merge alias scopes during call slot optimization
When MemCpyOpt performs call slot optimization it will concatenate the `alias.scope` metadata between the function call and the memcpy. However, scoped AA relies on the domains in metadata to be maintained in a caller-callee relationship. Naive concatenation breaks this assumption leading to bad AA results.

The fix is to take the intersection of domains then union the scopes within those domains.

The original bug came from a case of rust bad codegen which uses this bad aliasing to perform additional memcpy optimizations. As show in the added test case `%src` got forwarded past its lifetime leading to a dereference of garbage data.

Testing
ninja check-llvm

Reviewed By: jeroen.dobbelaere

Differential Revision: https://reviews.llvm.org/D91576
2020-12-03 09:23:37 -08:00
Nikita Popov 54eab293f5 [BasicAA] Add test for suboptimal result with unknown sizes (NFC) 2020-12-01 18:20:34 +01:00
Sanjay Patel 136f98e523 [x86] adjust cost model values for minnum/maxnum with fast-math-flags
Without FMF, we lower these intrinsics into something like this:

vmaxsd	%xmm0, %xmm1, %xmm2
vcmpunordsd	%xmm0, %xmm0, %xmm0
vblendvpd	%xmm0, %xmm1, %xmm2, %xmm0

But if we can ignore NANs, the single min/max instruction is enough
because there is no need to fix up the x86 logic that corresponds to
X > Y ? X : Y.

We probably want to make other adjustments for FP intrinsics with FMF
to account for specialized codegen (for example, FSQRT).

Differential Revision: https://reviews.llvm.org/D92337
2020-12-01 10:45:53 -05:00
Sanjay Patel 40dc535b5a [x86] add tests for maxnum/minnum with nnan; NFC 2020-11-30 14:30:28 -05:00
Sjoerd Meijer 5110ff0817 [AArch64][CostModel] Fix cost for mul <2 x i64>
This was modeled to have a cost of 1, but since we do not have a MUL.2d this is
scalarized into vector inserts/extracts and scalar muls.

Motivating precommitted test is test/Transforms/SLPVectorizer/AArch64/mul.ll,
which we don't want to SLP vectorize.

Test Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll
unfortunately needed changing, but the reason is documented in
LoopVectorize.cpp:6855:

  // The cost of executing VF copies of the scalar instruction. This opcode
  // is unknown. Assume that it is the same as 'mul'.

which I will address next as a follow up of this.

Differential Revision: https://reviews.llvm.org/D92208
2020-11-30 11:36:55 +00:00
Nikita Popov e987fbdd85 [BasicAA] Generalize recursive phi alias analysis
For recursive phis, we skip the recursive operands and check that
the remaining operands are NoAlias with an unknown size. Currently,
this is limited to inbounds GEPs with positive offsets, to
guarantee that the recursion only ever increases the pointer.

Make this more general by only requiring that the underlying object
of the phi operand is the phi itself, i.e. it it based on itself in
some way. To compensate, we need to use a beforeOrAfterPointer()
location size, as we no longer have the guarantee that the pointer
is strictly increasing.

This allows us to handle some additional cases like negative geps,
geps with dynamic offsets or geps that aren't inbounds.

Differential Revision: https://reviews.llvm.org/D91914
2020-11-29 10:25:23 +01:00
Nikita Popov b5e8de9c79 [BasicAA] Add tests for suboptimal speculation results (NFC)
While we determine that (phi1, phi2) is noalias, we don't
determine that (gep phi1 + 1, gep phi2 + 1) are also noalias.
2020-11-28 19:16:17 +01:00
Nikita Popov 1dea8ed8b7 [BasicAA] Remove unnecessary known size requirement
The size requirement on V2 was present because it was not clear
whether an unknown size would allow an access before the start of
V2, which could then overlap. This is clarified since D91649: In
this part of BasicAA, all accesses can occur only after the base
pointer, even if they have unknown size.

This makes the positive and negative offset cases symmetric.

Differential Revision: https://reviews.llvm.org/D91482
2020-11-28 10:17:12 +01:00
Simon Pilgrim 969918e177 [DAG] Legalize umin(x,y) -> sub(x,usubsat(x,y)) and umax(x,y) -> add(x,usubsat(y,x)) iff usubsat is legal
If usubsat() is legal, this is likely to result in smaller codegen expansion than the default cmp+select codegen expansion.

Allows us to move the x86-specific lowering to the generic expansion code.

Differential Revision: https://reviews.llvm.org/D92183
2020-11-27 11:18:58 +00:00
Arthur Eubanks f342ed1cd1 [test] Fix runtime-pointer-checking-insert-typesize.ll under NPM
Also clean it up a bit.
2020-11-26 12:34:32 -08:00
Nikita Popov 4df8efce80 [AA] Split up LocationSize::unknown()
Currently, we have some confusion in the codebase regarding the
meaning of LocationSize::unknown(): Some parts (including most of
BasicAA) assume that LocationSize::unknown() only allows accesses
after the base pointer. Some parts (various callers of AA) assume
that LocationSize::unknown() allows accesses both before and after
the base pointer (but within the underlying object).

This patch splits up LocationSize::unknown() into
LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer()
to make this completely unambiguous. I tried my best to determine
which one is appropriate for all the existing uses.

The test changes in cs-cs.ll in particular illustrate a previously
clearly incorrect AA result: We were effectively assuming that
argmemonly functions were only allowed to access their arguments
after the passed pointer, but not before it. I'm pretty sure that
this was not intentional, and it's certainly not specified by
LangRef that way.

Differential Revision: https://reviews.llvm.org/D91649
2020-11-26 18:39:55 +01:00
Sjoerd Meijer a3b1fcbc0c [AArch64][CostModel] Precommit some vector mul tests. NFC.
The cost-model is not getting the cost right for a mul with <2 x i64>
operands, i.e. we don't have a MUL.2d, and this is precommitting some
tests before adjusting this.
2020-11-26 13:23:11 +00:00
Florian Hahn 926681b6be
[CostModel] Add basic implementation of getGatherScatterOpCost.
Add a basic implementation of getGatherScatterOpCost to BasicTTIImpl.

The implementation estimates the cost of scalarizing the loads/stores,
the cost of packing/extracting the individual lanes and the cost of
only selecting enabled lanes.

This more accurately reflects the current cost on targets like AArch64.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D91984
2020-11-26 12:02:25 +00:00
Simon Pilgrim 385a27d6cd [CostModel][X86] Refresh ISD::ABS costs
Update costs now that D92095 and D92102 have tweaked the SSE2 implementation

The SSE42 BLENDVPD cost can actually be used on SSE41 as we don't attempt to generate PCMPGT anymore

Add scalar i16/i32/i64 costs as we can do this cheaply with CMOV
2020-11-25 18:40:19 +00:00
Joe Ellis 06654a5348 [SVE] Fix TypeSize warning in RuntimePointerChecking::insert
The TypeSize warning would occur because RuntimePointerChecking::insert
was not scalable vector aware. The fix is to use
ScalarEvolution::getSizeOfExpr to grab the size of types.

Differential Revision: https://reviews.llvm.org/D90171
2020-11-25 16:59:03 +00:00
Arthur Eubanks 7167e5203a Port -print-memderefs to NPM
There is lots of code duplication, but hopefully it won't matter soon.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D91683
2020-11-23 11:56:22 -08:00
Arthur Eubanks 8eec3959ef [test] Pin memory-dereferenceable.ll to legacy PM
-print-memderefs is only used for one test and hasn't been touched in a while.
2020-11-23 11:56:17 -08:00
Florian Hahn 14c0185bfe
[AArch64] Add scatter cost model tests. 2020-11-23 18:36:56 +00:00
Florian Hahn 3a1c6cec15
[AArch64] Add tests for masked.gather costs. 2020-11-23 17:33:27 +00:00
Max Kazantsev 48d7cc6ae2 [SCEV] Fix incorrect treatment of max taken count. PR48225
SCEV makes a logical mistake when handling EitherMayExit in
case when both conditions must be met to exit the loop. The
mistake looks like follows: "if condition `A` fails within at most `X` first
iterations, and `B` fails within at most `Y` first iterations, then `A & B`
fails at most within `min (X, Y)` first iterations". This is wrong, because
both of them must fail at the same time.

Simple example illustrating this is following: we have an IV with step 1,
condition `A` = "IV is even", condition `B` = "IV is odd". Both `A` and `B`
will fail within first two iterations. But it doesn't mean that both of them
will fail within first two first iterations at the same time, which would mean
that IV is neither even nor odd at the same time within first 2 iterations.

We can only do so for known exact BE counts, but not for max.

Differential Revision: https://reviews.llvm.org/D91942
Reviewed By: nikic
2020-11-23 16:52:39 +07:00
Sanjay Patel 2717252c92 [CostModel] add basic handling for FP maximum/minimum intrinsics
This might be a regression for some ARM targets, but that should
be changed in the target-specific overrides.

There is apparently still no default lowering for these nodes,
so I am assuming these intrinsics are not in common use.
X86, PowerPC, and RISC-V for example, just crash given the most
basic IR.
2020-11-22 13:43:53 -05:00
Sanjay Patel 3a18f26723 [CostModel] add tests for FP maximum; NFC
These min/max intrinsics are not handled in the basic
implementation and probably not handled in target-specific
overrides either.
2020-11-22 13:33:42 -05:00
Nikita Popov 221c2b8862 [BasicAA] Add more phi-phi tests (NFC)
Test a few more variations:
 * NoAlias with different strides
 * MustAlias without loop
 * MustAlias with same stride
 * MustAlias base pointers with different stride
2020-11-22 16:53:06 +01:00
Nikita Popov 072ddff3f2 [BasicAA] Add recphi test with dynamic offset (NFC)
Currently, we don't recognize that %a an %p don't alias.
2020-11-21 17:37:41 +01:00
Matt Arsenault 1d1234b2a4 OpaquePtr: Update more tests to use typed sret 2020-11-20 20:08:43 -05:00
Matt Arsenault 20c43d6bd5 OpaquePtr: Bulk update tests to use typed sret 2020-11-20 17:58:26 -05:00
Matt Arsenault 06c192d454 OpaquePtr: Bulk update tests to use typed byval
Upgrade of the IR text tests should be the only thing blocking making
typed byval mandatory. Partially done through regex and partially
manual.
2020-11-20 14:00:46 -05:00
Sanjay Patel e32bd35120 [CostModel] mostly remove cost-kind predicate for intrinsics in basic TTI implementation
This is re-applying a combination of f7eac51b9b and 8ec7ea3ddc as one patch
to avoid regressions now that we have better testing in place.

Those were reverted with 32dd5870ee because of crashing in experimental intrinsics.
That bug should be fixed with 7ae346434.

Paraphrased original commit messages:

This is the last step in removing cost-kind as a consideration in the
basic class model for intrinsics.
See D89461 for the start of that.
Subsequent commits dealt with each of the special-case intrinsics that
had customization here in the basic class. This should remove a barrier
to retrying D87188 (canonicalization to the abs intrinsic).

The ARM and x86 cost diffs seen here may be wrong because the
target-specific overrides have their own bugs, but we hope this is
less wrong - if something has a significant throughput cost, then it
should have a significant size / blended cost too by default.

The only behavioral diff in current regression tests is shown in the
x86 scatter-gather test (which is misplaced or broken because it runs
the entire -O3 pipeline) - we unrolled less, and we assume that is
a improvement.

Exception: in general, we want the *size* cost for a scalar call to be
cheap even if the other costs are expensive - we expect it to just be
a branch with some optional stack manipulation.

It is likely that we will want to carve out some
exceptions/overrides to this rule as follow-up patches for
calls that have some general and/or target-specific difference
to the expected lowering.

This was noticed as a regression in unrolling, so we have a test
for that now along with a couple of direct cost model tests.

If the assumed scalarization costs for the oversized vector
calls are not realistic, that would be another follow-up
refinement of the cost models.

Differential Revision: https://reviews.llvm.org/D90554
2020-11-20 11:21:10 -05:00
Sanjay Patel 7ae346434a [CostModel] avoid crashing while finding scalarization overhead
The constrained intrinsics have metadata arguments, so the
tests here were crashing as noted in D90554 (and that was
reverted even though this bug exists independently of that
change).
2020-11-20 10:18:29 -05:00
Sanjay Patel 1285781fc5 [CostModel] add tests for math library calls; NFC
This is a partial un-revert of 32dd5870ee (originally df09f82599 ).

I'm adding back the baseline tests first, so we don't have
to back-track as much in case there are still problems.
2020-11-20 08:24:49 -05:00
Max Kazantsev 2290daa938 [Test] Auto-update checks in a test 2020-11-20 16:53:51 +07:00
Max Kazantsev 0c101c9cbc [Test] Add tests demonstrating a bug in SCEV, PR48225
Slightly simplified version of original test reported by Congzhe Cao.
2020-11-20 15:59:22 +07:00
Eric Christopher 32dd5870ee Temporarily Revert "[CostModel] remove cost-kind predicate for intrinsics in basic TTI implementation"
as it's causing crashes in the optimizer. A reduced testcase has been posted as a follow-up.

This reverts commit f7eac51b9b.

Temporarily Revert "[CostModel] make default size cost for libcalls small (again)" as it depends upon the primary revert.

This reverts commit 8ec7ea3ddc.

Temporarily Revert "[CostModel] add tests for math library calls; NFC" as it depends upon the primary revert.

This reverts commit df09f82599.

Temporarily Revert "[LoopUnroll] add test for full unroll that is sensitive to cost-model; NFC" as it depends upon the primary revert.

This reverts commit 618d555e8d.
2020-11-19 22:10:23 -08:00
Artur Pilipenko 887c7660bd [BasicAA] Deoptimize intrinsics don't modify memory
Similarly to assumes and guards deoptimize intrinsics are
marked as writing to ensure proper control dependencies
but they never modify any particular memory location.

Differential Revision: https://reviews.llvm.org/D91658
2020-11-19 12:08:33 -08:00
Mircea Trofin f62fe0ee3b [FileCheck] Disallow unused prefixes in llvm/test/Analysis
This is achieved through a substitution of FileCheck in lit.cfg.py,
where we explicitly set -allow-unused-prefixes to false.

We also introduce a %FileCheckWithUnusedPrefixes% substitution that can
be used in those cases where we want to allow unused prefixes, even if
the folder policy is to disallow them.

Differential Revision: https://reviews.llvm.org/D91275
2020-11-19 07:56:35 -08:00
Nikita Popov cd3c22c47e [BasicAA] Generalize base offset modulus handling
The GEP aliasing implementation currently has two pieces of code
that solve two different subsets of the same basic problem: If you
have GEPs with offsets 4*x + 0 and 4*y + 1 (assuming access size 1),
then they do not alias regardless of whether x and y are the same.

One implementation is in aliasSameBasePointerGEPs(), which looks at
this in a limited structural way. It requires both GEP base pointers
to be exactly the same, then (optionally) a number of equal indexes,
then an unknown index, then a non-equal index into a struct. This
set of limitations works, but it's overly restrictive and hides the
core property we're trying to exploit.

The second implementation is part of aliasGEP() itself and tries to
find a common modulus in the scales, so it can then check that the
constant offset doesn't overlap under modular arithmetic. The second
implementation has the right idea of what the general problem is,
but effectively only considers power of two factors in the scales
(while aliasSameBasePointerGEPs also works with non-pow2 struct sizes.)

What this patch does is to adjust the aliasGEP() implementation to
instead find the largest common factor in all the scales (i.e. the GCD)
and use that as the modulus.

Differential Revision: https://reviews.llvm.org/D91027
2020-11-18 21:48:49 +01:00
Nikita Popov 85ccdcaa50 [BasicAA] Remove assert in AA evaluator
As reported in https://reviews.llvm.org/D91383#2401825, this
assert breaks external -aa-eval tests. We'll have to fix this
case before re-enabling it.
2020-11-18 20:04:38 +01:00
Craig Topper f0b0bab34d [X86] Use GF2P8AFFINEQB to implement vector bitreverse.
We can use GF2P8AFFINEQB to reverse bits in a byte. Shuffles are needed to reverse the bytes in elements larger than i8. LegalizeVectorOps takes care of inserting the shuffle for the larger element size.

We already have Custom lowering for v16i8 with SSSE3, v32i8 with AVX, and v64i8 with AVX512BW.

I think we might be able to use this for scalars too by moving into a vector and back. But I'll save that for a follow up as its a little more involved.

Reviewed By: RKSimon, pengfei

Differential Revision: https://reviews.llvm.org/D91515
2020-11-17 23:49:06 -08:00
Arthur Eubanks 9e3b4f4941 [JumpThreading] Make -print-lvi-after-jump-threading work with NPM 2020-11-17 23:15:20 -08:00
Wei Wang 3279347da0 [BPI] Look through bitcasts in calcZeroHeuristic
Constant hoisting may hide the constant value behind bitcast for And's
operand. Track down the constant to make the BFI result consistent
regardless of hoisting.

Differential Revision: https://reviews.llvm.org/D91450
2020-11-17 09:33:05 -08:00
Nikita Popov cb4fc25c91 [BasicAA] Make alias GEP positive offset handling symmetric
aliasGEP() currently implements some special handling for the case
where all variable offsets are positive, in which case the constant
offset can be taken as the minimal offset. However, it does not
perform the same handling for the all-negative case. This means that
the alias-analysis result between two GEPs is asymmetric:
If GEP1 - GEP2 is all-positive, then GEP2 - GEP1 is all-negative,
and the first will result in NoAlias, while the second will result
in MayAlias.

Apart from producing sub-optimal results for one order, this also
violates our caching assumption. In particular, if BatchAA is used,
the cached result depends on the order of the GEPs in the first query.
This results in an inconsistency in BatchAA and AA results, which
is how I noticed this issue in the first place.

Differential Revision: https://reviews.llvm.org/D91383
2020-11-17 18:05:34 +01:00
Caroline Concatto 6c4d8f4651 [AArch64] Add check for widening instruction for SVE.
This patch fixes the function isWideningInstruction for scalable vectors.
Now the cost model can check the widening pattern for SVE.

Differential Revision: https://reviews.llvm.org/D91260
2020-11-16 12:30:08 +00:00
Florian Hahn 7fa8b62920 [MemorySSA] Add pointer decrement loop clobber test case. 2020-11-15 18:00:01 +00:00
Nikita Popov 9ace4b337f Revert "[SCEV] Factor out part of wrap flag detection logic [NFC-ish]"
This reverts commit 1ec6e1eb8a.

This change causes a significant compile-time regression:
https://llvm-compile-time-tracker.com/compare.php?from=dd0b8b94d0796bd895cc998dd163b4fbebceb0b8&to=1ec6e1eb8a084bffae8a40236eb9925d8026dd07&stat=instructions

I assume that this is due to the non-NFC part of the change, which
now performs expensive nowrap inference even for nowrap flags that
are not used by the particular code.
2020-11-15 10:19:44 +01:00
Philip Reames 1ec6e1eb8a [SCEV] Factor out part of wrap flag detection logic [NFC-ish]
In an effort to make code around flag determination more readable, and (possibly) prepare for a follow up change, factor out some of the flag detection logic.  In the process, reduce the number of locations we mutate wrap flags by a couple.

Note that this isn't NFC.  The old code tried for NSW xor (NUW || NW).  This is, two different paths computed different sets of wrap flags.  The new code will try for all three.  The result is that some expressions end up with a few extra flags set.
2020-11-14 19:21:05 -08:00
Nikita Popov 0b72444211 [BasicAA] Remove unnecessary size limitation
We're dropping a common offset from both GEPs here. It's not
necessary for the access sizes to be the same as well.
2020-11-14 16:51:31 +01:00
Sanjay Patel 8ec7ea3ddc [CostModel] make default size cost for libcalls small (again)
This was changed recently with D90554 / f7eac51b9b
...because we had a regression testing blindspot for intrinsics
that are expected to be lowered to libcalls.

In general, we want the *size* cost for a scalar call to be cheap
even if the other costs are expensive - we expect it to just be
a branch with some optional stack manipulation.

It is likely that we will want to carve out some
exceptions/overrides to this rule as follow-up patches for
calls that have some general and/or target-specific difference
to the expected lowering.

This was noticed as a regression in unrolling, so we have a test
for that now along with a couple of direct cost model tests.

If the assumed scalarization costs for the oversized vector
calls are not realistic, that would be another follow-up
refinement of the cost models.
2020-11-14 08:15:35 -05:00
Sanjay Patel df09f82599 [CostModel] add tests for math library calls; NFC 2020-11-14 08:15:35 -05:00
Simon Pilgrim e11195d0a9 [CostModel][X86] Remove unused CHECK prefixes
Allows us to remove the "CHECK: {{^}}" hack and help simplify D91275
2020-11-13 17:31:48 +00:00
Nikita Popov f3124a46c1 [SCEV] Fix nsw flags for GEP expressions
The SCEV code for constructing GEP expressions currently assumes
that the addition of the base and all the offsets is nsw if the GEP
is inbounds. While the addition of the offsets is indeed nsw, the
addition to the base address is not, as the base address is
interpreted as an unsigned value.

Fix the GEP expression code to not assume nsw for the base+offset
calculation. However, do assume nuw if we know that the offset is
non-negative. With this, we use the same behavior as the
construction of GEP addrecs does. (Modulo the fact that we
disregard SCEV unification, as the pre-existing FIXME points out).

Differential Revision: https://reviews.llvm.org/D90648
2020-11-13 18:19:32 +01:00
Nikita Popov c00545dc32 [BasicAA] Remove checks for GEP decomposition limit reached
The GEP aliasing code currently checks for the GEP decomposition
limit being reached (i.e., we did not reach the "final" underlying
object). As far as I can see, these checks are not necessary. It is
perfectly fine to work with a GEP whose base can still be further
decomposed.

Looking back through the commit history, these checks were originally
introduced in 1a444489e9. However, I
believe that the problem this was intended to address was later
properly fixed with 1726fc698c, and
the checks are no longer necessary since then (and were not the
right fix in the first place).

Differential Revision: https://reviews.llvm.org/D91010
2020-11-12 20:43:38 +01:00
Jamie Schmeiser 5f672fefeb Reland: Introduce -dot-cfg-mssa option which creates dot-cfg style file with mssa comments included in source
Summary:
Expand the print-memoryssa and print<memoryssa> passes with a new hidden
option -cfg-dot-mssa that names a file. When set, a dot-cfg style file
will be generated into the named file with the memoryssa comments retained
and those blocks containing them shown in light pink. The option does
nothing in isolation.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>

Reviewed By: asbirlea (Alina Sbirlea), dblaikie (David Blaikie)

Differential Revision: https://reviews.llvm.org/D90638
2020-11-12 17:39:14 +00:00
Anh Tuyen Tran a20b3620bb Revert "Introduce -dot-cfg-mssa option which creates dot-cfg style file with mssa comments included in source"
This reverts commit 45d459e752 due to
build issue in Poly.
2020-11-12 15:48:14 +00:00
Jamie Schmeiser 45d459e752 Introduce -dot-cfg-mssa option which creates dot-cfg style file with mssa comments included in source
Summary:
Expand the print-memoryssa and print<memoryssa> passes with a new hidden
option -cfg-dot-mssa that names a file. When set, a dot-cfg style file
will be generated into the named file with the memoryssa comments retained
and those blocks containing them shown in light pink. The option does
nothing in isolation.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>

Reviewed By: asbirlea (Alina Sbirlea), dblaikie (David Blaikie)

Differential Revision: https://reviews.llvm.org/D90638
2020-11-12 15:41:16 +00:00
Caroline Concatto 37f4ccb275 [AArch64]Add memory op cost model for SVE
This patch adds/fixes memory op cost model for SVE with fixed-width
vector.

Differential Revision: https://reviews.llvm.org/D90950
2020-11-11 12:49:19 +00:00
Simon Pilgrim ca59d37e0e [ValueTacking] assume-queries-counter.ll - remove unused check prefix 2020-11-10 14:31:03 +00:00
Simon Pilgrim 87902b2ed0 [BasicAA] phi-values-usage.ll - remove unused check prefix 2020-11-10 14:31:03 +00:00
Simon Pilgrim 88fe246a34 [ScalarEvolution] Remove unused check prefixes 2020-11-10 14:31:02 +00:00
Sanjay Patel f7eac51b9b [CostModel] remove cost-kind predicate for intrinsics in basic TTI implementation
This is the last step in removing cost-kind as a consideration in the basic class model for intrinsics.
See D89461 for the start of that.
Subsequent commits dealt with each of the special-case intrinsics that had customization here in the
basic class. This should remove a barrier to retrying
D87188 (canonicalization to the abs intrinsic).

The ARM and x86 cost diffs seen here may be wrong because the target-specific overrides have their own
bugs, but we hope this is less wrong - if something has a significant throughput cost, then it should
have a significant size / blended cost too by default.

The only behavioral diff in current regression tests is shown in the x86 scatter-gather test (which is
misplaced or broken because it runs the entire -O3 pipeline) - we unrolled less, and we assume that is
a improvement.

Differential Revision: https://reviews.llvm.org/D90554
2020-11-10 08:19:31 -05:00
Simon Pilgrim 20bbe14ac8 [CostModel][ARM] Remove unused check-prefix 2020-11-10 13:10:12 +00:00
Simon Pilgrim bd2c0e2c9f [CostModel][AArch64] Remove unused check-prefix 2020-11-10 13:10:11 +00:00
Simon Pilgrim fe9403df06 [CostModel][X86] Remove unused check-prefixes 2020-11-10 12:48:35 +00:00
Max Kazantsev 6022a8b7e8 [SCEV] Drop cached ranges of AddRecs after flag update
Our range computation methods benefit from no-wrap flags. But if the ranges
were first computed before the flags were set, the cached range will be too
pessimistic.

We need to drop cached ranges whenever we sharpen AddRec's no wrap flags.

Differential Revision: https://reviews.llvm.org/D89847
Reviewed By: fhahn
2020-11-10 12:37:12 +07:00
Nikita Popov dd5b51f4fa [BasicAA] Add test for decomposition limit (NFC)
Test behavior before/at/after the GEP decomposition limit.
2020-11-09 21:31:11 +01:00
Michael Liao fa5d31f825 [GlobalsAA] Teach to handle `addrspacecast`. 2020-11-09 00:04:52 -05:00
Sanjay Patel 264a6df353 [ARM] remove cost-kind predicate for cmp/sel costs
This is the cmp/sel sibling to D90692.
Again, the reasoning is: the throughput cost is number of instructions/uops,
so size/blended costs are identical except in special cases (for example,
fdiv or other known-expensive machine instructions or things like MVE that
may require cracking into >1 uops).

We need to check for a valid (non-null) condition type parameter because
SimplifyCFG may pass nullptr for that (and so we will crash multiple
regression tests without that check). I'm not sure if passing nullptr makes
sense, but other code in the cost model does appear to check if that param
is set or not.

Differential Revision: https://reviews.llvm.org/D90781
2020-11-05 14:52:25 -05:00
Arthur Eubanks 06926e0f01 Port print-must-be-executed-contexts and print-mustexecute to NPM
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D90207
2020-11-03 21:06:46 -08:00
Sanjay Patel c40126e740 [ARM] remove cost-kind predicate for most math op costs
This is based on the same idea that I am using for the basic model implementation
and what I have partly already done for x86: throughput cost is number of
instructions/uops, so size/blended costs are identical except in special cases
(for example, fdiv or other known-expensive machine instructions or things like
MVE that may require cracking into >1 uop)).

Differential Revision: https://reviews.llvm.org/D90692
2020-11-03 17:23:46 -05:00
Sanjay Patel 3c050a597c [CostModel] fix cost calc bug for sadd/ssub with overflow
As noted in D90554, there's an opcode typo in using an easily
misused cost model API: getCmpSelInstrCost(). Beyond that, the
assumed sequence of ops is questionable, but that would be
another patch.

My guess is that the x86 test diffs show that we are probably
wrong both before and after this change, so there will be no
practical difference.
As an example, I tried this test which shows a cost of '7'
either way:

  define <4 x i32> @sadd(<4 x i32> %va, <4 x i32> %vb) {
    %V4I32  = call {<4 x i32>, <4 x i1>}  @llvm.sadd.with.overflow.v4i32(<4 x i32> %va, <4 x i32> %vb)
    %ov = extractvalue {<4 x i32>, <4 x i1>} %V4I32, 1
    %r = extractvalue {<4 x i32>, <4 x i1>} %V4I32, 0
    %z = select <4 x i1> %ov, <4 x i32> <i32 42, i32 42, i32 42, i32 42>, <4 x i32> %r
    ret <4 x i32> %z
  }

  $ llc -o - sadd.ll -mattr=avx
        vpaddd  %xmm1, %xmm0, %xmm2
        vpcmpgtd        %xmm2, %xmm0, %xmm0
        vpxor   %xmm0, %xmm1, %xmm0
        vblendvps       %xmm0, LCPI0_0(%rip), %xmm2, %xmm0a

Differential Revision: https://reviews.llvm.org/D90681
2020-11-03 11:03:47 -05:00
David Green 90131e3ecb [CostModel] Make target intrinsics cheap by default
This patch changes the intrinsics cost model to assume that by default
target intrinsics are cheap. This didn't seem to be the case for all
intrinsics, and is potentially an MVE problem due to our scalarization
overheads. Cheap seems to be a good default in general though.

Differential Revision: https://reviews.llvm.org/D90597
2020-11-03 09:58:28 +00:00
Fangrui Song 491dd2711f [LazyCallGraph] Build SCCs of the reference graph in order
```
// The legacy PM CGPassManager discovers SCCs this way:
for function in the source order
  tarjanSCC(function)

// While the new PM CGSCCPassManager does:
for function in the reversed source order [1]
  discover a reference graph SCC
  build call graph SCCs inside the reference graph SCC
```

In the common cases, reference graph ~= call graph, the new PM order is
undesired because for `a | b | c` (3 independent functions), the new PM will
process them in the reversed order: c, b, a. If `a <-> b <-> c`, we can see
that `-print-after-all` will report the sole SCC as `scc: (c, b, a)`.

This patch corrects the iteration order. The discovered SCC order will match
the legacy PM in the common cases.

For some tests (`Transforms/Inline/cgscc-*.ll` and
`unittests/Analysis/CGSCCPassManagerTest.cpp`), the behaviors are dependent on
the SCC discovery order and there are too many check lines for the particular
order.  This patch simply reverses the function order to avoid changing too many
check lines.

Differential Revision: https://reviews.llvm.org/D90566
2020-11-02 13:22:42 -08:00
David Green 5ac21f9bfe [ARM] Cost model test for target intrinsics. NFC 2020-11-02 17:46:48 +00:00
Sanjay Patel 35fa3c474f [x86] add AVX2 cost model entries for maxnum of 256-bit vectors
As noticed in D90554 ,
the AVX2 costs for 256-bit vectors did not include FMAXNUM entries,
so we fell back to AVX1 which assumes those ops will be split into
128-bit halves or something close to that.

Differential Revision: https://reviews.llvm.org/D90613
2020-11-02 12:20:17 -05:00
Florian Hahn b3b993a7ad Reland "[TTI] Add VecPred argument to getCmpSelInstrCost."
This reverts the revert commit 408c4408fa.

This version of the patch includes a fix for a crash caused by
treating ICmp/FCmp constant expressions as instructions.

Original message:

On some targets, like AArch64, vector selects can be efficiently lowered
if the vector condition is a compare with a supported predicate.

This patch adds a new argument to getCmpSelInstrCost, to indicate the
predicate of the feeding select condition. Note that it is not
sufficient to use the context instruction when querying the cost of a
vector select starting from a scalar one, because the condition of the
vector select could be composed of compares with different predicates.

This change greatly improves modeling the costs of certain
compare/select patterns on AArch64.

I am also planning on putting up patches to make use of the new argument in
SLPVectorizer & LV.
2020-11-02 15:39:29 +00:00
Fangrui Song 7979f24954 [test] Fix some unused check prefixes in test/Analysis/CostModel/X86 2020-10-31 23:29:57 -07:00
David Green 30ad742644 [ARM] Fix crash for gather of pointer costs.
If the elt size is unknown due to it being a pointer, a comparison
against 0 will cause an assert. Make sure the elt size is large enough
before comparing and for the moment just return the scalar cost.
2020-10-31 13:10:14 +00:00
Arthur Eubanks 5c31b8b94f Revert "Use uint64_t for branch weights instead of uint32_t"
This reverts commit 10f2a0d662.

More uint64_t overflows.
2020-10-31 00:25:32 -07:00
Florian Hahn 408c4408fa Revert "[TTI] Add VecPred argument to getCmpSelInstrCost."
This reverts commit 73f01e3df5.

This appears to break
http://lab.llvm.org:8011/#/builders/85/builds/383.
2020-10-30 21:26:14 +00:00
Arthur Eubanks 10f2a0d662 Use uint64_t for branch weights instead of uint32_t
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D88609
2020-10-30 10:03:46 -07:00
Sanjay Patel 251dd7c0f9 [x86] add cost overrides for mul with overflow
I'm assuming the standard size integer instructions for this end up as something like:
mulq %rsi
seto %al

And the 'mul' generally has reciprocal throughput of 1 on typical implementations
(higher latency, but that's not handled here).
The default costs may end up much higher than that, and that's what we see in the test diffs.

Vector types are left as a 'TODO'.

Differential Revision: https://reviews.llvm.org/D90431
2020-10-30 12:38:16 -04:00
Florian Hahn 73f01e3df5 [TTI] Add VecPred argument to getCmpSelInstrCost.
On some targets, like AArch64, vector selects can be efficiently lowered
if the vector condition is a compare with a supported predicate.

This patch adds a new argument to getCmpSelInstrCost, to indicate the
predicate of the feeding select condition. Note that it is not
sufficient to use the context instruction when querying the cost of a
vector select starting from a scalar one, because the condition of the
vector select could be composed of compares with different predicates.

This change greatly improves modeling the costs of certain
compare/select patterns on AArch64.

I am also planning on putting up patches to make use of the new argument in
SLPVectorizer & LV.

Reviewed By: dmgreen, RKSimon

Differential Revision: https://reviews.llvm.org/D90070
2020-10-30 13:49:08 +00:00
Roman Lebedev b4916918e5
[SCEV] SCEVPtrToIntExpr simplifications
If we've got an SCEVPtrToIntExpr(op), where op is not an SCEVUnknown,
we want to sink the SCEVPtrToIntExpr into an operand,
so that the operation is performed on integers,
and eventually we end up with just an `SCEVPtrToIntExpr(SCEVUnknown)`.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D89692
2020-10-30 11:13:35 +03:00
Roman Lebedev 81fc53a36a
[SCEV] Introduce SCEVPtrToIntExpr (PR46786)
And use it to model LLVM IR's `ptrtoint` cast.

This is essentially an alternative to D88806, but with no chance for
all the problems it caused due to having the cast as implicit there.
(see rG7ee6c402474a2f5fd21c403e7529f97f6362fdb3)

As we've established by now, there are at least two reasons why we want this:
* It will allow SCEV to actually model the `ptrtoint` casts
  and their operands, instead of treating them as `SCEVUnknown`
* It should help with initial problem of PR46786 - this should eventually allow us
  to not loose pointer-ness of an expression in more cases

As discussed in [[ https://bugs.llvm.org/show_bug.cgi?id=46786 | PR46786 ]], in principle,
we could just extend `SCEVUnknown` with a `is ptrtoint` cast, because `ScalarEvolution::getPtrToIntExpr()`
should sink the cast as far down into the expression as possible,
so in the end we should always end up with `SCEVPtrToIntExpr` of `SCEVUnknown`.

But i think that it isn't the best solution, because it doesn't really matter
from memory consumption side - there probably won't be *that* many `SCEVPtrToIntExpr`s
for it to matter, and it allows for much better discoverability.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D89456
2020-10-30 11:13:35 +03:00
Dávid Bolvanský 7a2abf5aca [InferAttrs] Add nocapture/writeonly to string/mem libcalls
One step closer to fix PR47644.

Differential Revision: https://reviews.llvm.org/D89645
2020-10-29 20:06:43 +01:00
Sanjay Patel d5a75e7738 [x86] add test for umul intrinsic costs; NFC 2020-10-29 12:12:52 -04:00
Sanjay Patel 7c395f31a6 [CostModel][x86] remove cost-kind predicate for intrinsic costs
We model cost as number of instructions / uops, so it does not
make sense to treat size/blended costs any differently than
throughput.
2020-10-28 14:33:37 -04:00
Sanjay Patel 9df32c9044 [CostModel] remove cost-kind predicate for funnel shift costs
Completing the series of FIXME removals for special-case intrinsics:
50dfa19cc7
f2c25c7079
c963bde015
01ea93d85d

This one looks quite different than the others. The size/blended
cost is still potentially very far off from the throughput cost,
but this is hopefully not worse on the whole. It looks like the
underlying costs for the expanded shift/logic have their own
cost-kind limitations. Also, we are not asking the target if
it has a legal funnel shift op, so we just assume that the
intrinsic gets expanded.
2020-10-28 14:02:34 -04:00
Max Kazantsev 5ef84688fb Re-enable "[SCEV] Prove implications of different type via truncation"
When we need to prove implication of expressions of different type width,
the default strategy is to widen everything to wider type and prove in this
type. This does not interact well with AddRecs with negative steps and
unsigned predicates: such AddRec will likely not have a `nuw` flag, and its
`zext` to wider type will not be an AddRec. In contraty, `trunc` of an AddRec
in some cases can easily be proved to be an `AddRec` too.

This patch introduces an alternative way to handling implications of different
type widths. If we can prove that wider type values actually fit in the narrow type,
we truncate them and prove the implication in narrow type.

The return was due to revert of underlying patch that this one depends on.

Unit test temporarily disabled because the required logic in SCEV is switched
off due to compile time reasons.

Differential Revision: https://reviews.llvm.org/D89548
2020-10-28 16:02:14 +07:00
David Green 066737fdbc [AArch64] Remove AArch64ISD::NOT, use vnot instead
vnot (xor -1) should be equivalent to the AArch64 specific AArch64ISD::NOT
node, but allow more folding thanks to all the target independent
optimizations. Specifically this allows select(icmp ne, x, y) to
become "cmeq; bsl y, x" as opposed to needing to convert the predicate
with "cmeq; mvn; bsl x, y"

Unfortunately there is a regression in a cmtst test, but the code it
selected from was already non-canonical, with instcombine preferring to
use an eq predicate instead. Plus the more common case of icmp ne is
improved.

Differential Revision: https://reviews.llvm.org/D90126
2020-10-28 08:15:37 +00:00