Commit Graph

2432 Commits

Author SHA1 Message Date
Max Kazantsev 48d7cc6ae2 [SCEV] Fix incorrect treatment of max taken count. PR48225
SCEV makes a logical mistake when handling EitherMayExit in
case when both conditions must be met to exit the loop. The
mistake looks like follows: "if condition `A` fails within at most `X` first
iterations, and `B` fails within at most `Y` first iterations, then `A & B`
fails at most within `min (X, Y)` first iterations". This is wrong, because
both of them must fail at the same time.

Simple example illustrating this is following: we have an IV with step 1,
condition `A` = "IV is even", condition `B` = "IV is odd". Both `A` and `B`
will fail within first two iterations. But it doesn't mean that both of them
will fail within first two first iterations at the same time, which would mean
that IV is neither even nor odd at the same time within first 2 iterations.

We can only do so for known exact BE counts, but not for max.

Differential Revision: https://reviews.llvm.org/D91942
Reviewed By: nikic
2020-11-23 16:52:39 +07:00
Sanjay Patel 2717252c92 [CostModel] add basic handling for FP maximum/minimum intrinsics
This might be a regression for some ARM targets, but that should
be changed in the target-specific overrides.

There is apparently still no default lowering for these nodes,
so I am assuming these intrinsics are not in common use.
X86, PowerPC, and RISC-V for example, just crash given the most
basic IR.
2020-11-22 13:43:53 -05:00
Sanjay Patel 3a18f26723 [CostModel] add tests for FP maximum; NFC
These min/max intrinsics are not handled in the basic
implementation and probably not handled in target-specific
overrides either.
2020-11-22 13:33:42 -05:00
Nikita Popov 221c2b8862 [BasicAA] Add more phi-phi tests (NFC)
Test a few more variations:
 * NoAlias with different strides
 * MustAlias without loop
 * MustAlias with same stride
 * MustAlias base pointers with different stride
2020-11-22 16:53:06 +01:00
Nikita Popov 072ddff3f2 [BasicAA] Add recphi test with dynamic offset (NFC)
Currently, we don't recognize that %a an %p don't alias.
2020-11-21 17:37:41 +01:00
Matt Arsenault 1d1234b2a4 OpaquePtr: Update more tests to use typed sret 2020-11-20 20:08:43 -05:00
Matt Arsenault 20c43d6bd5 OpaquePtr: Bulk update tests to use typed sret 2020-11-20 17:58:26 -05:00
Matt Arsenault 06c192d454 OpaquePtr: Bulk update tests to use typed byval
Upgrade of the IR text tests should be the only thing blocking making
typed byval mandatory. Partially done through regex and partially
manual.
2020-11-20 14:00:46 -05:00
Sanjay Patel e32bd35120 [CostModel] mostly remove cost-kind predicate for intrinsics in basic TTI implementation
This is re-applying a combination of f7eac51b9b and 8ec7ea3ddc as one patch
to avoid regressions now that we have better testing in place.

Those were reverted with 32dd5870ee because of crashing in experimental intrinsics.
That bug should be fixed with 7ae346434.

Paraphrased original commit messages:

This is the last step in removing cost-kind as a consideration in the
basic class model for intrinsics.
See D89461 for the start of that.
Subsequent commits dealt with each of the special-case intrinsics that
had customization here in the basic class. This should remove a barrier
to retrying D87188 (canonicalization to the abs intrinsic).

The ARM and x86 cost diffs seen here may be wrong because the
target-specific overrides have their own bugs, but we hope this is
less wrong - if something has a significant throughput cost, then it
should have a significant size / blended cost too by default.

The only behavioral diff in current regression tests is shown in the
x86 scatter-gather test (which is misplaced or broken because it runs
the entire -O3 pipeline) - we unrolled less, and we assume that is
a improvement.

Exception: in general, we want the *size* cost for a scalar call to be
cheap even if the other costs are expensive - we expect it to just be
a branch with some optional stack manipulation.

It is likely that we will want to carve out some
exceptions/overrides to this rule as follow-up patches for
calls that have some general and/or target-specific difference
to the expected lowering.

This was noticed as a regression in unrolling, so we have a test
for that now along with a couple of direct cost model tests.

If the assumed scalarization costs for the oversized vector
calls are not realistic, that would be another follow-up
refinement of the cost models.

Differential Revision: https://reviews.llvm.org/D90554
2020-11-20 11:21:10 -05:00
Sanjay Patel 7ae346434a [CostModel] avoid crashing while finding scalarization overhead
The constrained intrinsics have metadata arguments, so the
tests here were crashing as noted in D90554 (and that was
reverted even though this bug exists independently of that
change).
2020-11-20 10:18:29 -05:00
Sanjay Patel 1285781fc5 [CostModel] add tests for math library calls; NFC
This is a partial un-revert of 32dd5870ee (originally df09f82599 ).

I'm adding back the baseline tests first, so we don't have
to back-track as much in case there are still problems.
2020-11-20 08:24:49 -05:00
Max Kazantsev 2290daa938 [Test] Auto-update checks in a test 2020-11-20 16:53:51 +07:00
Max Kazantsev 0c101c9cbc [Test] Add tests demonstrating a bug in SCEV, PR48225
Slightly simplified version of original test reported by Congzhe Cao.
2020-11-20 15:59:22 +07:00
Eric Christopher 32dd5870ee Temporarily Revert "[CostModel] remove cost-kind predicate for intrinsics in basic TTI implementation"
as it's causing crashes in the optimizer. A reduced testcase has been posted as a follow-up.

This reverts commit f7eac51b9b.

Temporarily Revert "[CostModel] make default size cost for libcalls small (again)" as it depends upon the primary revert.

This reverts commit 8ec7ea3ddc.

Temporarily Revert "[CostModel] add tests for math library calls; NFC" as it depends upon the primary revert.

This reverts commit df09f82599.

Temporarily Revert "[LoopUnroll] add test for full unroll that is sensitive to cost-model; NFC" as it depends upon the primary revert.

This reverts commit 618d555e8d.
2020-11-19 22:10:23 -08:00
Artur Pilipenko 887c7660bd [BasicAA] Deoptimize intrinsics don't modify memory
Similarly to assumes and guards deoptimize intrinsics are
marked as writing to ensure proper control dependencies
but they never modify any particular memory location.

Differential Revision: https://reviews.llvm.org/D91658
2020-11-19 12:08:33 -08:00
Mircea Trofin f62fe0ee3b [FileCheck] Disallow unused prefixes in llvm/test/Analysis
This is achieved through a substitution of FileCheck in lit.cfg.py,
where we explicitly set -allow-unused-prefixes to false.

We also introduce a %FileCheckWithUnusedPrefixes% substitution that can
be used in those cases where we want to allow unused prefixes, even if
the folder policy is to disallow them.

Differential Revision: https://reviews.llvm.org/D91275
2020-11-19 07:56:35 -08:00
Nikita Popov cd3c22c47e [BasicAA] Generalize base offset modulus handling
The GEP aliasing implementation currently has two pieces of code
that solve two different subsets of the same basic problem: If you
have GEPs with offsets 4*x + 0 and 4*y + 1 (assuming access size 1),
then they do not alias regardless of whether x and y are the same.

One implementation is in aliasSameBasePointerGEPs(), which looks at
this in a limited structural way. It requires both GEP base pointers
to be exactly the same, then (optionally) a number of equal indexes,
then an unknown index, then a non-equal index into a struct. This
set of limitations works, but it's overly restrictive and hides the
core property we're trying to exploit.

The second implementation is part of aliasGEP() itself and tries to
find a common modulus in the scales, so it can then check that the
constant offset doesn't overlap under modular arithmetic. The second
implementation has the right idea of what the general problem is,
but effectively only considers power of two factors in the scales
(while aliasSameBasePointerGEPs also works with non-pow2 struct sizes.)

What this patch does is to adjust the aliasGEP() implementation to
instead find the largest common factor in all the scales (i.e. the GCD)
and use that as the modulus.

Differential Revision: https://reviews.llvm.org/D91027
2020-11-18 21:48:49 +01:00
Nikita Popov 85ccdcaa50 [BasicAA] Remove assert in AA evaluator
As reported in https://reviews.llvm.org/D91383#2401825, this
assert breaks external -aa-eval tests. We'll have to fix this
case before re-enabling it.
2020-11-18 20:04:38 +01:00
Craig Topper f0b0bab34d [X86] Use GF2P8AFFINEQB to implement vector bitreverse.
We can use GF2P8AFFINEQB to reverse bits in a byte. Shuffles are needed to reverse the bytes in elements larger than i8. LegalizeVectorOps takes care of inserting the shuffle for the larger element size.

We already have Custom lowering for v16i8 with SSSE3, v32i8 with AVX, and v64i8 with AVX512BW.

I think we might be able to use this for scalars too by moving into a vector and back. But I'll save that for a follow up as its a little more involved.

Reviewed By: RKSimon, pengfei

Differential Revision: https://reviews.llvm.org/D91515
2020-11-17 23:49:06 -08:00
Arthur Eubanks 9e3b4f4941 [JumpThreading] Make -print-lvi-after-jump-threading work with NPM 2020-11-17 23:15:20 -08:00
Wei Wang 3279347da0 [BPI] Look through bitcasts in calcZeroHeuristic
Constant hoisting may hide the constant value behind bitcast for And's
operand. Track down the constant to make the BFI result consistent
regardless of hoisting.

Differential Revision: https://reviews.llvm.org/D91450
2020-11-17 09:33:05 -08:00
Nikita Popov cb4fc25c91 [BasicAA] Make alias GEP positive offset handling symmetric
aliasGEP() currently implements some special handling for the case
where all variable offsets are positive, in which case the constant
offset can be taken as the minimal offset. However, it does not
perform the same handling for the all-negative case. This means that
the alias-analysis result between two GEPs is asymmetric:
If GEP1 - GEP2 is all-positive, then GEP2 - GEP1 is all-negative,
and the first will result in NoAlias, while the second will result
in MayAlias.

Apart from producing sub-optimal results for one order, this also
violates our caching assumption. In particular, if BatchAA is used,
the cached result depends on the order of the GEPs in the first query.
This results in an inconsistency in BatchAA and AA results, which
is how I noticed this issue in the first place.

Differential Revision: https://reviews.llvm.org/D91383
2020-11-17 18:05:34 +01:00
Caroline Concatto 6c4d8f4651 [AArch64] Add check for widening instruction for SVE.
This patch fixes the function isWideningInstruction for scalable vectors.
Now the cost model can check the widening pattern for SVE.

Differential Revision: https://reviews.llvm.org/D91260
2020-11-16 12:30:08 +00:00
Florian Hahn 7fa8b62920 [MemorySSA] Add pointer decrement loop clobber test case. 2020-11-15 18:00:01 +00:00
Nikita Popov 9ace4b337f Revert "[SCEV] Factor out part of wrap flag detection logic [NFC-ish]"
This reverts commit 1ec6e1eb8a.

This change causes a significant compile-time regression:
https://llvm-compile-time-tracker.com/compare.php?from=dd0b8b94d0796bd895cc998dd163b4fbebceb0b8&to=1ec6e1eb8a084bffae8a40236eb9925d8026dd07&stat=instructions

I assume that this is due to the non-NFC part of the change, which
now performs expensive nowrap inference even for nowrap flags that
are not used by the particular code.
2020-11-15 10:19:44 +01:00
Philip Reames 1ec6e1eb8a [SCEV] Factor out part of wrap flag detection logic [NFC-ish]
In an effort to make code around flag determination more readable, and (possibly) prepare for a follow up change, factor out some of the flag detection logic.  In the process, reduce the number of locations we mutate wrap flags by a couple.

Note that this isn't NFC.  The old code tried for NSW xor (NUW || NW).  This is, two different paths computed different sets of wrap flags.  The new code will try for all three.  The result is that some expressions end up with a few extra flags set.
2020-11-14 19:21:05 -08:00
Nikita Popov 0b72444211 [BasicAA] Remove unnecessary size limitation
We're dropping a common offset from both GEPs here. It's not
necessary for the access sizes to be the same as well.
2020-11-14 16:51:31 +01:00
Sanjay Patel 8ec7ea3ddc [CostModel] make default size cost for libcalls small (again)
This was changed recently with D90554 / f7eac51b9b
...because we had a regression testing blindspot for intrinsics
that are expected to be lowered to libcalls.

In general, we want the *size* cost for a scalar call to be cheap
even if the other costs are expensive - we expect it to just be
a branch with some optional stack manipulation.

It is likely that we will want to carve out some
exceptions/overrides to this rule as follow-up patches for
calls that have some general and/or target-specific difference
to the expected lowering.

This was noticed as a regression in unrolling, so we have a test
for that now along with a couple of direct cost model tests.

If the assumed scalarization costs for the oversized vector
calls are not realistic, that would be another follow-up
refinement of the cost models.
2020-11-14 08:15:35 -05:00
Sanjay Patel df09f82599 [CostModel] add tests for math library calls; NFC 2020-11-14 08:15:35 -05:00
Simon Pilgrim e11195d0a9 [CostModel][X86] Remove unused CHECK prefixes
Allows us to remove the "CHECK: {{^}}" hack and help simplify D91275
2020-11-13 17:31:48 +00:00
Nikita Popov f3124a46c1 [SCEV] Fix nsw flags for GEP expressions
The SCEV code for constructing GEP expressions currently assumes
that the addition of the base and all the offsets is nsw if the GEP
is inbounds. While the addition of the offsets is indeed nsw, the
addition to the base address is not, as the base address is
interpreted as an unsigned value.

Fix the GEP expression code to not assume nsw for the base+offset
calculation. However, do assume nuw if we know that the offset is
non-negative. With this, we use the same behavior as the
construction of GEP addrecs does. (Modulo the fact that we
disregard SCEV unification, as the pre-existing FIXME points out).

Differential Revision: https://reviews.llvm.org/D90648
2020-11-13 18:19:32 +01:00
Nikita Popov c00545dc32 [BasicAA] Remove checks for GEP decomposition limit reached
The GEP aliasing code currently checks for the GEP decomposition
limit being reached (i.e., we did not reach the "final" underlying
object). As far as I can see, these checks are not necessary. It is
perfectly fine to work with a GEP whose base can still be further
decomposed.

Looking back through the commit history, these checks were originally
introduced in 1a444489e9. However, I
believe that the problem this was intended to address was later
properly fixed with 1726fc698c, and
the checks are no longer necessary since then (and were not the
right fix in the first place).

Differential Revision: https://reviews.llvm.org/D91010
2020-11-12 20:43:38 +01:00
Jamie Schmeiser 5f672fefeb Reland: Introduce -dot-cfg-mssa option which creates dot-cfg style file with mssa comments included in source
Summary:
Expand the print-memoryssa and print<memoryssa> passes with a new hidden
option -cfg-dot-mssa that names a file. When set, a dot-cfg style file
will be generated into the named file with the memoryssa comments retained
and those blocks containing them shown in light pink. The option does
nothing in isolation.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>

Reviewed By: asbirlea (Alina Sbirlea), dblaikie (David Blaikie)

Differential Revision: https://reviews.llvm.org/D90638
2020-11-12 17:39:14 +00:00
Anh Tuyen Tran a20b3620bb Revert "Introduce -dot-cfg-mssa option which creates dot-cfg style file with mssa comments included in source"
This reverts commit 45d459e752 due to
build issue in Poly.
2020-11-12 15:48:14 +00:00
Jamie Schmeiser 45d459e752 Introduce -dot-cfg-mssa option which creates dot-cfg style file with mssa comments included in source
Summary:
Expand the print-memoryssa and print<memoryssa> passes with a new hidden
option -cfg-dot-mssa that names a file. When set, a dot-cfg style file
will be generated into the named file with the memoryssa comments retained
and those blocks containing them shown in light pink. The option does
nothing in isolation.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>

Reviewed By: asbirlea (Alina Sbirlea), dblaikie (David Blaikie)

Differential Revision: https://reviews.llvm.org/D90638
2020-11-12 15:41:16 +00:00
Caroline Concatto 37f4ccb275 [AArch64]Add memory op cost model for SVE
This patch adds/fixes memory op cost model for SVE with fixed-width
vector.

Differential Revision: https://reviews.llvm.org/D90950
2020-11-11 12:49:19 +00:00
Simon Pilgrim ca59d37e0e [ValueTacking] assume-queries-counter.ll - remove unused check prefix 2020-11-10 14:31:03 +00:00
Simon Pilgrim 87902b2ed0 [BasicAA] phi-values-usage.ll - remove unused check prefix 2020-11-10 14:31:03 +00:00
Simon Pilgrim 88fe246a34 [ScalarEvolution] Remove unused check prefixes 2020-11-10 14:31:02 +00:00
Sanjay Patel f7eac51b9b [CostModel] remove cost-kind predicate for intrinsics in basic TTI implementation
This is the last step in removing cost-kind as a consideration in the basic class model for intrinsics.
See D89461 for the start of that.
Subsequent commits dealt with each of the special-case intrinsics that had customization here in the
basic class. This should remove a barrier to retrying
D87188 (canonicalization to the abs intrinsic).

The ARM and x86 cost diffs seen here may be wrong because the target-specific overrides have their own
bugs, but we hope this is less wrong - if something has a significant throughput cost, then it should
have a significant size / blended cost too by default.

The only behavioral diff in current regression tests is shown in the x86 scatter-gather test (which is
misplaced or broken because it runs the entire -O3 pipeline) - we unrolled less, and we assume that is
a improvement.

Differential Revision: https://reviews.llvm.org/D90554
2020-11-10 08:19:31 -05:00
Simon Pilgrim 20bbe14ac8 [CostModel][ARM] Remove unused check-prefix 2020-11-10 13:10:12 +00:00
Simon Pilgrim bd2c0e2c9f [CostModel][AArch64] Remove unused check-prefix 2020-11-10 13:10:11 +00:00
Simon Pilgrim fe9403df06 [CostModel][X86] Remove unused check-prefixes 2020-11-10 12:48:35 +00:00
Max Kazantsev 6022a8b7e8 [SCEV] Drop cached ranges of AddRecs after flag update
Our range computation methods benefit from no-wrap flags. But if the ranges
were first computed before the flags were set, the cached range will be too
pessimistic.

We need to drop cached ranges whenever we sharpen AddRec's no wrap flags.

Differential Revision: https://reviews.llvm.org/D89847
Reviewed By: fhahn
2020-11-10 12:37:12 +07:00
Nikita Popov dd5b51f4fa [BasicAA] Add test for decomposition limit (NFC)
Test behavior before/at/after the GEP decomposition limit.
2020-11-09 21:31:11 +01:00
Michael Liao fa5d31f825 [GlobalsAA] Teach to handle `addrspacecast`. 2020-11-09 00:04:52 -05:00
Sanjay Patel 264a6df353 [ARM] remove cost-kind predicate for cmp/sel costs
This is the cmp/sel sibling to D90692.
Again, the reasoning is: the throughput cost is number of instructions/uops,
so size/blended costs are identical except in special cases (for example,
fdiv or other known-expensive machine instructions or things like MVE that
may require cracking into >1 uops).

We need to check for a valid (non-null) condition type parameter because
SimplifyCFG may pass nullptr for that (and so we will crash multiple
regression tests without that check). I'm not sure if passing nullptr makes
sense, but other code in the cost model does appear to check if that param
is set or not.

Differential Revision: https://reviews.llvm.org/D90781
2020-11-05 14:52:25 -05:00
Arthur Eubanks 06926e0f01 Port print-must-be-executed-contexts and print-mustexecute to NPM
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D90207
2020-11-03 21:06:46 -08:00
Sanjay Patel c40126e740 [ARM] remove cost-kind predicate for most math op costs
This is based on the same idea that I am using for the basic model implementation
and what I have partly already done for x86: throughput cost is number of
instructions/uops, so size/blended costs are identical except in special cases
(for example, fdiv or other known-expensive machine instructions or things like
MVE that may require cracking into >1 uop)).

Differential Revision: https://reviews.llvm.org/D90692
2020-11-03 17:23:46 -05:00
Sanjay Patel 3c050a597c [CostModel] fix cost calc bug for sadd/ssub with overflow
As noted in D90554, there's an opcode typo in using an easily
misused cost model API: getCmpSelInstrCost(). Beyond that, the
assumed sequence of ops is questionable, but that would be
another patch.

My guess is that the x86 test diffs show that we are probably
wrong both before and after this change, so there will be no
practical difference.
As an example, I tried this test which shows a cost of '7'
either way:

  define <4 x i32> @sadd(<4 x i32> %va, <4 x i32> %vb) {
    %V4I32  = call {<4 x i32>, <4 x i1>}  @llvm.sadd.with.overflow.v4i32(<4 x i32> %va, <4 x i32> %vb)
    %ov = extractvalue {<4 x i32>, <4 x i1>} %V4I32, 1
    %r = extractvalue {<4 x i32>, <4 x i1>} %V4I32, 0
    %z = select <4 x i1> %ov, <4 x i32> <i32 42, i32 42, i32 42, i32 42>, <4 x i32> %r
    ret <4 x i32> %z
  }

  $ llc -o - sadd.ll -mattr=avx
        vpaddd  %xmm1, %xmm0, %xmm2
        vpcmpgtd        %xmm2, %xmm0, %xmm0
        vpxor   %xmm0, %xmm1, %xmm0
        vblendvps       %xmm0, LCPI0_0(%rip), %xmm2, %xmm0a

Differential Revision: https://reviews.llvm.org/D90681
2020-11-03 11:03:47 -05:00