Commit Graph

2612 Commits

Author SHA1 Message Date
Max Kazantsev 624fc63a05 [SCEV] Re-enable "Use nw flag and symbolic iteration count to sharpen ranges of AddRecs", attempt 3
We can sharpen the range of a AddRec if we know that it does not
self-wrap and know the symbolic iteration count in the loop. If we can
evaluate the value of AddRec on the last iteration and prove that at least
one its intermediate value lies between start and end, then no-wrap flag
allows us to conclude that all of them also lie between start and end. So
the estimate of range can be improved to union of ranges of start and end.

Switched off by default, can be turned on by flag.

Differential Revision: https://reviews.llvm.org/D89381
Reviewed By: lebedev.ri, nikic
2020-10-28 12:39:41 +07:00
Sanjay Patel 50dfa19cc7 [CostModel] remove cost-kind predicate for FP add/mul vector reduction costs
This was originally part of:
f2c25c7079
but that was reverted because there was an underlying bug in
processing the vector type of these intrinsics. That was
fixed with:
74ffc823ed

This is similar in spirit to 01ea93d85d (memcpy) except that
here the underlying caller assumptions were created for vectorizer
use (throughput) rather than other passes.

That meant targets could have an enormous throughput cost with no
corresponding size, latency, or blended cost increase.

Paraphrasing from the previous commits:
This may not make sense for some callers, but at least now the
costs will be consistently wrong instead of mysteriously wrong.

Targets should provide better overrides if the current modeling
is not accurate.
2020-10-27 18:00:20 -04:00
Sanjay Patel 138fda5dd2 [CostModel] add tests for FP reductions; NFC 2020-10-27 18:00:20 -04:00
Nico Weber 2a4e704c92 Revert "Use uint64_t for branch weights instead of uint32_t"
This reverts commit e5766f25c6.
Makes clang assert when building Chromium, see https://crbug.com/1142813
for a repro.
2020-10-27 09:26:21 -04:00
Shimin Cui 22e4346e05 [ValueTracking] Add tracking of the alignment assume bundle
This patch is to add the support of the value tracking of the alignment assume bundle.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D88669
2020-10-27 12:16:45 +00:00
Arthur Eubanks e5766f25c6 Use uint64_t for branch weights instead of uint32_t
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D88609
2020-10-26 20:24:04 -07:00
Bing1 Yu 2c08f1b4b6 [CostModel][X86] teach TTI calculate cost of chain of vector inserts/extracts more precisely and correctly:In each 128-lane, if there is at least one index is demanded and not all indices are demanded...
In each 128-lane, if there is at least one index is demanded and not all
indices are demanded and this 128-lane is not the first 128-lane of the
legalized-vector, then this 128-lane needs a extracti128;
If in each 128-lane, there is at least one index is demanded, this 128-lane
needs a inserti128.

The following cases will help you build a better understanding:
Assume we insert several elements into a v8i32 vector in avx2,
Case#1: inserting into 1th index needs vpinsrd + inserti128
Case#2: inserting into 5th index needs extracti128 + vpinsrd +
inserti128
Case#3: inserting into 4,5,6,7 index needs 4*vpinsrd + inserti128.

Reviewed By: pengfei, RKSimon

Differential Revision: https://reviews.llvm.org/D89767
2020-10-27 11:21:13 +08:00
Joe Ellis bf60bb26ec [SVE] Fix TypeSize warning in llvm::getGEPInductionOperand
We do not need to use the implicit cast here. We can instead can rely on
a comparison between two TypeSize objects instead. This algorithm will
work fine with scalable vectors.

Reviewed By: DavidTruby

Differential Revision: https://reviews.llvm.org/D90146
2020-10-26 17:40:32 +00:00
Joe Ellis 0383a1a8c2 [SVE][AArch64] Fix TypeSize warning in GEP cost analysis
The warning would fire when calling getGEPCost for analyzing the cost of
a GEP instruction. This would result in the use of the now deprecated
implicit cast of TypeSize to uint64_t through the overloaded operator.

This patch fixes the issue by using getKnownMinSize instead of the
implicit cast. This is possible because the code is already
scalable-vector aware. The semantic behaviour of the code is unchanged
by this patch.

Reviewed By: sdesmalen, fpetrogalli

Differential Revision: https://reviews.llvm.org/D89872
2020-10-26 17:40:19 +00:00
Tyker d3205bbca3 [Annotation] Allows annotation to carry some additional constant arguments.
This allows using annotation in a much more contexts than it currently has.
especially when annotation with template or constexpr.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D88645
2020-10-26 10:50:05 +01:00
Sanjay Patel f2c25c7079 [CostModel] remove cost-kind predicate for some vector reduction costs
This is a modified 2nd try of 22d10b8ab4
(reverted by 1c8371692d because it managed
to expose an existing crashing bug that should be fixed by
74ffc823 ).

Original commit message:

This is similar in spirit to 01ea93d85d (memcpy) except that
here the underlying caller assumptions were created for vectorizer
use (throughput) rather than other passes.

That meant targets could have an enormous throughput cost with no
corresponding size, latency, or blended cost increase.
The ARM costs show a small difference between throughput and
size because there's an underlying difference in cmp/sel
costs that is also predicated on cost-kind.

Paraphrasing from the previous commits:
This may not make sense for some callers, but at least now the
costs will be consistently wrong instead of mysteriously wrong.

Targets should provide better overrides if the current modeling
is not accurate.
2020-10-25 15:17:52 -04:00
Sanjay Patel 74ffc823ed [CostModel] fix operand/type accounting for fadd/fmul reductions
I'm not sure if/how this ever worked, but it must not be tested
currently because the basic tests added here were crashing as
noted in the post-review comments for 1c83716 (which reverted
another cost-model fix in 22d10b8ab4).
2020-10-25 15:01:19 -04:00
Nikita Popov ebeef022aa [SCEV] Strenthen nowrap flags after constant folding for mul exprs
Same change as 0dda633317, but for
mul expressions. We want to first fold any constant operans and
then strengthen the nowrap flags, as we can compute more precise
flags at that point.
2020-10-25 19:43:58 +01:00
Nikita Popov 1ff313f098 [SCEV] Always constant fold mul expression operands
Establish parity with the handling of add expressions, by always
constant folding mul expression operands before checking the depth
limit (this is a non-recursive simplification). The code was already
unconditionally constant folding the case where all operands were
constants, but was not folding multiple constant operands together
if there were also non-constant operands.

This requires picking out a different demonstration for depth-based
folding differences in the limit-depth.ll test.
2020-10-25 18:50:06 +01:00
Nikita Popov 0dda633317 [SCEV] Strength nowrap flags after constant folding
We should first try to constant fold the add expression and only
strengthen nowrap flags afterwards. This allows us to determine
stronger flags if e.g. only two operands are left after constant
folding (and thus "guaranteed no wrap region" code applies) or the
resulting operands are non-negative and thus nsw->nuw strengthening
applies.
2020-10-25 18:00:22 +01:00
Martin Storsjö 1c8371692d Revert "[CostModel] remove cost-kind predicate for vector reduction costs"
This reverts commit 22d10b8ab4.

This broke compilation e.g. like this:
$ cat synth.c
*a;
float *b;
c() {
  for (;;) {
    float d = -*b * *a++;
    d -= *--b * *a++;
    d -= *--b * *a;
    d -= *--b * *a;
    e(d);
  }
}
$ clang -target x86_64-linux-gnu -c -O2 -ffast-math synth.c
clang: ../include/llvm/Support/Casting.h:104: static bool llvm::isa_impl
_cl<To, const From*>::doit(const From*) [with To = llvm::PointerType; Fr
om = llvm::Type]: Assertion `Val && "isa<> used on a null pointer"' fail
ed.
2020-10-25 08:47:54 +02:00
Sanjay Patel 22d10b8ab4 [CostModel] remove cost-kind predicate for vector reduction costs
This is similar in spirit to 01ea93d85d (memcpy) except that
here the underlying caller assumptions were created for vectorizer
use (throughput) rather than other passes.

That meant targets could have an enormous throughput cost with no
corresponding size, latency, or blended cost increase.
The ARM costs show a small difference between throughput and
size because there's an underlying difference in cmp/sel
costs that is also predicated on cost-kind.

Paraphrasing from the previous commits:
This may not make sense for some callers, but at least now the
costs will be consistently wrong instead of mysteriously wrong.

Targets should provide better overrides if the current modeling
is not accurate.
2020-10-24 13:20:17 -04:00
dfukalov 9068c20965 [AMDGPU][CostModel] Refine cost model for half- and quarter-rate instructions.
1. Throughput and codesize costs estimations was separated and updated.
2. Updated fdiv cost estimation for different cases.
3. Added scalarization processing for types that are treated as !isSimple() to
improve codesize estimation in getArithmeticInstrCost() and
getArithmeticInstrCost(). The code was borrowed from TCK_RecipThroughput path
of base implementation.

Next step is unify scalarization part in base class that is currently works for
TCK_RecipThroughput path only.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D89973
2020-10-24 19:53:08 +03:00
Florian Hahn 089c1ccd6d [AArch64] Add vector compare/select cost-model tests. 2020-10-23 20:43:04 +01:00
Florian Hahn 0fcc6f7a76 [AArch64] Implement getIntrinsicInstrCost, handle min/max intrinsics.
This patch adds a specialized implementation of getIntrinsicInstrCost
and add initial cost-modeling for min/max vector intrinsics.

AArch64 NEON support umin/smin/umax/smax for vectors
<8 x i8>, <16 x i8>, <4 x i16>, <8 x i16>, <2 x i32> and <4 x i32>.
Notably, it does not support vectors with i64 elements.

This change by itself should have very little impact on codegen, but in
follow-up patches I plan to teach the vectorizers to consider using
those intrinsics on platforms where it is profitable, e.g. because there
is no general 'select'-like instruction.

The current cost returned should be better for throughput, latency and size.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D89953
2020-10-23 11:32:42 +01:00
Nikita Popov 1882568fcb [BasicAA] Only add visited phi blocks temporarily
Visited phi blocks only need to be added for the duration of the
recursive alias queries, they should not leak into following code.

Once again, while this also improves analysis precision, this is
mainly intended to clarify the applicability scope of VisitedPhiBBs.
2020-10-22 22:26:29 +02:00
Nikita Popov 2b372570ee [BasicAA] Don't track visited blocks for phi-phi alias query
We only need the VisitedPhiBBs to disambiguate comparisons of
values from two different loop iterations. If we're comparing
two phis from the same basic block in lock-step, the compared
values will always be on the same iteration.

While this also increases precision, this is mainly intended
to clarify the scope of VisitedPhiBBs.
2020-10-22 22:12:21 +02:00
Nikita Popov 17690ee79a [BasicAA] Add additional phi tests (NFC) 2020-10-22 21:53:19 +02:00
Florian Hahn c1705e0ba4 [AArch64] Add min/max cost-model tests for v2i32. 2020-10-22 16:04:13 +01:00
Florian Hahn d6efc87518 [AArch64] Add min/max cost-model tests for v4i16. 2020-10-22 15:47:50 +01:00
Florian Hahn fbb6375db0 [AArch64] Add cost model tests for min/max intrinsics. 2020-10-22 13:28:04 +01:00
Arthur Eubanks 55c4ff9860 [test] Fix tests using -analyze that fail under NPM
Many of these tests don't use the output of -analyze.
2020-10-21 21:54:30 -07:00
Arthur Eubanks 7d6c3e509a [test] Fix quadradic-exit-value.ll under NPM 2020-10-21 13:33:01 -07:00
Arthur Eubanks 1d1217c4ea [test] Fix no-wrap-symbolic-becount.ll under NPM 2020-10-21 13:15:15 -07:00
Sanjay Patel c963bde015 [CostModel] remove cost-kind predicate for scatter/gather cost
This is similar in spirit to 01ea93d85d (memcpy) except that
here the underlying caller assumptions were created for vectorizer
use (throughput) rather than other passes.

That meant ARM could have an enormous throughput cost with no
corresponding size, latency, or blended cost increase. X86 has
the same throughput restriction as the basic implementation, so
it is still unchanged.

Paraphrasing from the previous commit:
This may not make sense for some callers, but at least now the
costs will be consistently wrong instead of mysteriously wrong.

Targets should provide better overrides if the current modeling
is not accurate.
2020-10-21 14:26:05 -04:00
Sanjay Patel 729610a51a [ARM] add cost-kind tests for intrinsics; NFC
This is a copy of the x86 file to provide better coverage;
x86 may have strange overrides that mask changes in the
generic model.
2020-10-21 14:26:04 -04:00
Sanjay Patel 01ea93d85d [CostModel] remove cost-kind predicate for memcpy cost
The default implementation base returns TCC_Expensive (currently
set to '4'), so that explains the test diff. This probably does
not make sense for most callers, but at least now the costs will
be consistently wrong instead of mysteriously wrong.

The ARM target has an override that tries to model codegen expansion,
and that should likely be adapted for general usage.

This probably does not affect anything because the vectorizers are
the primary users of the throughput cost, but memcpy is not listed
as a trivially vectorizable intrinsic.
2020-10-21 08:50:44 -04:00
Max Kazantsev bed02fa8b0 Revert "[SCEV] Prove implications of different type via truncation"
This reverts commit 80852a4f2f.

Test is now broken because underlying required patch was also reverted SUDDENLY.
2020-10-21 13:03:46 +07:00
Max Kazantsev 80852a4f2f [SCEV] Prove implications of different type via truncation
When we need to prove implication of expressions of different type width,
the default strategy is to widen everything to wider type and prove in this
type. This does not interact well with AddRecs with negative steps and
unsigned predicates: such AddRec will likely not have a `nuw` flag, and its
`zext` to wider type will not be an AddRec. In contraty, `trunc` of an AddRec
in some cases can easily be proved to be an `AddRec` too.

This patch introduces an alternative way to handling implications of different
type widths. If we can prove that wider type values actually fit in the narrow type,
we truncate them and prove the implication in narrow type.

Differential Revision: https://reviews.llvm.org/D89548
Reviewed By: fhahn
2020-10-21 12:53:22 +07:00
Fangrui Song d9f91a3d14 Revert D89381 "[SCEV] Recommit "Use nw flag and symbolic iteration count to sharpen ranges of AddRecs", attempt 2"
This reverts commit a10a64e7e3.

It broke polly/test/ScopInfo/NonAffine/non-affine-loop-condition-dependent-access_3.ll
The difference suggests that this may be a serious issue.
2020-10-20 21:03:58 -07:00
Roman Lebedev d1946469d6
[NFC][SCEV] Improve/rework test coverage for ptrtoint handling 2020-10-20 14:17:56 +03:00
sstefan1 fbfb1c7909 [IR] Make nosync, nofree and willreturn default for intrinsics.
D70365 allows us to make attributes default. This is a follow up to
actually make nosync, nofree and willreturn default. The approach we
chose, for now, is to opt-in to default attributes to avoid introducing
problems to target specific intrinsics. Intrinsics with default
attributes can be created using `DefaultAttrsIntrinsic` class.
2020-10-20 11:57:19 +02:00
Max Kazantsev a10a64e7e3 [SCEV] Recommit "Use nw flag and symbolic iteration count to sharpen ranges of AddRecs", attempt 2
Fixed wrapping range case & proof methods reduced to constant range
checks to save compile time.

Differential Revision: https://reviews.llvm.org/D89381
2020-10-20 11:32:36 +07:00
Arthur Eubanks 0f0ff33037 [NPM][StackSafetyAnalysis] Pin uses of -analyze to legacy PM
Tests already have corresponding NPM RUN lines.
2020-10-19 21:24:03 -07:00
Florian Hahn 3cbdae22b9 [SCEV] Add tests where assumes can be used to improve tripe multiple.
This patch adds a set of tests where information from assumes can be
used to improve the trip multiple.

See PR47904.
2020-10-19 18:26:09 +01:00
Max Kazantsev c153d48b15 [Test] Add one more SCEV range test 2020-10-19 13:38:20 +07:00
Roman Lebedev ec54867df5
[SCEV] Model `ashr exact x, C` as `(abs(x) EXACT/u (1<<C)) * signum(x)`
It's not pretty, but probably better than modelling it
as an opaque SCEVUnknown, i guess.

It is relevant e.g. for the loop that was brought up in
https://bugs.llvm.org/show_bug.cgi?id=46786#c26
as an example of what we'd be able to better analyze
once SCEV handles `ptrtoint` (D89456).

But as it is evident, even if we deal with `ptrtoint` there,
we also fail to model such an `ashr`.
Also, modeling of mul-of-exact-shr/div could use improvement.

As per alive2:
https://alive2.llvm.org/ce/z/tnfZKd
```
define i8 @src(i8 %0) {
  %2 = ashr exact i8 %0, 4
  ret i8 %2
}

declare i8 @llvm.abs(i8, i1)
declare i8 @llvm.smin(i8, i8)
declare i8 @llvm.smax(i8, i8)

define i8 @tgt(i8 %x) {
  %abs_x = call i8 @llvm.abs(i8 %x, i1 false)
  %div = udiv exact i8 %abs_x, 16
  %t0 = call i8 @llvm.smax(i8 %x, i8 -1)
  %t1 = call i8 @llvm.smin(i8 %t0, i8 1)
  %r = mul nsw i8 %div, %t1
  ret i8 %r
}
```
Transformation seems to be correct!
2020-10-17 21:22:24 +03:00
Roman Lebedev bd6d41f52e
[NFC][SCEV] Add some more ptrtoint/PR46786 -related tests 2020-10-17 21:04:44 +03:00
David Green b93d74ac9c [ARM] Basic getArithmeticReductionCost reduction costs
This adds some basic costs for MVE reductions - currently just costing
the simple legal add vectors as a single MVE instruction. More complex
costing can be added in the future when the framework more readily
allows it.

Differential Revision: https://reviews.llvm.org/D88980
2020-10-17 10:29:00 +01:00
David Green d79ee3a807 [ARM] Add a very basic active_lane_mask cost
This adds a very basic cost for active_lane_mask under MVE - making the
assumption that they will be free and then apologizing for that in a
comment.

In reality they may either be free (by being nicely folded into a tail
predicated loop), cost the same as a VCTP or be expanded into vdup's,
adds and cmp's. It is difficult to detect the difference from a single
getIntrinsicInstrCost call, so makes the assumption that the vectorizer
is adding them, and only added them where it makes sense.

We may need to change this in the future to better model predicate costs
in the vectorizer, especially at -Os or non-tail predicated loops. The
vectorizer currently does not query the cost of these instructions but
that will change in the future and a zero cost there probably makes the
most sense at the moment.

Differential Revision: https://reviews.llvm.org/D88989
2020-10-17 10:09:42 +01:00
Alina Sbirlea dc97138123 [MemorySSA] Verify clobbering within reachable blocks.
Resolves PR45976.
2020-10-16 17:46:28 -07:00
Nikita Popov 74c8c2d903 Revert "Recommit "[SCEV] Use nw flag and symbolic iteration count to sharpen ranges of AddRecs""
This reverts commit 32b72c3165.

While better than before, this change still introduces a large
compile-time regression (>3% on mafft):
https://llvm-compile-time-tracker.com/compare.php?from=fbd62fe60fb2281ca33da35dc25ca3c87ec0bb51&to=32b72c3165bf65cca2e8e6197b59eb4c4b60392a&stat=instructions

Additionally, the logic here doesn't look quite right to me,
I will comment in more detail on the differential revision.
2020-10-16 21:36:33 +02:00
Florian Hahn f085b7cbc1 [SCEV] Add additional tests where the max BTC is limited by wrapping. 2020-10-16 20:36:02 +01:00
Max Kazantsev 32b72c3165 Recommit "[SCEV] Use nw flag and symbolic iteration count to sharpen ranges of AddRecs"
It was reverted because of negative compile time impact. In this version,
less powerful proof methods are used (non-recursive reasoning only), and
scope limited to constant End values to avoid explision of complex proofs.

Differential Revision: https://reviews.llvm.org/D89381
2020-10-16 17:35:13 +07:00
Dávid Bolvanský 28691cdd71 [MemLoc] Support memchr/memccpy in MemoryLocation::getForArgument
Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D89321
2020-10-16 11:37:29 +02:00
Florian Hahn e034c3f704 [SCEV] Add a few test cases where the max BTC is limited by wrapping. 2020-10-16 09:53:32 +01:00
Florian Hahn 51ff04567b Recommit "[DSE] Switch to MemorySSA-backed DSE by default."
After investigation by @asbirlea, the issue that caused the
revert appears to be an issue in the original source, rather
than a problem with the compiler.

This patch enables MemorySSA DSE again.

This reverts commit 915310bf14.
2020-10-16 09:02:53 +01:00
Nikita Popov 7d3b475810 Revert "[SCEV] Use nw flag and symbolic iteration count to sharpen ranges of AddRecs"
This reverts commit 905101c360.

This causes a large compile-time regression:
https://llvm-compile-time-tracker.com/compare.php?from=cc175c2cc8e638462bab74e0781e06f9b6eb5017&to=905101c36025fe1c8ecdf9a20cd59db036676073&stat=instructions
2020-10-16 09:47:38 +02:00
Max Kazantsev 905101c360 [SCEV] Use nw flag and symbolic iteration count to sharpen ranges of AddRecs
We can sharpen the range of a AddRec if we know that it does not
self-wrap and know the symbolic iteration count in the loop. If we can
evaluate the value of AddRec on the last iteration and prove that at least
one its intermediate value lies between start and end, then no-wrap flag
allows us to conclude that all of them also lie between start and end. So
the estimate of range can be improved to union of ranges of start and end.

Differential Revision: https://reviews.llvm.org/D89381
Reviewed By: efriedma
2020-10-16 12:00:39 +07:00
Roman Lebedev b3d2df42f7
[NFC][SCEV] Autogenerate check lines in tests being affected by upcoming patch 2020-10-15 23:15:03 +03:00
Sanjay Patel 9f6048f83d [CostModel] remove cost-kind predicate for ctlz/cttz intrinsics in basic TTI implementation
The cost modeling for intrinsics is a patchwork based on different
expectations from the callers, so it's a mess. I'm hoping to untangle
this to allow canonicalization to the new min/max intrinsics in IR.
The general goal is to remove the cost-kind restriction here in the
basic implementation class. Ie, if some intrinsic has throughput cost
of 104, assume that it has the same size, latency, and blended costs.
Effectively, an intrinsic with cost N is composed of N simple
instructions. If that's not correct, the target should provide a more
accurate override.

The x86-64 SSE2 subtarget cost diffs require explanation:

1. The scalar ctlz/cttz are assuming "BSR+XOR+CMOV" or
   "TEST+BSF+CMOV/BRANCH", so not cheap.
2. The 128-bit SSE vector width versions assume cost of 18 or 26
   (no explanation provided in the tables, but this corresponds to a
   bunch of shift/logic/compare).
3. The 512-bit vectors in the test file are scaled up by a factor of
   4 from the legal vector width costs.
4. The plain latency cost-kind is not affected in this patch because
   that calc is diverted before we get to getIntrinsicInstrCost().

Differential Revision: https://reviews.llvm.org/D89461
2020-10-15 13:14:41 -04:00
Roman Lebedev 7ee6c40247
Revert "Reland "[SCEV] Model ptrtoint(SCEVUnknown) cast not as unknown, but as zext/trunc/self of SCEVUnknown"" and it's follow-ups
While we haven't encountered an earth-shattering problem with this yet,
by now it is pretty evident that trying to model the ptr->int cast
implicitly leads to having to update every single place that assumed
no such cast could be needed. That is of course the wrong approach.

Let's back this out, and re-attempt with some another approach,
possibly one originally suggested by Eli Friedman in
https://bugs.llvm.org/show_bug.cgi?id=46786#c20
which should hopefully spare us this pain and more.

This reverts commits 1fb6104293,
7324616660,
aaafe350bb,
e92a8e0c74.

I've kept&improved the tests though.
2020-10-14 16:09:18 +03:00
Sanjay Patel ef748583c2 [CostModel] rearrange basic intrinsic cost implementation
This is bigger/uglier than before, but it should allow fixing
all of the broken paths more easily. Test coverage added with
rGfab028b and other commits.

This is not NFC - the scalable vector test would crash
without this patch.
2020-10-13 11:52:00 -04:00
Sanjay Patel 1b94261e36 [x86] add cost model test for memcpy; NFC
This is treated as a special-case in the base class
implementation of getIntrinsicInstrCost().
2020-10-13 11:42:44 -04:00
Sanjay Patel 1c90878e60 [AArch64] fix spacing in test's RUN lines; NFC 2020-10-13 10:44:18 -04:00
Sanjay Patel fab028b914 [x86] add tests for cost model kinds of intrinsics; NFC
This provides coverage for existing special-cases and
a sampling of other intrinsics. Current output appears
to be wrong in several cases.
2020-10-13 10:39:43 -04:00
Sanjay Patel 937d782e38 [AArch64] add cost model test for scalable vector math; NFC
Testing for the various cost model "TargetCostKind" is limited,
and testing for scalable vectors is limited. The motivating
example of an intrinsic is not included here yet because that
just crashes.
2020-10-13 08:39:04 -04:00
Max Kazantsev fb2627d8d2 [Test] Add test showing that SCEV cannot compute IV's range 2020-10-13 17:52:39 +07:00
Roman Lebedev aaafe350bb
[SCEV] BuildConstantFromSCEV(): properly handle SCEVSignExtend from ptr
Much similar to the ZExt/Trunc handling.
Thanks goes to Alexander Richardson for nudging towards noticing this one proactively.

The appropriate (currently crashing) test coverage added.
2020-10-13 12:19:59 +03:00
Roman Lebedev 7324616660
[SCEV] BuildConstantFromSCEV(): properly handle SCEVZeroExtend from ptr
As being reported in https://reviews.llvm.org/D88806#2326944,
this is pretty much the sibling problem of https://reviews.llvm.org/D88806#2325340,
with root cause being that SCEV now models `ptrtoint` as trunc/zext/self of unknown.

The appropriate (currently crashing) test coverage added.
2020-10-13 11:47:44 +03:00
Roman Lebedev 1fb6104293
Reland "[SCEV] Model ptrtoint(SCEVUnknown) cast not as unknown, but as zext/trunc/self of SCEVUnknown"
This relands commit 1c021c64ca which was
reverted in commit 17cec6a11a because
an assertion was being triggered, since `BuildConstantFromSCEV()`
wasn't updated to handle the case where the constant we want to truncate
is actually a pointer. I was unsuccessful in coming up with a test case
where we'd end there with constant zext/sext of a pointer,
so i didn't handle those cases there until there is a test case.

Original commit message:

While we indeed can't treat them as no-ops, i believe we can/should
do better than just modelling them as `unknown`. `inttoptr` story
is complicated, but for `ptrtoint`, it seems straight-forward
to model it just as a zext-or-trunc of unknown.

This may be important now that we track towards
making inttoptr/ptrtoint casts not no-op,
and towards preventing folding them into loads/etc
(see D88979/D88789/D88788)

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D88806
2020-10-12 23:02:55 +03:00
Roman Lebedev 73818f450e
[NFC][ScalarEvolution] Add tests with ptrtoint in constant context in loop
Reduced from the https://reviews.llvm.org/D88806#2325340
2020-10-12 23:02:55 +03:00
Hans Wennborg 17cec6a11a Revert 1c021c64c "[SCEV] Model ptrtoint(SCEVUnknown) cast not as unknown, but as zext/trunc/self of SCEVUnknown"
> While we indeed can't treat them as no-ops, i believe we can/should
> do better than just modelling them as `unknown`. `inttoptr` story
> is complicated, but for `ptrtoint`, it seems straight-forward
> to model it just as a zext-or-trunc of unknown.
>
> This may be important now that we track towards
> making inttoptr/ptrtoint casts not no-op,
> and towards preventing folding them into loads/etc
> (see D88979/D88789/D88788)
>
> Reviewed By: mkazantsev
>
> Differential Revision: https://reviews.llvm.org/D88806

It caused the following assert during Chromium builds:

  llvm/lib/IR/Constants.cpp:1868:
  static llvm::Constant *llvm::ConstantExpr::getTrunc(llvm::Constant *, llvm::Type *, bool):
  Assertion `C->getType()->isIntOrIntVectorTy() && "Trunc operand must be integer"' failed.

See code review for a link to a reproducer.

This reverts commit 1c021c64ca.
2020-10-12 18:39:35 +02:00
Roman Lebedev 1c021c64ca
[SCEV] Model ptrtoint(SCEVUnknown) cast not as unknown, but as zext/trunc/self of SCEVUnknown
While we indeed can't treat them as no-ops, i believe we can/should
do better than just modelling them as `unknown`. `inttoptr` story
is complicated, but for `ptrtoint`, it seems straight-forward
to model it just as a zext-or-trunc of unknown.

This may be important now that we track towards
making inttoptr/ptrtoint casts not no-op,
and towards preventing folding them into loads/etc
(see D88979/D88789/D88788)

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D88806
2020-10-12 11:04:03 +03:00
Simon Pilgrim 913d7a110e [X86][SSE2] Use smarter instruction patterns for lowering UMIN/UMAX with v8i16.
This is my first LLVM patch, so please tell me if there are any process issues.

The main observation for this patch is that we can lower UMIN/UMAX with v8i16 by using unsigned saturated subtractions in a clever way. Previously this operation was lowered by turning the signbit of both inputs and the output which turns the unsigned minimum/maximum into a signed one.

We could use this trick in reverse for lowering SMIN/SMAX with v16i8 instead. In terms of latency/throughput this is the needs one large move instruction. It's just that the sign bit turning has an increased chance of being optimized further. This is particularly apparent in the "reduce" test cases. However due to the slight regression in the single use case, this patch no longer proposes this.

Unfortunately this argument also applies in reverse to the new lowering of UMIN/UMAX with v8i16 which regresses the "horizontal-reduce-umax", "horizontal-reduce-umin", "vector-reduce-umin" and "vector-reduce-umax" test cases a bit with this patch. Maybe some extra casework would be possible to avoid this. However independent of that I believe that the benefits in the common case of just 1 to 3 chained min/max instructions outweighs the downsides in that specific case.

Patch By: @TomHender (Tom Hender) ActuallyaDeviloper

Differential Revision: https://reviews.llvm.org/D87236
2020-10-11 11:21:23 +01:00
Florian Hahn d48b249b71 [SCEV] Add test cases where the max BTC is imprecise, due to step != 1.
Add a test case where we fail to compute a tight max backedge taken
count, due to the step being != 1.

This is part of the issue with PR40961.
2020-10-10 16:39:48 +01:00
Florian Hahn 2e9fd754b4 [SCEV] Handle ULE in applyLoopGuards.
Handle ULE predicate in similar fashion to ULT predicate in
applyLoopGuards.
2020-10-10 16:26:28 +01:00
Florian Hahn 2c6fc28aba [SCEV] Add a test case with ULE loop guard. 2020-10-10 15:58:26 +01:00
David Green 4c3515cd62 [ARM] Add MVE vecreduce costmodel tests. NFC
There were some existing tests that were not super useful. New ones are
added for testing MVE specific patterns.
2020-10-09 16:25:25 +01:00
Roman Lebedev 027e7a7721
Reland "[NFC][SCEV] Improve tests for ptrtoint modelling (D88806)"
I messed up runlines in the original commit.
2020-10-09 14:50:05 +03:00
Roman Lebedev 2aeae1617c
Revert "[NFC][SCEV] Improve tests for ptrtoint modelling (D88806)"
Buildbots aren't happy, need to investigate.
This reverts commit 32cc8f7998.
2020-10-09 14:10:43 +03:00
Roman Lebedev 32cc8f7998
[NFC][SCEV] Improve tests for ptrtoint modelling (D88806) 2020-10-09 13:50:30 +03:00
Amara Emerson 322d0afd87 [llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics.
This change renames the intrinsics to not have "experimental" in the name.

The autoupgrader will handle legacy intrinsics.

Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html

Differential Revision: https://reviews.llvm.org/D88787
2020-10-07 10:36:44 -07:00
Arthur Eubanks 61d4b342d1 [test][NewPM] Make dead-uses.ll work under NPM
This one is weird...

globals-aa needs to be already computed at licm, or else a function pass
can't run a module analysis and won't have access to globals-aa.
But the globals-aa result is impacted by instcombine in a way that
affects what the test is expecting. If globals-aa is computed before
instcombine, it is cached and globals-aa used in licm won't contain the
necessary info provided by instcombine.
Another catch is that if we don't invalidate AAManager, it will use the
cached AAManager that instcombine requested, which may not contain
globals-aa. So we have to invalidate<aa> so that licm can recompute
an AAManager with the globals-aa created by the require<globals-aa>.

This is essentially the problem described in https://reviews.llvm.org/D84259.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D88118
2020-10-06 07:33:02 -07:00
Roman Lebedev 80ac6da98e
[NFC][SCEV] Add a test with some patterns where we could treat inttoptr/ptrtoint as semi-transparent 2020-10-05 00:05:39 +03:00
Simon Moll 05ae04c396 [DA][SDA] SyncDependenceAnalysis re-write
This patch achieves two things:
1. It breaks up the `join_blocks` interface between the SDA to the DA to
   return two separate sets for divergent loops exits and divergent,
disjoint path joins.
2. It updates the SDA algorithm to run in O(n) time and improves the
   precision on divergent loop exits.

This fixes `https://bugs.llvm.org/show_bug.cgi?id=46372` (by virtue of
the improved `join_blocks` interface) and revealed an imprecise expected
result in the `Analysis/DivergenceAnalysis/AMDGPU/hidden_loopdiverge.ll`
test.

Reviewed By: sameerds

Differential Revision: https://reviews.llvm.org/D84413
2020-09-30 17:36:26 +02:00
Florian Hahn 0ad793f321 [SCEV] Also use info from assumes in applyLoopGuards.
Similar to collecting information from branches guarding a loop, we can
also collect information from assumes dominating the loop header.

Fixes PR47247.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D87854
2020-09-28 13:14:24 +01:00
Sanjay Patel 816b0a9c9f [CostModel] add cl option to check size and latency costs; NFC
This is a setting used by SimplifyCFG, LoopUnroll, and InlineCost,
but there is apparently no direct test coverage for any of those
cost model values.
2020-09-27 09:52:56 -04:00
Florian Hahn 915310bf14 Revert "[DSE] Switch to MemorySSA-backed DSE by default."
There appears to be a mis-compile with MemorySSA-backed DSE in
combination with llvm.lifetime.end. It currently appears like
DSE is doing the right thing and the llvm.lifetime.end markers
are incorrect. The reverted patch uncovers the mis-compile.

This patch temporarily switches back to the legacy DSE
implementation, while we investigate.

This reverts commit 9d172c8e9c.
2020-09-26 18:35:27 +01:00
Florian Hahn 7d274aa9be [SCEV] Add support for `x != 0` to CollectCondition.
Add support for NE predicates with 0 constants. Those can be translated
to UMaxExpr(x, 1).
2020-09-25 18:58:55 +01:00
Florian Hahn 3a69ebf0ad [SCEV] Add another test using info from loop guards for BTC with NE. 2020-09-25 18:58:55 +01:00
Florian Hahn b5a3b901c7 [SCEV] Add support for `x == constant` to CollectCondition.
Add support for EQ predicates with constant operand. In that case, using
the constant instead of an unknown expression should always be
beneficial.
2020-09-25 16:56:49 +01:00
Florian Hahn 8858340bd3 [SCEV] Swap operands if LHS is not unknown.
Currently we only use information from guards for unknown expressions.
Swap LHS/RHS and predicate, if LHS is not unknown.
2020-09-25 15:50:01 +01:00
Florian Hahn 1fa06162c1 [SCEV] Add more tests using info from loop guards for BTC. 2020-09-25 14:18:58 +01:00
Andrew Litteken f02c4c87b4 [IRSim] Adding wrapper pass for IRSimilarityIdentfier
This introduces an analysis pass that wraps IRSimilarityIdentifier,
and adds a printer pass to examine in what function similarities are
being found.

Test for what the printer pass can find are in
test/Analysis/IRSimilarityIdentifier.

Reviewed by: paquette, jroelofs

Differential Revision: https://reviews.llvm.org/D86973
2020-09-24 14:59:41 -05:00
Florian Hahn d4ddf63fc4 [SCEV] Use loop guard info when computing the max BE taken count in howFarToZero.
For some expressions, we can use information from loop guards when
we are looking for a maximum. This patch applies information from
loop guards to the expression used to compute the maximum backedge
taken count in howFarToZero. It currently replaces an unknown
expression X with UMin(X, Y), if the loop is guarded by
X ult Y.

This patch is minimal in what conditions it applies, and there
are a few TODOs to generalize.

This partly addresses PR40961. We will also need an update to
LV to address it completely.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D67178
2020-09-24 11:06:55 +01:00
Arthur Eubanks 6700b9de16 [NewPM][MSSA] Fix failures under NPM due to -enable-mssa-loop-dependency
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D88128
2020-09-23 15:17:43 -07:00
Jonas Paulsson 370a8c8025 [SystemZ] Make sure not to call getZExtValue on a >64 bit constant.
Better use isZero() and isIntN() in SystemZTargetTransformInfo rather than
calling getZExtValue() since the immediate operand may be wider than 64 bits,
which is not allowed with getZExtValue().

Fixes https://bugs.llvm.org/show_bug.cgi?id=47600

Review: Simon Pilgrim
2020-09-23 15:36:32 +02:00
Arthur Eubanks 2d0de5f9a4 [test][NewPM] Clean up ScalarEvolution tests to work under NPM 2020-09-22 19:31:10 -07:00
Bing1 Yu ec24e50553 [CostModel][X86] add CostModel for SK_Select(v8f64, v8i64, v16f32, v16i32, v32i16, v64i8)
add CostModel for SK_Select(v8f64, v8i64, v16f32, v16i32, v32i16, v64i8)

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D87884
2020-09-23 10:29:10 +08:00
Arthur Eubanks 61ac58e10a [NewPM] Pin tests with -debug-pass to legacy PM
-debug-pass is a legacy PM only option.

Some tests checks that the pass returned that it made a change,
which is not relevant to the NPM, since passes return PreservedAnalyses.

Some tests check that passes are freed at the proper time, which is also
not relevant to the NPM.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87945
2020-09-22 17:54:25 -07:00
Arthur Eubanks e16d10b753 [test][NewPM] Pin do-nothing-intrinsic.ll to legacy PM
It tests CallGraph infra around the legacy PM which isn't relevant in NPM.
2020-09-22 11:33:38 -07:00
Arthur Eubanks a5141b83f1 [LoopInfo][NewPM] Fix tests in Analysis/LoopInfo under NPM 2020-09-22 11:31:00 -07:00
Arthur Eubanks 9db0c572c1 [Delinearization][NewPM] Port delinearization to NPM
Also make tests in Analysis/Delinearization work under NPM.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87741
2020-09-21 17:59:08 -07:00
Arthur Eubanks 84a8ca1e6c [NewPM] Pin -lazy-branch-prob and -lazy-block-freq tests to legacy PM
NPM passes just use the normal versions of these analyses instead.
Also pin any tests with -analyze to legacy PM.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87857
2020-09-21 17:51:46 -07:00
Fangrui Song 8fdac7cb7a Revert D71539 "Recommit "[SCEV] Look through single value PHIs.""
This reverts commit 11dccf8d3a.

A bootstrapped clang crashes (due to ArrayRef::front called on an empty
ArrayRef) when compiling some files.  Very strangely, this only reproduces with
modules.

```
13 0x0000564d3349e968 llvm::ArrayRef<llvm::BasicBlock*>::front() const /proc/self/cwd/llvm/include/llvm/ADT/ArrayRef.h:160:7
14 0x0000564d3349e896 llvm::LoopBase<llvm::BasicBlock, llvm::Loop>::getHeader() const /proc/self/cwd/llvm/include/llvm/Analysis/LoopInfo.h:104:50
15 0x0000564d3349fd9d llvm::LoopBase<llvm::BasicBlock, llvm::Loop>::getLoopLatch() const /proc/self/cwd/llvm/include/llvm/Analysis/LoopInfoImpl.h:210:11
16 0x0000564d33593c8a llvm::ScalarEvolution::computeBackedgeTakenCount(llvm::Loop const*, bool) /proc/self/cwd/llvm/lib/Analysis/ScalarEvolution.cpp:6933:15
17 0x0000564d33592ebc llvm::ScalarEvolution::getBackedgeTakenInfo(llvm::Loop const*) /proc/self/cwd/llvm/lib/Analysis/ScalarEvolution.cpp:0:30
18 0x0000564d33593a54 llvm::ScalarEvolution::getBackedgeTakenCount(llvm::Loop const*, llvm::ScalarEvolution::ExitCountKind) /proc/self/cwd/llvm/lib/Analysis/ScalarEvolution.cpp:6487:36
19 0x0000564d32be2402 llvm::ScalarEvolution::getConstantMaxBackedgeTakenCount(llvm::Loop const*) /proc/self/cwd/llvm/include/llvm/Analysis/ScalarEvolution.h:768:5
20 0x0000564d33590807 llvm::ScalarEvolution::getRangeRef(llvm::SCEV const*, llvm::ScalarEvolution::RangeSignHint) /proc/self/cwd/llvm/lib/Analysis/ScalarEvolution.cpp:5495:19
21 0x0000564d320abab7 llvm::ScalarEvolution::getSignedRange(llvm::SCEV const*) /proc/self/cwd/llvm/include/llvm/Analysis/ScalarEvolution.h:840:12
22 0x0000564d335a03aa llvm::ScalarEvolution::isKnownPredicateViaConstantRanges(llvm::CmpInst::Predicate, llvm::SCEV const*, llvm::SCEV const*) /proc/self/cwd/llvm/lib/Analysis/ScalarEvolution.cpp:9239:60
23 0x0000564d33586a80 llvm::ScalarEvolution::isKnownViaNonRecursiveReasoning(llvm::CmpInst::Predicate, llvm::SCEV const*, llvm::SCEV const*) /proc/self/cwd/llvm/lib/Analysis/ScalarEvolution.cpp:10284:60
```
2020-09-21 17:21:43 -07:00
Kazu Hirata ca8321574d Fix comment typos. NFC. 2020-09-21 16:12:56 -07:00
Roman Lebedev 64e2cb7e96
[SCEV] Recognize @llvm.uadd.sat as `%y + umin(%x, (-1 - %y))`
----------------------------------------
define i32 @src(i32 %x, i32 %y) {
%0:
  %r = uadd_sat i32 %x, %y
  ret i32 %r
}
=>
define i32 @tgt(i32 %x, i32 %y) {
%0:
  %t0 = sub nsw nuw i32 4294967295, %y
  %t1 = umin i32 %x, %t0
  %r = add nuw i32 %t1, %y
  ret i32 %r
}
Transformation seems to be correct!

The alternative, naive, lowering could be the following,
although i don't think it's better,
thought it will likely be needed for sadd/ssub/*shl:

----------------------------------------
define i32 @src(i32 %x, i32 %y) {
%0:
  %r = uadd_sat i32 %x, %y
  ret i32 %r
}
=>
define i32 @tgt(i32 %x, i32 %y) {
%0:
  %t0 = zext i32 %x to i33
  %t1 = zext i32 %y to i33
  %t2 = add nuw i33 %t0, %t1
  %t3 = zext i32 4294967295 to i33
  %t4 = umin i33 %t2, %t3
  %r = trunc i33 %t4 to i32
  ret i32 %r
}
Transformation seems to be correct!
2020-09-21 20:25:54 +03:00
Roman Lebedev fedc9549d5
[SCEV] Recognize @llvm.usub.sat as `%x - (umin %x, %y)`
----------------------------------------
define i32 @src(i32 %x, i32 %y) {
%0:
  %r = usub_sat i32 %x, %y
  ret i32 %r
}
=>
define i32 @tgt(i32 %x, i32 %y) {
%0:
  %t0 = umin i32 %x, %y
  %r = sub nuw i32 %x, %t0
  ret i32 %r
}
Transformation seems to be correct!
2020-09-21 20:25:54 +03:00
Roman Lebedev 0592de550f
[NFC][SCEV] Add tests for @llvm.*.sat intrinsics 2020-09-21 20:25:53 +03:00
Roman Lebedev 1bb7ab8c4a
[SCEV] Recognize @llvm.abs as smax(x, -x)
As per alive2 (ignoring undef):

----------------------------------------
define i32 @src(i32 %x, i1 %y) {
%0:
  %r = abs i32 %x, 0
  ret i32 %r
}
=>
define i32 @tgt(i32 %x, i1 %y) {
%0:
  %neg_x = mul i32 %x, 4294967295
  %r = smax i32 %x, %neg_x
  ret i32 %r
}
Transformation seems to be correct!

----------------------------------------
define i32 @src(i32 %x, i1 %y) {
%0:
  %r = abs i32 %x, 1
  ret i32 %r
}
=>
define i32 @tgt(i32 %x, i1 %y) {
%0:
  %neg_x = mul nsw i32 %x, 4294967295
  %r = smax i32 %x, %neg_x
  ret i32 %r
}
Transformation seems to be correct!
2020-09-21 20:25:53 +03:00
Roman Lebedev 83c2d10d3c
[NFC][SCEV] Add tests for @llvm.abs intrinsic 2020-09-21 20:25:53 +03:00
Florian Hahn 3cbdfe424f [SCEV] Add additional max BTC tests with loop guards. 2020-09-21 17:41:24 +01:00
Simon Pilgrim 18a3ebcd30 [CostModel][X86] Add some select shuffle costs tests for D87884 2020-09-21 16:09:05 +01:00
Florian Hahn 11dccf8d3a Recommit "[SCEV] Look through single value PHIs."
This commit was originally because it was suspected to cause a crash,
but a reproducer did not surface.

A crash that was exposed by this change was fixed in 1d8f2e5292.

This reverts the revert commit 0581c0b0ee.
2020-09-21 11:59:50 +01:00
Florian Hahn 57ae9bb932 [LSR] Preserve MSSA when using SplitCriticalEdge.
LSR claims to MemorySSA, but we also have to make sure it is preserved
when splitting critical edges. This can be done by passing MSSAU to
SplitCriticalEdge.

Fixes PR47557.
2020-09-21 09:51:26 +01:00
Dávid Bolvanský fa33235df5 [BasicAA] Regenerate test checks 2020-09-19 19:36:10 +02:00
Dávid Bolvanský d716f1608c [MemLoc] Support bcmp in MemoryLocation::getForArgument
Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D87964
2020-09-19 17:12:43 +02:00
Florian Hahn 9d172c8e9c Recommit "[DSE] Switch to MemorySSA-backed DSE by default."
This switches to using DSE + MemorySSA by default again, after
fixing the issues reported after the first commit.

Notable fixes fc82006331, a0017c2bc2.

This reverts commit 3a59628f3c.
2020-09-18 11:05:00 +01:00
Florian Hahn a0017c2bc2 [MemorySSA] Be more conservative when traversing MemoryPhis.
I think we need to be even more conservative when traversing memory
phis, to make sure we catch any loop carried dependences.

This approach updates fillInCurrentPair to use unknown sizes for
locations when we walk over a phi, unless the location is guaranteed to
be loop-invariant for any possible loop. Using an unknown size for
locations should ensure we catch all memory accesses to locations after
the given memory location, which includes loop-carried dependences.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87778
2020-09-17 22:09:53 +01:00
Arthur Eubanks 179a22e807 [NewPM] Fix pr45927.ll under NPM 2020-09-17 13:57:55 -07:00
Florian Hahn 51973a607d [SCEV] Add test cases for max BTC with loop guard info.
This adds test cases for PR40961 and PR47247. They illustrate cases in
which the max backedge-taken count can be improved by information from
the loop guards.
2020-09-17 20:27:48 +01:00
Florian Hahn 9dc1e53787 [MemorySSA] Add another loop clobber test case. 2020-09-17 14:15:29 +01:00
Sjoerd Meijer 6637d72ddd [Lint] Add check for intrinsic get.active.lane.mask
As @efriedma pointed out in D86301, this "not equal to 0 check" of
get.active.lane.mask's second operand needs to live here in Lint and not the
Verifier.

Differential Revision: https://reviews.llvm.org/D87228
2020-09-17 09:22:03 +01:00
Arthur Eubanks f4ea0f9814 [NewPM] Port -print-alias-sets to NPM
Really it should be named print<alias-sets>, but for the sake of
changing fewer tests, added a TODO to rename after NPM switch and test
cleanup.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D87713
2020-09-16 18:34:56 -07:00
Alina Sbirlea 344a3d0bc0 [MemorySSA] Rename uses in blocks with Phis.
Renaming should include blocks with existing Phis.

Resolves PR45927.

Differential Revision: https://reviews.llvm.org/D87661
2020-09-16 17:24:17 -07:00
Arthur Eubanks 09c342493d [NPM] Translate alias analysis into require<> as well
'require<globals-aa>' is needed to make globals-aa work in NPM, since
globals-aa is a module analysis but function passes cannot run module
analyses on demand.
So don't skip translating alias analyses to 'require<>'.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87743
2020-09-16 08:54:09 -07:00
Alina Sbirlea d3d7603900 [MemorySSA] Report unoptimized as None, not MayAlias. 2020-09-15 23:58:53 -07:00
Alina Sbirlea fc82006331 [MemorySSA] Set MustDominate to true for PhiTranslation. 2020-09-15 23:29:57 -07:00
Arthur Eubanks 3b38062d1c [NewPM] Fix 2003-02-19-LoopInfoNestingBug.ll under NPM
Also move it to a more appropriate directory.
2020-09-15 20:21:45 -07:00
Arthur Eubanks 558e5c31b6 [Dominators][NewPM] Pin tests with -analyze to legacy PM
-analyze isn't supported in NPM. All affected tests have corresponding
NPM RUN line.
2020-09-15 11:59:00 -07:00
Arthur Eubanks d158e786cc [DemandedBits][NewPM] Pin some tests to legacy PM
All tests have corresponding NPM RUN lines.
-analyze doesn't work under NPM.
2020-09-15 11:55:58 -07:00
Arthur Eubanks 9853e84b54 [PostDominators][NewPM] Fix tests to work under NPM
Each test has a legacy PM pinned to legacy PM and a NPM RUN line.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87660
2020-09-15 11:19:01 -07:00
Arthur Eubanks 3f69b2140f [NewPM][opt] Fix -globals-aa not being recognized as alias analysis in NPM
Was missing MODULE_ALIAS_ANALYSIS, previously only FUNCTION_ALIAS_ANALYSIS was taken into account.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87664
2020-09-15 11:18:19 -07:00
Arthur Eubanks e0c7641de6 [RegionInfo][NewPM] Fix RegionInfo tests to work under NPM
Pin RUN lines with -analyze to legacy PM, add corresponding NPM RUN line if missing.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87658
2020-09-15 11:12:14 -07:00
Arthur Eubanks 6f66ad13c5 [DependenceAnalysis][NewPM] Fix tests to work under NPM
All tests had corresponding NPM lines, simply pin non-NPM lines to legacy PM.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D87665
2020-09-15 11:11:23 -07:00
Arthur Eubanks 54e1bf1154 [LoopAccessAnalysis][NewPM] Fix tests to work under NPM
Pin RUN lines with -analyze to legacy PM, add corresponding NPM RUN lines.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D87662
2020-09-15 11:06:47 -07:00
Florian Hahn 3a59628f3c Revert "[DSE] Switch to MemorySSA-backed DSE by default."
This reverts commit fb109c42d9.

Temporarily revert due to a mis-compile pointed out at D87163.
2020-09-15 18:07:56 +01:00
Florian Hahn c4f1b31441 [MemorySSA] Make sure PerformedPhiTrans is updated for each visited def.
1ce82015f6 added a fix to restrict phi optimizations after phi
translations. But the current use of performedPhiTranslation only
checked whether phi translation happened for the first iterator and
missed cases where phi translations happens at subsequent
iterators/upwards defs.

This patch changes upward_defs_iteartor to take a pointer to a bool, so
we can easily ensure the final value includes all visited defs, while
still being able to conveniently use it with make_range & co.
2020-09-14 16:11:56 +01:00
Florian Hahn f07f3c7237 [MemorySSA] Precommit test case for PR47498. 2020-09-14 16:11:56 +01:00
Florian Hahn fb109c42d9 [DSE] Switch to MemorySSA-backed DSE by default.
The tests have been updated and I plan to move them from the MSSA
directory up.

Some end-to-end tests needed small adjustments. One difference to the
legacy DSE is that legacy DSE also deletes trivially dead instructions
that are unrelated to memory operations. Because MemorySSA-backed DSE
just walks the MemorySSA, we only visit/check memory instructions. But
removing unrelated dead instructions is not really DSE's job and other
passes will clean up.

One noteworthy change is in llvm/test/Transforms/Coroutines/ArgAddr.ll,
but I think this comes down to legacy DSE not handling instructions that
may throw correctly in that case. To cover this with MemorySSA-backed
DSE, we need an update to llvm.coro.begin to treat it's return value to
belong to the same underlying object as the passed pointer.

There are some minor cases MemorySSA-backed DSE currently misses, e.g. related
to atomic operations, but I think those can be implemented after the switch.

This has been discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2020-August/144417.html

For the MultiSource/SPEC2000/SPEC2006 the number of eliminated stores
goes from ~17500 (legayc DSE) to ~26300 (MemorySSA-backed). More numbers
and details in the thread on llvm-dev.

Impact on CTMark:
```
                                     Legacy Pass Manager
                        exec instrs    size-text
O3                       + 0.60%        - 0.27%
ReleaseThinLTO           + 1.00%        - 0.42%
ReleaseLTO-g.            + 0.77%        - 0.33%
RelThinLTO (link only)   + 0.87%        - 0.42%
RelLO-g (link only)      + 0.78%        - 0.33%
```
http://llvm-compile-time-tracker.com/compare.php?from=3f22e96d95c71ded906c67067d75278efb0a2525&to=ae8be4642533ff03803967ee9d7017c0d73b0ee0&stat=instructions
```
                                     New Pass Manager
                       exec instrs.   size-text
O3                       + 0.95%       - 0.25%
ReleaseThinLTO           + 1.34%       - 0.41%
ReleaseLTO-g.            + 1.71%       - 0.35%
RelThinLTO (link only)   + 0.96%       - 0.41%
RelLO-g (link only)      + 2.21%       - 0.35%
```
http://195.201.131.214:8000/compare.php?from=3f22e96d95c71ded906c67067d75278efb0a2525&to=ae8be4642533ff03803967ee9d7017c0d73b0ee0&stat=instructions

Reviewed By: asbirlea, xbolva00, nikic

Differential Revision: https://reviews.llvm.org/D87163
2020-09-10 22:24:32 +01:00
Simon Pilgrim de25ebaac6 [CostModel][X86] Add vXi32 division by uniform constant costs (PR47476)
Other types can be handled in future patches but their uniform / non-uniform costs are more similar and don't appear to cause many vectorization issues.
2020-09-10 12:17:54 +01:00
Krzysztof Parzyszek 8b7c8f2c54 Mark masked.{store,scatter,compressstore} intrinsics as write-only 2020-09-09 17:28:21 -05:00
Sam Parker 0af4147804 [ARM][CostModel] CodeSize costs for i1 arith ops
When optimising for size, make the cost of i1 logical operations
relatively expensive so that optimisations don't try to combine
predicates.

Differential Revision: https://reviews.llvm.org/D86525
2020-09-07 09:27:18 +01:00
Florian Hahn 1ddb3a369f [LangRef] Adjust guarantee for llvm.memcpy to also allow equal arguments.
This adjusts the description of `llvm.memcpy` to also allow operands
to be equal. This is in line with what Clang currently expects.

This change is intended to be temporary and followed by re-introduce
a variant with the non-overlapping guarantee for cases where we can
actually ensure that property in the front-end.

See the links below for more details:
http://lists.llvm.org/pipermail/cfe-dev/2020-August/066614.html
and PR11763.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D86815
2020-09-05 19:18:23 +01:00
Nikita Popov ac87480bd8 [SCEV] Recognize min/max intrinsics
Recognize umin/umax/smin/smax intrinsics and convert them to the
already existing SCEV nodes of the same name.

In the future we'll want SCEVExpander to also produce the intrinsics,
but we're not ready for that yet.

Differential Revision: https://reviews.llvm.org/D87160
2020-09-05 16:30:11 +02:00
Nikita Popov 6b50ce3ac9 [SCEV] Add tests for min/max intrinsics (NFC) 2020-09-04 22:08:01 +02:00
Bryan Chan 3404add468 [EarlyCSE] Verify hash code in regression tests
As discussed in D86843, -earlycse-debug-hash should be used in more regression
tests to catch inconsistency between the hashing and the equivalence check.

Differential Revision: https://reviews.llvm.org/D86863
2020-09-04 10:40:35 -04:00
Alina Sbirlea ce66089ac6 Fix build-bots.
BasicAA can be freed (and it is not recomputed).
2020-09-01 20:24:15 -07:00
Alina Sbirlea 1ccfb52a61 [MemCpyOptimizer] Preserve analyses and replace use of lambdas to get them.
Summary:
Analyses are preserved in MemCpyOptimizer.
Get analyses before running the pass and store the pointers, instead of
using lambdas and getting them every time on demand.

Reviewers: lenary, deadalnix, mehdi_amini, nikic, efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74494
2020-09-01 17:35:40 -07:00
Max Kazantsev e7f53044e7 [Test] Move IndVars test to a proper place 2020-09-01 12:17:31 +07:00
Alina Sbirlea 63844c116a [MemorySSA] Clean up single value phis.
MemoryPhis with a single value are correct, but can lead to errors when
updating. Clean up single entry Phis newly added when cloning blocks.
Resolves PR46574.
2020-08-31 19:26:08 -07:00
Anna Welker 064981f0ce [ARM][MVE] Enable MVE gathers and scatters by default
Enable MVE gather/scatters by default, which requires some
minor adaptations in some tests.

Differential revision: https://reviews.llvm.org/D86776
2020-08-28 19:05:29 +01:00
Florian Hahn fd6ebea50d [MemLoc] Support memcmp in MemoryLocation::getForArgument.
This patch adds support for memcmp in MemoryLocation::getForArgument.
memcmp reads from the first 2 arguments up to the number of bytes of the
third argument.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D86725
2020-08-28 10:19:54 +01:00
Florian Hahn 85dacca29f [BasicAA] Add first libfunc tests with memcmp. 2020-08-28 10:02:41 +01:00
Vitaly Buka a40660551e [StackSafety] Ignore allocas with partial lifetime markers
Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D86672
2020-08-27 13:54:41 -07:00
Arthur Eubanks 486ed88533 [ConstProp] Remove ConstantPropagation
As discussed in
http://lists.llvm.org/pipermail/llvm-dev/2020-July/143801.html.

Currently no users outside of unit tests.

Replace all instances in tests of -constprop with -instsimplify.
Notable changes in tests:
* vscale.ll - @llvm.sadd.sat.nxv16i8 is evaluated by instsimplify, use a fake intrinsic instead
* InsertElement.ll - insertelement undef is removed by instsimplify in @insertelement_undef
llvm/test/Transforms/ConstProp moved to llvm/test/Transforms/InstSimplify/ConstProp

Reviewed By: lattner, nikic

Differential Revision: https://reviews.llvm.org/D85159
2020-08-26 15:51:30 -07:00
Arthur Eubanks 098d3f9827 [InstSimplify] Simplify to vector constants when possible
InstSimplify should do all transformations that ConstProp does, but
one thing that ConstProp does that InstSimplify wouldn't is inline
vector instructions that are constants, e.g. into a ret.

Previously vector instructions wouldn't be inlined in InstSimplify
because llvm::Simplify*Instruction() would return nullptr for specific
instructions, such as vector instructions that were actually constants,
if it couldn't simplify them.

This changes SimplifyInsertElementInst, SimplifyExtractElementInst, and
SimplifyShuffleVectorInst to return a vector constant when possible.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D85946
2020-08-26 11:40:36 -07:00
David Green 677c1590c0 [ARM] Increase MVE gather/scatter cost by MVECostFactor.
MVE Gather scatter codegeneration is looking a lot better than it used
to, but still has some issues. The instructions we currently model as 1
cycle per element, which is a bit low for some cases. Increasing the
cost by the MVECostFactor brings them in-line with our other instruction
costs. This will have the effect of only generating then when the extra
benefit is more likely to overcome some of the issues. Notably in
running out of registers and vectorizing loops that could otherwise be
SLP vectorized.

In the short-term whilst we look at other ways of dealing with those
more directly, we can increase the costs of gathers to make them more
likely to be beneficial when created.

Differential Revision: https://reviews.llvm.org/D86444
2020-08-26 13:03:46 +01:00
Ta-Wei Tu abbd652dd6 [LoopNest] False negative of `arePerfectlyNested` with LCSSA loops
Summary: The LCSSA pass (required for all loop passes) sometimes adds
additional blocks containing LCSSA variables, and checkLoopsStructure
may return false even when the loops are perfectly nested in this case.
This is because the successor of the exit block of the inner loop now
points to the LCSSA block instead of the latch block of the outer loop.
Examples are shown in the test nests-with-lcssa.ll.

To fix the issue, the successor of the exit block of the inner loop can
now point to a block in which all instructions are LCSSA phi node
(except the terminator), and the sole successor of that block should
point to the latch block of the outer loop.

Reviewed By: Whitney, etiotto

Differential Revision: https://reviews.llvm.org/D86133
2020-08-25 16:20:52 +00:00
Sam Parker da4ada116e [NFC][ARM] arith code size cost tests
Add a run to measure the code size cost of arithmetic instructions
and add a function for i1 types.
2020-08-25 11:16:01 +01:00
David Sherwood 7b64765cd1 [SVE] Fix TypeSize related warnings with IR truncates of scalable vectors
In getCastInstrCost when the instruction is a truncate we were relying
upon the implicit TypeSize -> uint64_t cast when asking if a given type
has the same size as a legal integer. I've changed the code to only
ask the question if the type is fixed length.

I have also changed InstCombinerImpl::SimplifyDemandedUseBits to bail
out for now if the type is a scalable vector.

I've added the following new tests:

  Analysis/CostModel/AArch64/sve-trunc.ll
  Transforms/InstCombine/AArch64/sve-trunc.ll

for both of these fixes.

Differential revision: https://reviews.llvm.org/D86432
2020-08-25 09:17:56 +01:00
Christopher Tetreault 5eff21c8ff [NFC][documentation] clarify comment in test
test referenced a relative path to a file, but the path was not correct
relative to the project the test is in

Differential Revision: https://reviews.llvm.org/D86368
2020-08-21 14:30:47 -07:00
Sam Parker acf0bb41e4 [ARM][CostModel] Select instruction costs.
Modify the ARM getCmpSelInstrCost implementation for the code size
costs of selects. Now consider the legalization cost and increase
the cost of i1 because those values wouldn't live in a general purpose
register. We also make selects +1 more expensive to account for the IT
instruction.

Differential Revision: https://reviews.llvm.org/D82091
2020-08-21 08:49:56 +01:00
Chuanqi Xu f6de5306ec [NFC][StackSafety] Test that StackLifetime looks through stripPointerCasts
StackLifetime class collects lifetime marker of an `alloca` by collect
the user of `BitCast` who is the user of the `alloca`. However, either
the `alloca` itself could be used with the lifetime marker or the `BitCast`
of the `alloca` could be transformed to other instructions. (e.g.,
it may be transformed to all zero reps in `InstCombine` pass).
This patch tries to fix this process in `collectMarkers` functions.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D85399
2020-08-18 16:21:00 -07:00
Dávid Bolvanský 0f14b2e6cb Revert "[BPI] Improve static heuristics for integer comparisons"
This reverts commit 50c743fa71. Patch will be split to smaller ones.
2020-08-17 20:44:33 +02:00
Simon Pilgrim c1f6ce0c73 [DemandedBits] Improve accuracy of Add propagator
The current demand propagator for addition will mark all input bits at and right of the alive output bit as alive. But carry won't propagate beyond a bit for which both operands are zero (or one/zero in the case of subtraction) so a more accurate answer is possible given known bits.

I derived a propagator by working through truth tables and using a bit-reversed addition to make demand ripple to the right, but I'm not sure how to make a convincing argument for its correctness in the comments yet. Nevertheless, here's a minimal implementation and test to get feedback.

This would help in a situation where, for example, four bytes (<128) packed into an int are added with four others SIMD-style but only one of the four results is actually read.

Known A:     0_______0_______0_______0_______
Known B:     0_______0_______0_______0_______
AOut:        00000000001000000000000000000000
AB, current: 00000000001111111111111111111111
AB, patch:   00000000001111111000000000000000

Committed on behalf of: @rrika (Erika)

Differential Revision: https://reviews.llvm.org/D72423
2020-08-17 12:54:09 +01:00
Simon Pilgrim 79d9e2cd93 [DemandedBits] Reorder addition test checks. NFC.
As suggested on D72423 we should try to keep the same order as the original IR
2020-08-17 12:54:09 +01:00
Vitaly Buka e10e7829bf [StackSafety] Skip ambiguous lifetime analysis
If we can't identify alloca used in lifetime marker we
need to assume to worst case scenario.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D84630
2020-08-16 18:05:52 -07:00
Vitaly Buka 47552a614a [StackSafety] Change how callee searched in index
Handle other than local linkage types.
2020-08-16 04:37:19 -07:00
Simon Pilgrim 25ce634172 [DemandedBits] Add addition test case from D72423 2020-08-14 15:59:53 +01:00
Vitaly Buka 798eb71c3a [NFC][StackSafety] Dedup callees 2020-08-14 01:14:52 -07:00
Dávid Bolvanský 50c743fa71 [BPI] Improve static heuristics for integer comparisons
Similarly as for pointers, even for integers a == b is usually false.

GCC also uses this heuristic.

Reviewed By: ebrevnov

Differential Revision: https://reviews.llvm.org/D85781
2020-08-13 19:54:27 +02:00
Dávid Bolvanský f9264995a6 Revert "[BPI] Improve static heuristics for integer comparisons"
This reverts commit 44587e2f7e. Sanitizer tests need to be updated.
2020-08-13 14:37:40 +02:00
Dávid Bolvanský 44587e2f7e [BPI] Improve static heuristics for integer comparisons
Similarly as for pointers, even for integers a == b is usually false.

GCC also uses this heuristic.

Reviewed By: ebrevnov

Differential Revision: https://reviews.llvm.org/D85781
2020-08-13 14:23:58 +02:00
Dávid Bolvanský a0485421d2 Revert "[BPI] Improve static heuristics for integer comparisons"
This reverts commit 385c9d673f.
2020-08-13 12:59:15 +02:00
Dávid Bolvanský 385c9d673f [BPI] Improve static heuristics for integer comparisons
Similarly as for pointers, even for integers a == b is usually false.

GCC also uses this heuristic.

Reviewed By: ebrevnov

Differential Revision: https://reviews.llvm.org/D85781
2020-08-13 12:45:40 +02:00
Ali Tamur 0581c0b0ee Revert "[SCEV] Look through single value PHIs."
This reverts commit e441b7a7a0.

This patch causes a compile error in tensorflow opensource project. The stack trace looks like:

Point of crash:
llvm/include/llvm/Analysis/LoopInfoImpl.h : line 35

(gdb) ptype *this
type = const class llvm::LoopBase<llvm::BasicBlock, llvm::Loop> [with BlockT = llvm::BasicBlock, LoopT = llvm::Loop]

(gdb) p *this
$1 = {ParentLoop = 0x0, SubLoops = std::vector of length 0, capacity 0, Blocks = std::vector of length 0, capacity 1,
  DenseBlockSet = {<llvm::SmallPtrSetImpl<llvm::BasicBlock const*>> = {<llvm::SmallPtrSetImplBase> = {<llvm::DebugEpochBase> = {Epoch = 3}, SmallArray = 0x1b2bf6c8, CurArray = 0x1b2bf6c8,
        CurArraySize = 8, NumNonEmpty = 0, NumTombstones = 0}, <No data fields>}, SmallStorage = {0xfffffffffffffffe, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}}, IsInvalid = true}

(gdb) p *this->DenseBlockSet->CurArray
$2 = (const void *) 0xfffffffffffffffe

I will try to get a case from tensorflow or use creduce to get a small case.
2020-08-12 23:13:24 -07:00
Florian Hahn e441b7a7a0 [SCEV] Look through single value PHIs.
Now that SCEVExpander can preserve LCSSA form,
we do not have to worry about LCSSA form when
trying to look through PHIs. SCEVExpander will take
care of inserting LCSSA PHI nodes as required.

This increases precision of the analysis in some cases.

Reviewed By: mkazantsev, bmahjour

Differential Revision: https://reviews.llvm.org/D71539
2020-08-12 10:03:42 +01:00
Dávid Bolvanský d68a2859ab [BPI] Teach BPI about bcmp function
bcmp is similar to memcmp
2020-08-11 20:44:53 +02:00
Florian Hahn 3483c28c5b [SCEV] ] If RHS >= Start, simplify (Start smax RHS) to RHS for trip counts.
This is the max version of D85046.

This change causes binary changes in 44 out of 237 benchmarks (out of
MultiSource/SPEC2000/SPEC2006)

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D85189
2020-08-11 13:20:24 +01:00
Thomas Lively 514445e035 [WebAssembly][ConstantFolding] Fold fp-to-int truncation intrinsics
Constant fold both the trapping and saturating versions of the
WebAssembly truncation intrinsics. The tests are adapted from the
WebAssembly spec tests for the corresponding instructions.

Requested in PR46982.

Differential Revision: https://reviews.llvm.org/D85392
2020-08-10 12:40:05 -07:00
Vitaly Buka dee812a297 [StackSafety] Fix union which produces wrapped sets 2020-08-09 23:20:17 -07:00
Vitaly Buka 3a34228bff [StackSafety] Don't keep FullSet in index
Optimization. Missing record is enterpreted as FullSet anyway.
2020-08-09 15:01:46 -07:00
Vitaly Buka 654266bea9 [StackSafety] Use getSignedMin() to serialize ranges
Almost NFC as it's important only for full sets which should not
be serialized at all.
2020-08-09 14:53:13 -07:00
Vitaly Buka eff04f9595 [NFC][StackSafety] Add index test
This directly covers generateParamAccessSummary
2020-08-09 14:34:00 -07:00
Vitaly Buka 2fa401fe53 [NFC][StackSafety] Add shell test requirement 2020-08-09 14:31:17 -07:00
Vitaly Buka 2a11d5dcc9 [NFC][StackSafety] Avoid some duplications in tests 2020-08-09 12:38:53 -07:00
Vitaly Buka 6d9b3cb2fb Revert "[NFC][StackSafety] Add index test"
This reverts commit 5fd49911db.

GUIDs don't match.
2020-08-08 21:26:35 -07:00
Vitaly Buka 5fd49911db [NFC][StackSafety] Add index test
This directly covers generateParamAccessSummary
2020-08-08 19:11:02 -07:00
Vitaly Buka b317321545 [NFC][StackSafety] noinline in alias tests 2020-08-08 18:21:52 -07:00
Vitaly Buka 7547508b7a Revert "[StackSafety] Skip ambiguous lifetime analysis"
This reverts commit 0b2616a804.

Crashes with safe-stack.
2020-08-07 14:02:50 -07:00
Max Kazantsev da9e7b1ab0 [Test] Added test showing missing range check elimination opportunity in IndVars
Seems that SCEV is not powerful enough to handle this.
2020-08-07 16:47:25 +07:00
Vitaly Buka 7fb9de2c6f [StackSafety,NFC] Fix tests in debug 2020-08-06 20:46:39 -07:00
Vitaly Buka 39cbcbe1b1 [StackSafety,NFC] Add more tests 2020-08-06 19:50:05 -07:00
Vitaly Buka d97636196a [StackSafety,NFC] Sort llvm-lto2 resolutions in tests 2020-08-06 19:46:52 -07:00
Vitaly Buka 92dcf12b2f [StackSafety,NFC] Use CHECK-EMPTY in tests 2020-08-06 19:19:51 -07:00
Vitaly Buka 0b2616a804 [StackSafety] Skip ambiguous lifetime analysis
If we can't identify alloca used in lifetime marker we
need to assume to worst case scenario.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D84630
2020-08-06 19:10:33 -07:00
dfukalov 4ccc38813e [AMDGPU][CostModel] Add f16, f64 and contract cases to fused costs estimation.
Add cases of fused fmul+fadd/fsub with f16 and f64 operands to cost model.
Also added operations with contract attribute.

Fixed line endings in test.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D84995
2020-08-06 21:43:27 +03:00
Arthur Eubanks d0acd97c68 [NewPM][LoopUnswitch] Pin loop-unswitch to legacy PM or use simple-loop-unswitch
As mentioned in
http://lists.llvm.org/pipermail/llvm-dev/2020-July/143395.html,
loop-unswitch has not been ported to the NPM. Instead people are using
simple-loop-unswitch.

Pin all tests in Transforms/LoopUnswitch to legacy PM and replace all
other uses of loop-unswitch with simple-loop-unswitch.

One test that didn't fit into the above was
2014-06-21-congruent-constant.ll which seems to only pass with
loop-unswitch. That is also pinned to legacy PM.

Now all tests containing "-loop-unswitch" anywhere in the test succeed with
NPM turned on by default.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D85360
2020-08-06 10:56:00 -07:00
Alina Sbirlea beb9993d96 [MSSA] Update test with more detailed and resilient checks. [NFC] 2020-08-05 16:46:44 -07:00
Arthur Eubanks 4103f4a936 [MSSA][NewPM] Handle tests with -print-memoryssa
-print-memoryssa in legacy PM is print<memoryssa> in NPM.
Pin tests with -print-memoryssa to legacy PM.
Add corresponding tests for NPM where missing.
This fixes "unknown pass name 'print-memoryssa'".

Some tests still fail in Analysis/MemorySSA due to other passes that
haven't been ported.

pr43427.ll and pr43438.ll required adding -aa-pipeline=basic-aa,
-loop-simplify (since it doesn't run on legacy PM by default), and
decrementing some of the MemoryPhi numbers.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D85333
2020-08-05 15:59:45 -07:00
Sam Parker f2675ab45f [ARM][CostModel] Implement getCFInstrCost
As with other targets, set the throughput cost of control-flow
instructions to free so that we don't miss out of vectorization
opportunities.

Differential Revision: https://reviews.llvm.org/D85283
2020-08-05 12:44:51 +01:00
David Green 3c7e7d40a9 [BasicAA] Enable -basic-aa-recphi by default
This option was added a while back, to help improve AA around pointer
phi loops. It looks for phi(gep(phi, const), x) loops, checking if x can
then prove more precise aliasing info.

Differential Revision: https://reviews.llvm.org/D82998
2020-08-04 10:43:42 +01:00
Florian Hahn b7856f9d8d [SCEV] Consolidate some smin/smax folding tests into single test file.
This patch moves a few spread out smin/smax tests to smin-smax-folds.ll
and adds additional test cases that expose further potential for
folds.
2020-08-04 10:24:11 +01:00
Alina Sbirlea 1ce82015f6 [MemorySSA] Restrict optimizations after a PhiTranslation.
Merging alias results from different paths, when a path did phi
translation is not necesarily correct. Conservatively terminate such paths.
Aimed to fix PR46156.

Differential Revision: https://reviews.llvm.org/D84905
2020-08-03 14:46:41 -07:00
Florian Hahn ee1c12708a [SCEV] If Start>=RHS, simplify (Start smin RHS) = RHS for trip counts.
In some cases, it seems like we can get rid of unnecessary s/umins by
using information from the loop guards (unless I am missing something).

One place where this seems to be helpful in practice is when computing
loop trip counts. This patch just changes howManyGreaterThans for now.
Note that this requires a loop for which we can check 'is guarded'.

On SPEC2000/SPEC2006/MultiSource, there are some notable changes for
some programs in the number of loops unrolled and trip counts computed.

```
Same hash: 179 (filtered out)
Remaining: 58
Metric: scalar-evolution.NumTripCountsComputed

Program                                        base    patch   diff
 test-suite...langs-C/compiler/compiler.test    25.00   31.00  24.0%
 test-suite.../Applications/SPASS/SPASS.test   2020.00 2323.00 15.0%
 test-suite...langs-C/allroots/allroots.test    29.00   32.00  10.3%
 test-suite.../Prolangs-C/loader/loader.test    17.00   18.00   5.9%
 test-suite...fice-ispell/office-ispell.test   253.00  265.00   4.7%
 test-suite...006/450.soplex/450.soplex.test   3552.00 3692.00  3.9%
 test-suite...chmarks/MallocBench/gs/gs.test   453.00  470.00   3.8%
 test-suite...ngs-C/assembler/assembler.test    29.00   30.00   3.4%
 test-suite.../Benchmarks/Ptrdist/bc/bc.test   263.00  270.00   2.7%
 test-suite...rks/FreeBench/pifft/pifft.test   722.00  741.00   2.6%
 test-suite...count/automotive-bitcount.test    41.00   42.00   2.4%
 test-suite...0/253.perlbmk/253.perlbmk.test   1417.00 1451.00  2.4%
 test-suite...000/197.parser/197.parser.test   387.00  396.00   2.3%
 test-suite...lications/sqlite3/sqlite3.test   1168.00 1189.00  1.8%
 test-suite...000/255.vortex/255.vortex.test   173.00  176.00   1.7%

Metric: loop-unroll.NumUnrolled

Program                                        base   patch  diff
 test-suite...langs-C/compiler/compiler.test     1.00   3.00 200.0%
 test-suite.../Applications/SPASS/SPASS.test   134.00 234.00 74.6%
 test-suite...count/automotive-bitcount.test     3.00   4.00 33.3%
 test-suite.../Prolangs-C/loader/loader.test     3.00   4.00 33.3%
 test-suite...langs-C/allroots/allroots.test     3.00   4.00 33.3%
 test-suite...Source/Benchmarks/sim/sim.test    10.00  12.00 20.0%
 test-suite...fice-ispell/office-ispell.test    21.00  25.00 19.0%
 test-suite.../Benchmarks/Ptrdist/bc/bc.test    32.00  38.00 18.8%
 test-suite...006/450.soplex/450.soplex.test   300.00 352.00 17.3%
 test-suite...rks/FreeBench/pifft/pifft.test    60.00  69.00 15.0%
 test-suite...chmarks/MallocBench/gs/gs.test    57.00  63.00 10.5%
 test-suite...ngs-C/assembler/assembler.test    10.00  11.00 10.0%
 test-suite...0/253.perlbmk/253.perlbmk.test   145.00 157.00  8.3%
 test-suite...000/197.parser/197.parser.test    43.00  46.00  7.0%
 test-suite...TimberWolfMC/timberwolfmc.test   205.00 214.00  4.4%
 Geomean difference                                           7.6%
```

Fixes https://bugs.llvm.org/show_bug.cgi?id=46939
Fixes https://bugs.llvm.org/show_bug.cgi?id=46924 on X86.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D85046
2020-08-03 17:22:42 +01:00
Florian Hahn ffb4735200 [SCEV] Precommit tests with signed counting down loop.
From PR46939.
2020-08-02 10:26:26 +01:00
Sanjay Patel e591713bff [ConstantFolding] fold abs intrinsic
The handling for minimum value is similar to cttz/ctlz with 0 just above this case.

Differential Revision: https://reviews.llvm.org/D84942
2020-07-31 14:08:44 -04:00
Vitaly Buka 89051ebace [NFC] GetUnderlyingObject -> getUnderlyingObject
I am going to touch them in the next patch anyway
2020-07-30 21:08:24 -07:00
Arthur Eubanks 47acbcf09a [tbaa] Rename type-based-aa -> tbaa
For consistency with legacy pass name.
Helps with 37 instances of "unknown pass name 'tbaa'" in check-llvm under NPM.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D84967
2020-07-30 19:51:35 -07:00
Sanjay Patel f7237ee74f [ConstantFolding] add tests for abs intrinsic; NFC 2020-07-30 09:28:30 -04:00
Yuanfang Chen 8224c5047e For some tests targeting SystemZ, -march=z13 ---> -mcpu=z13
z13 is not a target. It is a CPU.
2020-07-29 19:18:01 -07:00
Sanjay Patel 9ee7d7122c [ConstantFolding] fold integer min/max intrinsics
If both operands are undef, return undef.
If one operand is undef, clamp to limit constant.
2020-07-29 11:01:13 -04:00
Sanjay Patel 9f95895833 [ConstantFolding] add tests for integer min/max intrinsics; NFC 2020-07-29 11:01:13 -04:00
Simon Pilgrim d1abca187d [CostModel][X86] Add SSE costs for SMAX/SMIN/UMAX/UMIN intrinsics 2020-07-29 15:55:43 +01:00
Sanjay Patel 8c3262a7b4 [ConstantFolding] update test checks FP min/max intrinsics
There's a slight difference in functionality with the new CHECK lines:
before, we allowed either -0.0 or 0.0 for maxnum/minnum. That matches
the definition, but we should always get a deterministic result from
constant folding within the compiler, so now we assert that we got
the single expected result in all cases.
2020-07-29 09:43:33 -04:00
Simon Pilgrim 0a0f28254a [CostModel][X86] Add SSE costs for ABS intrinsics 2020-07-29 14:33:59 +01:00
David Green 9ddb28964c [ARM] Tune getCastInstrCost for extending masked loads and truncating masked stores
This patch uses the feature added in D79162 to fix the cost of a
sext/zext of a masked load, or a trunc for a masked store.
Previously, those were considered cheap or even free, but it's
not the case as we cannot split the load in the same way we would for
normal loads.

This updates the costs to better reflect reality, and adds a test for it
in test/Analysis/CostModel/ARM/cast.ll.

It also adds a vectorizer test that showcases the improvement: in some
cases, the vectorizer will now choose a smaller VF when
tail-predication is enabled, which results in better codegen. (Because
if it were to use a higher VF in those cases, the code we see above
would be generated, and the vmovs would block tail-predication later in
the process, resulting in very poor codegen overall)

Original Patch by Pierre van Houtryve

Differential Revision: https://reviews.llvm.org/D79163
2020-07-29 13:41:34 +01:00
Simon Pilgrim c5ef1f1edd [TTI] Add default cost expansion for abs/smax/smin/umax/umin intrinsics 2020-07-29 12:13:06 +01:00
Simon Pilgrim 3f7249046a [CostModel][X86] Add smax/smin/umin/umax intrinsics cost model tests
Costs currently fall back to scalar generic intrinsic calls
2020-07-28 19:56:11 +01:00
Simon Pilgrim c6920081a8 [CostModel][X86] Add abs intrinsics cost model tests
abs costs currently falls back in scalar generic intrinsic calls
2020-07-28 19:56:10 +01:00
Arthur Eubanks 2ca6c422d2 [FunctionAttrs] Rename functionattrs -> function-attrs
To match NewPM pass name, and also for readability.
Also rename rpo-functionattrs -> rpo-function-attrs while we're here.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D84694
2020-07-28 09:09:13 -07:00
Florian Hahn be2ea29ee1 [SCEV] Add additional tests.
Increase test coverage for upcoming changes to how SCEV deals with LCSSA
phis.
2020-07-28 16:15:57 +01:00
Jinsong Ji d28f86723f Re-land "[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support"
This reverts commit bf544fa1c3.

Fixed the typo in PPCInstrInfo.cpp.
2020-07-28 14:00:11 +00:00
Jinsong Ji bf544fa1c3 Revert "[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support"
This reverts commit adffce7153.

This is breaking test-suite, revert while investigation.
2020-07-27 21:07:00 +00:00
Jinsong Ji adffce7153 [PowerPC] Remove QPX/A2Q BGQ/BGP CNK support
Per RFC http://lists.llvm.org/pipermail/llvm-dev/2020-April/141295.html
no one is making use of QPX/A2Q/BGQ/BGP CNK anymore.

This patch remove the support of QPX/A2Q in llvm, BGQ/BGP in clang,
CNK support in openmp/polly.

Reviewed By: hfinkel

Differential Revision: https://reviews.llvm.org/D83915
2020-07-27 19:24:39 +00:00
Juneyoung Lee 32088f4f7f [ConstantFolding] Fold freeze if it is never undef or poison
This is a simple patch that adds constant folding for freeze
instruction.

IIUC, it isn't needed to update ConstantFold.cpp because there is no freeze
constexpr.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D84597
2020-07-26 21:54:44 +09:00
Juneyoung Lee 1b802fe34d NFC; add a test for freeze's constprop 2020-07-26 21:03:23 +09:00
Arthur Eubanks 9bb6ce78be Rename scoped-noalias -> scoped-noalias-aa
Summary: To match NewPM name. Also the new name is clearer and more consistent.

Subscribers: jvesely, nhaehnle, hiraditya, asbirlea, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84542
2020-07-24 12:14:27 -07:00
Tarindu Jayatilaka 06283661b3 Add new function properties to FunctionPropertiesAnalysis
Added  LoadInstCount, StoreInstCount, MaxLoopDepth, LoopCount

Reviewed By: jdoerfert, mtrofin

Differential Revision: https://reviews.llvm.org/D82283
2020-07-23 12:46:47 -07:00
Tarindu Jayatilaka ee6f0e109c Add a Printer to the FunctionPropertiesAnalysis
A printer pass and a lit test case was added.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D82523
2020-07-23 11:57:11 -07:00
Max Kazantsev c1d8e39236 [Test] Add more simple tests for PR46786 2020-07-22 17:11:26 +07:00
Max Kazantsev b96114c1e1 [SCEV] Remove premature assert. PR46786
This assert was added to verify assumption that GEP's SCEV will be of pointer type,
basing on fact that it should be a SCEVAddExpr with (at least) last operand being
pointer. Two notes:
- GEP's SCEV does not have to be a SCEVAddExpr after all simplifications;
- In current state, GEP's SCEV does not have to have at least one pointer operands
  (all of them can become int during the transforms).

However, we might want to be at a point where it is true. We are currently removing
this assert and will try to enumerate the cases where "is pointer" notion might be
lost during the transforms. When all of them are fixed, we can return it.

Differential Revision: https://reviews.llvm.org/D84294
Reviewed By: lebedev.ri
2020-07-22 15:43:16 +07:00
David Green becaa6803a [ARM] Constant fold VCTP intrinsics
We can sometimes get into the situation where the operand to a vctp
intrinsic becomes constant, such as after a loop is fully unrolled. This
adds the constant folding needed for them, allowing them to simplify
away and hopefully simplifying remaining instructions.

Differential Revision: https://reviews.llvm.org/D84110
2020-07-21 11:39:31 +01:00
Matt Arsenault ad8e900cb3 Verifier: Disallow byval and similar for AMDGPU calling conventions
These imply stack-like semantics, which doesn't make any sense for
entry points.
2020-07-20 10:58:57 -04:00
Jameson Nash 8b354cc8db [ConstantFolding] check applicability of AllOnes constant creation first
The getAllOnesValue can only handle things that are bitcast from a
ConstantInt, while here we bitcast through a pointer, so we may see more
complex objects (like Array or Struct).

Differential Revision: https://reviews.llvm.org/D83870
2020-07-19 13:13:57 -04:00
Arthur Eubanks 9adbb5cb3a [SCEV] Fix ScalarEvolution tests under NPM
Many tests use opt's -analyze feature, which does not translate well to
NPM and has better alternatives. The alternative here is to explicitly
add a pass that calls ScalarEvolution::print().

The legacy pass manager RUNs aren't changing, but they are now pinned to
the legacy pass manager.  For each legacy pass manager RUN, I added a
corresponding NPM RUN using the 'print<scalar-evolution>' pass. For
compatibility with update_analyze_test_checks.py and existing test
CHECKs, 'print<scalar-evolution>' now prints what -analyze prints per
function.

This was generated by the following Python script and failures were
manually fixed up:

import sys
for i in sys.argv:
    with open(i, 'r') as f:
        s = f.read()
    with open(i, 'w') as f:
        for l in s.splitlines():
            if "RUN:" in l and ' -analyze ' in l and '\\' not in l:
                f.write(l.replace(' -analyze ', ' -analyze -enable-new-pm=0 '))
                f.write('\n')
                f.write(l.replace(' -analyze ', ' -disable-output ').replace(' -scalar-evolution ', ' "-passes=print<scalar-evolution>" ').replace(" | ", " 2>&1 | "))
                f.write('\n')
            else:
                f.write(l)

There are a couple failures still in ScalarEvolution under NPM, but
those are due to other unrelated naming conflicts.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D83798
2020-07-16 11:24:07 -07:00
David Green 311fafd2c9 [BasicAA] Fix -basicaa-recphi for geps with negative offsets
As shown in D82998, the basic-aa-recphi option can cause miscompiles for
gep's with negative constants. The option checks for recursive phi, that
recurse through a contant gep. If it finds one, it performs aliasing
calculations using the other phi operands with an unknown size, to
specify that an unknown number of elements after the initial value are
potentially accessed. This works fine expect where the constant is
negative, as the size is still considered to be positive. So this patch
expands the check to make sure that the constant is also positive.

Differential Revision: https://reviews.llvm.org/D83576
2020-07-16 17:22:40 +01:00
David Green 30fa576627 [BasicAA] Add additional negative phi tests. NFC 2020-07-16 15:32:38 +01:00
dfukalov 76a0c0ee6f [AMDGPU][CostModel] Improve cost estimation for fused {fadd|fsub}(a,fmul(b,c))
Summary:
If result of fmul(b,c) has one use, in almost all cases (except denormals are
IEEE) the pair of operations will be fused in one fma/mad/mac/etc.

Reviewers: rampitec

Reviewed By: rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits, kerbowa

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83919
2020-07-16 03:06:38 +03:00
Arthur Eubanks f413b53a67 [NPM][IVUsers] Rename ivusers -> iv-users
LPM passes were named iv-users, which seems nicer than ivusers.

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D83803
2020-07-15 09:38:21 -07:00
Tyker 0257ba581c Fix tests after 16f777f421 2020-07-14 22:52:26 +02:00
Giorgis Georgakoudis aef60af34e [CallGraph] Ignore callback uses
Summary:
Ignore callback uses when adding a callback function
in the CallGraph. Callback functions are typically
created when outlining, e.g. for OpenMP, so they have
internal scope and linkage. They should not be added
to the ExternalCallingNode since they are only callable
by the specified caller function at creation time.

A CGSCC pass, such as OpenMPOpt, may need to update
the CallGraph by adding a new outlined callback function.
Without ignoring callback uses, adding breaks CGSCC
pass restrictions and results to a broken CallGraph.

Reviewers: jdoerfert

Subscribers: hiraditya, sstefan1, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83370
2020-07-14 13:08:49 -07:00
Tyker 16f777f421 [NFC] Add debug and stat counters to assume queries and assume builder
Summary:
Add debug counter and stats counter to assume queries and assume builder
here is the collected stats on a build of check-llvm + check-clang.
  "assume-builder.NumAssumeBuilt": 2720879,
  "assume-builder.NumAssumesMerged": 761396,
  "assume-builder.NumAssumesRemoved": 1576212,
  "assume-builder.NumBundlesInAssumes": 6518809,
  "assume-queries.NumAssumeQueries": 85566380,
  "assume-queries.NumUsefullAssumeQueries": 2727360,
the NumUsefullAssumeQueries stat is actually pessimistic because in a few places queries
ask to keep providing information to try to get better information. and this isn't counted
as a usefull query evem tho it can be usefull

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83506
2020-07-14 21:49:14 +02:00
David Sherwood c06b7e2ab5 [SVE] Fix implicit TypeSize->uint64_t conversion getCastInstrCost
In getCastInstrCost() when comparing different sizes for src and
dst types we should be using the TypeSize comparison operators
instead of relying upon TypeSize being converted a uin64_t.
Previously this meant we were dropping the scalable property and
treating fixed and scalable vector types the same.

Differential Revision: https://reviews.llvm.org/D83461
2020-07-14 08:16:31 +01:00
David Green e1135b486a Revert "[BasicAA] Enable -basic-aa-recphi by default"
This reverts commit af839a9618.

Some issues appear to be being caused by this. Reverting whilst we
investigate.
2020-07-10 13:43:54 +01:00
Roman Lebedev c2a61ef388
Revert "[CallGraph] Ignore callback uses"
This likely has broken test/Transforms/Attributor/IPConstantProp/ tests.
http://45.33.8.238/linux/22502/step_12.txt

This reverts commit 205dc0922d.
2020-07-10 00:02:07 +03:00
Giorgis Georgakoudis 205dc0922d [CallGraph] Ignore callback uses
Summary:
Ignore callback uses when adding a callback function
in the CallGraph. Callback functions are typically
created when outlining, e.g. for OpenMP, so they have
internal scope and linkage. They should not be added
to the ExternalCallingNode since they are only callable
by the specified caller function at creation time.

A CGSCC pass, such as OpenMPOpt, may need to update
the CallGraph by adding a new outlined callback function.
Without ignoring callback uses, adding breaks CGSCC
pass restrictions and results to a broken CallGraph.

Reviewers: jdoerfert

Subscribers: hiraditya, sstefan1, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83370
2020-07-09 13:13:46 -07:00
David Green af839a9618 [BasicAA] Enable -basic-aa-recphi by default
This option was added a while back, to help improve AA around pointer
phi loops. It looks for phi(gep(phi, const), x) loops, checking if x can
then prove more precise aliasing info.

Differential Revision: https://reviews.llvm.org/D82998
2020-07-09 14:54:53 +01:00
Arthur Eubanks 83158cf95d [BasicAA] Remove -basicaa alias
Follow up of https://reviews.llvm.org/D82607.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D83067
2020-07-07 10:44:23 -07:00
Stanislav Mekhanoshin f7a7efbf88 [AMDGPU] Tweak getTypeLegalizationCost()
Even though wide vectors are legal they still cost more as we
will have to eventually split them. Not all operations can
be uniformly done on vector types.

Conservatively add the cost of splitting at least to 8 dwords,
which is our widest possible load.

We are more or less lying to cost mode with this change but
this can prevent vectorizer from creation of wide vectors which
results in RA problems for us.

Differential Revision: https://reviews.llvm.org/D83078
2020-07-06 14:07:48 -07:00
Roman Lebedev a2619a60e4
Reland "[ScalarEvolution] createSCEV(): recognize `udiv`/`urem` disguised as an `sdiv`/`srem`"
This reverts commit d3e3f36ff1,
which reverter the original commit 2c16100e6f,
but with polly tests now actually passing.
2020-07-06 18:00:22 +03:00
David Green 146dad0077 [ARM] MVE FP16 cost adjustments
This adjusts the MVE fp16 cost model, similar to how we already do for
integer casts. It uses the base cost of 1 per cvt for most fp extend /
truncates, but adjusts it for loads and stores where we know that a
extending load has been used to get the load into the correct lane, and
only an MVE VCVTB is then needed.

Differential Revision: https://reviews.llvm.org/D81813
2020-07-06 15:57:51 +01:00
Mikhail Goncharov d3e3f36ff1
Revert "[ScalarEvolution] createSCEV(): recognize `udiv`/`urem` disguised as an `sdiv`/`srem`"
Summary:
This reverts commit 2c16100e6f.

ninja check-polly fails:
  Polly :: Isl/CodeGen/MemAccess/generate-all.ll
  Polly :: ScopInfo/multidim_srem.ll

Reviewers: kadircet, bollu

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83230
2020-07-06 16:41:59 +02:00
David Green afdb2ef2ed [ARM] Adjust default fp extend and trunc costs
This adds some default costs for fp extends and truncates, generally
costing them as 1 per lane. If the type is not legal then the cost will
include a call to an __aeabi_ function.

Some NEON code is also adjusted to make sure it applies to the expected
types, now that fp16 is a more common thing.

Differential Revision: https://reviews.llvm.org/D82458
2020-07-06 14:23:17 +01:00
David Green 60b8b2beea [ARM] Add extra extend and trunc costs for cast instructions
This expands the existing extend costs with a few extras for larger
types than legal, which will usually be split under MVE. It also adds
trunk support for the same thing. These should not have a large effect
on many things, but makes the costs explicit and keeps a certain balance
between the trunks and extends.

Differential Revision: https://reviews.llvm.org/D82457
2020-07-06 11:33:05 +01:00
David Green 55227f85d0 [ARM] Use BaseT::getMemoryOpCost for getMemoryOpCost
This alters getMemoryOpCost to use the Base TargetTransformInfo version
that includes some additional checks for whether extending loads are
legal. This will generally have the effect of making <2 x ..> and some
<4 x ..> loads/stores more expensive, which in turn should help favour
larger vector factors.

Notably it alters the cost of a <4 x half>, which with the current
codegen will be expensive if it is not extended.

Differential Revision: https://reviews.llvm.org/D82456
2020-07-06 10:58:40 +01:00
Arthur Eubanks 3d12e79094 [NewPM][LSR] Rename strength-reduce -> loop-reduce
The legacy pass was called "loop-reduce".

This lowers the number of check-llvm failures under NPM by 83.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D82925
2020-07-02 11:15:29 -07:00
David Green 30bd66544d [BasicAA] Fix recursive phi MustAlias calculations
With the option -basic-aa-recphi we can detect recursive phis that loop
through constant geps, which allows us to detect more no-alias case for
pointer IV's. If the other phi operand and the other alias value are
MustAlias though, we cannot presume that every element in the loop is
also MustAlias. We need to instead be conservative and return MayAlias.

Differential Revision: https://reviews.llvm.org/D82987
2020-07-02 14:01:38 +01:00
Roman Lebedev 2c16100e6f
[ScalarEvolution] createSCEV(): recognize `udiv`/`urem` disguised as an `sdiv`/`srem`
Summary:
While InstCombine trivially converts that `srem` into a `urem`,
it might happen later than wanted, in particular i'd like
for that to happen on  https://godbolt.org/z/bwuEmJ test case
early in pipeline, before first instcombine run, just before `-mem2reg`.

SCEV should recognize this case natively.

Reviewers: mkazantsev, efriedma, nikic, reames

Reviewed By: efriedma

Subscribers: clementval, hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82721
2020-07-02 13:22:12 +03:00
David Green 68498ce8af [BasicAA] New basic-aa-recphi test. NFC 2020-07-02 10:54:01 +01:00
Roman Lebedev e7da7d9428
[NFCI] Actually provide correct check lines in sdiv.ll 2020-07-02 02:00:02 +03:00
Roman Lebedev 51ff7642a3
[NFC][ScalarEvolution] Add udiv-disguised-as-sdiv test
Much like 25521150d7,
but with division instead of remainder.

See https://reviews.llvm.org/D82721
2020-07-02 01:44:19 +03:00
Sergey Dmitriev cb8faaacb5 [CallGraph] Add support for callback call sites
Summary:
This patch changes call graph analysis to recognize callback call sites
and add an artificial 'reference' call record from the broker function
caller to the callback function in the call graph. A presence of such
reference enforces bottom-up traversal order for callback functions in
CG SCC pass manager because callback function logically becomes a callee
of the broker function caller.

Reviewers: jdoerfert, hfinkel, sstefan1, baziotis

Reviewed By: jdoerfert

Subscribers: hiraditya, kuter, sstefan1, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82572
2020-07-01 13:44:11 -07:00
Florian Hahn 1ccc49924a [AArch64] Add getCFInstrCost, treat branches as free for throughput.
D79164/2596da31740f changed getCFInstrCost to return 1 per default.
AArch64 did not have its own implementation, hence the throughput cost
of CFI instructions is overestimated. On most cores, most branches should
be predicated and essentially free throughput wise.

This restores a 9% performance regression on a SPEC2006 benchmark on
AArch64 with -O3 LTO & PGO.

This patch effectively restores pre 2596da3174 behavior for AArch64
and undoes the AArch64 test changes of the patch.

Reviewers: samparker, dmgreen, anemet

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D82755
2020-06-30 20:34:04 +01:00
Roman Lebedev 25521150d7
[NFC][ScalarEvolution] Add a test showing SCEV failure to recognize 'urem'
While InstCombine trivially converts that `srem` into a `urem`,
it might happen later than wanted. SCEV should recognize this natively.
2020-06-28 20:35:02 +03:00
Roman Lebedev 141e845da5
[SCEV] Make SCEVAddExpr actually always return pointer type if there is pointer operand (PR46457)
Summary:
The added assertion fails on the added test without the fix.

Reduced from test-suite/MultiSource/Benchmarks/MiBench/office-ispell/correct.c
In IR, getelementptr, obviously, takes pointer as it's base,
and returns a pointer.

When creating an SCEV expression, SCEV operands are sorted in hope
that it increases folding potential, and at the same time SCEVAddExpr's
type is the type of the last(!) operand.

Which means, in some exceedingly rare cases, pointer operand may happen to
end up not being the last operand, and as a result SCEV for GEP
will suddenly have a non-pointer return type.
We should ensure that does not happen.

In the end, actually storing the `Type *`, at the cost of increasing
memory footprint of `SCEVAddExpr`, appears to be the solution.
We can't just store a 'is a pointer' bit and create pointer type
on the fly since we don't have data layout in getType().

Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=46457 | PR46457 ]]

Reviewers: efriedma, mkazantsev, reames, nikic

Reviewed By: efriedma

Subscribers: hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82633
2020-06-27 11:37:17 +03:00
Fangrui Song 4cd19a6e15 [BasicAA] Rename -disable-basicaa to -disable-basic-aa to be consistent with the canonical name "basic-aa" 2020-06-26 20:55:44 -07:00
Fangrui Song f31811f2dc [BasicAA] Rename deprecated -basicaa to -basic-aa
Follow-up to D82607
Revert an accidental change (empty.ll) of D82683
2020-06-26 20:41:37 -07:00
Arthur Eubanks 0077988a6f Fix full-store-partial-alias.ll
Accidentally renamed -disable-basicaa -> -disable-basic-aa
2020-06-26 15:46:47 -07:00
Arthur Eubanks feeed16a5f [NewPM][BasicAA] basicaa -> basic-aa in Analysis/BasicAA
Following https://reviews.llvm.org/D82607.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D82683
2020-06-26 14:58:01 -07:00
Arthur Eubanks 0c6bf90b56 [NewPM][BasicAA] Rename basicaa -> basic-aa, add alias
Summary:
BasicAA under the new pass manager is called "basic-aa", which fits more
with the other AA names which almost always contain a dash.

Keep an alias from basicaa -> basic-aa.

Will change all references of "basicaa" to "basic-aa", then remove the
alias.

Makes check-llvm failures under NPM go from 2307 to 1867.

Reviewers: asbirlea, ychen

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82607
2020-06-25 18:08:34 -07:00
David Green f14457f5d8 [ARM] Split cast cost tests, and add masked load/store tests. NFC
This file has grown quite large and could do with being split up. This
splits away the load/store + cast tests into a separate file. Some
masked load/store + cast tests have been added too, along with some
extra load/store + fpcast tests.
2020-06-25 13:24:17 +01:00
Eli Friedman 90ad786947 [IR] Prefer scalar type for struct indexes in GEP constant expressions.
This has two advantages: one, it's simpler, and two, it doesn't require
heroic pattern matching with scalable vectors.

Also includes a small fix to DataLayout to allow the scalable vector
testcase to work correctly.

Differential Revision: https://reviews.llvm.org/D82061
2020-06-23 16:14:36 -07:00
Vitaly Buka 5d964e262f [StackSafety] Check variable lifetime
We can't consider variable safe if out-of-lifetime access is possible.
So if StackLifetime can't prove that the instruction always uses
the variable when it's still alive, we consider it unsafe.
2020-06-22 03:45:29 -07:00
Vitaly Buka 8f592ed333 [StackSafety] Ignore unreachable instructions
Usually DominatorTree provides this info, but here we use
StackLifetime. The reason is that in the next patch StackLifetime
will be used for actual lifetime checks and we can avoid
forwarding the DominatorTree into this code.
2020-06-22 03:45:29 -07:00
Florian Hahn 9a7d80a32c Revert "[BasicAA] Use known lower bounds for index values for size based check."
This potentially related to https://bugs.llvm.org/show_bug.cgi?id=46335
and causes a slight compile-time regression. Revert while investigating.

This reverts commit d99a1848c4.
2020-06-20 10:06:05 +01:00
dfukalov 129388ddc4 [AMDGPU][CostModel] Add fneg cost estimation
Summary: The estimation uses AMDGPUTargetLowering::isFNegFree()

Reviewers: rampitec

Reviewed By: rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82065
2020-06-19 17:31:35 +03:00
Vitaly Buka 306c257b00 [SafeStack,NFC] Print liveness for all instrunctions 2020-06-19 02:32:17 -07:00
Vitaly Buka 7b27c09f63 [StackSafety,NFC] Don't test terminators
Code does not track terminators and do not expose them through interface.
State there is just a state of the last instruction or entry.
So this information is just redundant and doesn't need to be tested.
2020-06-19 02:32:17 -07:00
Tyker b7338fb1a6 [AssumeBundles] add cannonicalisation to the assume builder
Summary:
this reduces significantly the number of assumes generated without aftecting too much
the information that is preserved. this improves the compile-time cost
of enable-knowledge-retention significantly.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, asbirlea, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79650
2020-06-19 10:32:26 +02:00
Vitaly Buka fcd67665a8 [StackSafety] Add "Must Live" logic
Summary:
Extend StackLifetime with option to calculate liveliness
where alloca is only considered alive on basic block entry
if all non-dead predecessors had it alive at terminators.

Depends on D82043.

Reviewers: eugenis

Reviewed By: eugenis

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82124
2020-06-18 16:53:37 -07:00
Vitaly Buka f672791e08 [StackSafety] Add pass for StackLifetime testing
Summary: lifetime.ll is a copy of SafeStack/X86/coloring2.ll

Reviewers: eugenis

Reviewed By: eugenis

Subscribers: hiraditya, mgrang, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82043
2020-06-18 16:34:18 -07:00
Paul Walker 4612f39120 [SVE] Add flag to specify SVE register size, using this to calculate legal vector types.
Adds aarch64-sve-vector-bits-{min,max} to allow the size of SVE
data registers (in bits) to be specified. This allows the code
generator to make assumptions it normally couldn't. As a starting
point this information is used to mark fixed length vector types
that can fit within the specified size as legal.

Reviewers: rengolin, efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80384
2020-06-18 12:11:16 +00:00
Sameer Sahasrabuddhe 7aad220795 [DA] conservatively mark the join of every divergent branch
For a loop, a join block is a block that is reachable along multiple
disjoint paths from the exiting block of a loop. If the exit condition
of the loop is divergent, then such join blocks must also be marked
divergent. This currently fails in some cases because not all join
blocks are identified correctly.

The workaround is to conservatively mark every join block of any
branch (not necessarily the exiting block of a loop) as divergent.

https://bugs.llvm.org/show_bug.cgi?id=46372

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D81806
2020-06-18 17:39:20 +05:30
Christopher Tetreault 8819202dfd [SVE] Eliminate bad VectorType::getNumElements() calls from ConstantFold
Summary:
Assume all usages of this function are explicitly fixed-width operations
and cast to FixedVectorType

Reviewers: efriedma, sdesmalen, c-rhodes, majnemer, dblaikie

Reviewed By: sdesmalen

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80262
2020-06-17 14:19:56 -07:00
Sameer Sahasrabuddhe d3963b3a5f [DA] propagate loop live-out values that get used in a branch
Values that are uniform within a loop but appear divergent to uses
outside the loop are "tainted" so that such uses are marked
divergent. But if such a use is a branch, then it's divergence needs
to be propagated. The simplest way to do that is to put the branch
back in the main worklist so that it is processed appropriately.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D81822
2020-06-17 09:21:00 +05:30
Tyker d7deef1206 Revert "[AssumeBundles] add cannonicalisation to the assume builder"
This reverts commit 90c50cad19.
2020-06-16 14:34:55 +02:00
Tyker 90c50cad19 [AssumeBundles] add cannonicalisation to the assume builder
Summary:
this reduces significantly the number of assumes generated without aftecting too much
the information that is preserved. this improves the compile-time cost
of enable-knowledge-retention significantly.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, asbirlea, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79650
2020-06-16 13:12:35 +02:00
Sam Parker 2596da3174 [CostModel] getCFInstrCost in getUserCost.
Have BasicTTI call the base implementation so that both agree on the
default behaviour, which the default being a cost of '1'. This has
required an X86 specific implementation as it seems to be very
reliant on those instructions being free. Changes are also made to
AMDGPU so that their implementations distinguish between cost kinds,
so that the unrolling isn't affected. PowerPC also has its own
implementation to prevent changes to the reg-usage vectorizer test.

The cost model test changes now reflect that ret instructions are not
generally free.

Differential Revision: https://reviews.llvm.org/D79164
2020-06-15 09:28:46 +01:00
David Green 7507186b94 [ARM] Additional cast cost tests.
This adds additional cast cpst tests useful for MVE, notably around half
types.
2020-06-14 14:30:07 +01:00
Vitaly Buka c1e47b47f8 [StackSafety] Run ThinLTO
Summary:
ThinLTO linking runs dataflow processing on collected
function parameters. Then StackSafetyGlobalInfoWrapperPass
in ThinLTO backend will run as usual looking up to external
symbol in the summary if needed.

Depends on D80985.

Reviewers: eugenis, pcc

Reviewed By: eugenis

Subscribers: inglorion, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D81242
2020-06-12 18:11:29 -07:00
Vitaly Buka e6ce0dc5de [StackSafety,NFC] Extract addOverflowNever 2020-06-12 17:42:32 -07:00
Vitaly Buka 999307323a [StackSafety] Fix byval handling
We don't need process paramenters which marked as
byval as we are not going to pass interested allocas
without copying.

If we pass value into byval argument, we just handle that
as Load of corresponding type and stop that branch of analysis.
2020-06-11 20:58:36 -07:00
Alina Sbirlea 519b019a0a Verify MemorySSA after all updates.
Verify after completing all updates.
Resolves PR46275.
2020-06-11 18:48:41 -07:00
Simon Pilgrim 28947bc23c [CostModel][X86] Add broadcast costs for vXi1 bool vectors
Doesn't mean much on non-AVX512 targets but better to keep with the other shuffles
2020-06-10 15:27:15 +01:00
Roman Lebedev c868335e24
[SCEV] ScalarEvolution::createSCEV(): clarify no-wrap flag propagation for shift by bitwidth-1
Summary:
There was this comment here previously:
```
-        // It is currently not resolved how to interpret NSW for left
-        // shift by BitWidth - 1, so we avoid applying flags in that
-        // case. Remove this check (or this comment) once the situation
-        // is resolved. See
-        // http://lists.llvm.org/pipermail/llvm-dev/2015-April/084195.html
-        // and http://reviews.llvm.org/D8890 .
```
But langref was fixed in rL286785, and the behavior is pretty obvious:
http://volta.cs.utah.edu:8080/z/MM4WZP
^ nuw can always be propagated. nsw can be propagated if
either nuw is specified, or the shift is by *less* than bitwidth-1.

This mimics similar D81189 Reassociate change, alive2 is happy about that one.

I'm not sure `NUW` isn't being printed, but that seems unrelated.

Reviewers: mkazantsev, reames, sanjoy, nlopes, craig.topper, efriedma

Reviewed By: efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81243
2020-06-06 13:02:07 +03:00
Philip Reames 32c09d527c [Tests] Migrate a number of tests to gc-live bundle representation 2020-06-05 16:44:04 -07:00
Roman Lebedev 39e3683534
[NFC][SCEV] Add test with 'or' with no common bits set 2020-06-05 12:18:15 +03:00
Roman Lebedev 39e3c92410
[NFC][SCEV] Some tests for shifts by bitwidth-2/bitwidth-1 w/ no-wrap flags 2020-06-05 11:45:09 +03:00
Vitaly Buka 6dd738e2f0 [StackSafety,NFC] Switch tests to aarch64 2020-06-05 00:24:02 -07:00
Yevgeny Rouban dcfa78a4cc Extend InvokeInst !prof branch_weights metadata to unwind branches
Allow InvokeInst to have the second optional prof branch weight for
its unwind branch. InvokeInst is a terminator with two successors.
It might have its unwind branch taken many times. If so
the BranchProbabilityInfo unwind branch heuristic can be inaccurate.
This patch allows a higher accuracy calculated with both branch
weights set.

Changes:
 - A new section about InvokeInst is added to
   the BranchWeightMetadata page. It states the old information that
   missed in the doc and adds new about the second branch weight.
 - Verifier is changed to allow either 1 or 2 branch weights
   for InvokeInst.
 - A new test is written for BranchProbabilityInfo to demonstrate
   the main improvement of the simple fix in calcMetadataWeights().
 - Several new testcases are created for Inliner. Those check that
    both weights are accounted for invoke instruction weight
    calculation.
 - PGOUseFunc::setBranchWeights() is fixed to be applicable to
   InvokeInst.

Reviewers: davidxl, reames, xur, yamauchi
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80618
2020-06-04 15:37:15 +07:00
Jay Foad c27214c234 [AMDGPU] Fold llvm.amdgcn.cos and llvm.amdgcn.sin intrinsics (fix tests)
Try to fix Windows buildbots.
2020-06-03 11:40:52 +01:00
Jay Foad c823cfde21 [AMDGPU] Fold llvm.amdgcn.cos and llvm.amdgcn.sin intrinsics
Differential Revision: https://reviews.llvm.org/D80702
2020-06-03 09:34:22 +01:00