Commit Graph

2559 Commits

Author SHA1 Message Date
Stanislav Mekhanoshin d9c99043bd Option to ignore llvm[.compiler].used uses in hasAddressTaken()
Differential Revision: https://reviews.llvm.org/D96087
2021-02-25 10:06:24 -08:00
Stanislav Mekhanoshin 29e2d9461a Option to ignore assume like intrinsic uses in hasAddressTaken()
Differential Revision: https://reviews.llvm.org/D96081
2021-02-25 09:48:29 -08:00
Stelios Ioannou 30cb9c03b5 [AArch64] Add abs intrinsic costs
This patch adds cost-modelling for abs vector intrinsic.

Change-Id: I89007971bfb15f5b4a02a2eadfd43018e9a73976
2021-02-25 09:31:52 +00:00
Philip Reames c1706f2269 [tests] precommit tests for an upcoming AA improvement 2021-02-24 09:51:00 -08:00
Philip Reames 532d4814ac Revert "[tests] Mark an autogened test as such"
This reverts commit 43a569faeb.

Unhelpfully, the tool just added the header and didn't actually update any of the tests.  I didn't notice until after pushing.
2021-02-24 09:26:26 -08:00
Philip Reames 43a569faeb [tests] Mark an autogened test as such 2021-02-24 09:15:19 -08:00
Ta-Wei Tu 98c6110d9b [LoopNest] Use `getUniqueSuccessor()` instead when checking empty blocks
Blocks that contain only a single branch instruction to the next block can be skipped in analyzing the loop-nest structure.
This is currently done by `getSingleSuccessor()`.
However, the branch instruction might have multiple targets which happen to all be the same.
In this case, the block should still be considered as empty and skipped.

An example is `test/Transforms/LoopInterchange/update-condbranch-duplicate-successors.ll` (the LIT test for this patch is modified from it as well).

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D97286
2021-02-24 09:53:12 +08:00
David Green dd2dbf7ee2 [TTI] Change getOperandsScalarizationOverhead to take Type args
As a followup to D95291, getOperandsScalarizationOverhead was still
using a VF as a vector factor if the arguments were scalar, and would
assert on certain matrix intrinsics with differently sized vector
arguments. This patch removes the VF arg, instead passing the Types
through directly. This should allow it to more accurately compute the
cost without having to guess at which operands will be vectorized,
something difficult with more complex intrinsics.

This adjusts one SVE test as it is now calling the wrong intrinsic vs
veccall. Without invalid InstructCosts the cost of the scalarized
intrinsic is too low. This should get fixed when the cost of
scalarization is accounted for with scalable types.

Differential Revision: https://reviews.llvm.org/D96287
2021-02-23 13:04:59 +00:00
Dávid Bolvanský cd54c57919 Reland "[Libcalls, Attrs] Annotate libcalls with noundef"
Fixed Clang tests.
2021-02-20 06:18:48 +01:00
Dávid Bolvanský 94d034fb86 Revert "[Libcalls, Attrs] Annotate libcalls with noundef"
This reverts commit 33b0c63775. Bots are failing. Some Clang tests need to be updated too.
2021-02-20 04:18:42 +01:00
Dávid Bolvanský 33b0c63775 [Libcalls, Attrs] Annotate libcalls with noundef
I think we can use here same logic as for nonnull.

strlen(X) - X must be noundef => valid pointer.

for libcalls with size arg, we add noundef only if size is known and greater than 0 - so pointers must be noundef (valid ones)

Reviewed By: jdoerfert, aqjune

Differential Revision: https://reviews.llvm.org/D95122
2021-02-20 04:10:07 +01:00
Philip Reames cc574f85fa Add datalayout to test added in 7e3183d73
Realized after pushing this would probably fail on bots for other than x86-64.
2021-02-19 13:10:19 -08:00
Philip Reames 7e3183d735 Add test triggered by review discussion on D97077 2021-02-19 13:03:58 -08:00
Philip Reames 5de47ebff6 precommit test cleanup for D97077 2021-02-19 12:19:39 -08:00
Nikita Popov 71a8e4e7d6 [MemCopyOpt] Enable MemorySSA by default
This enables use of MemorySSA instead of MemDep in MemCpyOpt. To
allow this without significant compile-time impact, the MemCpyOpt
pass is moved directly before DSE (in the cases where this was not
already the case), which allows us to reuse the existing MemorySSA
analysis.

Unlike the MemDep-based implementation, the MemorySSA-based MemCpyOpt
can also perform simple optimizations across basic blocks.

Differential Revision: https://reviews.llvm.org/D94376
2021-02-19 18:06:25 +01:00
Philip Reames 4a5edea193 [SCEV] Use both known bits and sign bits when computing range of SCEV unknowns
When computing a range for a SCEVUnknown, today we use computeKnownBits for unsigned ranges, and computeNumSignBots for signed ranges. This means we miss opportunities to improve range results.

One common missed pattern is that we have a signed range of a value which CKB can determine is positive, but CNSB doesn't convey that information. The current range includes the negative part, and is thus double the size.

Per the removed comment, the original concern which delayed using both (after some code merging years back) was a compile time concern. CTMark results (provided by Nikita, thanks!) showed a geomean impact of about 0.1%. This doesn't seem large enough to avoid higher quality results.

Differential Revision: https://reviews.llvm.org/D96534
2021-02-19 08:29:12 -08:00
Nikita Popov 70e3c9a8b6 [BasicAA] Always strip single-argument phi nodes
We can always look through single-argument (LCSSA) phi nodes when
performing alias analysis. getUnderlyingObject() already does this,
but stripPointerCastsAndInvariantGroups() does not. We still look
through these phi nodes with the usual aliasPhi() logic, but
sometimes get sub-optimal results due to the restrictions on value
equivalence when looking through arbitrary phi nodes. I think it's
generally beneficial to keep the underlying object logic and the
pointer cast stripping logic in sync, insofar as it is possible.

With this patch we get marginally better results:

  aa.NumMayAlias | 5010069 | 5009861
  aa.NumMustAlias | 347518 | 347674
  aa.NumNoAlias | 27201336 | 27201528
  ...
  licm.NumPromoted | 1293 | 1296

I've renamed the relevant strip method to stripPointerCastsForAliasAnalysis(),
as we're past the point where we can explicitly spell out everything
that's getting stripped.

Differential Revision: https://reviews.llvm.org/D96668
2021-02-18 23:07:50 +01:00
David Green 33ba220611 [ARM] Ensure types provided to getIntrinsicCost are valid
It appears that pointer types were causing issues for the min/max cost
code in getIntrinsicInstrCost. This makes sure that when matching
icmp/select to a min/max, we only do that for normal int or float types.
2021-02-18 14:00:23 +00:00
David Green 1a6744e3dc [ARM] Add larger than legal ICmp costs
A v8i32 compare will produce a v8i1 predicate, but during codegen the
v8i32 will be split into two v4i32, potentially requiring two v4i1
predicates to be merged into a single v8i1. Because this merging of two
v4i1's into a v8i1 is very expensive, we need to make the cost of the
compare equally high.

This patch adds the cost of that to ARMTTIImpl::getCmpSelInstrCost.
Because we don't know whether the user of the predicate can be split,
and the cost model is mostly pre-instruction, we may be pessimistic but
that should only be for larger and legal types. This also adds min/max
detection to the costmodel where it can be detected, to keep those in
line with the cost of simple min/max instructions. Otherwise for the
most part, costs that were already expensive have become more expensive.

Differential Revision: https://reviews.llvm.org/D96692
2021-02-18 11:42:17 +00:00
David Green 1fbb3287fc [ARM] MVE ICmp costing tests. NFC 2021-02-18 10:50:34 +00:00
Stanislav Mekhanoshin a8d9d50762 [AMDGPU] gfx90a support
Differential Revision: https://reviews.llvm.org/D96906
2021-02-17 16:01:32 -08:00
David Green 6d835c5fcd [ARM] Add MVE abs costs
Similar to min/max, this increases the accuracy of abs intrinsics costs
under MVE.
2021-02-17 14:21:09 +00:00
David Green 415deff10b [ARM] MVE abs intrinsic costs. NFC 2021-02-17 13:54:17 +00:00
Sameer Sahasrabuddhe 11bf7da64a [NewPM] Introduce (GPU)DivergenceAnalysis in the new pass manager
The GPUDivergenceAnalysis is now renamed to just "DivergenceAnalysis"
since there is no conflict with LegacyDivergenceAnalysis. In the
legacy PM, this analysis can only be used through the legacy DA
serving as a wrapper. It is now made available as a pass in the new
PM, and has no relation with the legacy DA.

The new DA currently cannot handle irreducible control flow; its
presence can cause the analysis to run indefinitely. The analysis is
now modified to detect this and report all instructions in the
function as divergent. This is super conservative, but allows the
analysis to be used without hanging the compiler.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D96615
2021-02-16 10:26:45 +05:30
David Green 0a98efb049 [ARM] Add some basic Min/Max costs
This adds basic MVE costs for SMIN/SMAX/UMIN/UMAX, as well as MINNUM and
MAXNUM representing fmin and fmax. It tightens up the costs, not using a
ICmp+Select cost.

Differential Revision: https://reviews.llvm.org/D96603
2021-02-15 15:06:19 +00:00
Caroline Concatto b52e6c5891 [CostModel]Add cost model for experimental.vector.reverse
This patch uses the function getShuffleCost with SK_Reverse to compute the cost
for experimental.vector.reverse.
For scalable vector type, it adds a table will the legal types on
AArch64TTIImpl::getShuffleCost to not assert in BasicTTIImpl::getShuffleCost,
and for fixed vector, it relies on the existing cost model in BasicTTIImpl.

Depends on D94883

Differential Revision: https://reviews.llvm.org/D95603
2021-02-15 14:23:57 +00:00
Nikita Popov f197cf2126 [BasicAA] Merge aliasGEP code paths
At this point, we can treat the case of GEP/GEP aliasing and
GEP/non-GEP aliasing in essentially the same way. The only
differences are that we need to do an additional negative GEP base
check, and that we perform a bailout on unknown sizes for the
GEP/non-GEP case (the latter exists only to limit compile-time).

This change is not quite NFC due to the peculiar effect that
the DecomposedGEP for V2 can actually be non-trivial even if V2
is not a GEP. The reason for this is that getUnderlyingObject()
can look through LCSSA phi nodes, while stripPointerCasts() doesn't.
This can lead to slightly better results if single-entry phi nodes
occur inside a loop, where looking through the phi node via aliasPhi()
would subject it to phi cycle equivalence restrictions. It would
probably make sense to adjust pointer cast stripping (for AA) to
handle this case, and ensure consistent results.
2021-02-14 19:35:36 +01:00
Nikita Popov da46a2a87b [BasicAA] Add test for single arg phi in loop (NFC) 2021-02-14 19:35:36 +01:00
David Green 6abe362ed7 [ARM] Fix duplicate fdiv tests, changing them to frem. NFC 2021-02-13 15:16:11 +00:00
David Green 7c2e061188 [ARM] Extra vector shuffle tests of various kinds. NFC 2021-02-13 15:03:10 +00:00
Tyker 642e9225c6 reland [InstCombine] convert assumes to operand bundles
Instcombine will convert the nonnull and alignment assumption that use the boolean condtion
to an assumption that uses the operand bundles when knowledge retention is enabled.

Differential Revision: https://reviews.llvm.org/D82703
2021-02-13 13:03:11 +01:00
David Green b7c3de8d5a [ARM] MVE min/max cost tests. NFC 2021-02-13 11:12:12 +00:00
Sander de Smalen 1d42ba254f [BasicTTIImpl] Fix getCastInstrCost for scalable vectors by querying for ElementCount.
This fixes an overly restrictive assumption that the vector is a FixedVectorType,
in code that tries to calculate the cost of a cast operation when splitting
a too-wide vector. The algorithm works the same for scalable vectors, so this
patch removes the cast<FixedVectorType>.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D96253
2021-02-12 08:28:52 +00:00
Sander de Smalen 63d787e5d4 [CostModel] An extending load to illegal type is not free.
COST(zext (<4 x i32> load(...) to <4 x i64>)) != 0 when
<4 x i64> is an illegal result type that requires splitting
of the operation.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D96250
2021-02-12 07:59:21 +00:00
Philip Reames 8ef4b961a3 [knownbits] Preserve known bits for small shift recurrences
The motivation for this is that I'm looking at an example that uses shifts as induction variables. There's lots of other omissions, but one of the first I noticed is that we can't compute tight known bits. (This indirectly causes SCEV's range analysis to produce very poor results as well.)

Differential Revision: https://reviews.llvm.org/D96440
2021-02-11 17:56:36 -08:00
Philip Reames 72fc5b1b8e [tests] Autogen update test to remove whitespace diffs 2021-02-11 17:06:49 -08:00
Philip Reames b911a71427 [tests] precommit a tests for D96534 (and other range quality items) 2021-02-11 17:02:59 -08:00
Philip Reames 6538cef317 [tests] Autogen a few tests for ease of update 2021-02-11 16:54:06 -08:00
Philip Reames 81c51891ad [tests] Precommit tests for D96440 2021-02-11 10:48:07 -08:00
Michael Kruse 606aa622b2 Revert "[AssumptionCache] Avoid dangling llvm.assume calls in the cache"
This reverts commit b7d870eae7 and the
subsequent fix "[Polly] Fix build after AssumptionCache change (D96168)"
(commit e6810cab09).

It caused indeterminism in the output, such that e.g. the
polly-x86_64-linux buildbot failed accasionally.
2021-02-11 12:17:38 -06:00
David Green b1ef919aad [ARM] Add CostKind to getMVEVectorCostFactor.
This adds the CostKind to getMVEVectorCostFactor, so that it can
automatically account for CodeSize costs, where it returns a cost of 1
not the MVEFactor used for Throughput/Latency. This helps simplify the
caller code and allows us to get the codesize cost more correct in more
cases.
2021-02-11 15:33:59 +00:00
Tyker 5652e192fc Revert "[InstCombine] convert assumes to operand bundles"
This reverts commit 5eb2e994f9.
2021-02-10 01:32:00 +01:00
Tyker 5eb2e994f9 [InstCombine] convert assumes to operand bundles
Instcombine will convert the nonnull and alignment assumption that use the boolean condtion
to an assumption that uses the operand bundles when knowledge retention is enabled.

Differential Revision: https://reviews.llvm.org/D82703
2021-02-09 19:33:53 +01:00
Andrew Litteken 56615a2654 [IROutliner] Adding instruction strings to IRSimilarityPrinting diagnostics.
When doing some recent debugging of the IROutliner, and using the similarity pass for debugging, just having the basic block and function isn't really enough to get all the information. This adds the first and last instruction to the output of the IRSimilarityPrinting pass to give better information to a user.

Reviewer: paquette

Differential Revision: https://reviews.llvm.org/D94304
2021-02-09 12:11:47 -06:00
Johannes Doerfert b7d870eae7 [AssumptionCache] Avoid dangling llvm.assume calls in the cache
PR49043 exposed a problem when it comes to RAUW llvm.assumes. While
D96106 would fix it for GVNSink, it seems a more general concern. To
avoid future problems this patch moves away from the vector of weak
reference model used in the assumption cache. Instead, we track the
llvm.assume calls with a callback handle which will remove itself from
the cache if the call is deleted.

Fixes PR49043.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D96168
2021-02-06 12:18:39 -06:00
Nikita Popov be9889b350 [MemorySSA] Don't treat lifetime.end as NoAlias
MemorySSA currently treats lifetime.end intrinsics as not aliasing
anything. This breaks MemorySSA-based MemCpyOpt, because we'll happily
move a read of a pointer below a lifetime.end intrinsic, as no clobber
is reported.

I think the MemorySSA modelling here isn't correct: lifetime.end(p)
has approximately the same effect as doing a memcpy(p, undef), and
should be treated as a clobber.

This patch removes the special handling of lifetime.end, leaving
alias analysis to handle it appropriately.

Differential Revision: https://reviews.llvm.org/D95763
2021-02-04 20:58:28 +01:00
Caroline Concatto 2cbcf3e297 [AArch64][SVE]Add cost model for broadcast shuffle
This patch adds a cost model for  SK_Broadcast in
AArch64TTIImpl::getShuffleCost with scalable vector.
Without this patch, the scalable vector type relies on  BasicTTIImpl cost
implementation and assert.

Differential Revision: https://reviews.llvm.org/D95598
2021-02-03 09:53:22 +00:00
Gil Rapaport d475030dc2 [SCEV] Apply loop guards to divisibility tests
Extend applyLoopGuards() to take into account conditions/assumes proving some
value %v to be divisible by D by rewriting %v to (%v / D) * D. This lets the
loop unroller and the loop vectorizer identify more loops as not requiring
remainder loops.

Differential Revision: https://reviews.llvm.org/D95521
2021-02-02 08:09:39 +02:00
Florian Hahn f1e8136115
[SCEV] Bail out if URem operand cannot be zero-extended.
In some cases, LHS is larger than the target expression type. Bail out
in that case for now, to avoid crashing
2021-02-01 13:50:54 +00:00
Mindong Chen 00fcc03687 [SCEV] Fix incorrect loop exit count analysis.
In computeLoadConstantCompareExitLimit, the addrec used to compute the
exit count should be from the loop which the exiting block belongs to.

Reviewed by: mkazantsev

Differential Revision: https://reviews.llvm.org/D92367
2021-01-27 19:36:05 +08:00