Summary:
If two arguments are both readonly, then they have no memory dependency
that would violate noalias, even if they do actually overlap.
Reviewers: hfinkel, efriedma
Reviewed By: efriedma
Subscribers: efriedma, hiraditya, llvm-commits, tstellar
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60239
llvm-svn: 359047
This does two main things, firstly adding some at least basic addressing modes
for i64 types, and secondly treats floats and doubles sensibly when there is no
fpu. The floating point change can help codesize in some cases, especially with
D60294.
Most backends seems to not consider the exact VT in isLegalAddressingMode,
instead switching on type size. That is now what this does when the target does
not have an fpu (as the float data will be loaded using LDR's). i64's currently
use the address range of an LDRD (even though they may be legalised and loaded
with an LDR). This is at least better than marking them all as illegal
addressing modes.
I have not attempted to do much with vectors yet. That will need changing once
MVE is added.
Differential Revision: https://reviews.llvm.org/D60677
llvm-svn: 358845
Summary:
The immediate post dominator of the loop header may be part of the divergent loop.
Since this /was/ the divergence propagation bound the SDA would not detect joins of divergent paths outside the loop.
Reviewers: nhaehnle
Reviewed By: nhaehnle
Subscribers: mmasten, arsenm, jvesely, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59042
llvm-svn: 358681
On pre-AVX512 targets we can use MOVMSK to extract reduced boolean results. This is properly optimized, annoyingly AVX512 isn't and produces code that is almost as bad as the (unchanged) costs suggest......
Differential Revision: https://reviews.llvm.org/D60403
llvm-svn: 358574
Summary:
When inserting a new Def, MemorySSA may be have non-minimal number of Phis.
While inserting, the walk to find the previous definition may cleanup minimal Phis.
When the last definition is trivial to obtain, we do not cache it.
It is possible while getting the previous definition for a Def to get two different answers:
- one that was straight-forward to find when walking the first path (a trivial phi in this case), and
- another that follows a cleanup of the trivial phi, it determines it may need additional Phi nodes, it inserts them and returns a new phi in the same position as the former trivial one.
While the Phis added for the second path are all redundant, they are not complete (the walk is only done upwards), and they are not properly cleaned up afterwards.
A way to fix this problem is to cache the straight-forward answer we got on the first walk.
The caching is only kept for the duration of a getPreviousDef call, and for Phis we use TrackingVH, so removing the trivial phi will lead to replacing it with the next dominating phi in the cache.
Resolves PR40749.
Reviewers: george.burgess.iv
Subscribers: jlebar, Prazek, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60634
llvm-svn: 358313
Summary:
After introducing the limit for clobber walking, `walkToPhiOrClobber` would assert that the limit is at least 1 on entry.
The test included triggered that assert.
The callsite in `tryOptimizePhi` making the calls to `walkToPhiOrClobber` is structured like this:
```
while (true) {
if (getBlockingAccess()) { // calls walkToPhiOrClobber
}
for (...) {
walkToPhiOrClobber();
}
}
```
The cleanest fix is to check if the limit was reached inside `walkToPhiOrClobber`, and give an allowence of 1.
This approach not make any alias() calls (no calls to instructionClobbersQuery), so the performance condition is enforced.
The limit is set back to 0 if not used, as this provides info on the fact that we stopped before reaching a true clobber.
Reviewers: george.burgess.iv
Subscribers: jlebar, Prazek, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60479
llvm-svn: 358303
Current LCG doesn't check aliased functions. So if an internal function has a public alias it will not be added to CG SCC, but it is still reachable from outside through the alias.
So this patch adds aliased functions to SCC.
Differential Revision: https://reviews.llvm.org/D59898
llvm-svn: 357795
Summary:
This lets us avoid e.g. checking if A >=s B in getSMaxExpr(A, B) if we've
already established that (A smax B) is the best we can do.
Fixes PR41225.
Reviewers: asbirlea
Subscribers: mcrosier, jlebar, bixia, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60010
llvm-svn: 357320
Just as as llvm IR supports explicitly specifying numeric value ids
for instructions, and emits them by default in textual output, now do
the same for blocks.
This is a slightly incompatible change in the textual IR format.
Previously, llvm would parse numeric labels as string names. E.g.
define void @f() {
br label %"55"
55:
ret void
}
defined a label *named* "55", even without needing to be quoted, while
the reference required quoting. Now, if you intend a block label which
looks like a value number to be a name, you must quote it in the
definition too (e.g. `"55":`).
Previously, llvm would print nameless blocks only as a comment, and
would omit it if there was no predecessor. This could cause confusion
for readers of the IR, just as unnamed instructions did prior to the
addition of "%5 = " syntax, back in 2008 (PR2480).
Now, it will always print a label for an unnamed block, with the
exception of the entry block. (IMO it may be better to print it for
the entry-block as well. However, that requires updating many more
tests.)
Thus, the following is supported, and is the canonical printing:
define i32 @f(i32, i32) {
%3 = add i32 %0, %1
br label %4
4:
ret i32 %3
}
New test cases covering this behavior are added, and other tests
updated as required.
Differential Revision: https://reviews.llvm.org/D58548
llvm-svn: 356789
This adds new function getMemcpyCost to TTI so that the cost of a memcpy can be
modeled and queried. The default implementation returns Expensive, but targets
can override this function to model the cost more accurately.
Differential Revision: https://reviews.llvm.org/D59252
llvm-svn: 356555
There are a few different issues, mostly stemming from using
generation based checks for anything instead of subtarget
features. Stop adding flat-address-space as a feature for HSA, as it
should only be a device property. This was incorrectly allowing flat
instructions to select for SI.
Increase the default generation for HSA to avoid the encoding error
when emitting objects. This has some other side effects from various
checks which probably should be separate subtarget features (in the
cost model and for dealing with the DS offset folding issue).
Partial fix for bug 41070. It should probably be an error to try using
amdhsa without flat support.
llvm-svn: 356347
AMDGPU would like to have MVTs for v3i32, v3f32, v5i32, v5f32. This
commit does not add them, but makes preparatory changes:
* Fixed assumptions of power-of-2 vector type in kernel arg handling,
and added v5 kernel arg tests and v3/v5 shader arg tests.
* Added v5 tests for cost analysis.
* Added vec3/vec5 arg test cases.
Some of this patch is from Matt Arsenault, also of AMD.
Differential Revision: https://reviews.llvm.org/D58928
Change-Id: I7279d6b4841464d2080eb255ef3c589e268eabcd
llvm-svn: 356342
Summary:
This fixes an extremely long compile time caused by recursive analysis
of truncs, which were not previously subject to any depth limits unlike
some of the other ops. I decided to use the same control used for
sext/zext, since the routines analyzing these are sometimes mutually
recursive with the trunc analysis.
Reviewers: mkazantsev, sanjoy
Subscribers: sanjoy, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58994
llvm-svn: 355949
Summary:
This adds support for 64 bit buffer atomic arithmetic instructions but does not include
cmpswap as that depends on a fix to the way the register pairs are handled
Change-Id: Ib207ea65fb69487ccad5066ea647ae8ddfe2ce61
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58918
llvm-svn: 355520
In some cases, MaxBECount can be less precise than ExactBECount for AND
and OR (the AND case was PR26207). In the OR test case, both ExactBECounts are
undef, but MaxBECount are different, so we hit the assertion below. This
patch uses the same solution the AND case already uses.
Assertion failed:
((isa<SCEVCouldNotCompute>(ExactNotTaken) || !isa<SCEVCouldNotCompute>(MaxNotTaken))
&& "Exact is not allowed to be less precise than Max"), function ExitLimit
This patch also consolidates test cases for both AND and OR in a single
test case.
Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13245
Reviewers: sanjoy, efriedma, mkazantsev
Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D58853
llvm-svn: 355259
Summary:
The original assumption for the insertDef method was that it would not
materialize Defs out of no-where, hence it will not insert phis needed
after inserting a Def.
However, when cloning an instruction (use case used in LICM), we do
materialize Defs "out of no-where". If the block receiving a Def has at
least one other Def, then no processing is needed. If the block just
received its first Def, we must check where Phi placement is needed.
The only new usage of insertDef is in LICM, hence the trigger for the bug.
But the original goal of the method also fails to apply for the move()
method. If we move a Def from the entry point of a diamond to either the
left or right blocks, then the merge block must add a phi.
While this usecase does not currently occur, or may be viewed as an
incorrect transformation, MSSA must behave corectly given the scenario.
Resolves PR40749 and PR40754.
Reviewers: george.burgess.iv
Subscribers: sanjoy, jlebar, Prazek, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58652
llvm-svn: 355040
Based on an IR equivalent of target lowering's generic expansion - target specific costs will typically be lower (IR doesn't have a good mull/mulh equivalent) but we need a baseline.
Differential Revision: https://reviews.llvm.org/D57925
llvm-svn: 354774
The correct edge being deleted is not to the unswitched exit block, but to the
original block before it was split. That's the key in the map, not the
value.
The insert is correct. The new edge is to the .split block.
The splitting turns OriginalBB into:
OriginalBB -> OriginalBB.split.
Assuming the orignal CFG edge: ParentBB->OriginalBB, we must now delete
ParentBB->OriginalBB, not ParentBB->OriginalBB.split.
llvm-svn: 354656
Constant hoisting may have hidden a constant behind a bitcast so that
it isn't folded into its users. However, this prevents BPI from
calculating some of its heuristics that are based upon constant
values. So, I've added a simple helper function to look through these
casts.
Differential Revision: https://reviews.llvm.org/D58166
llvm-svn: 354119
Summary:
This verification may fail after certain transformations due to
BasicAA's fragility. Added a small explanation and a testcase that
triggers the assert in checkClobberSanity (before its removal).
Addresses PR40509.
Reviewers: george.burgess.iv
Subscribers: sanjoy, jlebar, llvm-commits, Prazek
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57973
llvm-svn: 353739
Currently, SCEV creates SCEVUnknown for every node of unreachable code. If we
have a huge amounts of such code, we will be littering SE with these nodes. We could
just state that they all are undef and save some memory.
Differential Revision: https://reviews.llvm.org/D57567
Reviewed By: sanjoy
llvm-svn: 353017
Summary:
The analysis result of DA caches pointers to AA, SCEV, and LI, but it
never checks for their invalidation. Fix that.
Reviewers: chandlerc, dmgreen, bogner
Reviewed By: dmgreen
Subscribers: hiraditya, bollu, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D56381
llvm-svn: 352986
Currently SCEV attempts to limit transformations so that they do not work with
big SCEVs (that may take almost infinite compile time). But for this, it uses heuristics
such as recursion depth and number of operands, which do not give us a guarantee
that we don't actually have big SCEVs. This situation is still possible, though it is not
likely to happen. However, the bug PR33494 showed a bunch of simple corner case
tests where we still produce huge SCEVs, even not reaching big recursion depth etc.
This patch introduces a concept of 'huge' SCEVs. A SCEV is huge if its expression
size (intoduced in D35989) exceeds some threshold value. We prohibit optimizing
transformations if any of SCEVs we are dealing with is huge. This gives us a reliable
check that we don't spend too much time working with them.
As the next step, we can possibly get rid of old limiting mechanisms, such as recursion
depth thresholds.
Differential Revision: https://reviews.llvm.org/D35990
Reviewed By: reames
llvm-svn: 352728
The code of AddRec simplification is using wrong loop when it creates a new
AddRecExpr. It should be using AddRecLoop which we have saved and against which
all gate checks are made, and not calling AddRec->getLoop() over and over
again because AddRec may change and become an AddRecurrency from outer loop
during the transform iterations.
Considering this change trivial, commiting for postcommit review.
llvm-svn: 352451
Followup to D56636, this time handling the UADDSAT case by expanding
uadd.sat(a, b) to umin(a, ~b) + b.
Differential Revision: https://reviews.llvm.org/D56869
llvm-svn: 352409
Add generic costs calculation for SADDSAT/SSUBSAT intrinsics, this uses generic costs for sadd_with_overflow/ssub_with_overflow, an extra sign comparison + a selects based on the sign/overflow.
This completes PR40316
Differential Revision: https://reviews.llvm.org/D57239
llvm-svn: 352315
For the power9 CPU, vector operations consume a pair of execution units rather
than one execution unit like a scalar operation. Update the target transform
cost functions to reflect the higher cost of vector operations when targeting
Power9.
Patch by RolandF.
Differential revision: https://reviews.llvm.org/D55461
llvm-svn: 352261
Add generic costs calculation for UADDSAT/USUBSAT intrinsics, this fallbacks to using generic costs for uadd_with_overflow/usub_with_overflow + a select.
Differential Revision: https://reviews.llvm.org/D56907
llvm-svn: 352044
This patch replaces the existing LLVMVectorSameWidth matcher with LLVMScalarOrSameVectorWidth.
The matching args must be either scalars or vectors with the same number of elements, but in either case the scalar/element type can differ, specified by LLVMScalarOrSameVectorWidth.
I've updated the _overflow intrinsics to demonstrate this - allowing it to return a i1 or <N x i1> overflow result, matching the scalar/vectorwidth of the other (add/sub/mul) result type.
The masked load/store/gather/scatter intrinsics have also been updated to use this, although as we specify the reference type to be llvm_anyvector_ty we guarantee the mask will be <N x i1> so no change in behaviour
Differential Revision: https://reviews.llvm.org/D57090
llvm-svn: 351957
First step towards PR40376, this patch adds support for getCmpSelInstrCost to use the (optional) Instruction CmpInst predicate to indicate the type of integer comparison we're performing and alter the costs accordingly.
Differential Revision: https://reviews.llvm.org/D57013
llvm-svn: 351810
Prior to SSE41 (and sometimes on AVX1), vector select has to be performed as a ((X & C)|(Y & ~C)) bit select.
Exposes a couple of issues with the min/max reduction costs (which only go down to SSE42 for some reason).
The increase pre-SSE41 selection costs also prevent a couple of tests from firing any longer, so I've either tweaked the target or added AVX tests as well to the existing SSE2 tests.
llvm-svn: 351685