There are a few bugs in the walker that this patch addresses.
Primarily:
- Caching can break when we have multiple BBs without phis
- We weren't optimizing some phis properly
- Because of how the DFS iterator works, there were times where we
wouldn't cache any results of our DFS
I left the test cases with FIXMEs in, because I'm not sure how much
effort it will take to get those to work (read: We'll probably
ultimately have to end up redoing the walker, or we'll have to come up
with some creative caching tricks), and more test coverage = better.
Differential Revision: http://reviews.llvm.org/D18065
llvm-svn: 264180
When you have multiple LCSSA (single-operand) PHIs that are converted
into two-operand PHIs due to versioning, only assert that the PHI
currently being converted has a single operand. I.e. we don't want to
check PHIs that were converted earlier in the loop.
Fixes PR27023.
Thanks to Karl-Johan Karlsson for the minimized testcase!
llvm-svn: 264081
It's a bug fix.
For rerolled loops SE trip count remains unchanged. It leads to incorrect work of the next passes.
My patch just resets SE info for rerolled loop forcing SE to re-evaluate it next time it requested.
I also added a verifier call in the exisitng test to be sure no invalid SE data remain. Without my fix this test would fail with -verify-scev.
Differential Revision: http://reviews.llvm.org/D18316
llvm-svn: 264051
Summary:
Without tree pruning clang has 2,667,552 points.
Wiht only dominators pruning: 1,515,586.
With both dominators & predominators pruning: 1,340,534.
Resubmit of r262103.
Differential Revision: http://reviews.llvm.org/D18341
llvm-svn: 264003
If we have a BB with only MemoryDefs, live-in calculations will ignore
it. This means we get results like this:
define void @foo(i8* %p) {
; 1 = MemoryDef(liveOnEntry)
store i8 0, i8* %p
br i1 undef, label %if.then, label %if.end
if.then:
; 2 = MemoryDef(1)
store i8 1, i8* %p
br label %if.end
if.end:
; 3 = MemoryDef(1)
store i8 2, i8* %p
ret void
}
...When there should be a MemoryPhi in the `if.end` BB.
This patch fixes that behavior.
llvm-svn: 263991
The sinpi/cospi can be replaced with sincospi to remove unnecessary
computations. However, we need to make sure that the calls are within
the same function!
This fixes PR26993.
llvm-svn: 263875
Summary:
ThinLTO is relying on linkInModule to import selected function.
However a lot of "magic" was hidden in linkInModule and the IRMover,
who would rename and promote global variables on the fly.
This is moving to an approach where the steps are decoupled and the
client is reponsible to specify the list of globals to import.
As a consequence some test are changed because they were relying on
the previous behavior which was importing the definition of *every*
single global without control on the client side.
Now the burden is on the client to decide if a global has to be imported
or not.
Reviewers: tejohnson
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D18122
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 263863
The loop on IVOperand's incoming values assumes IVOperand to be an
induction variable on the loop over which `S Pred X` is invariant;
otherwise loop invariant incoming values to IVOperand are not guaranteed
to dominate the comparision.
This fixes PR26973.
llvm-svn: 263827
Summary:
These dependencies would be used in the future to reduce the number
of instrumented blocks(http://reviews.llvm.org/rL262103)
This is submitted as a separate CL because of previous problems with
ARM.
Subscribers: aemerson
Differential Revision: http://reviews.llvm.org/D18227
llvm-svn: 263797
Summary:
It can hurt performance to prefetch ahead too much. Be conservative for
now and don't prefetch ahead more than 3 iterations on Cyclone.
Reviewers: hfinkel
Subscribers: llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D17949
llvm-svn: 263772
Summary:
And use this TTI for Cyclone. As it was explained in the original RFC
(http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758), the HW
prefetcher work up to 2KB strides.
I am also adding tests for this and the previous change (D17943):
* Cyclone prefetching accesses with a large stride
* Cyclone not prefetching accesses with a small stride
* Generic Aarch64 subtarget not prefetching either
Reviewers: hfinkel
Subscribers: aemerson, rengolin, llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D17945
llvm-svn: 263771
Summary:
Use the new LoopVersioning facility (D16712) to add noalias metadata in
the vector loop if we versioned with memchecks. This can enable some
optimization opportunities further down the pipeline (see the included
test or the benchmark improvement quoted in D16712).
The test also covers the bug I had in the initial version in D16712.
The vectorizer did not previously use LoopVersioning. The reason is
that the vectorizer performs its transformations in single shot. It
creates an empty single-block vector loop that it then populates with
the widened, if-converted instructions. Thus creating an intermediate
versioned scalar loop seems wasteful.
So this patch (rather than bringing in LoopVersioning fully) adds a
special interface to LoopVersioning to allow the vectorizer to add
no-alias annotation while still performing its own versioning.
As the vectorizer propagates metadata from the instructions in the
original loop to the vector instructions we also check the pointer in
the original instruction and see if LoopVersioning can add no-alias
metadata based on the issued memchecks.
Reviewers: hfinkel, nadav, mzolotukhin
Subscribers: mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D17191
llvm-svn: 263744
Summary:
If we decide to version a loop to benefit a transformation, it makes
sense to record the now non-aliasing accesses in the newly versioned
loop. This allows non-aliasing information to be used by subsequent
passes.
One example is 456.hmmer in SPECint2006 where after loop distribution,
we vectorize one of the newly distributed loops. To vectorize we
version this loop to fully disambiguate may-aliasing accesses. If we
add the noalias markers, we can use the same information in a later DSE
pass to eliminate some dead stores which amounts to ~25% of the
instructions of this hot memory-pipeline-bound loop. The overall
performance improves by 18% on our ARM64.
The scoped noalias annotation is added in LoopVersioning. The patch
then enables this for loop distribution. A follow-on patch will enable
it for the vectorizer. Eventually this should be run by default when
versioning the loop but first I'd like to get some feedback whether my
understanding and application of scoped noalias metadata is correct.
Essentially my approach was to have a separate alias domain for each
versioning of the loop. For example, if we first version in loop
distribution and then in vectorization of the distributed loops, we have
a different set of memchecks for each versioning. By keeping the scopes
in different domains they can conveniently be defined independently
since different alias domains don't affect each other.
As written, I also have a separate domain for each loop. This is not
necessary and we could save some metadata here by using the same domain
across the different loops. I don't think it's a big deal either way.
Probably the best is to review the tests first to see if I mapped this
problem correctly to scoped noalias markers. I have plenty of comments
in the tests.
Note that the interface is prepared for the vectorizer which needs the
annotateInstWithNoAlias API. The vectorizer does not use LoopVersioning
so we need a way to pass in the versioned instructions. This is also
why the maps have to become part of the object state.
Also currently, we only have an AA-aware DSE after the vectorizer if we
also run the LTO pipeline. Depending how widely this triggers we may
want to schedule a DSE toward the end of the regular pass pipeline.
Reviewers: hfinkel, nadav, ashutosh.nema
Subscribers: mssimpso, aemerson, llvm-commits, mcrosier
Differential Revision: http://reviews.llvm.org/D16712
llvm-svn: 263743
This is similar to D18133 where we allowed profile weights on select instructions.
This extends that change to also allow the 'unpredictable' attribute of branches to apply to selects.
A test to check that 'unpredictable' metadata is preserved when cloning instructions was checked in at:
http://reviews.llvm.org/rL263648
Differential Revision: http://reviews.llvm.org/D18220
llvm-svn: 263716
This splits out the logic that maps the `"statepoint-id"` attribute into
the actual statepoint ID, and the `"statepoint-num-patch-bytes"`
attribute into the number of patchable bytes the statpeoint is lowered
into. The new home of this logic is in IR/Statepoint.cpp, and this
refactoring will support similar functionality when lowering calls with
deopt operand bundles in the future.
llvm-svn: 263685
Summary:
Fix LSRInstance::HoistInsertPosition() to check the original insert
position block first for a canonical insertion point that is dominated
by all inputs. This leads to SCEV being able to reuse more instructions
since it currently tracks the instructions it creates for reuse by
keeping a table of <Value, insert point> pairs.
Reviewers: atrick
Subscribers: mcrosier, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D18001
llvm-svn: 263644
The latent bug that LLE exposed in the LoopVectorizer was resolved
(PR26952).
The pass can be disabled with -mllvm -enable-loop-load-elim=0
llvm-svn: 263595
There is something strange going on with debug info (.eh_frame_hdr)
disappearing when msan.module_ctor are placed in comdat sections.
Moving this functionality under flag, disabled by default.
llvm-svn: 263579
This was a latent bug that got exposed by the change to add LoopSimplify
as a dependence to LoopLoadElimination. Since LoopInfo was corrupted
after LV, LoopSimplify mis-compiled nbench in the test-suite (more
details in the PR).
The problem was that when we create the blocks for predicated stores we
didn't add those to any loops.
The original testcase for store predication provides coverage for this
assuming we verify LI on the way out of LV.
Fixes PR26952.
llvm-svn: 263565
Since the static getGlobalIdentifier and getGUID methods are now called
for global values other than functions, reflect that by moving these
methods to the GlobalValue class.
llvm-svn: 263524
(Resubmitting after fixing missing file issue)
With the changes in r263275, there are now more than just functions in
the summary. Completed the renaming of data structures (started in
r263275) to reflect the wider scope. In particular, changed the
FunctionIndex* data structures to ModuleIndex*, and renamed related
variables and comments. Also renamed the files to reflect the changes.
A companion clang patch will immediately succeed this patch to reflect
this renaming.
llvm-svn: 263513
Summary:
Specifically, when we perform runtime loop unrolling of a loop that
contains a convergent op, we can only unroll k times, where k divides
the loop trip multiple.
Without this change, we'll happily unroll e.g. the following loop
for (int i = 0; i < N; ++i) {
if (i == 0) convergent_op();
foo();
}
into
int i = 0;
if (N % 2 == 1) {
convergent_op();
foo();
++i;
}
for (; i < N - 1; i += 2) {
if (i == 0) convergent_op();
foo();
foo();
}.
This is unsafe, because we've just added a control-flow dependency to
the convergent op in the prelude.
In general, runtime unrolling loops that contain convergent ops is safe
only if we don't have emit a prelude, which occurs when the unroll count
divides the trip multiple.
Reviewers: resistor
Subscribers: llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D17526
llvm-svn: 263509
Summary: This now try to reorder instructions in order to help create the optimizable pattern.
Reviewers: craig.topper, spatel, dexonsmith, Prazek, chandlerc, joker.eph, majnemer
Differential Revision: http://reviews.llvm.org/D16523
llvm-svn: 263503
With the changes in r263275, there are now more than just functions in
the summary. Completed the renaming of data structures (started in
r263275) to reflect the wider scope. In particular, changed the
FunctionIndex* data structures to ModuleIndex*, and renamed related
variables and comments. Also renamed the files to reflect the changes.
A companion clang patch will immediately succeed this patch to reflect
this renaming.
llvm-svn: 263490
As noted in:
https://llvm.org/bugs/show_bug.cgi?id=26636
This doesn't accomplish anything on its own. It's the first step towards preserving
and using branch weights with selects.
The next step would be to make sure we're propagating the info in all of the other
places where we create selects (SimplifyCFG, InstCombine, etc). I don't think there's
an easy fix to make this happen; we have to look at each transform individually to
determine how to correctly propagate the weights.
Along with that step, we need to then use the weights when making subsequent transform
decisions such as discussed in http://reviews.llvm.org/D16836.
The inliner test is independent but closely related. It verifies that metadata is
preserved when both branches and selects are cloned.
Differential Revision: http://reviews.llvm.org/D18133
llvm-svn: 263482
Summary:
Previously we had a notion of convergent functions but not of convergent
calls. This is insufficient to correctly analyze calls where the target
is unknown, e.g. indirect calls.
Now a call is convergent if it targets a known-convergent function, or
if it's explicitly marked as convergent. As usual, we can remove
convergent where we can prove that no convergent operations are
performed in the call.
Originally landed as r261544, then reverted in r261544 for (incidental)
build breakage. Re-landed here with no changes.
Reviewers: chandlerc, jingyue
Subscribers: llvm-commits, tra, jhen, hfinkel
Differential Revision: http://reviews.llvm.org/D17739
llvm-svn: 263481