[LoopVersioning] Annotate versioned loop with noalias metadata
Summary:
If we decide to version a loop to benefit a transformation, it makes
sense to record the now non-aliasing accesses in the newly versioned
loop. This allows non-aliasing information to be used by subsequent
passes.
One example is 456.hmmer in SPECint2006 where after loop distribution,
we vectorize one of the newly distributed loops. To vectorize we
version this loop to fully disambiguate may-aliasing accesses. If we
add the noalias markers, we can use the same information in a later DSE
pass to eliminate some dead stores which amounts to ~25% of the
instructions of this hot memory-pipeline-bound loop. The overall
performance improves by 18% on our ARM64.
The scoped noalias annotation is added in LoopVersioning. The patch
then enables this for loop distribution. A follow-on patch will enable
it for the vectorizer. Eventually this should be run by default when
versioning the loop but first I'd like to get some feedback whether my
understanding and application of scoped noalias metadata is correct.
Essentially my approach was to have a separate alias domain for each
versioning of the loop. For example, if we first version in loop
distribution and then in vectorization of the distributed loops, we have
a different set of memchecks for each versioning. By keeping the scopes
in different domains they can conveniently be defined independently
since different alias domains don't affect each other.
As written, I also have a separate domain for each loop. This is not
necessary and we could save some metadata here by using the same domain
across the different loops. I don't think it's a big deal either way.
Probably the best is to review the tests first to see if I mapped this
problem correctly to scoped noalias markers. I have plenty of comments
in the tests.
Note that the interface is prepared for the vectorizer which needs the
annotateInstWithNoAlias API. The vectorizer does not use LoopVersioning
so we need a way to pass in the versioned instructions. This is also
why the maps have to become part of the object state.
Also currently, we only have an AA-aware DSE after the vectorizer if we
also run the LTO pipeline. Depending how widely this triggers we may
want to schedule a DSE toward the end of the regular pass pipeline.
Reviewers: hfinkel, nadav, ashutosh.nema
Subscribers: mssimpso, aemerson, llvm-commits, mcrosier
Differential Revision: http://reviews.llvm.org/D16712
llvm-svn: 263743
2016-03-18 04:32:32 +08:00
|
|
|
; RUN: opt -basicaa -loop-versioning -S < %s | FileCheck %s
|
|
|
|
|
|
|
|
; A very simple case. After versioning the %loadA and %loadB can't alias with
|
|
|
|
; the store.
|
|
|
|
;
|
|
|
|
; To see it easier what's going on, I expanded every noalias/scope metadata
|
|
|
|
; reference below in a comment. For a scope I use the format scope(domain),
|
|
|
|
; e.g. scope 17 in domain 15 is written as 17(15).
|
|
|
|
|
2016-03-29 05:04:13 +08:00
|
|
|
; CHECK-LABEL: @f(
|
[LoopVersioning] Annotate versioned loop with noalias metadata
Summary:
If we decide to version a loop to benefit a transformation, it makes
sense to record the now non-aliasing accesses in the newly versioned
loop. This allows non-aliasing information to be used by subsequent
passes.
One example is 456.hmmer in SPECint2006 where after loop distribution,
we vectorize one of the newly distributed loops. To vectorize we
version this loop to fully disambiguate may-aliasing accesses. If we
add the noalias markers, we can use the same information in a later DSE
pass to eliminate some dead stores which amounts to ~25% of the
instructions of this hot memory-pipeline-bound loop. The overall
performance improves by 18% on our ARM64.
The scoped noalias annotation is added in LoopVersioning. The patch
then enables this for loop distribution. A follow-on patch will enable
it for the vectorizer. Eventually this should be run by default when
versioning the loop but first I'd like to get some feedback whether my
understanding and application of scoped noalias metadata is correct.
Essentially my approach was to have a separate alias domain for each
versioning of the loop. For example, if we first version in loop
distribution and then in vectorization of the distributed loops, we have
a different set of memchecks for each versioning. By keeping the scopes
in different domains they can conveniently be defined independently
since different alias domains don't affect each other.
As written, I also have a separate domain for each loop. This is not
necessary and we could save some metadata here by using the same domain
across the different loops. I don't think it's a big deal either way.
Probably the best is to review the tests first to see if I mapped this
problem correctly to scoped noalias markers. I have plenty of comments
in the tests.
Note that the interface is prepared for the vectorizer which needs the
annotateInstWithNoAlias API. The vectorizer does not use LoopVersioning
so we need a way to pass in the versioned instructions. This is also
why the maps have to become part of the object state.
Also currently, we only have an AA-aware DSE after the vectorizer if we
also run the LTO pipeline. Depending how widely this triggers we may
want to schedule a DSE toward the end of the regular pass pipeline.
Reviewers: hfinkel, nadav, ashutosh.nema
Subscribers: mssimpso, aemerson, llvm-commits, mcrosier
Differential Revision: http://reviews.llvm.org/D16712
llvm-svn: 263743
2016-03-18 04:32:32 +08:00
|
|
|
|
|
|
|
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
|
|
|
|
|
|
|
|
define void @f(i32* %a, i32* %b, i32* %c) {
|
|
|
|
entry:
|
|
|
|
br label %for.body
|
|
|
|
|
|
|
|
; CHECK: for.body.lver.orig:
|
|
|
|
; CHECK: for.body:
|
|
|
|
for.body: ; preds = %for.body, %entry
|
|
|
|
%ind = phi i64 [ 0, %entry ], [ %add, %for.body ]
|
|
|
|
|
|
|
|
%arrayidxA = getelementptr inbounds i32, i32* %a, i64 %ind
|
|
|
|
; CHECK: %loadA = {{.*}} !alias.scope !0
|
|
|
|
; A's scope: !0 -> { 1(2) }
|
|
|
|
%loadA = load i32, i32* %arrayidxA, align 4
|
|
|
|
|
|
|
|
%arrayidxB = getelementptr inbounds i32, i32* %b, i64 %ind
|
|
|
|
; CHECK: %loadB = {{.*}} !alias.scope !3
|
|
|
|
; B's scope: !3 -> { 4(2) }
|
|
|
|
%loadB = load i32, i32* %arrayidxB, align 4
|
|
|
|
|
|
|
|
%mulC = mul i32 %loadA, %loadB
|
|
|
|
|
|
|
|
%arrayidxC = getelementptr inbounds i32, i32* %c, i64 %ind
|
|
|
|
; CHECK: store {{.*}} !alias.scope !5, !noalias !7
|
|
|
|
; C noalias A and B: !7 -> { 1(2), 4(2) }
|
|
|
|
store i32 %mulC, i32* %arrayidxC, align 4
|
|
|
|
|
|
|
|
%add = add nuw nsw i64 %ind, 1
|
|
|
|
%exitcond = icmp eq i64 %add, 20
|
|
|
|
br i1 %exitcond, label %for.end, label %for.body
|
|
|
|
|
|
|
|
for.end: ; preds = %for.body
|
|
|
|
ret void
|
|
|
|
}
|
|
|
|
; CHECK: !0 = !{!1}
|
|
|
|
; CHECK: !1 = distinct !{!1, !2}
|
|
|
|
; CHECK: !2 = distinct !{!2, !"LVerDomain"}
|
|
|
|
; CHECK: !3 = !{!4}
|
|
|
|
; CHECK: !4 = distinct !{!4, !2}
|
|
|
|
; CHECK: !5 = !{!6}
|
|
|
|
; CHECK: !6 = distinct !{!6, !2}
|
|
|
|
; CHECK: !7 = !{!1, !4}
|