2020-01-28 10:00:43 +08:00
|
|
|
; REQUIRES: aarch64-registered-target
|
2020-01-28 08:22:04 +08:00
|
|
|
|
[Matrix] Mark expressions shared between multiple remarks.
This patch adds support for explicitly highlighting sub-expressions
shared by multiple leaf nodes. For example consider the following
code
%shared.load = tail call <8 x double> @llvm.matrix.columnwise.load.v8f64.p0f64(double* %arg1, i32 %stride, i32 2, i32 4), !dbg !10, !noalias !10
%trans = tail call <8 x double> @llvm.matrix.transpose.v8f64(<8 x double> %shared.load, i32 2, i32 4), !dbg !10
tail call void @llvm.matrix.columnwise.store.v8f64.p0f64(<8 x double> %trans, double* %arg3, i32 10, i32 4, i32 2), !dbg !10
%load.2 = tail call <30 x double> @llvm.matrix.columnwise.load.v30f64.p0f64(double* %arg3, i32 %stride, i32 2, i32 15), !dbg !10, !noalias !10
%mult = tail call <60 x double> @llvm.matrix.multiply.v60f64.v8f64.v30f64(<8 x double> %trans, <30 x double> %load.2, i32 4, i32 2, i32 15), !dbg !11
tail call void @llvm.matrix.columnwise.store.v60f64.p0f64(<60 x double> %mult, double* %arg2, i32 10, i32 4, i32 15), !dbg !11
We have two leaf nodes (the 2 stores) and the first store stores %trans
which is also used by the matrix multiply %mult. We generate separate
remarks for each leaf (stores). To denote that parts are shared, the
shared expressions are marked as shared (), with a reference to the
other remark that shares it. The operation summary also denotes the
shared operations separately.
Reviewers: anemet, Gerolf, thegameg, hfinkel, andrew.w.kaylor, LuoYuanke
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D72526
2020-01-29 01:01:00 +08:00
|
|
|
; This test needs to be target specific due to the cost estimate in the output.
|
|
|
|
|
2020-01-28 10:00:43 +08:00
|
|
|
; RUN: opt -lower-matrix-intrinsics -pass-remarks=lower-matrix-intrinsics -mtriple=arm64-apple-iphoneos < %s 2>&1 | FileCheck %s
|
2020-01-28 08:22:04 +08:00
|
|
|
|
2020-01-28 09:17:17 +08:00
|
|
|
; CHECK-LABEL: remark: test.h:40:20: Lowered with 6 stores, 6 loads, 24 compute ops
|
2020-01-28 08:22:04 +08:00
|
|
|
; CHECK-NEXT: store(
|
|
|
|
; CHECK-NEXT: transpose.2x6.double(load(addr %A)),
|
|
|
|
; CHECK-NEXT: addr %B)
|
|
|
|
define void @transpose(<12 x double>* %A, <12 x double>* %B) !dbg !23 {
|
|
|
|
%load = load <12 x double>, <12 x double>* %A, !dbg !24
|
|
|
|
%t = call <12 x double> @llvm.matrix.transpose.v12f64.v12f64(<12 x double> %load, i32 2, i32 6), !dbg !24
|
|
|
|
store <12 x double> %t, <12 x double>* %B, !dbg !24
|
|
|
|
ret void
|
|
|
|
}
|
|
|
|
|
2020-01-28 09:17:17 +08:00
|
|
|
; CHECK-LABEL: remark: test.h:50:20: Lowered with 2 stores, 12 loads, 22 compute ops
|
2020-01-28 08:22:04 +08:00
|
|
|
; CHECK-NEXT: store(
|
|
|
|
; CHECK-NEXT: multiply.2x6.6x2.double(
|
|
|
|
; CHECK-NEXT: load(addr %A),
|
|
|
|
; CHECK-NEXT: load(addr %B)),
|
|
|
|
; CHECK-NEXT: addr %C)
|
|
|
|
define void @multiply(<12 x double>* %A, <12 x double>* %B, <4 x double>* %C) !dbg !25 {
|
|
|
|
%A.matrix = load <12 x double>, <12 x double>* %A, !dbg !26
|
|
|
|
%B.matrix = load <12 x double>, <12 x double>* %B, !dbg !26
|
|
|
|
%t = call <4 x double> @llvm.matrix.multiply(<12 x double> %A.matrix, <12 x double> %B.matrix, i32 2, i32 6, i32 2), !dbg !26
|
|
|
|
store <4 x double> %t, <4 x double>* %C, !dbg !26
|
|
|
|
ret void
|
|
|
|
}
|
|
|
|
|
2020-01-28 09:17:17 +08:00
|
|
|
; CHECK-LABEL: remark: test.h:60:20: Lowered with 6 stores, 6 loads, 0 compute ops
|
2020-01-28 08:22:04 +08:00
|
|
|
; CHECK-NEXT: store(
|
[Matrix] Update load/store intrinsics.
This patch adjust the load/store matrix intrinsics, formerly known as
llvm.matrix.columnwise.load/store, to improve the naming and allow
passing of extra information (volatile).
The patch performs the following changes:
* Rename columnwise.load/store to column.major.load/store. This is more
expressive and also more in line with the naming in Clang.
* Changes the stride arguments from i32 to i64. The stride can be
larger than i32 and this makes things more uniform with the way
things are handled in Clang.
* A new boolean argument is added to indicate whether the load/store
is volatile. The lowering respects that when emitting vector
load/store instructions
* MatrixBuilder is updated to require both Alignment and IsVolatile
arguments, which are passed through to the generated intrinsic. The
alignment is set using the `align` attribute.
The changes are grouped together in a single patch, to have a single
commit that breaks the compatibility. We probably should be fine with
updating the intrinsics, as we did not yet officially support them in
the last stable release. If there are any concerns, we can add
auto-upgrade rules for the columnwise intrinsics though.
Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, nicolasvasilache, rjmccall, ftynse
Reviewed By: anemet, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D81472
2020-06-18 16:30:41 +08:00
|
|
|
; CHECK-NEXT: column.major.load.3x3.double(addr %A, 5),
|
2020-01-28 08:22:04 +08:00
|
|
|
; CHECK-NEXT: addr %B)
|
2020-07-14 18:01:05 +08:00
|
|
|
define void @column.major.load(double* %A, <9 x double>* %B) !dbg !27 {
|
|
|
|
%A.matrix = call <9 x double> @llvm.matrix.column.major.load(double* %A, i64 5, i1 false, i32 3, i32 3), !dbg !28
|
2020-01-28 08:22:04 +08:00
|
|
|
store <9 x double> %A.matrix, <9 x double>* %B, !dbg !28
|
|
|
|
ret void
|
|
|
|
}
|
|
|
|
|
2020-01-28 09:17:17 +08:00
|
|
|
; CHECK-LABEL: remark: test.h:70:20: Lowered with 6 stores, 6 loads, 0 compute ops
|
[Matrix] Update load/store intrinsics.
This patch adjust the load/store matrix intrinsics, formerly known as
llvm.matrix.columnwise.load/store, to improve the naming and allow
passing of extra information (volatile).
The patch performs the following changes:
* Rename columnwise.load/store to column.major.load/store. This is more
expressive and also more in line with the naming in Clang.
* Changes the stride arguments from i32 to i64. The stride can be
larger than i32 and this makes things more uniform with the way
things are handled in Clang.
* A new boolean argument is added to indicate whether the load/store
is volatile. The lowering respects that when emitting vector
load/store instructions
* MatrixBuilder is updated to require both Alignment and IsVolatile
arguments, which are passed through to the generated intrinsic. The
alignment is set using the `align` attribute.
The changes are grouped together in a single patch, to have a single
commit that breaks the compatibility. We probably should be fine with
updating the intrinsics, as we did not yet officially support them in
the last stable release. If there are any concerns, we can add
auto-upgrade rules for the columnwise intrinsics though.
Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, nicolasvasilache, rjmccall, ftynse
Reviewed By: anemet, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D81472
2020-06-18 16:30:41 +08:00
|
|
|
; CHECK-NEXT: column.major.store.3x3.double(
|
|
|
|
; CHECK-NEXT: column.major.load.3x3.double(addr %A, 5),
|
2020-01-28 08:22:04 +08:00
|
|
|
; CHECK-NEXT: addr %B,
|
|
|
|
; CHECK-NEXT: 10)
|
2020-07-14 18:01:05 +08:00
|
|
|
define void @column.major.store(double* %A, double* %B) !dbg !29 {
|
|
|
|
%A.matrix = call <9 x double> @llvm.matrix.column.major.load(double* %A, i64 5, i1 false, i32 3, i32 3), !dbg !30
|
|
|
|
call void @llvm.matrix.column.major.store(<9 x double> %A.matrix, double* %B, i64 10, i1 false, i32 3, i32 3), !dbg !30
|
2020-01-28 08:22:04 +08:00
|
|
|
ret void
|
|
|
|
}
|
|
|
|
|
2020-01-28 09:17:17 +08:00
|
|
|
; CHECK-LABEL: remark: test.h:80:20: Lowered with 6 stores, 6 loads, 12 compute ops
|
[Matrix] Update load/store intrinsics.
This patch adjust the load/store matrix intrinsics, formerly known as
llvm.matrix.columnwise.load/store, to improve the naming and allow
passing of extra information (volatile).
The patch performs the following changes:
* Rename columnwise.load/store to column.major.load/store. This is more
expressive and also more in line with the naming in Clang.
* Changes the stride arguments from i32 to i64. The stride can be
larger than i32 and this makes things more uniform with the way
things are handled in Clang.
* A new boolean argument is added to indicate whether the load/store
is volatile. The lowering respects that when emitting vector
load/store instructions
* MatrixBuilder is updated to require both Alignment and IsVolatile
arguments, which are passed through to the generated intrinsic. The
alignment is set using the `align` attribute.
The changes are grouped together in a single patch, to have a single
commit that breaks the compatibility. We probably should be fine with
updating the intrinsics, as we did not yet officially support them in
the last stable release. If there are any concerns, we can add
auto-upgrade rules for the columnwise intrinsics though.
Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, nicolasvasilache, rjmccall, ftynse
Reviewed By: anemet, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D81472
2020-06-18 16:30:41 +08:00
|
|
|
; CHECK-NEXT: column.major.store.3x3.double(
|
2020-01-28 08:22:04 +08:00
|
|
|
; CHECK-NEXT: fmul(
|
|
|
|
; CHECK-NEXT: fadd(
|
[Matrix] Update load/store intrinsics.
This patch adjust the load/store matrix intrinsics, formerly known as
llvm.matrix.columnwise.load/store, to improve the naming and allow
passing of extra information (volatile).
The patch performs the following changes:
* Rename columnwise.load/store to column.major.load/store. This is more
expressive and also more in line with the naming in Clang.
* Changes the stride arguments from i32 to i64. The stride can be
larger than i32 and this makes things more uniform with the way
things are handled in Clang.
* A new boolean argument is added to indicate whether the load/store
is volatile. The lowering respects that when emitting vector
load/store instructions
* MatrixBuilder is updated to require both Alignment and IsVolatile
arguments, which are passed through to the generated intrinsic. The
alignment is set using the `align` attribute.
The changes are grouped together in a single patch, to have a single
commit that breaks the compatibility. We probably should be fine with
updating the intrinsics, as we did not yet officially support them in
the last stable release. If there are any concerns, we can add
auto-upgrade rules for the columnwise intrinsics though.
Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, nicolasvasilache, rjmccall, ftynse
Reviewed By: anemet, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D81472
2020-06-18 16:30:41 +08:00
|
|
|
; CHECK-NEXT: column.major.load.3x3.double(addr %A, 5)
|
|
|
|
; CHECK-NEXT: (reused) column.major.load.3x3.double(addr %A, 5)),
|
|
|
|
; CHECK-NEXT: (reused) column.major.load.3x3.double(addr %A, 5)),
|
2020-01-28 08:22:04 +08:00
|
|
|
; CHECK-NEXT: addr %B,
|
|
|
|
; CHECK-NEXT: 10)
|
|
|
|
|
2020-07-14 18:01:05 +08:00
|
|
|
define void @binaryops(double* %A, double* %B) !dbg !31 {
|
|
|
|
%A.matrix = call <9 x double> @llvm.matrix.column.major.load(double* %A, i64 5, i1 false, i32 3, i32 3), !dbg !32
|
[Matrix] Add remark propagation along the inlined-at chain.
This patch adds support for propagating matrix expressions along the
inlined-at chain and emitting remarks at the traversed function scopes.
To motivate this new behavior, consider the example below. Without the
remark 'up-leveling', we would only get remarks in load.h and store.h,
but we cannot generate a remark describing the full expression in
toplevel.cpp, which is the place where the user has the best chance of
spotting/fixing potential problems.
With this patch, we generate a remark for the load in load.h, one for
the store in store.h and one for the complete expression in
toplevel.cpp. For a bigger example, please see remarks-inlining.ll.
load.h:
template <typename Ty, unsigned R, unsigned C> Matrix<Ty, R, C> load(Ty *Ptr) {
Matrix<Ty, R, C> Result;
Result.value = *reinterpret_cast <typename Matrix<Ty, R, C>::matrix_t *>(Ptr);
return Result;
}
store.h:
template <typename Ty, unsigned R, unsigned C> void store(Matrix<Ty, R, C> M1, Ty *Ptr) {
*reinterpret_cast<typename decltype(M1)::matrix_t *>(Ptr) = M1.value;
}
toplevel.cpp
void test(double *A, double *B, double *C) {
store(add(load<double, 3, 5>(A), load<double, 3, 5>(B)), C);
}
For a given function, we traverse the inlined-at chain for each
matrix instruction (= instructions with shape information). We collect
the matrix instructions in each DISubprogram we visit. This produces a
mapping of DISubprogram -> (List of matrix instructions visible in the
subpogram). We then generate remarks using the list of instructions for
each subprogram in the inlined-at chain. Note that the list of instructions
for a subprogram includes the instructions from its own subprograms
recursively. For example using the example above, for the subprogram
'test' this includes inline functions 'load' and 'store'. This allows
surfacing the remarks at a level useful to users.
Please note that the current approach may create a lot of extra remarks.
Additional heuristics to cut-off the traversal can be implemented in the
future. For example, it might make sense to stop 'up-leveling' once all
matrix instructions are at the same debug location.
Reviewers: anemet, Gerolf, thegameg, hfinkel, andrew.w.kaylor, LuoYuanke
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D73600
2020-03-12 01:01:47 +08:00
|
|
|
%R1.matrix = fadd <9 x double> %A.matrix, %A.matrix, !dbg !32
|
|
|
|
%R2.matrix = fmul <9 x double> %R1.matrix, %A.matrix, !dbg !32
|
2020-07-14 18:01:05 +08:00
|
|
|
call void @llvm.matrix.column.major.store(<9 x double> %R2.matrix, double* %B, i64 10, i1 false, i32 3, i32 3), !dbg !32
|
2020-01-28 08:22:04 +08:00
|
|
|
ret void
|
|
|
|
}
|
|
|
|
|
2020-01-28 09:17:17 +08:00
|
|
|
; CHECK-LABEL: remark: test.h:90:20: Lowered with 6 stores, 6 loads, 12 compute ops
|
[Matrix] Update load/store intrinsics.
This patch adjust the load/store matrix intrinsics, formerly known as
llvm.matrix.columnwise.load/store, to improve the naming and allow
passing of extra information (volatile).
The patch performs the following changes:
* Rename columnwise.load/store to column.major.load/store. This is more
expressive and also more in line with the naming in Clang.
* Changes the stride arguments from i32 to i64. The stride can be
larger than i32 and this makes things more uniform with the way
things are handled in Clang.
* A new boolean argument is added to indicate whether the load/store
is volatile. The lowering respects that when emitting vector
load/store instructions
* MatrixBuilder is updated to require both Alignment and IsVolatile
arguments, which are passed through to the generated intrinsic. The
alignment is set using the `align` attribute.
The changes are grouped together in a single patch, to have a single
commit that breaks the compatibility. We probably should be fine with
updating the intrinsics, as we did not yet officially support them in
the last stable release. If there are any concerns, we can add
auto-upgrade rules for the columnwise intrinsics though.
Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, nicolasvasilache, rjmccall, ftynse
Reviewed By: anemet, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D81472
2020-06-18 16:30:41 +08:00
|
|
|
; CHECK-NEXT: column.major.store.3x3.double(
|
2020-01-28 08:22:04 +08:00
|
|
|
; CHECK-NEXT: fmul(
|
|
|
|
; CHECK-NEXT: fadd(
|
[Matrix] Update load/store intrinsics.
This patch adjust the load/store matrix intrinsics, formerly known as
llvm.matrix.columnwise.load/store, to improve the naming and allow
passing of extra information (volatile).
The patch performs the following changes:
* Rename columnwise.load/store to column.major.load/store. This is more
expressive and also more in line with the naming in Clang.
* Changes the stride arguments from i32 to i64. The stride can be
larger than i32 and this makes things more uniform with the way
things are handled in Clang.
* A new boolean argument is added to indicate whether the load/store
is volatile. The lowering respects that when emitting vector
load/store instructions
* MatrixBuilder is updated to require both Alignment and IsVolatile
arguments, which are passed through to the generated intrinsic. The
alignment is set using the `align` attribute.
The changes are grouped together in a single patch, to have a single
commit that breaks the compatibility. We probably should be fine with
updating the intrinsics, as we did not yet officially support them in
the last stable release. If there are any concerns, we can add
auto-upgrade rules for the columnwise intrinsics though.
Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, nicolasvasilache, rjmccall, ftynse
Reviewed By: anemet, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D81472
2020-06-18 16:30:41 +08:00
|
|
|
; CHECK-NEXT: column.major.load.3x3.double(addr %A, 5)
|
|
|
|
; CHECK-NEXT: (reused) column.major.load.3x3.double(addr %A, 5)),
|
|
|
|
; CHECK-NEXT: (reused) column.major.load.3x3.double(addr %A, 5)),
|
2020-01-28 08:22:04 +08:00
|
|
|
; CHECK-NEXT: addr %B,
|
|
|
|
; CHECK-NEXT: 10)
|
2020-01-28 09:17:17 +08:00
|
|
|
; CHECK-NEXT: remark: test.h:90:20: Lowered with 2 stores, 12 loads, 22 compute ops
|
2020-01-28 08:22:04 +08:00
|
|
|
; CHECK-NEXT: store(
|
|
|
|
; CHECK-NEXT: multiply.2x6.6x2.double(
|
|
|
|
; CHECK-NEXT: load(addr %C),
|
|
|
|
; CHECK-NEXT: load(addr %D)),
|
|
|
|
; CHECK-NEXT: addr %E)
|
|
|
|
|
2020-07-14 18:01:05 +08:00
|
|
|
define void @multiple_expressions(double* %A, double* %B, <12 x double>* %C, <12 x double>* %D, <4 x double>* %E) !dbg !33 {
|
|
|
|
%A.matrix = call <9 x double> @llvm.matrix.column.major.load(double* %A, i64 5, i1 false, i32 3, i32 3), !dbg !34
|
[Matrix] Add remark propagation along the inlined-at chain.
This patch adds support for propagating matrix expressions along the
inlined-at chain and emitting remarks at the traversed function scopes.
To motivate this new behavior, consider the example below. Without the
remark 'up-leveling', we would only get remarks in load.h and store.h,
but we cannot generate a remark describing the full expression in
toplevel.cpp, which is the place where the user has the best chance of
spotting/fixing potential problems.
With this patch, we generate a remark for the load in load.h, one for
the store in store.h and one for the complete expression in
toplevel.cpp. For a bigger example, please see remarks-inlining.ll.
load.h:
template <typename Ty, unsigned R, unsigned C> Matrix<Ty, R, C> load(Ty *Ptr) {
Matrix<Ty, R, C> Result;
Result.value = *reinterpret_cast <typename Matrix<Ty, R, C>::matrix_t *>(Ptr);
return Result;
}
store.h:
template <typename Ty, unsigned R, unsigned C> void store(Matrix<Ty, R, C> M1, Ty *Ptr) {
*reinterpret_cast<typename decltype(M1)::matrix_t *>(Ptr) = M1.value;
}
toplevel.cpp
void test(double *A, double *B, double *C) {
store(add(load<double, 3, 5>(A), load<double, 3, 5>(B)), C);
}
For a given function, we traverse the inlined-at chain for each
matrix instruction (= instructions with shape information). We collect
the matrix instructions in each DISubprogram we visit. This produces a
mapping of DISubprogram -> (List of matrix instructions visible in the
subpogram). We then generate remarks using the list of instructions for
each subprogram in the inlined-at chain. Note that the list of instructions
for a subprogram includes the instructions from its own subprograms
recursively. For example using the example above, for the subprogram
'test' this includes inline functions 'load' and 'store'. This allows
surfacing the remarks at a level useful to users.
Please note that the current approach may create a lot of extra remarks.
Additional heuristics to cut-off the traversal can be implemented in the
future. For example, it might make sense to stop 'up-leveling' once all
matrix instructions are at the same debug location.
Reviewers: anemet, Gerolf, thegameg, hfinkel, andrew.w.kaylor, LuoYuanke
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D73600
2020-03-12 01:01:47 +08:00
|
|
|
%R1.matrix = fadd <9 x double> %A.matrix, %A.matrix, !dbg !34
|
|
|
|
%R2.matrix = fmul <9 x double> %R1.matrix, %A.matrix, !dbg !34
|
2020-07-14 18:01:05 +08:00
|
|
|
call void @llvm.matrix.column.major.store(<9 x double> %R2.matrix, double* %B, i64 10, i1 false, i32 3, i32 3), !dbg !34
|
2020-01-28 08:22:04 +08:00
|
|
|
|
|
|
|
%C.matrix = load <12 x double>, <12 x double>* %C, !dbg !34
|
|
|
|
%D.matrix = load <12 x double>, <12 x double>* %D, !dbg !34
|
|
|
|
%Mult.matrix = call <4 x double> @llvm.matrix.multiply(<12 x double> %C.matrix, <12 x double> %D.matrix, i32 2, i32 6, i32 2), !dbg !34
|
|
|
|
store <4 x double> %Mult.matrix, <4 x double>* %E, !dbg !34
|
|
|
|
|
|
|
|
ret void
|
|
|
|
}
|
|
|
|
|
2020-01-28 09:17:17 +08:00
|
|
|
; CHECK-LABEL: remark: test.h:100:20: Lowered with 6 stores, 6 loads, 12 compute ops
|
[Matrix] Update load/store intrinsics.
This patch adjust the load/store matrix intrinsics, formerly known as
llvm.matrix.columnwise.load/store, to improve the naming and allow
passing of extra information (volatile).
The patch performs the following changes:
* Rename columnwise.load/store to column.major.load/store. This is more
expressive and also more in line with the naming in Clang.
* Changes the stride arguments from i32 to i64. The stride can be
larger than i32 and this makes things more uniform with the way
things are handled in Clang.
* A new boolean argument is added to indicate whether the load/store
is volatile. The lowering respects that when emitting vector
load/store instructions
* MatrixBuilder is updated to require both Alignment and IsVolatile
arguments, which are passed through to the generated intrinsic. The
alignment is set using the `align` attribute.
The changes are grouped together in a single patch, to have a single
commit that breaks the compatibility. We probably should be fine with
updating the intrinsics, as we did not yet officially support them in
the last stable release. If there are any concerns, we can add
auto-upgrade rules for the columnwise intrinsics though.
Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, nicolasvasilache, rjmccall, ftynse
Reviewed By: anemet, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D81472
2020-06-18 16:30:41 +08:00
|
|
|
; CHECK-NEXT: column.major.store.3x3.double(
|
2020-01-28 08:22:04 +08:00
|
|
|
; CHECK-NEXT: fmul(
|
|
|
|
; CHECK-NEXT: fadd(
|
[Matrix] Update load/store intrinsics.
This patch adjust the load/store matrix intrinsics, formerly known as
llvm.matrix.columnwise.load/store, to improve the naming and allow
passing of extra information (volatile).
The patch performs the following changes:
* Rename columnwise.load/store to column.major.load/store. This is more
expressive and also more in line with the naming in Clang.
* Changes the stride arguments from i32 to i64. The stride can be
larger than i32 and this makes things more uniform with the way
things are handled in Clang.
* A new boolean argument is added to indicate whether the load/store
is volatile. The lowering respects that when emitting vector
load/store instructions
* MatrixBuilder is updated to require both Alignment and IsVolatile
arguments, which are passed through to the generated intrinsic. The
alignment is set using the `align` attribute.
The changes are grouped together in a single patch, to have a single
commit that breaks the compatibility. We probably should be fine with
updating the intrinsics, as we did not yet officially support them in
the last stable release. If there are any concerns, we can add
auto-upgrade rules for the columnwise intrinsics though.
Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, nicolasvasilache, rjmccall, ftynse
Reviewed By: anemet, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D81472
2020-06-18 16:30:41 +08:00
|
|
|
; CHECK-NEXT: column.major.load.3x3.double(addr %A, 5)
|
|
|
|
; CHECK-NEXT: (reused) column.major.load.3x3.double(addr %A, 5)),
|
|
|
|
; CHECK-NEXT: (reused) column.major.load.3x3.double(addr %A, 5)),
|
2020-07-14 18:01:05 +08:00
|
|
|
; CHECK-NEXT: addr %B,
|
2020-01-28 08:22:04 +08:00
|
|
|
; CHECK-NEXT: 10)
|
2020-07-14 18:01:05 +08:00
|
|
|
define void @stackaddresses(double* %A, double* %B) !dbg !35 {
|
|
|
|
%A.matrix = call <9 x double> @llvm.matrix.column.major.load(double* %A, i64 5, i1 false, i32 3, i32 3), !dbg !36
|
[Matrix] Add remark propagation along the inlined-at chain.
This patch adds support for propagating matrix expressions along the
inlined-at chain and emitting remarks at the traversed function scopes.
To motivate this new behavior, consider the example below. Without the
remark 'up-leveling', we would only get remarks in load.h and store.h,
but we cannot generate a remark describing the full expression in
toplevel.cpp, which is the place where the user has the best chance of
spotting/fixing potential problems.
With this patch, we generate a remark for the load in load.h, one for
the store in store.h and one for the complete expression in
toplevel.cpp. For a bigger example, please see remarks-inlining.ll.
load.h:
template <typename Ty, unsigned R, unsigned C> Matrix<Ty, R, C> load(Ty *Ptr) {
Matrix<Ty, R, C> Result;
Result.value = *reinterpret_cast <typename Matrix<Ty, R, C>::matrix_t *>(Ptr);
return Result;
}
store.h:
template <typename Ty, unsigned R, unsigned C> void store(Matrix<Ty, R, C> M1, Ty *Ptr) {
*reinterpret_cast<typename decltype(M1)::matrix_t *>(Ptr) = M1.value;
}
toplevel.cpp
void test(double *A, double *B, double *C) {
store(add(load<double, 3, 5>(A), load<double, 3, 5>(B)), C);
}
For a given function, we traverse the inlined-at chain for each
matrix instruction (= instructions with shape information). We collect
the matrix instructions in each DISubprogram we visit. This produces a
mapping of DISubprogram -> (List of matrix instructions visible in the
subpogram). We then generate remarks using the list of instructions for
each subprogram in the inlined-at chain. Note that the list of instructions
for a subprogram includes the instructions from its own subprograms
recursively. For example using the example above, for the subprogram
'test' this includes inline functions 'load' and 'store'. This allows
surfacing the remarks at a level useful to users.
Please note that the current approach may create a lot of extra remarks.
Additional heuristics to cut-off the traversal can be implemented in the
future. For example, it might make sense to stop 'up-leveling' once all
matrix instructions are at the same debug location.
Reviewers: anemet, Gerolf, thegameg, hfinkel, andrew.w.kaylor, LuoYuanke
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D73600
2020-03-12 01:01:47 +08:00
|
|
|
%R1.matrix = fadd <9 x double> %A.matrix, %A.matrix, !dbg !36
|
|
|
|
%R2.matrix = fmul <9 x double> %R1.matrix, %A.matrix, !dbg !36
|
2020-07-14 18:01:05 +08:00
|
|
|
call void @llvm.matrix.column.major.store(<9 x double> %R2.matrix, double* %B, i64 10, i1 false, i32 3, i32 3), !dbg !36
|
2020-01-28 08:22:04 +08:00
|
|
|
ret void
|
|
|
|
}
|
|
|
|
|
2020-01-28 09:17:17 +08:00
|
|
|
; CHECK-LABEL: remark: test.h:30:20: Lowered with 10 stores, 9 loads, 30 compute ops
|
2020-01-28 08:22:04 +08:00
|
|
|
; CHECK-NEXT: store(
|
|
|
|
; CHECK-NEXT: transpose.5x3.double(load(addr %A)),
|
|
|
|
; CHECK-NEXT: stack addr %s1)
|
|
|
|
%S1 = type {<15 x double>*}
|
|
|
|
define void @get_underlying_object(%S1* %A) !dbg !21 {
|
|
|
|
entry:
|
|
|
|
%s1 = alloca <15 x double>, !dbg !22
|
|
|
|
%a1 = getelementptr %S1, %S1* %A, i32 0, i32 0, !dbg !22
|
|
|
|
%a2 = load <15 x double>*, <15 x double>** %a1, !dbg !22
|
|
|
|
%av = load <15 x double>, <15 x double>* %a2, !dbg !22
|
|
|
|
|
|
|
|
%s2 = bitcast <15 x double>* %s1 to i64*, !dbg !22
|
|
|
|
%s3 = bitcast i64* %s2 to <15 x double>*, !dbg !22
|
|
|
|
|
[Matrix] Add remark propagation along the inlined-at chain.
This patch adds support for propagating matrix expressions along the
inlined-at chain and emitting remarks at the traversed function scopes.
To motivate this new behavior, consider the example below. Without the
remark 'up-leveling', we would only get remarks in load.h and store.h,
but we cannot generate a remark describing the full expression in
toplevel.cpp, which is the place where the user has the best chance of
spotting/fixing potential problems.
With this patch, we generate a remark for the load in load.h, one for
the store in store.h and one for the complete expression in
toplevel.cpp. For a bigger example, please see remarks-inlining.ll.
load.h:
template <typename Ty, unsigned R, unsigned C> Matrix<Ty, R, C> load(Ty *Ptr) {
Matrix<Ty, R, C> Result;
Result.value = *reinterpret_cast <typename Matrix<Ty, R, C>::matrix_t *>(Ptr);
return Result;
}
store.h:
template <typename Ty, unsigned R, unsigned C> void store(Matrix<Ty, R, C> M1, Ty *Ptr) {
*reinterpret_cast<typename decltype(M1)::matrix_t *>(Ptr) = M1.value;
}
toplevel.cpp
void test(double *A, double *B, double *C) {
store(add(load<double, 3, 5>(A), load<double, 3, 5>(B)), C);
}
For a given function, we traverse the inlined-at chain for each
matrix instruction (= instructions with shape information). We collect
the matrix instructions in each DISubprogram we visit. This produces a
mapping of DISubprogram -> (List of matrix instructions visible in the
subpogram). We then generate remarks using the list of instructions for
each subprogram in the inlined-at chain. Note that the list of instructions
for a subprogram includes the instructions from its own subprograms
recursively. For example using the example above, for the subprogram
'test' this includes inline functions 'load' and 'store'. This allows
surfacing the remarks at a level useful to users.
Please note that the current approach may create a lot of extra remarks.
Additional heuristics to cut-off the traversal can be implemented in the
future. For example, it might make sense to stop 'up-leveling' once all
matrix instructions are at the same debug location.
Reviewers: anemet, Gerolf, thegameg, hfinkel, andrew.w.kaylor, LuoYuanke
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D73600
2020-03-12 01:01:47 +08:00
|
|
|
%t = call <15 x double> @llvm.matrix.transpose.v15f64.v15f64(<15 x double> %av, i32 5, i32 3), !dbg !22
|
2020-01-28 08:22:04 +08:00
|
|
|
|
|
|
|
store <15 x double> %t, <15 x double>* %s3, !dbg !22
|
|
|
|
ret void
|
|
|
|
}
|
|
|
|
|
2020-07-14 18:01:05 +08:00
|
|
|
declare <12 x double> @llvm.matrix.transpose.v12f64.v12f64(<12 x double>, i32, i32)
|
|
|
|
declare <4 x double> @llvm.matrix.multiply(<12 x double>, <12 x double>, i32, i32, i32)
|
|
|
|
declare <9 x double> @llvm.matrix.column.major.load(double*, i64, i1, i32, i32)
|
2020-01-28 08:22:04 +08:00
|
|
|
declare <15 x double> @llvm.matrix.transpose.v15f64.v15f64(<15 x double>, i32, i32)
|
2020-07-14 18:01:05 +08:00
|
|
|
declare void @llvm.matrix.column.major.store(<9 x double>, double*, i64, i1, i32, i32)
|
|
|
|
|
2020-01-28 08:22:04 +08:00
|
|
|
|
|
|
|
!llvm.dbg.cu = !{!0}
|
|
|
|
!llvm.module.flags = !{!3, !4}
|
|
|
|
|
|
|
|
!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2)
|
|
|
|
!1 = !DIFile(filename: "test.h", directory: "/test")
|
|
|
|
!2 = !{}
|
|
|
|
!3 = !{i32 2, !"Dwarf Version", i32 4}
|
|
|
|
!4 = !{i32 2, !"Debug Info Version", i32 3}
|
|
|
|
|
|
|
|
!6 = !DISubroutineType(types: !7)
|
|
|
|
!7 = !{null, !8, !8, !11}
|
|
|
|
!8 = !DIDerivedType(tag: DW_TAG_restrict_type, baseType: !9)
|
|
|
|
!9 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !10, size: 32, align: 32)
|
|
|
|
!10 = !DIBasicType(name: "float", size: 32, align: 32, encoding: DW_ATE_float)
|
|
|
|
!11 = !DIBasicType(name: "int", size: 32, align: 32, encoding: DW_ATE_signed)
|
|
|
|
!12 = !{!13}
|
|
|
|
!13 = !DILocalVariable(name: "a", arg: 1, scope: !5, file: !1, line: 1, type: !8)
|
|
|
|
!14 = !DILocation(line: 1, column: 27, scope: !5)
|
|
|
|
|
|
|
|
!5 = distinct !DISubprogram(name: "fn1", scope: !1, file: !1, line: 1, type: !6, isLocal: false, isDefinition: true, scopeLine: 1, flags: DIFlagPrototyped, isOptimized: true, unit: !0, retainedNodes: !12)
|
|
|
|
!19 = !DILocation(line: 10, column: 20, scope: !5)
|
|
|
|
!20 = !DILocation(line: 10, column: 10, scope: !5)
|
|
|
|
|
|
|
|
!21 = distinct !DISubprogram(name: "fn2", scope: !1, file: !1, line: 1, type: !6, isLocal: false, isDefinition: true, scopeLine: 1, flags: DIFlagPrototyped, isOptimized: true, unit: !0, retainedNodes: !12)
|
|
|
|
!22 = !DILocation(line: 30, column: 20, scope: !21)
|
|
|
|
|
|
|
|
!23 = distinct !DISubprogram(name: "fn3", scope: !1, file: !1, line: 1, type: !6, isLocal: false, isDefinition: true, scopeLine: 1, flags: DIFlagPrototyped, isOptimized: true, unit: !0, retainedNodes: !12)
|
|
|
|
!24 = !DILocation(line: 40, column: 20, scope: !23)
|
|
|
|
|
|
|
|
!25 = distinct !DISubprogram(name: "fn4", scope: !1, file: !1, line: 1, type: !6, isLocal: false, isDefinition: true, scopeLine: 1, flags: DIFlagPrototyped, isOptimized: true, unit: !0, retainedNodes: !12)
|
|
|
|
!26 = !DILocation(line: 50, column: 20, scope: !25)
|
|
|
|
|
|
|
|
!27 = distinct !DISubprogram(name: "fn5", scope: !1, file: !1, line: 1, type: !6, isLocal: false, isDefinition: true, scopeLine: 1, flags: DIFlagPrototyped, isOptimized: true, unit: !0, retainedNodes: !12)
|
|
|
|
!28 = !DILocation(line: 60, column: 20, scope: !27)
|
|
|
|
|
|
|
|
!29 = distinct !DISubprogram(name: "fn6", scope: !1, file: !1, line: 1, type: !6, isLocal: false, isDefinition: true, scopeLine: 1, flags: DIFlagPrototyped, isOptimized: true, unit: !0, retainedNodes: !12)
|
|
|
|
!30 = !DILocation(line: 70, column: 20, scope: !29)
|
|
|
|
|
|
|
|
!31 = distinct !DISubprogram(name: "fn7", scope: !1, file: !1, line: 1, type: !6, isLocal: false, isDefinition: true, scopeLine: 1, flags: DIFlagPrototyped, isOptimized: true, unit: !0, retainedNodes: !12)
|
|
|
|
!32 = !DILocation(line: 80, column: 20, scope: !31)
|
|
|
|
|
|
|
|
!33 = distinct !DISubprogram(name: "fn8", scope: !1, file: !1, line: 1, type: !6, isLocal: false, isDefinition: true, scopeLine: 1, flags: DIFlagPrototyped, isOptimized: true, unit: !0, retainedNodes: !12)
|
|
|
|
!34 = !DILocation(line: 90, column: 20, scope: !33)
|
|
|
|
|
|
|
|
!35 = distinct !DISubprogram(name: "fn9", scope: !1, file: !1, line: 1, type: !6, isLocal: false, isDefinition: true, scopeLine: 1, flags: DIFlagPrototyped, isOptimized: true, unit: !0, retainedNodes: !12)
|
|
|
|
!36 = !DILocation(line: 100, column: 20, scope: !35)
|