2015-05-06 01:44:16 +08:00
|
|
|
; RUN: llc -mtriple=i686-pc-windows-msvc < %s | FileCheck %s
|
|
|
|
|
|
|
|
declare void @may_throw_or_crash()
|
|
|
|
declare i32 @_except_handler3(...)
|
|
|
|
declare i32 @_except_handler4(...)
|
|
|
|
declare i32 @__CxxFrameHandler3(...)
|
|
|
|
declare void @llvm.eh.begincatch(i8*, i8*)
|
|
|
|
declare void @llvm.eh.endcatch()
|
2015-06-12 07:37:18 +08:00
|
|
|
declare i32 @llvm.eh.typeid.for(i8*)
|
|
|
|
|
|
|
|
define internal i32 @catchall_filt() {
|
|
|
|
ret i32 1
|
|
|
|
}
|
2015-05-06 01:44:16 +08:00
|
|
|
|
2015-06-18 04:52:32 +08:00
|
|
|
define void @use_except_handler3() personality i32 (...)* @_except_handler3 {
|
2015-06-12 07:37:18 +08:00
|
|
|
entry:
|
2015-05-06 01:44:16 +08:00
|
|
|
invoke void @may_throw_or_crash()
|
2015-10-10 05:27:28 +08:00
|
|
|
to label %cont unwind label %lpad
|
2015-05-06 01:44:16 +08:00
|
|
|
cont:
|
|
|
|
ret void
|
2015-10-10 05:27:28 +08:00
|
|
|
lpad:
|
[IR] Reformulate LLVM's EH funclet IR
While we have successfully implemented a funclet-oriented EH scheme on
top of LLVM IR, our scheme has some notable deficiencies:
- catchendpad and cleanupendpad are necessary in the current design
but they are difficult to explain to others, even to seasoned LLVM
experts.
- catchendpad and cleanupendpad are optimization barriers. They cannot
be split and force all potentially throwing call-sites to be invokes.
This has a noticable effect on the quality of our code generation.
- catchpad, while similar in some aspects to invoke, is fairly awkward.
It is unsplittable, starts a funclet, and has control flow to other
funclets.
- The nesting relationship between funclets is currently a property of
control flow edges. Because of this, we are forced to carefully
analyze the flow graph to see if there might potentially exist illegal
nesting among funclets. While we have logic to clone funclets when
they are illegally nested, it would be nicer if we had a
representation which forbade them upfront.
Let's clean this up a bit by doing the following:
- Instead, make catchpad more like cleanuppad and landingpad: no control
flow, just a bunch of simple operands; catchpad would be splittable.
- Introduce catchswitch, a control flow instruction designed to model
the constraints of funclet oriented EH.
- Make funclet scoping explicit by having funclet instructions consume
the token produced by the funclet which contains them.
- Remove catchendpad and cleanupendpad. Their presence can be inferred
implicitly using coloring information.
N.B. The state numbering code for the CLR has been updated but the
veracity of it's output cannot be spoken for. An expert should take a
look to make sure the results are reasonable.
Reviewers: rnk, JosephTremoulet, andrew.w.kaylor
Differential Revision: http://reviews.llvm.org/D15139
llvm-svn: 255422
2015-12-12 13:38:55 +08:00
|
|
|
%cs = catchswitch within none [label %catch] unwind to caller
|
2015-10-10 05:27:28 +08:00
|
|
|
catch:
|
[IR] Reformulate LLVM's EH funclet IR
While we have successfully implemented a funclet-oriented EH scheme on
top of LLVM IR, our scheme has some notable deficiencies:
- catchendpad and cleanupendpad are necessary in the current design
but they are difficult to explain to others, even to seasoned LLVM
experts.
- catchendpad and cleanupendpad are optimization barriers. They cannot
be split and force all potentially throwing call-sites to be invokes.
This has a noticable effect on the quality of our code generation.
- catchpad, while similar in some aspects to invoke, is fairly awkward.
It is unsplittable, starts a funclet, and has control flow to other
funclets.
- The nesting relationship between funclets is currently a property of
control flow edges. Because of this, we are forced to carefully
analyze the flow graph to see if there might potentially exist illegal
nesting among funclets. While we have logic to clone funclets when
they are illegally nested, it would be nicer if we had a
representation which forbade them upfront.
Let's clean this up a bit by doing the following:
- Instead, make catchpad more like cleanuppad and landingpad: no control
flow, just a bunch of simple operands; catchpad would be splittable.
- Introduce catchswitch, a control flow instruction designed to model
the constraints of funclet oriented EH.
- Make funclet scoping explicit by having funclet instructions consume
the token produced by the funclet which contains them.
- Remove catchendpad and cleanupendpad. Their presence can be inferred
implicitly using coloring information.
N.B. The state numbering code for the CLR has been updated but the
veracity of it's output cannot be spoken for. An expert should take a
look to make sure the results are reasonable.
Reviewers: rnk, JosephTremoulet, andrew.w.kaylor
Differential Revision: http://reviews.llvm.org/D15139
llvm-svn: 255422
2015-12-12 13:38:55 +08:00
|
|
|
%p = catchpad within %cs [i8* bitcast (i32 ()* @catchall_filt to i8*)]
|
|
|
|
catchret from %p to label %cont
|
2015-05-06 01:44:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
; CHECK-LABEL: _use_except_handler3:
|
2015-05-30 05:58:11 +08:00
|
|
|
; CHECK: pushl %ebp
|
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting with compiler time improvements
Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 297695
2017-03-14 08:34:14 +08:00
|
|
|
; CHECK-NEXT: movl %esp, %ebp
|
|
|
|
; CHECK-NEXT: pushl %ebx
|
|
|
|
; CHECK-NEXT: pushl %edi
|
|
|
|
; CHECK-NEXT: pushl %esi
|
|
|
|
; CHECK-NEXT: subl ${{[0-9]+}}, %esp
|
|
|
|
; CHECK-NEXT: movl %esp, -36(%ebp)
|
|
|
|
; CHECK-NEXT: movl $-1, -16(%ebp)
|
|
|
|
; CHECK-NEXT: movl $L__ehtable$use_except_handler3, -20(%ebp)
|
|
|
|
; CHECK-NEXT: leal -28(%ebp), %[[node:[^ ,]*]]
|
|
|
|
; CHECK-NEXT: movl $__except_handler3, -24(%ebp)
|
|
|
|
; CHECK-NEXT: movl %fs:0, %[[next:[^ ,]*]]
|
|
|
|
; CHECK-NEXT: movl %[[next]], -28(%ebp)
|
|
|
|
; CHECK-NEXT: movl %[[node]], %fs:0
|
|
|
|
; CHECK-NEXT: movl $0, -16(%ebp)
|
|
|
|
; CHECK-NEXT: calll _may_throw_or_crash
|
|
|
|
|
2015-07-10 06:09:41 +08:00
|
|
|
; CHECK: movl -28(%ebp), %[[next:[^ ,]*]]
|
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting with compiler time improvements
Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 297695
2017-03-14 08:34:14 +08:00
|
|
|
; CHECK-NEXT: movl %[[next]], %fs:0
|
2015-05-06 01:44:16 +08:00
|
|
|
; CHECK: retl
|
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting with compiler time improvements
Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 297695
2017-03-14 08:34:14 +08:00
|
|
|
; CHECK-NEXT: LBB1_2: # %catch{{$}}
|
2015-05-06 01:44:16 +08:00
|
|
|
|
2015-06-10 05:42:19 +08:00
|
|
|
; CHECK: .section .xdata,"dr"
|
|
|
|
; CHECK-LABEL: L__ehtable$use_except_handler3:
|
|
|
|
; CHECK-NEXT: .long -1
|
2015-06-12 07:37:18 +08:00
|
|
|
; CHECK-NEXT: .long _catchall_filt
|
2015-10-10 05:27:28 +08:00
|
|
|
; CHECK-NEXT: .long LBB1_2
|
2015-06-10 05:42:19 +08:00
|
|
|
|
2015-06-18 04:52:32 +08:00
|
|
|
define void @use_except_handler4() personality i32 (...)* @_except_handler4 {
|
2015-06-12 07:37:18 +08:00
|
|
|
entry:
|
2015-05-06 01:44:16 +08:00
|
|
|
invoke void @may_throw_or_crash()
|
2015-10-10 05:27:28 +08:00
|
|
|
to label %cont unwind label %lpad
|
2015-05-06 01:44:16 +08:00
|
|
|
cont:
|
|
|
|
ret void
|
2015-10-10 05:27:28 +08:00
|
|
|
lpad:
|
[IR] Reformulate LLVM's EH funclet IR
While we have successfully implemented a funclet-oriented EH scheme on
top of LLVM IR, our scheme has some notable deficiencies:
- catchendpad and cleanupendpad are necessary in the current design
but they are difficult to explain to others, even to seasoned LLVM
experts.
- catchendpad and cleanupendpad are optimization barriers. They cannot
be split and force all potentially throwing call-sites to be invokes.
This has a noticable effect on the quality of our code generation.
- catchpad, while similar in some aspects to invoke, is fairly awkward.
It is unsplittable, starts a funclet, and has control flow to other
funclets.
- The nesting relationship between funclets is currently a property of
control flow edges. Because of this, we are forced to carefully
analyze the flow graph to see if there might potentially exist illegal
nesting among funclets. While we have logic to clone funclets when
they are illegally nested, it would be nicer if we had a
representation which forbade them upfront.
Let's clean this up a bit by doing the following:
- Instead, make catchpad more like cleanuppad and landingpad: no control
flow, just a bunch of simple operands; catchpad would be splittable.
- Introduce catchswitch, a control flow instruction designed to model
the constraints of funclet oriented EH.
- Make funclet scoping explicit by having funclet instructions consume
the token produced by the funclet which contains them.
- Remove catchendpad and cleanupendpad. Their presence can be inferred
implicitly using coloring information.
N.B. The state numbering code for the CLR has been updated but the
veracity of it's output cannot be spoken for. An expert should take a
look to make sure the results are reasonable.
Reviewers: rnk, JosephTremoulet, andrew.w.kaylor
Differential Revision: http://reviews.llvm.org/D15139
llvm-svn: 255422
2015-12-12 13:38:55 +08:00
|
|
|
%cs = catchswitch within none [label %catch] unwind to caller
|
2015-10-10 05:27:28 +08:00
|
|
|
catch:
|
[IR] Reformulate LLVM's EH funclet IR
While we have successfully implemented a funclet-oriented EH scheme on
top of LLVM IR, our scheme has some notable deficiencies:
- catchendpad and cleanupendpad are necessary in the current design
but they are difficult to explain to others, even to seasoned LLVM
experts.
- catchendpad and cleanupendpad are optimization barriers. They cannot
be split and force all potentially throwing call-sites to be invokes.
This has a noticable effect on the quality of our code generation.
- catchpad, while similar in some aspects to invoke, is fairly awkward.
It is unsplittable, starts a funclet, and has control flow to other
funclets.
- The nesting relationship between funclets is currently a property of
control flow edges. Because of this, we are forced to carefully
analyze the flow graph to see if there might potentially exist illegal
nesting among funclets. While we have logic to clone funclets when
they are illegally nested, it would be nicer if we had a
representation which forbade them upfront.
Let's clean this up a bit by doing the following:
- Instead, make catchpad more like cleanuppad and landingpad: no control
flow, just a bunch of simple operands; catchpad would be splittable.
- Introduce catchswitch, a control flow instruction designed to model
the constraints of funclet oriented EH.
- Make funclet scoping explicit by having funclet instructions consume
the token produced by the funclet which contains them.
- Remove catchendpad and cleanupendpad. Their presence can be inferred
implicitly using coloring information.
N.B. The state numbering code for the CLR has been updated but the
veracity of it's output cannot be spoken for. An expert should take a
look to make sure the results are reasonable.
Reviewers: rnk, JosephTremoulet, andrew.w.kaylor
Differential Revision: http://reviews.llvm.org/D15139
llvm-svn: 255422
2015-12-12 13:38:55 +08:00
|
|
|
%p = catchpad within %cs [i8* bitcast (i32 ()* @catchall_filt to i8*)]
|
|
|
|
catchret from %p to label %cont
|
2015-05-06 01:44:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
; CHECK-LABEL: _use_except_handler4:
|
2015-05-30 05:58:11 +08:00
|
|
|
; CHECK: pushl %ebp
|
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting with compiler time improvements
Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 297695
2017-03-14 08:34:14 +08:00
|
|
|
; CHECK-NEXT: movl %esp, %ebp
|
|
|
|
; CHECK-NEXT: pushl %ebx
|
|
|
|
; CHECK-NEXT: pushl %edi
|
|
|
|
; CHECK-NEXT: pushl %esi
|
|
|
|
; CHECK-NEXT: subl ${{[0-9]+}}, %esp
|
|
|
|
; CHECK-NEXT: movl %ebp, %eax
|
|
|
|
; CHECK-NEXT: movl %esp, -36(%ebp)
|
|
|
|
; CHECK-NEXT: movl $-2, -16(%ebp)
|
|
|
|
; CHECK-NEXT: movl $L__ehtable$use_except_handler4, %[[lsda:[^ ,]*]]
|
|
|
|
; CHECK-NEXT: movl ___security_cookie, %[[seccookie:[^ ,]*]]
|
|
|
|
; CHECK-NEXT: xorl %[[seccookie]], %[[lsda]]
|
|
|
|
; CHECK-NEXT: movl %[[lsda]], -20(%ebp)
|
|
|
|
; CHECK-NEXT: xorl %[[seccookie]], %[[tmp1:[^ ,]*]]
|
|
|
|
; CHECK-NEXT: movl %[[tmp1]], -40(%ebp)
|
|
|
|
; CHECK-NEXT: leal -28(%ebp), %[[node:[^ ,]*]]
|
|
|
|
; CHECK-NEXT: movl $__except_handler4, -24(%ebp)
|
|
|
|
; CHECK-NEXT: movl %fs:0, %[[next:[^ ,]*]]
|
|
|
|
; CHECK-NEXT: movl %[[next]], -28(%ebp)
|
|
|
|
; CHECK-NEXT: movl %[[node]], %fs:0
|
|
|
|
; CHECK-NEXT: movl $0, -16(%ebp)
|
|
|
|
; CHECK-NEXT: calll _may_throw_or_crash
|
|
|
|
|
2015-07-10 06:09:41 +08:00
|
|
|
; CHECK: movl -28(%ebp), %[[next:[^ ,]*]]
|
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting with compiler time improvements
Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 297695
2017-03-14 08:34:14 +08:00
|
|
|
; CHECK-NEXT: movl %[[next]], %fs:0
|
|
|
|
; CHECK-NEXT: addl $28, %esp
|
|
|
|
; CHECK-NEXT: popl %esi
|
|
|
|
; CHECK-NEXT: popl %edi
|
|
|
|
; CHECK-NEXT: popl %ebx
|
|
|
|
; CHECK-NEXT: popl %ebp
|
|
|
|
; CHECK-NEXT: retl
|
|
|
|
; CHECK-NEXT: LBB2_2: # %catch{{$}}
|
2015-05-06 01:44:16 +08:00
|
|
|
|
2015-06-10 05:42:19 +08:00
|
|
|
; CHECK: .section .xdata,"dr"
|
|
|
|
; CHECK-LABEL: L__ehtable$use_except_handler4:
|
|
|
|
; CHECK-NEXT: .long -2
|
|
|
|
; CHECK-NEXT: .long 0
|
[StackProtector] Fix computation of GSCookieOffset and EHCookieOffset with SEH4
Summary:
Fix the computation of the offsets present in the scopetable when using the
SEH (__except_handler4).
This patch added an intrinsic to track the position of the allocation on the
stack of the EHGuard. This position is needed when producing the ScopeTable.
```
struct _EH4_SCOPETABLE {
DWORD GSCookieOffset;
DWORD GSCookieXOROffset;
DWORD EHCookieOffset;
DWORD EHCookieXOROffset;
_EH4_SCOPETABLE_RECORD ScopeRecord[1];
};
struct _EH4_SCOPETABLE_RECORD {
DWORD EnclosingLevel;
long (*FilterFunc)();
union {
void (*HandlerAddress)();
void (*FinallyFunc)();
};
};
```
The code to generate the EHCookie is added in `X86WinEHState.cpp`.
Which is adding these instructions when using SEH4.
```
Lfunc_begin0:
# BB#0: # %entry
pushl %ebp
movl %esp, %ebp
pushl %ebx
pushl %edi
pushl %esi
subl $28, %esp
movl %ebp, %eax <<-- Loading FramePtr
movl %esp, -36(%ebp)
movl $-2, -16(%ebp)
movl $L__ehtable$use_except_handler4_ssp, %ecx
xorl ___security_cookie, %ecx
movl %ecx, -20(%ebp)
xorl ___security_cookie, %eax <<-- XOR FramePtr and Cookie
movl %eax, -40(%ebp) <<-- Storing EHGuard
leal -28(%ebp), %eax
movl $__except_handler4, -24(%ebp)
movl %fs:0, %ecx
movl %ecx, -28(%ebp)
movl %eax, %fs:0
movl $0, -16(%ebp)
calll _may_throw_or_crash
LBB1_1: # %cont
movl -28(%ebp), %eax
movl %eax, %fs:0
addl $28, %esp
popl %esi
popl %edi
popl %ebx
popl %ebp
retl
```
And the corresponding offset is computed:
```
Luse_except_handler4_ssp$parent_frame_offset = -36
.p2align 2
L__ehtable$use_except_handler4_ssp:
.long -2 # GSCookieOffset
.long 0 # GSCookieXOROffset
.long -40 # EHCookieOffset <<----
.long 0 # EHCookieXOROffset
.long -2 # ToState
.long _catchall_filt # FilterFunction
.long LBB1_2 # ExceptionHandler
```
Clang is not yet producing function using SEH4, but it's a work in progress.
This patch is a step toward having a valid implementation of SEH4.
Unfortunately, it is not yet fully working. The EH registration block is not
allocated at the right offset on the stack.
Reviewers: rnk, majnemer
Subscribers: llvm-commits, chrisha
Differential Revision: http://reviews.llvm.org/D21231
llvm-svn: 273281
2016-06-21 23:58:55 +08:00
|
|
|
; CHECK-NEXT: .long -40
|
2015-06-10 05:42:19 +08:00
|
|
|
; CHECK-NEXT: .long 0
|
|
|
|
; CHECK-NEXT: .long -2
|
2015-06-12 07:37:18 +08:00
|
|
|
; CHECK-NEXT: .long _catchall_filt
|
2015-10-10 05:27:28 +08:00
|
|
|
; CHECK-NEXT: .long LBB2_2
|
2015-06-10 05:42:19 +08:00
|
|
|
|
[StackProtector] Fix computation of GSCookieOffset and EHCookieOffset with SEH4
Summary:
Fix the computation of the offsets present in the scopetable when using the
SEH (__except_handler4).
This patch added an intrinsic to track the position of the allocation on the
stack of the EHGuard. This position is needed when producing the ScopeTable.
```
struct _EH4_SCOPETABLE {
DWORD GSCookieOffset;
DWORD GSCookieXOROffset;
DWORD EHCookieOffset;
DWORD EHCookieXOROffset;
_EH4_SCOPETABLE_RECORD ScopeRecord[1];
};
struct _EH4_SCOPETABLE_RECORD {
DWORD EnclosingLevel;
long (*FilterFunc)();
union {
void (*HandlerAddress)();
void (*FinallyFunc)();
};
};
```
The code to generate the EHCookie is added in `X86WinEHState.cpp`.
Which is adding these instructions when using SEH4.
```
Lfunc_begin0:
# BB#0: # %entry
pushl %ebp
movl %esp, %ebp
pushl %ebx
pushl %edi
pushl %esi
subl $28, %esp
movl %ebp, %eax <<-- Loading FramePtr
movl %esp, -36(%ebp)
movl $-2, -16(%ebp)
movl $L__ehtable$use_except_handler4_ssp, %ecx
xorl ___security_cookie, %ecx
movl %ecx, -20(%ebp)
xorl ___security_cookie, %eax <<-- XOR FramePtr and Cookie
movl %eax, -40(%ebp) <<-- Storing EHGuard
leal -28(%ebp), %eax
movl $__except_handler4, -24(%ebp)
movl %fs:0, %ecx
movl %ecx, -28(%ebp)
movl %eax, %fs:0
movl $0, -16(%ebp)
calll _may_throw_or_crash
LBB1_1: # %cont
movl -28(%ebp), %eax
movl %eax, %fs:0
addl $28, %esp
popl %esi
popl %edi
popl %ebx
popl %ebp
retl
```
And the corresponding offset is computed:
```
Luse_except_handler4_ssp$parent_frame_offset = -36
.p2align 2
L__ehtable$use_except_handler4_ssp:
.long -2 # GSCookieOffset
.long 0 # GSCookieXOROffset
.long -40 # EHCookieOffset <<----
.long 0 # EHCookieXOROffset
.long -2 # ToState
.long _catchall_filt # FilterFunction
.long LBB1_2 # ExceptionHandler
```
Clang is not yet producing function using SEH4, but it's a work in progress.
This patch is a step toward having a valid implementation of SEH4.
Unfortunately, it is not yet fully working. The EH registration block is not
allocated at the right offset on the stack.
Reviewers: rnk, majnemer
Subscribers: llvm-commits, chrisha
Differential Revision: http://reviews.llvm.org/D21231
llvm-svn: 273281
2016-06-21 23:58:55 +08:00
|
|
|
define void @use_except_handler4_ssp() sspstrong personality i32 (...)* @_except_handler4 {
|
|
|
|
entry:
|
|
|
|
invoke void @may_throw_or_crash()
|
|
|
|
to label %cont unwind label %lpad
|
|
|
|
cont:
|
|
|
|
ret void
|
|
|
|
lpad:
|
|
|
|
%cs = catchswitch within none [label %catch] unwind to caller
|
|
|
|
catch:
|
|
|
|
%p = catchpad within %cs [i8* bitcast (i32 ()* @catchall_filt to i8*)]
|
|
|
|
catchret from %p to label %cont
|
|
|
|
}
|
|
|
|
|
|
|
|
; CHECK-LABEL: _use_except_handler4_ssp:
|
|
|
|
; CHECK: pushl %ebp
|
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting with compiler time improvements
Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 297695
2017-03-14 08:34:14 +08:00
|
|
|
; CHECK-NEXT: movl %esp, %ebp
|
|
|
|
; CHECK-NEXT: pushl %ebx
|
|
|
|
; CHECK-NEXT: pushl %edi
|
|
|
|
; CHECK-NEXT: pushl %esi
|
|
|
|
; CHECK-NEXT: subl ${{[0-9]+}}, %esp
|
|
|
|
; CHECK-NEXT: movl %ebp, %[[ehguard:[^ ,]*]]
|
|
|
|
; CHECK-NEXT: movl %esp, -36(%ebp)
|
|
|
|
; CHECK-NEXT: movl $-2, -16(%ebp)
|
|
|
|
; CHECK-NEXT: movl $L__ehtable$use_except_handler4_ssp, %[[lsda:[^ ,]*]]
|
|
|
|
; CHECK-NEXT: movl ___security_cookie, %[[seccookie:[^ ,]*]]
|
|
|
|
; CHECK-NEXT: xorl %[[seccookie]], %[[lsda]]
|
|
|
|
; CHECK-NEXT: movl %[[lsda]], -20(%ebp)
|
|
|
|
; CHECK-NEXT: xorl %[[seccookie]], %[[ehguard]]
|
|
|
|
; CHECK-NEXT: movl %[[ehguard]], -40(%ebp)
|
|
|
|
; CHECK-NEXT: leal -28(%ebp), %[[node:[^ ,]*]]
|
|
|
|
; CHECK-NEXT: movl $__except_handler4, -24(%ebp)
|
|
|
|
; CHECK-NEXT: movl %fs:0, %[[next:[^ ,]*]]
|
|
|
|
; CHECK-NEXT: movl %[[next]], -28(%ebp)
|
|
|
|
; CHECK-NEXT: movl %[[node]], %fs:0
|
|
|
|
; CHECK-NEXT: movl $0, -16(%ebp)
|
|
|
|
; CHECK-NEXT: calll _may_throw_or_crash
|
[StackProtector] Fix computation of GSCookieOffset and EHCookieOffset with SEH4
Summary:
Fix the computation of the offsets present in the scopetable when using the
SEH (__except_handler4).
This patch added an intrinsic to track the position of the allocation on the
stack of the EHGuard. This position is needed when producing the ScopeTable.
```
struct _EH4_SCOPETABLE {
DWORD GSCookieOffset;
DWORD GSCookieXOROffset;
DWORD EHCookieOffset;
DWORD EHCookieXOROffset;
_EH4_SCOPETABLE_RECORD ScopeRecord[1];
};
struct _EH4_SCOPETABLE_RECORD {
DWORD EnclosingLevel;
long (*FilterFunc)();
union {
void (*HandlerAddress)();
void (*FinallyFunc)();
};
};
```
The code to generate the EHCookie is added in `X86WinEHState.cpp`.
Which is adding these instructions when using SEH4.
```
Lfunc_begin0:
# BB#0: # %entry
pushl %ebp
movl %esp, %ebp
pushl %ebx
pushl %edi
pushl %esi
subl $28, %esp
movl %ebp, %eax <<-- Loading FramePtr
movl %esp, -36(%ebp)
movl $-2, -16(%ebp)
movl $L__ehtable$use_except_handler4_ssp, %ecx
xorl ___security_cookie, %ecx
movl %ecx, -20(%ebp)
xorl ___security_cookie, %eax <<-- XOR FramePtr and Cookie
movl %eax, -40(%ebp) <<-- Storing EHGuard
leal -28(%ebp), %eax
movl $__except_handler4, -24(%ebp)
movl %fs:0, %ecx
movl %ecx, -28(%ebp)
movl %eax, %fs:0
movl $0, -16(%ebp)
calll _may_throw_or_crash
LBB1_1: # %cont
movl -28(%ebp), %eax
movl %eax, %fs:0
addl $28, %esp
popl %esi
popl %edi
popl %ebx
popl %ebp
retl
```
And the corresponding offset is computed:
```
Luse_except_handler4_ssp$parent_frame_offset = -36
.p2align 2
L__ehtable$use_except_handler4_ssp:
.long -2 # GSCookieOffset
.long 0 # GSCookieXOROffset
.long -40 # EHCookieOffset <<----
.long 0 # EHCookieXOROffset
.long -2 # ToState
.long _catchall_filt # FilterFunction
.long LBB1_2 # ExceptionHandler
```
Clang is not yet producing function using SEH4, but it's a work in progress.
This patch is a step toward having a valid implementation of SEH4.
Unfortunately, it is not yet fully working. The EH registration block is not
allocated at the right offset on the stack.
Reviewers: rnk, majnemer
Subscribers: llvm-commits, chrisha
Differential Revision: http://reviews.llvm.org/D21231
llvm-svn: 273281
2016-06-21 23:58:55 +08:00
|
|
|
; CHECK: movl -28(%ebp), %[[next:[^ ,]*]]
|
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting with compiler time improvements
Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 297695
2017-03-14 08:34:14 +08:00
|
|
|
; CHECK-NEXT: movl %[[next]], %fs:0
|
[StackProtector] Fix computation of GSCookieOffset and EHCookieOffset with SEH4
Summary:
Fix the computation of the offsets present in the scopetable when using the
SEH (__except_handler4).
This patch added an intrinsic to track the position of the allocation on the
stack of the EHGuard. This position is needed when producing the ScopeTable.
```
struct _EH4_SCOPETABLE {
DWORD GSCookieOffset;
DWORD GSCookieXOROffset;
DWORD EHCookieOffset;
DWORD EHCookieXOROffset;
_EH4_SCOPETABLE_RECORD ScopeRecord[1];
};
struct _EH4_SCOPETABLE_RECORD {
DWORD EnclosingLevel;
long (*FilterFunc)();
union {
void (*HandlerAddress)();
void (*FinallyFunc)();
};
};
```
The code to generate the EHCookie is added in `X86WinEHState.cpp`.
Which is adding these instructions when using SEH4.
```
Lfunc_begin0:
# BB#0: # %entry
pushl %ebp
movl %esp, %ebp
pushl %ebx
pushl %edi
pushl %esi
subl $28, %esp
movl %ebp, %eax <<-- Loading FramePtr
movl %esp, -36(%ebp)
movl $-2, -16(%ebp)
movl $L__ehtable$use_except_handler4_ssp, %ecx
xorl ___security_cookie, %ecx
movl %ecx, -20(%ebp)
xorl ___security_cookie, %eax <<-- XOR FramePtr and Cookie
movl %eax, -40(%ebp) <<-- Storing EHGuard
leal -28(%ebp), %eax
movl $__except_handler4, -24(%ebp)
movl %fs:0, %ecx
movl %ecx, -28(%ebp)
movl %eax, %fs:0
movl $0, -16(%ebp)
calll _may_throw_or_crash
LBB1_1: # %cont
movl -28(%ebp), %eax
movl %eax, %fs:0
addl $28, %esp
popl %esi
popl %edi
popl %ebx
popl %ebp
retl
```
And the corresponding offset is computed:
```
Luse_except_handler4_ssp$parent_frame_offset = -36
.p2align 2
L__ehtable$use_except_handler4_ssp:
.long -2 # GSCookieOffset
.long 0 # GSCookieXOROffset
.long -40 # EHCookieOffset <<----
.long 0 # EHCookieXOROffset
.long -2 # ToState
.long _catchall_filt # FilterFunction
.long LBB1_2 # ExceptionHandler
```
Clang is not yet producing function using SEH4, but it's a work in progress.
This patch is a step toward having a valid implementation of SEH4.
Unfortunately, it is not yet fully working. The EH registration block is not
allocated at the right offset on the stack.
Reviewers: rnk, majnemer
Subscribers: llvm-commits, chrisha
Differential Revision: http://reviews.llvm.org/D21231
llvm-svn: 273281
2016-06-21 23:58:55 +08:00
|
|
|
; CHECK: retl
|
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting with compiler time improvements
Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 297695
2017-03-14 08:34:14 +08:00
|
|
|
; CHECK-NEXT: [[catch:[^ ,]*]]: # %catch{{$}}
|
|
|
|
|
|
|
|
|
[StackProtector] Fix computation of GSCookieOffset and EHCookieOffset with SEH4
Summary:
Fix the computation of the offsets present in the scopetable when using the
SEH (__except_handler4).
This patch added an intrinsic to track the position of the allocation on the
stack of the EHGuard. This position is needed when producing the ScopeTable.
```
struct _EH4_SCOPETABLE {
DWORD GSCookieOffset;
DWORD GSCookieXOROffset;
DWORD EHCookieOffset;
DWORD EHCookieXOROffset;
_EH4_SCOPETABLE_RECORD ScopeRecord[1];
};
struct _EH4_SCOPETABLE_RECORD {
DWORD EnclosingLevel;
long (*FilterFunc)();
union {
void (*HandlerAddress)();
void (*FinallyFunc)();
};
};
```
The code to generate the EHCookie is added in `X86WinEHState.cpp`.
Which is adding these instructions when using SEH4.
```
Lfunc_begin0:
# BB#0: # %entry
pushl %ebp
movl %esp, %ebp
pushl %ebx
pushl %edi
pushl %esi
subl $28, %esp
movl %ebp, %eax <<-- Loading FramePtr
movl %esp, -36(%ebp)
movl $-2, -16(%ebp)
movl $L__ehtable$use_except_handler4_ssp, %ecx
xorl ___security_cookie, %ecx
movl %ecx, -20(%ebp)
xorl ___security_cookie, %eax <<-- XOR FramePtr and Cookie
movl %eax, -40(%ebp) <<-- Storing EHGuard
leal -28(%ebp), %eax
movl $__except_handler4, -24(%ebp)
movl %fs:0, %ecx
movl %ecx, -28(%ebp)
movl %eax, %fs:0
movl $0, -16(%ebp)
calll _may_throw_or_crash
LBB1_1: # %cont
movl -28(%ebp), %eax
movl %eax, %fs:0
addl $28, %esp
popl %esi
popl %edi
popl %ebx
popl %ebp
retl
```
And the corresponding offset is computed:
```
Luse_except_handler4_ssp$parent_frame_offset = -36
.p2align 2
L__ehtable$use_except_handler4_ssp:
.long -2 # GSCookieOffset
.long 0 # GSCookieXOROffset
.long -40 # EHCookieOffset <<----
.long 0 # EHCookieXOROffset
.long -2 # ToState
.long _catchall_filt # FilterFunction
.long LBB1_2 # ExceptionHandler
```
Clang is not yet producing function using SEH4, but it's a work in progress.
This patch is a step toward having a valid implementation of SEH4.
Unfortunately, it is not yet fully working. The EH registration block is not
allocated at the right offset on the stack.
Reviewers: rnk, majnemer
Subscribers: llvm-commits, chrisha
Differential Revision: http://reviews.llvm.org/D21231
llvm-svn: 273281
2016-06-21 23:58:55 +08:00
|
|
|
|
|
|
|
; CHECK: .section .xdata,"dr"
|
|
|
|
; CHECK-LABEL: L__ehtable$use_except_handler4_ssp:
|
|
|
|
; CHECK-NEXT: .long -2
|
|
|
|
; CHECK-NEXT: .long 0
|
|
|
|
; CHECK-NEXT: .long -40
|
|
|
|
; CHECK-NEXT: .long 0
|
|
|
|
; CHECK-NEXT: .long -2
|
|
|
|
; CHECK-NEXT: .long _catchall_filt
|
|
|
|
; CHECK-NEXT: .long [[catch]]
|
|
|
|
|
2015-06-18 04:52:32 +08:00
|
|
|
define void @use_CxxFrameHandler3() personality i32 (...)* @__CxxFrameHandler3 {
|
2015-05-06 01:44:16 +08:00
|
|
|
invoke void @may_throw_or_crash()
|
|
|
|
to label %cont unwind label %catchall
|
|
|
|
cont:
|
|
|
|
ret void
|
2015-09-17 06:14:46 +08:00
|
|
|
|
2015-05-06 01:44:16 +08:00
|
|
|
catchall:
|
[IR] Reformulate LLVM's EH funclet IR
While we have successfully implemented a funclet-oriented EH scheme on
top of LLVM IR, our scheme has some notable deficiencies:
- catchendpad and cleanupendpad are necessary in the current design
but they are difficult to explain to others, even to seasoned LLVM
experts.
- catchendpad and cleanupendpad are optimization barriers. They cannot
be split and force all potentially throwing call-sites to be invokes.
This has a noticable effect on the quality of our code generation.
- catchpad, while similar in some aspects to invoke, is fairly awkward.
It is unsplittable, starts a funclet, and has control flow to other
funclets.
- The nesting relationship between funclets is currently a property of
control flow edges. Because of this, we are forced to carefully
analyze the flow graph to see if there might potentially exist illegal
nesting among funclets. While we have logic to clone funclets when
they are illegally nested, it would be nicer if we had a
representation which forbade them upfront.
Let's clean this up a bit by doing the following:
- Instead, make catchpad more like cleanuppad and landingpad: no control
flow, just a bunch of simple operands; catchpad would be splittable.
- Introduce catchswitch, a control flow instruction designed to model
the constraints of funclet oriented EH.
- Make funclet scoping explicit by having funclet instructions consume
the token produced by the funclet which contains them.
- Remove catchendpad and cleanupendpad. Their presence can be inferred
implicitly using coloring information.
N.B. The state numbering code for the CLR has been updated but the
veracity of it's output cannot be spoken for. An expert should take a
look to make sure the results are reasonable.
Reviewers: rnk, JosephTremoulet, andrew.w.kaylor
Differential Revision: http://reviews.llvm.org/D15139
llvm-svn: 255422
2015-12-12 13:38:55 +08:00
|
|
|
%cs = catchswitch within none [label %catch] unwind to caller
|
2015-09-17 06:14:46 +08:00
|
|
|
catch:
|
[IR] Reformulate LLVM's EH funclet IR
While we have successfully implemented a funclet-oriented EH scheme on
top of LLVM IR, our scheme has some notable deficiencies:
- catchendpad and cleanupendpad are necessary in the current design
but they are difficult to explain to others, even to seasoned LLVM
experts.
- catchendpad and cleanupendpad are optimization barriers. They cannot
be split and force all potentially throwing call-sites to be invokes.
This has a noticable effect on the quality of our code generation.
- catchpad, while similar in some aspects to invoke, is fairly awkward.
It is unsplittable, starts a funclet, and has control flow to other
funclets.
- The nesting relationship between funclets is currently a property of
control flow edges. Because of this, we are forced to carefully
analyze the flow graph to see if there might potentially exist illegal
nesting among funclets. While we have logic to clone funclets when
they are illegally nested, it would be nicer if we had a
representation which forbade them upfront.
Let's clean this up a bit by doing the following:
- Instead, make catchpad more like cleanuppad and landingpad: no control
flow, just a bunch of simple operands; catchpad would be splittable.
- Introduce catchswitch, a control flow instruction designed to model
the constraints of funclet oriented EH.
- Make funclet scoping explicit by having funclet instructions consume
the token produced by the funclet which contains them.
- Remove catchendpad and cleanupendpad. Their presence can be inferred
implicitly using coloring information.
N.B. The state numbering code for the CLR has been updated but the
veracity of it's output cannot be spoken for. An expert should take a
look to make sure the results are reasonable.
Reviewers: rnk, JosephTremoulet, andrew.w.kaylor
Differential Revision: http://reviews.llvm.org/D15139
llvm-svn: 255422
2015-12-12 13:38:55 +08:00
|
|
|
%p = catchpad within %cs [i8* null, i32 64, i8* null]
|
|
|
|
catchret from %p to label %cont
|
2015-05-06 01:44:16 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
; CHECK-LABEL: _use_CxxFrameHandler3:
|
2015-05-30 05:58:11 +08:00
|
|
|
; CHECK: pushl %ebp
|
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting with compiler time improvements
Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 297695
2017-03-14 08:34:14 +08:00
|
|
|
; CHECK-NEXT: movl %esp, %ebp
|
|
|
|
; CHECK-NEXT: pushl %ebx
|
|
|
|
; CHECK-NEXT: pushl %edi
|
|
|
|
; CHECK-NEXT: pushl %esi
|
|
|
|
; CHECK-NEXT: subl ${{[0-9]+}}, %esp
|
|
|
|
; CHECK-NEXT: movl %esp, -28(%ebp)
|
|
|
|
; CHECK-NEXT: movl $-1, -16(%ebp)
|
|
|
|
; CHECK-NEXT: leal -24(%ebp), %[[node:[^ ,]*]]
|
|
|
|
; CHECK-NEXT: movl $___ehhandler$use_CxxFrameHandler3, -20(%ebp)
|
|
|
|
; CHECK-NEXT: movl %fs:0, %[[next:[^ ,]*]]
|
|
|
|
; CHECK-NEXT: movl %[[next]], -24(%ebp)
|
|
|
|
; CHECK-NEXT: movl %[[node]], %fs:0
|
|
|
|
; CHECK-NEXT: movl $0, -16(%ebp)
|
|
|
|
; CHECK-NEXT: calll _may_throw_or_crash
|
2015-07-10 06:09:41 +08:00
|
|
|
; CHECK: movl -24(%ebp), %[[next:[^ ,]*]]
|
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting with compiler time improvements
Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 297695
2017-03-14 08:34:14 +08:00
|
|
|
; CHECK-NEXT: movl %[[next]], %fs:0
|
2015-05-06 01:44:16 +08:00
|
|
|
; CHECK: retl
|
2015-05-21 07:08:04 +08:00
|
|
|
|
2015-05-30 01:00:57 +08:00
|
|
|
; CHECK: .section .xdata,"dr"
|
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting with compiler time improvements
Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 297695
2017-03-14 08:34:14 +08:00
|
|
|
; CHECK-NEXT: .p2align 2
|
2015-05-30 01:00:57 +08:00
|
|
|
; CHECK-LABEL: L__ehtable$use_CxxFrameHandler3:
|
|
|
|
; CHECK-NEXT: .long 429065506
|
|
|
|
; CHECK-NEXT: .long 2
|
|
|
|
; CHECK-NEXT: .long ($stateUnwindMap$use_CxxFrameHandler3)
|
|
|
|
; CHECK-NEXT: .long 1
|
|
|
|
; CHECK-NEXT: .long ($tryMap$use_CxxFrameHandler3)
|
|
|
|
; CHECK-NEXT: .long 0
|
|
|
|
; CHECK-NEXT: .long 0
|
|
|
|
; CHECK-NEXT: .long 0
|
|
|
|
; CHECK-NEXT: .long 1
|
|
|
|
|
2015-05-21 07:08:04 +08:00
|
|
|
; CHECK-LABEL: ___ehhandler$use_CxxFrameHandler3:
|
|
|
|
; CHECK: movl $L__ehtable$use_CxxFrameHandler3, %eax
|
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting with compiler time improvements
Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 297695
2017-03-14 08:34:14 +08:00
|
|
|
; CHECK-NEXT: jmp ___CxxFrameHandler3 # TAILCALL
|
2015-06-10 09:02:30 +08:00
|
|
|
|
|
|
|
; CHECK: .safeseh __except_handler3
|
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting with compiler time improvements
Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 297695
2017-03-14 08:34:14 +08:00
|
|
|
; CHECK-NEXT: .safeseh __except_handler4
|
|
|
|
; CHECK-NEXT: .safeseh ___ehhandler$use_CxxFrameHandler3
|