2017-08-25 10:32:51 +08:00
|
|
|
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
|
[x86] Introduce a pass to begin more systematically fixing PR36028 and similar issues.
The key idea is to lower COPY nodes populating EFLAGS by scanning the
uses of EFLAGS and introducing dedicated code to preserve the necessary
state in a GPR. In the vast majority of cases, these uses are cmovCC and
jCC instructions. For such cases, we can very easily save and restore
the necessary information by simply inserting a setCC into a GPR where
the original flags are live, and then testing that GPR directly to feed
the cmov or conditional branch.
However, things are a bit more tricky if arithmetic is using the flags.
This patch handles the vast majority of cases that seem to come up in
practice: adc, adcx, adox, rcl, and rcr; all without taking advantage of
partially preserved EFLAGS as LLVM doesn't currently model that at all.
There are a large number of operations that techinaclly observe EFLAGS
currently but shouldn't in this case -- they typically are using DF.
Currently, they will not be handled by this approach. However, I have
never seen this issue come up in practice. It is already pretty rare to
have these patterns come up in practical code with LLVM. I had to resort
to writing MIR tests to cover most of the logic in this pass already.
I suspect even with its current amount of coverage of arithmetic users
of EFLAGS it will be a significant improvement over the current use of
pushf/popf. It will also produce substantially faster code in most of
the common patterns.
This patch also removes all of the old lowering for EFLAGS copies, and
the hack that forced us to use a frame pointer when EFLAGS copies were
found anywhere in a function so that the dynamic stack adjustment wasn't
a problem. None of this is needed as we now lower all of these copies
directly in MI and without require stack adjustments.
Lots of thanks to Reid who came up with several aspects of this
approach, and Craig who helped me work out a couple of things tripping
me up while working on this.
Differential Revision: https://reviews.llvm.org/D45146
llvm-svn: 329657
2018-04-10 09:41:17 +08:00
|
|
|
; RUN: llc -mtriple=i386-linux-gnu -verify-machineinstrs %s -o - | FileCheck %s --check-prefix=CHECK --check-prefix=CHECK32
|
|
|
|
; RUN: llc -mtriple=x86_64-linux-gnu -verify-machineinstrs %s -o - | FileCheck %s --check-prefix=CHECK --check-prefix=CHECK64
|
CodeGen peephole: fold redundant phys reg copies
Code generation often exposes redundant physical register copies through
virtual registers such as:
%vreg = COPY %PHYSREG
...
%PHYSREG = COPY %vreg
There are cases where no intervening clobber of %PHYSREG occurs, and the
later copy could therefore be removed. In some cases this further allows
us to remove the initial copy.
This patch contains a motivating example which comes from the x86 build
of Chrome, specifically cc::ResourceProvider::UnlockForRead uses
libstdc++'s implementation of hash_map. That example has two tests live
at the same time, and after machine sinking LLVM has confused itself
enough and things spilling EFLAGS is a great idea even though it's
never restored and the comparison results are both live.
Before this patch we have:
DEC32m %RIP, 1, %noreg, <ga:@L>, %noreg, %EFLAGS<imp-def>
%vreg1<def> = COPY %EFLAGS; GR64:%vreg1
%EFLAGS<def> = COPY %vreg1; GR64:%vreg1
JNE_1 <BB#1>, %EFLAGS<imp-use>
Both copies are useless. This patch tries to eliminate the later copy in
a generic manner.
dec is especially confusing to LLVM when compared with sub.
I wrote this patch to treat all physical registers generically, but only
remove redundant copies of non-allocatable physical registers because
the allocatable ones caused issues (e.g. when calling conventions weren't
properly modeled) and should be handled later by the register allocator
anyways.
The following tests used to failed when the patch also replaced allocatable
registers:
CodeGen/X86/StackColoring.ll
CodeGen/X86/avx512-calling-conv.ll
CodeGen/X86/copy-propagation.ll
CodeGen/X86/inline-asm-fpstack.ll
CodeGen/X86/musttail-varargs.ll
CodeGen/X86/pop-stack-cleanup.ll
CodeGen/X86/preserve_mostcc64.ll
CodeGen/X86/tailcallstack64.ll
CodeGen/X86/this-return-64.ll
This happens because COPY has other special meaning for e.g. dependency
breakage and x87 FP stack.
Note that all other backends' tests pass.
Reviewers: qcolombet
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15157
llvm-svn: 254665
2015-12-04 07:43:56 +08:00
|
|
|
|
|
|
|
; The peephole optimizer can elide some physical register copies such as
|
|
|
|
; EFLAGS. Make sure the flags are used directly, instead of needlessly using
|
[x86] Introduce a pass to begin more systematically fixing PR36028 and similar issues.
The key idea is to lower COPY nodes populating EFLAGS by scanning the
uses of EFLAGS and introducing dedicated code to preserve the necessary
state in a GPR. In the vast majority of cases, these uses are cmovCC and
jCC instructions. For such cases, we can very easily save and restore
the necessary information by simply inserting a setCC into a GPR where
the original flags are live, and then testing that GPR directly to feed
the cmov or conditional branch.
However, things are a bit more tricky if arithmetic is using the flags.
This patch handles the vast majority of cases that seem to come up in
practice: adc, adcx, adox, rcl, and rcr; all without taking advantage of
partially preserved EFLAGS as LLVM doesn't currently model that at all.
There are a large number of operations that techinaclly observe EFLAGS
currently but shouldn't in this case -- they typically are using DF.
Currently, they will not be handled by this approach. However, I have
never seen this issue come up in practice. It is already pretty rare to
have these patterns come up in practical code with LLVM. I had to resort
to writing MIR tests to cover most of the logic in this pass already.
I suspect even with its current amount of coverage of arithmetic users
of EFLAGS it will be a significant improvement over the current use of
pushf/popf. It will also produce substantially faster code in most of
the common patterns.
This patch also removes all of the old lowering for EFLAGS copies, and
the hack that forced us to use a frame pointer when EFLAGS copies were
found anywhere in a function so that the dynamic stack adjustment wasn't
a problem. None of this is needed as we now lower all of these copies
directly in MI and without require stack adjustments.
Lots of thanks to Reid who came up with several aspects of this
approach, and Craig who helped me work out a couple of things tripping
me up while working on this.
Differential Revision: https://reviews.llvm.org/D45146
llvm-svn: 329657
2018-04-10 09:41:17 +08:00
|
|
|
; saving and restoring specific conditions.
|
CodeGen peephole: fold redundant phys reg copies
Code generation often exposes redundant physical register copies through
virtual registers such as:
%vreg = COPY %PHYSREG
...
%PHYSREG = COPY %vreg
There are cases where no intervening clobber of %PHYSREG occurs, and the
later copy could therefore be removed. In some cases this further allows
us to remove the initial copy.
This patch contains a motivating example which comes from the x86 build
of Chrome, specifically cc::ResourceProvider::UnlockForRead uses
libstdc++'s implementation of hash_map. That example has two tests live
at the same time, and after machine sinking LLVM has confused itself
enough and things spilling EFLAGS is a great idea even though it's
never restored and the comparison results are both live.
Before this patch we have:
DEC32m %RIP, 1, %noreg, <ga:@L>, %noreg, %EFLAGS<imp-def>
%vreg1<def> = COPY %EFLAGS; GR64:%vreg1
%EFLAGS<def> = COPY %vreg1; GR64:%vreg1
JNE_1 <BB#1>, %EFLAGS<imp-use>
Both copies are useless. This patch tries to eliminate the later copy in
a generic manner.
dec is especially confusing to LLVM when compared with sub.
I wrote this patch to treat all physical registers generically, but only
remove redundant copies of non-allocatable physical registers because
the allocatable ones caused issues (e.g. when calling conventions weren't
properly modeled) and should be handled later by the register allocator
anyways.
The following tests used to failed when the patch also replaced allocatable
registers:
CodeGen/X86/StackColoring.ll
CodeGen/X86/avx512-calling-conv.ll
CodeGen/X86/copy-propagation.ll
CodeGen/X86/inline-asm-fpstack.ll
CodeGen/X86/musttail-varargs.ll
CodeGen/X86/pop-stack-cleanup.ll
CodeGen/X86/preserve_mostcc64.ll
CodeGen/X86/tailcallstack64.ll
CodeGen/X86/this-return-64.ll
This happens because COPY has other special meaning for e.g. dependency
breakage and x87 FP stack.
Note that all other backends' tests pass.
Reviewers: qcolombet
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15157
llvm-svn: 254665
2015-12-04 07:43:56 +08:00
|
|
|
|
|
|
|
@L = external global i32
|
|
|
|
@M = external global i8
|
2017-08-25 10:32:51 +08:00
|
|
|
|
CodeGen peephole: fold redundant phys reg copies
Code generation often exposes redundant physical register copies through
virtual registers such as:
%vreg = COPY %PHYSREG
...
%PHYSREG = COPY %vreg
There are cases where no intervening clobber of %PHYSREG occurs, and the
later copy could therefore be removed. In some cases this further allows
us to remove the initial copy.
This patch contains a motivating example which comes from the x86 build
of Chrome, specifically cc::ResourceProvider::UnlockForRead uses
libstdc++'s implementation of hash_map. That example has two tests live
at the same time, and after machine sinking LLVM has confused itself
enough and things spilling EFLAGS is a great idea even though it's
never restored and the comparison results are both live.
Before this patch we have:
DEC32m %RIP, 1, %noreg, <ga:@L>, %noreg, %EFLAGS<imp-def>
%vreg1<def> = COPY %EFLAGS; GR64:%vreg1
%EFLAGS<def> = COPY %vreg1; GR64:%vreg1
JNE_1 <BB#1>, %EFLAGS<imp-use>
Both copies are useless. This patch tries to eliminate the later copy in
a generic manner.
dec is especially confusing to LLVM when compared with sub.
I wrote this patch to treat all physical registers generically, but only
remove redundant copies of non-allocatable physical registers because
the allocatable ones caused issues (e.g. when calling conventions weren't
properly modeled) and should be handled later by the register allocator
anyways.
The following tests used to failed when the patch also replaced allocatable
registers:
CodeGen/X86/StackColoring.ll
CodeGen/X86/avx512-calling-conv.ll
CodeGen/X86/copy-propagation.ll
CodeGen/X86/inline-asm-fpstack.ll
CodeGen/X86/musttail-varargs.ll
CodeGen/X86/pop-stack-cleanup.ll
CodeGen/X86/preserve_mostcc64.ll
CodeGen/X86/tailcallstack64.ll
CodeGen/X86/this-return-64.ll
This happens because COPY has other special meaning for e.g. dependency
breakage and x87 FP stack.
Note that all other backends' tests pass.
Reviewers: qcolombet
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15157
llvm-svn: 254665
2015-12-04 07:43:56 +08:00
|
|
|
declare i32 @bar(i64)
|
|
|
|
|
2017-08-25 10:32:51 +08:00
|
|
|
define i1 @plus_one() nounwind {
|
|
|
|
; CHECK32-LABEL: plus_one:
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK32: # %bb.0: # %entry
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK32-NEXT: movb M, %al
|
|
|
|
; CHECK32-NEXT: incl L
|
|
|
|
; CHECK32-NEXT: jne .LBB0_2
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK32-NEXT: # %bb.1: # %entry
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK32-NEXT: andb $8, %al
|
|
|
|
; CHECK32-NEXT: je .LBB0_2
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK32-NEXT: # %bb.3: # %exit2
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK32-NEXT: xorl %eax, %eax
|
|
|
|
; CHECK32-NEXT: retl
|
|
|
|
; CHECK32-NEXT: .LBB0_2: # %exit
|
|
|
|
; CHECK32-NEXT: movb $1, %al
|
|
|
|
; CHECK32-NEXT: retl
|
|
|
|
;
|
|
|
|
; CHECK64-LABEL: plus_one:
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK64: # %bb.0: # %entry
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK64-NEXT: movb {{.*}}(%rip), %al
|
|
|
|
; CHECK64-NEXT: incl {{.*}}(%rip)
|
|
|
|
; CHECK64-NEXT: jne .LBB0_2
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK64-NEXT: # %bb.1: # %entry
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK64-NEXT: andb $8, %al
|
|
|
|
; CHECK64-NEXT: je .LBB0_2
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK64-NEXT: # %bb.3: # %exit2
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK64-NEXT: xorl %eax, %eax
|
|
|
|
; CHECK64-NEXT: retq
|
|
|
|
; CHECK64-NEXT: .LBB0_2: # %exit
|
|
|
|
; CHECK64-NEXT: movb $1, %al
|
|
|
|
; CHECK64-NEXT: retq
|
CodeGen peephole: fold redundant phys reg copies
Code generation often exposes redundant physical register copies through
virtual registers such as:
%vreg = COPY %PHYSREG
...
%PHYSREG = COPY %vreg
There are cases where no intervening clobber of %PHYSREG occurs, and the
later copy could therefore be removed. In some cases this further allows
us to remove the initial copy.
This patch contains a motivating example which comes from the x86 build
of Chrome, specifically cc::ResourceProvider::UnlockForRead uses
libstdc++'s implementation of hash_map. That example has two tests live
at the same time, and after machine sinking LLVM has confused itself
enough and things spilling EFLAGS is a great idea even though it's
never restored and the comparison results are both live.
Before this patch we have:
DEC32m %RIP, 1, %noreg, <ga:@L>, %noreg, %EFLAGS<imp-def>
%vreg1<def> = COPY %EFLAGS; GR64:%vreg1
%EFLAGS<def> = COPY %vreg1; GR64:%vreg1
JNE_1 <BB#1>, %EFLAGS<imp-use>
Both copies are useless. This patch tries to eliminate the later copy in
a generic manner.
dec is especially confusing to LLVM when compared with sub.
I wrote this patch to treat all physical registers generically, but only
remove redundant copies of non-allocatable physical registers because
the allocatable ones caused issues (e.g. when calling conventions weren't
properly modeled) and should be handled later by the register allocator
anyways.
The following tests used to failed when the patch also replaced allocatable
registers:
CodeGen/X86/StackColoring.ll
CodeGen/X86/avx512-calling-conv.ll
CodeGen/X86/copy-propagation.ll
CodeGen/X86/inline-asm-fpstack.ll
CodeGen/X86/musttail-varargs.ll
CodeGen/X86/pop-stack-cleanup.ll
CodeGen/X86/preserve_mostcc64.ll
CodeGen/X86/tailcallstack64.ll
CodeGen/X86/this-return-64.ll
This happens because COPY has other special meaning for e.g. dependency
breakage and x87 FP stack.
Note that all other backends' tests pass.
Reviewers: qcolombet
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15157
llvm-svn: 254665
2015-12-04 07:43:56 +08:00
|
|
|
entry:
|
|
|
|
%loaded_L = load i32, i32* @L
|
|
|
|
%val = add nsw i32 %loaded_L, 1 ; N.B. will emit inc.
|
|
|
|
store i32 %val, i32* @L
|
|
|
|
%loaded_M = load i8, i8* @M
|
|
|
|
%masked = and i8 %loaded_M, 8
|
|
|
|
%M_is_true = icmp ne i8 %masked, 0
|
|
|
|
%L_is_false = icmp eq i32 %val, 0
|
|
|
|
%cond = and i1 %L_is_false, %M_is_true
|
|
|
|
br i1 %cond, label %exit2, label %exit
|
|
|
|
|
|
|
|
exit:
|
|
|
|
ret i1 true
|
|
|
|
|
|
|
|
exit2:
|
|
|
|
ret i1 false
|
|
|
|
}
|
|
|
|
|
2017-08-25 10:32:51 +08:00
|
|
|
define i1 @plus_forty_two() nounwind {
|
|
|
|
; CHECK32-LABEL: plus_forty_two:
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK32: # %bb.0: # %entry
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK32-NEXT: movb M, %al
|
2017-09-08 07:54:24 +08:00
|
|
|
; CHECK32-NEXT: addl $42, L
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK32-NEXT: jne .LBB1_2
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK32-NEXT: # %bb.1: # %entry
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK32-NEXT: andb $8, %al
|
|
|
|
; CHECK32-NEXT: je .LBB1_2
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK32-NEXT: # %bb.3: # %exit2
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK32-NEXT: xorl %eax, %eax
|
|
|
|
; CHECK32-NEXT: retl
|
|
|
|
; CHECK32-NEXT: .LBB1_2: # %exit
|
|
|
|
; CHECK32-NEXT: movb $1, %al
|
|
|
|
; CHECK32-NEXT: retl
|
|
|
|
;
|
|
|
|
; CHECK64-LABEL: plus_forty_two:
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK64: # %bb.0: # %entry
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK64-NEXT: movb {{.*}}(%rip), %al
|
2017-09-08 07:54:24 +08:00
|
|
|
; CHECK64-NEXT: addl $42, {{.*}}(%rip)
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK64-NEXT: jne .LBB1_2
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK64-NEXT: # %bb.1: # %entry
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK64-NEXT: andb $8, %al
|
|
|
|
; CHECK64-NEXT: je .LBB1_2
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK64-NEXT: # %bb.3: # %exit2
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK64-NEXT: xorl %eax, %eax
|
|
|
|
; CHECK64-NEXT: retq
|
|
|
|
; CHECK64-NEXT: .LBB1_2: # %exit
|
|
|
|
; CHECK64-NEXT: movb $1, %al
|
|
|
|
; CHECK64-NEXT: retq
|
CodeGen peephole: fold redundant phys reg copies
Code generation often exposes redundant physical register copies through
virtual registers such as:
%vreg = COPY %PHYSREG
...
%PHYSREG = COPY %vreg
There are cases where no intervening clobber of %PHYSREG occurs, and the
later copy could therefore be removed. In some cases this further allows
us to remove the initial copy.
This patch contains a motivating example which comes from the x86 build
of Chrome, specifically cc::ResourceProvider::UnlockForRead uses
libstdc++'s implementation of hash_map. That example has two tests live
at the same time, and after machine sinking LLVM has confused itself
enough and things spilling EFLAGS is a great idea even though it's
never restored and the comparison results are both live.
Before this patch we have:
DEC32m %RIP, 1, %noreg, <ga:@L>, %noreg, %EFLAGS<imp-def>
%vreg1<def> = COPY %EFLAGS; GR64:%vreg1
%EFLAGS<def> = COPY %vreg1; GR64:%vreg1
JNE_1 <BB#1>, %EFLAGS<imp-use>
Both copies are useless. This patch tries to eliminate the later copy in
a generic manner.
dec is especially confusing to LLVM when compared with sub.
I wrote this patch to treat all physical registers generically, but only
remove redundant copies of non-allocatable physical registers because
the allocatable ones caused issues (e.g. when calling conventions weren't
properly modeled) and should be handled later by the register allocator
anyways.
The following tests used to failed when the patch also replaced allocatable
registers:
CodeGen/X86/StackColoring.ll
CodeGen/X86/avx512-calling-conv.ll
CodeGen/X86/copy-propagation.ll
CodeGen/X86/inline-asm-fpstack.ll
CodeGen/X86/musttail-varargs.ll
CodeGen/X86/pop-stack-cleanup.ll
CodeGen/X86/preserve_mostcc64.ll
CodeGen/X86/tailcallstack64.ll
CodeGen/X86/this-return-64.ll
This happens because COPY has other special meaning for e.g. dependency
breakage and x87 FP stack.
Note that all other backends' tests pass.
Reviewers: qcolombet
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15157
llvm-svn: 254665
2015-12-04 07:43:56 +08:00
|
|
|
entry:
|
|
|
|
%loaded_L = load i32, i32* @L
|
|
|
|
%val = add nsw i32 %loaded_L, 42 ; N.B. won't emit inc.
|
|
|
|
store i32 %val, i32* @L
|
|
|
|
%loaded_M = load i8, i8* @M
|
|
|
|
%masked = and i8 %loaded_M, 8
|
|
|
|
%M_is_true = icmp ne i8 %masked, 0
|
|
|
|
%L_is_false = icmp eq i32 %val, 0
|
|
|
|
%cond = and i1 %L_is_false, %M_is_true
|
|
|
|
br i1 %cond, label %exit2, label %exit
|
|
|
|
|
|
|
|
exit:
|
|
|
|
ret i1 true
|
|
|
|
|
|
|
|
exit2:
|
|
|
|
ret i1 false
|
|
|
|
}
|
|
|
|
|
2017-08-25 10:32:51 +08:00
|
|
|
define i1 @minus_one() nounwind {
|
|
|
|
; CHECK32-LABEL: minus_one:
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK32: # %bb.0: # %entry
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK32-NEXT: movb M, %al
|
|
|
|
; CHECK32-NEXT: decl L
|
|
|
|
; CHECK32-NEXT: jne .LBB2_2
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK32-NEXT: # %bb.1: # %entry
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK32-NEXT: andb $8, %al
|
|
|
|
; CHECK32-NEXT: je .LBB2_2
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK32-NEXT: # %bb.3: # %exit2
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK32-NEXT: xorl %eax, %eax
|
|
|
|
; CHECK32-NEXT: retl
|
|
|
|
; CHECK32-NEXT: .LBB2_2: # %exit
|
|
|
|
; CHECK32-NEXT: movb $1, %al
|
|
|
|
; CHECK32-NEXT: retl
|
|
|
|
;
|
|
|
|
; CHECK64-LABEL: minus_one:
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK64: # %bb.0: # %entry
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK64-NEXT: movb {{.*}}(%rip), %al
|
|
|
|
; CHECK64-NEXT: decl {{.*}}(%rip)
|
|
|
|
; CHECK64-NEXT: jne .LBB2_2
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK64-NEXT: # %bb.1: # %entry
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK64-NEXT: andb $8, %al
|
|
|
|
; CHECK64-NEXT: je .LBB2_2
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK64-NEXT: # %bb.3: # %exit2
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK64-NEXT: xorl %eax, %eax
|
|
|
|
; CHECK64-NEXT: retq
|
|
|
|
; CHECK64-NEXT: .LBB2_2: # %exit
|
|
|
|
; CHECK64-NEXT: movb $1, %al
|
|
|
|
; CHECK64-NEXT: retq
|
CodeGen peephole: fold redundant phys reg copies
Code generation often exposes redundant physical register copies through
virtual registers such as:
%vreg = COPY %PHYSREG
...
%PHYSREG = COPY %vreg
There are cases where no intervening clobber of %PHYSREG occurs, and the
later copy could therefore be removed. In some cases this further allows
us to remove the initial copy.
This patch contains a motivating example which comes from the x86 build
of Chrome, specifically cc::ResourceProvider::UnlockForRead uses
libstdc++'s implementation of hash_map. That example has two tests live
at the same time, and after machine sinking LLVM has confused itself
enough and things spilling EFLAGS is a great idea even though it's
never restored and the comparison results are both live.
Before this patch we have:
DEC32m %RIP, 1, %noreg, <ga:@L>, %noreg, %EFLAGS<imp-def>
%vreg1<def> = COPY %EFLAGS; GR64:%vreg1
%EFLAGS<def> = COPY %vreg1; GR64:%vreg1
JNE_1 <BB#1>, %EFLAGS<imp-use>
Both copies are useless. This patch tries to eliminate the later copy in
a generic manner.
dec is especially confusing to LLVM when compared with sub.
I wrote this patch to treat all physical registers generically, but only
remove redundant copies of non-allocatable physical registers because
the allocatable ones caused issues (e.g. when calling conventions weren't
properly modeled) and should be handled later by the register allocator
anyways.
The following tests used to failed when the patch also replaced allocatable
registers:
CodeGen/X86/StackColoring.ll
CodeGen/X86/avx512-calling-conv.ll
CodeGen/X86/copy-propagation.ll
CodeGen/X86/inline-asm-fpstack.ll
CodeGen/X86/musttail-varargs.ll
CodeGen/X86/pop-stack-cleanup.ll
CodeGen/X86/preserve_mostcc64.ll
CodeGen/X86/tailcallstack64.ll
CodeGen/X86/this-return-64.ll
This happens because COPY has other special meaning for e.g. dependency
breakage and x87 FP stack.
Note that all other backends' tests pass.
Reviewers: qcolombet
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15157
llvm-svn: 254665
2015-12-04 07:43:56 +08:00
|
|
|
entry:
|
|
|
|
%loaded_L = load i32, i32* @L
|
|
|
|
%val = add nsw i32 %loaded_L, -1 ; N.B. will emit dec.
|
|
|
|
store i32 %val, i32* @L
|
|
|
|
%loaded_M = load i8, i8* @M
|
|
|
|
%masked = and i8 %loaded_M, 8
|
|
|
|
%M_is_true = icmp ne i8 %masked, 0
|
|
|
|
%L_is_false = icmp eq i32 %val, 0
|
|
|
|
%cond = and i1 %L_is_false, %M_is_true
|
|
|
|
br i1 %cond, label %exit2, label %exit
|
|
|
|
|
|
|
|
exit:
|
|
|
|
ret i1 true
|
|
|
|
|
|
|
|
exit2:
|
|
|
|
ret i1 false
|
|
|
|
}
|
|
|
|
|
2017-08-25 10:32:51 +08:00
|
|
|
define i1 @minus_forty_two() nounwind {
|
|
|
|
; CHECK32-LABEL: minus_forty_two:
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK32: # %bb.0: # %entry
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK32-NEXT: movb M, %al
|
2017-09-08 07:54:24 +08:00
|
|
|
; CHECK32-NEXT: addl $-42, L
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK32-NEXT: jne .LBB3_2
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK32-NEXT: # %bb.1: # %entry
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK32-NEXT: andb $8, %al
|
|
|
|
; CHECK32-NEXT: je .LBB3_2
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK32-NEXT: # %bb.3: # %exit2
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK32-NEXT: xorl %eax, %eax
|
|
|
|
; CHECK32-NEXT: retl
|
|
|
|
; CHECK32-NEXT: .LBB3_2: # %exit
|
|
|
|
; CHECK32-NEXT: movb $1, %al
|
|
|
|
; CHECK32-NEXT: retl
|
|
|
|
;
|
|
|
|
; CHECK64-LABEL: minus_forty_two:
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK64: # %bb.0: # %entry
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK64-NEXT: movb {{.*}}(%rip), %al
|
2017-09-08 07:54:24 +08:00
|
|
|
; CHECK64-NEXT: addl $-42, {{.*}}(%rip)
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK64-NEXT: jne .LBB3_2
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK64-NEXT: # %bb.1: # %entry
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK64-NEXT: andb $8, %al
|
|
|
|
; CHECK64-NEXT: je .LBB3_2
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK64-NEXT: # %bb.3: # %exit2
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK64-NEXT: xorl %eax, %eax
|
|
|
|
; CHECK64-NEXT: retq
|
|
|
|
; CHECK64-NEXT: .LBB3_2: # %exit
|
|
|
|
; CHECK64-NEXT: movb $1, %al
|
|
|
|
; CHECK64-NEXT: retq
|
CodeGen peephole: fold redundant phys reg copies
Code generation often exposes redundant physical register copies through
virtual registers such as:
%vreg = COPY %PHYSREG
...
%PHYSREG = COPY %vreg
There are cases where no intervening clobber of %PHYSREG occurs, and the
later copy could therefore be removed. In some cases this further allows
us to remove the initial copy.
This patch contains a motivating example which comes from the x86 build
of Chrome, specifically cc::ResourceProvider::UnlockForRead uses
libstdc++'s implementation of hash_map. That example has two tests live
at the same time, and after machine sinking LLVM has confused itself
enough and things spilling EFLAGS is a great idea even though it's
never restored and the comparison results are both live.
Before this patch we have:
DEC32m %RIP, 1, %noreg, <ga:@L>, %noreg, %EFLAGS<imp-def>
%vreg1<def> = COPY %EFLAGS; GR64:%vreg1
%EFLAGS<def> = COPY %vreg1; GR64:%vreg1
JNE_1 <BB#1>, %EFLAGS<imp-use>
Both copies are useless. This patch tries to eliminate the later copy in
a generic manner.
dec is especially confusing to LLVM when compared with sub.
I wrote this patch to treat all physical registers generically, but only
remove redundant copies of non-allocatable physical registers because
the allocatable ones caused issues (e.g. when calling conventions weren't
properly modeled) and should be handled later by the register allocator
anyways.
The following tests used to failed when the patch also replaced allocatable
registers:
CodeGen/X86/StackColoring.ll
CodeGen/X86/avx512-calling-conv.ll
CodeGen/X86/copy-propagation.ll
CodeGen/X86/inline-asm-fpstack.ll
CodeGen/X86/musttail-varargs.ll
CodeGen/X86/pop-stack-cleanup.ll
CodeGen/X86/preserve_mostcc64.ll
CodeGen/X86/tailcallstack64.ll
CodeGen/X86/this-return-64.ll
This happens because COPY has other special meaning for e.g. dependency
breakage and x87 FP stack.
Note that all other backends' tests pass.
Reviewers: qcolombet
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15157
llvm-svn: 254665
2015-12-04 07:43:56 +08:00
|
|
|
entry:
|
|
|
|
%loaded_L = load i32, i32* @L
|
|
|
|
%val = add nsw i32 %loaded_L, -42 ; N.B. won't emit dec.
|
|
|
|
store i32 %val, i32* @L
|
|
|
|
%loaded_M = load i8, i8* @M
|
|
|
|
%masked = and i8 %loaded_M, 8
|
|
|
|
%M_is_true = icmp ne i8 %masked, 0
|
|
|
|
%L_is_false = icmp eq i32 %val, 0
|
|
|
|
%cond = and i1 %L_is_false, %M_is_true
|
|
|
|
br i1 %cond, label %exit2, label %exit
|
|
|
|
|
|
|
|
exit:
|
|
|
|
ret i1 true
|
|
|
|
|
|
|
|
exit2:
|
|
|
|
ret i1 false
|
|
|
|
}
|
|
|
|
|
2017-08-25 10:32:51 +08:00
|
|
|
define i64 @test_intervening_call(i64* %foo, i64 %bar, i64 %baz) nounwind {
|
|
|
|
; CHECK32-LABEL: test_intervening_call:
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK32: # %bb.0: # %entry
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK32-NEXT: pushl %ebx
|
|
|
|
; CHECK32-NEXT: pushl %esi
|
|
|
|
; CHECK32-NEXT: pushl %eax
|
[x86] Introduce a pass to begin more systematically fixing PR36028 and similar issues.
The key idea is to lower COPY nodes populating EFLAGS by scanning the
uses of EFLAGS and introducing dedicated code to preserve the necessary
state in a GPR. In the vast majority of cases, these uses are cmovCC and
jCC instructions. For such cases, we can very easily save and restore
the necessary information by simply inserting a setCC into a GPR where
the original flags are live, and then testing that GPR directly to feed
the cmov or conditional branch.
However, things are a bit more tricky if arithmetic is using the flags.
This patch handles the vast majority of cases that seem to come up in
practice: adc, adcx, adox, rcl, and rcr; all without taking advantage of
partially preserved EFLAGS as LLVM doesn't currently model that at all.
There are a large number of operations that techinaclly observe EFLAGS
currently but shouldn't in this case -- they typically are using DF.
Currently, they will not be handled by this approach. However, I have
never seen this issue come up in practice. It is already pretty rare to
have these patterns come up in practical code with LLVM. I had to resort
to writing MIR tests to cover most of the logic in this pass already.
I suspect even with its current amount of coverage of arithmetic users
of EFLAGS it will be a significant improvement over the current use of
pushf/popf. It will also produce substantially faster code in most of
the common patterns.
This patch also removes all of the old lowering for EFLAGS copies, and
the hack that forced us to use a frame pointer when EFLAGS copies were
found anywhere in a function so that the dynamic stack adjustment wasn't
a problem. None of this is needed as we now lower all of these copies
directly in MI and without require stack adjustments.
Lots of thanks to Reid who came up with several aspects of this
approach, and Craig who helped me work out a couple of things tripping
me up while working on this.
Differential Revision: https://reviews.llvm.org/D45146
llvm-svn: 329657
2018-04-10 09:41:17 +08:00
|
|
|
; CHECK32-NEXT: movl {{[0-9]+}}(%esp), %eax
|
|
|
|
; CHECK32-NEXT: movl {{[0-9]+}}(%esp), %edx
|
|
|
|
; CHECK32-NEXT: movl {{[0-9]+}}(%esp), %ebx
|
|
|
|
; CHECK32-NEXT: movl {{[0-9]+}}(%esp), %ecx
|
|
|
|
; CHECK32-NEXT: movl {{[0-9]+}}(%esp), %esi
|
|
|
|
; CHECK32-NEXT: lock cmpxchg8b (%esi)
|
|
|
|
; CHECK32-NEXT: setne %bl
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK32-NEXT: subl $8, %esp
|
|
|
|
; CHECK32-NEXT: pushl %edx
|
|
|
|
; CHECK32-NEXT: pushl %eax
|
|
|
|
; CHECK32-NEXT: calll bar
|
|
|
|
; CHECK32-NEXT: addl $16, %esp
|
2018-04-18 23:52:50 +08:00
|
|
|
; CHECK32-NEXT: testb %bl, %bl
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK32-NEXT: jne .LBB4_3
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK32-NEXT: # %bb.1: # %t
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK32-NEXT: movl $42, %eax
|
|
|
|
; CHECK32-NEXT: jmp .LBB4_2
|
|
|
|
; CHECK32-NEXT: .LBB4_3: # %f
|
|
|
|
; CHECK32-NEXT: xorl %eax, %eax
|
|
|
|
; CHECK32-NEXT: .LBB4_2: # %t
|
|
|
|
; CHECK32-NEXT: xorl %edx, %edx
|
[x86] Introduce a pass to begin more systematically fixing PR36028 and similar issues.
The key idea is to lower COPY nodes populating EFLAGS by scanning the
uses of EFLAGS and introducing dedicated code to preserve the necessary
state in a GPR. In the vast majority of cases, these uses are cmovCC and
jCC instructions. For such cases, we can very easily save and restore
the necessary information by simply inserting a setCC into a GPR where
the original flags are live, and then testing that GPR directly to feed
the cmov or conditional branch.
However, things are a bit more tricky if arithmetic is using the flags.
This patch handles the vast majority of cases that seem to come up in
practice: adc, adcx, adox, rcl, and rcr; all without taking advantage of
partially preserved EFLAGS as LLVM doesn't currently model that at all.
There are a large number of operations that techinaclly observe EFLAGS
currently but shouldn't in this case -- they typically are using DF.
Currently, they will not be handled by this approach. However, I have
never seen this issue come up in practice. It is already pretty rare to
have these patterns come up in practical code with LLVM. I had to resort
to writing MIR tests to cover most of the logic in this pass already.
I suspect even with its current amount of coverage of arithmetic users
of EFLAGS it will be a significant improvement over the current use of
pushf/popf. It will also produce substantially faster code in most of
the common patterns.
This patch also removes all of the old lowering for EFLAGS copies, and
the hack that forced us to use a frame pointer when EFLAGS copies were
found anywhere in a function so that the dynamic stack adjustment wasn't
a problem. None of this is needed as we now lower all of these copies
directly in MI and without require stack adjustments.
Lots of thanks to Reid who came up with several aspects of this
approach, and Craig who helped me work out a couple of things tripping
me up while working on this.
Differential Revision: https://reviews.llvm.org/D45146
llvm-svn: 329657
2018-04-10 09:41:17 +08:00
|
|
|
; CHECK32-NEXT: addl $4, %esp
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK32-NEXT: popl %esi
|
|
|
|
; CHECK32-NEXT: popl %ebx
|
|
|
|
; CHECK32-NEXT: retl
|
|
|
|
;
|
|
|
|
; CHECK64-LABEL: test_intervening_call:
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK64: # %bb.0: # %entry
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK64-NEXT: pushq %rbx
|
|
|
|
; CHECK64-NEXT: movq %rsi, %rax
|
|
|
|
; CHECK64-NEXT: lock cmpxchgq %rdx, (%rdi)
|
[x86] Introduce a pass to begin more systematically fixing PR36028 and similar issues.
The key idea is to lower COPY nodes populating EFLAGS by scanning the
uses of EFLAGS and introducing dedicated code to preserve the necessary
state in a GPR. In the vast majority of cases, these uses are cmovCC and
jCC instructions. For such cases, we can very easily save and restore
the necessary information by simply inserting a setCC into a GPR where
the original flags are live, and then testing that GPR directly to feed
the cmov or conditional branch.
However, things are a bit more tricky if arithmetic is using the flags.
This patch handles the vast majority of cases that seem to come up in
practice: adc, adcx, adox, rcl, and rcr; all without taking advantage of
partially preserved EFLAGS as LLVM doesn't currently model that at all.
There are a large number of operations that techinaclly observe EFLAGS
currently but shouldn't in this case -- they typically are using DF.
Currently, they will not be handled by this approach. However, I have
never seen this issue come up in practice. It is already pretty rare to
have these patterns come up in practical code with LLVM. I had to resort
to writing MIR tests to cover most of the logic in this pass already.
I suspect even with its current amount of coverage of arithmetic users
of EFLAGS it will be a significant improvement over the current use of
pushf/popf. It will also produce substantially faster code in most of
the common patterns.
This patch also removes all of the old lowering for EFLAGS copies, and
the hack that forced us to use a frame pointer when EFLAGS copies were
found anywhere in a function so that the dynamic stack adjustment wasn't
a problem. None of this is needed as we now lower all of these copies
directly in MI and without require stack adjustments.
Lots of thanks to Reid who came up with several aspects of this
approach, and Craig who helped me work out a couple of things tripping
me up while working on this.
Differential Revision: https://reviews.llvm.org/D45146
llvm-svn: 329657
2018-04-10 09:41:17 +08:00
|
|
|
; CHECK64-NEXT: setne %bl
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK64-NEXT: movq %rax, %rdi
|
|
|
|
; CHECK64-NEXT: callq bar
|
2018-04-18 23:52:50 +08:00
|
|
|
; CHECK64-NEXT: testb %bl, %bl
|
[x86] Introduce a pass to begin more systematically fixing PR36028 and similar issues.
The key idea is to lower COPY nodes populating EFLAGS by scanning the
uses of EFLAGS and introducing dedicated code to preserve the necessary
state in a GPR. In the vast majority of cases, these uses are cmovCC and
jCC instructions. For such cases, we can very easily save and restore
the necessary information by simply inserting a setCC into a GPR where
the original flags are live, and then testing that GPR directly to feed
the cmov or conditional branch.
However, things are a bit more tricky if arithmetic is using the flags.
This patch handles the vast majority of cases that seem to come up in
practice: adc, adcx, adox, rcl, and rcr; all without taking advantage of
partially preserved EFLAGS as LLVM doesn't currently model that at all.
There are a large number of operations that techinaclly observe EFLAGS
currently but shouldn't in this case -- they typically are using DF.
Currently, they will not be handled by this approach. However, I have
never seen this issue come up in practice. It is already pretty rare to
have these patterns come up in practical code with LLVM. I had to resort
to writing MIR tests to cover most of the logic in this pass already.
I suspect even with its current amount of coverage of arithmetic users
of EFLAGS it will be a significant improvement over the current use of
pushf/popf. It will also produce substantially faster code in most of
the common patterns.
This patch also removes all of the old lowering for EFLAGS copies, and
the hack that forced us to use a frame pointer when EFLAGS copies were
found anywhere in a function so that the dynamic stack adjustment wasn't
a problem. None of this is needed as we now lower all of these copies
directly in MI and without require stack adjustments.
Lots of thanks to Reid who came up with several aspects of this
approach, and Craig who helped me work out a couple of things tripping
me up while working on this.
Differential Revision: https://reviews.llvm.org/D45146
llvm-svn: 329657
2018-04-10 09:41:17 +08:00
|
|
|
; CHECK64-NEXT: jne .LBB4_2
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK64-NEXT: # %bb.1: # %t
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK64-NEXT: movl $42, %eax
|
[x86] Introduce a pass to begin more systematically fixing PR36028 and similar issues.
The key idea is to lower COPY nodes populating EFLAGS by scanning the
uses of EFLAGS and introducing dedicated code to preserve the necessary
state in a GPR. In the vast majority of cases, these uses are cmovCC and
jCC instructions. For such cases, we can very easily save and restore
the necessary information by simply inserting a setCC into a GPR where
the original flags are live, and then testing that GPR directly to feed
the cmov or conditional branch.
However, things are a bit more tricky if arithmetic is using the flags.
This patch handles the vast majority of cases that seem to come up in
practice: adc, adcx, adox, rcl, and rcr; all without taking advantage of
partially preserved EFLAGS as LLVM doesn't currently model that at all.
There are a large number of operations that techinaclly observe EFLAGS
currently but shouldn't in this case -- they typically are using DF.
Currently, they will not be handled by this approach. However, I have
never seen this issue come up in practice. It is already pretty rare to
have these patterns come up in practical code with LLVM. I had to resort
to writing MIR tests to cover most of the logic in this pass already.
I suspect even with its current amount of coverage of arithmetic users
of EFLAGS it will be a significant improvement over the current use of
pushf/popf. It will also produce substantially faster code in most of
the common patterns.
This patch also removes all of the old lowering for EFLAGS copies, and
the hack that forced us to use a frame pointer when EFLAGS copies were
found anywhere in a function so that the dynamic stack adjustment wasn't
a problem. None of this is needed as we now lower all of these copies
directly in MI and without require stack adjustments.
Lots of thanks to Reid who came up with several aspects of this
approach, and Craig who helped me work out a couple of things tripping
me up while working on this.
Differential Revision: https://reviews.llvm.org/D45146
llvm-svn: 329657
2018-04-10 09:41:17 +08:00
|
|
|
; CHECK64-NEXT: popq %rbx
|
|
|
|
; CHECK64-NEXT: retq
|
|
|
|
; CHECK64-NEXT: .LBB4_2: # %f
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK64-NEXT: xorl %eax, %eax
|
|
|
|
; CHECK64-NEXT: popq %rbx
|
|
|
|
; CHECK64-NEXT: retq
|
|
|
|
entry:
|
CodeGen peephole: fold redundant phys reg copies
Code generation often exposes redundant physical register copies through
virtual registers such as:
%vreg = COPY %PHYSREG
...
%PHYSREG = COPY %vreg
There are cases where no intervening clobber of %PHYSREG occurs, and the
later copy could therefore be removed. In some cases this further allows
us to remove the initial copy.
This patch contains a motivating example which comes from the x86 build
of Chrome, specifically cc::ResourceProvider::UnlockForRead uses
libstdc++'s implementation of hash_map. That example has two tests live
at the same time, and after machine sinking LLVM has confused itself
enough and things spilling EFLAGS is a great idea even though it's
never restored and the comparison results are both live.
Before this patch we have:
DEC32m %RIP, 1, %noreg, <ga:@L>, %noreg, %EFLAGS<imp-def>
%vreg1<def> = COPY %EFLAGS; GR64:%vreg1
%EFLAGS<def> = COPY %vreg1; GR64:%vreg1
JNE_1 <BB#1>, %EFLAGS<imp-use>
Both copies are useless. This patch tries to eliminate the later copy in
a generic manner.
dec is especially confusing to LLVM when compared with sub.
I wrote this patch to treat all physical registers generically, but only
remove redundant copies of non-allocatable physical registers because
the allocatable ones caused issues (e.g. when calling conventions weren't
properly modeled) and should be handled later by the register allocator
anyways.
The following tests used to failed when the patch also replaced allocatable
registers:
CodeGen/X86/StackColoring.ll
CodeGen/X86/avx512-calling-conv.ll
CodeGen/X86/copy-propagation.ll
CodeGen/X86/inline-asm-fpstack.ll
CodeGen/X86/musttail-varargs.ll
CodeGen/X86/pop-stack-cleanup.ll
CodeGen/X86/preserve_mostcc64.ll
CodeGen/X86/tailcallstack64.ll
CodeGen/X86/this-return-64.ll
This happens because COPY has other special meaning for e.g. dependency
breakage and x87 FP stack.
Note that all other backends' tests pass.
Reviewers: qcolombet
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15157
llvm-svn: 254665
2015-12-04 07:43:56 +08:00
|
|
|
; cmpxchg sets EFLAGS, call clobbers it, then br uses EFLAGS.
|
|
|
|
%cx = cmpxchg i64* %foo, i64 %bar, i64 %baz seq_cst seq_cst
|
|
|
|
%v = extractvalue { i64, i1 } %cx, 0
|
|
|
|
%p = extractvalue { i64, i1 } %cx, 1
|
|
|
|
call i32 @bar(i64 %v)
|
|
|
|
br i1 %p, label %t, label %f
|
|
|
|
|
|
|
|
t:
|
|
|
|
ret i64 42
|
|
|
|
|
|
|
|
f:
|
|
|
|
ret i64 0
|
|
|
|
}
|
|
|
|
|
2017-08-25 10:32:51 +08:00
|
|
|
define i64 @test_two_live_flags(i64* %foo0, i64 %bar0, i64 %baz0, i64* %foo1, i64 %bar1, i64 %baz1) nounwind {
|
|
|
|
; CHECK32-LABEL: test_two_live_flags:
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK32: # %bb.0: # %entry
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK32-NEXT: pushl %ebp
|
|
|
|
; CHECK32-NEXT: pushl %ebx
|
|
|
|
; CHECK32-NEXT: pushl %edi
|
|
|
|
; CHECK32-NEXT: pushl %esi
|
[x86] Introduce a pass to begin more systematically fixing PR36028 and similar issues.
The key idea is to lower COPY nodes populating EFLAGS by scanning the
uses of EFLAGS and introducing dedicated code to preserve the necessary
state in a GPR. In the vast majority of cases, these uses are cmovCC and
jCC instructions. For such cases, we can very easily save and restore
the necessary information by simply inserting a setCC into a GPR where
the original flags are live, and then testing that GPR directly to feed
the cmov or conditional branch.
However, things are a bit more tricky if arithmetic is using the flags.
This patch handles the vast majority of cases that seem to come up in
practice: adc, adcx, adox, rcl, and rcr; all without taking advantage of
partially preserved EFLAGS as LLVM doesn't currently model that at all.
There are a large number of operations that techinaclly observe EFLAGS
currently but shouldn't in this case -- they typically are using DF.
Currently, they will not be handled by this approach. However, I have
never seen this issue come up in practice. It is already pretty rare to
have these patterns come up in practical code with LLVM. I had to resort
to writing MIR tests to cover most of the logic in this pass already.
I suspect even with its current amount of coverage of arithmetic users
of EFLAGS it will be a significant improvement over the current use of
pushf/popf. It will also produce substantially faster code in most of
the common patterns.
This patch also removes all of the old lowering for EFLAGS copies, and
the hack that forced us to use a frame pointer when EFLAGS copies were
found anywhere in a function so that the dynamic stack adjustment wasn't
a problem. None of this is needed as we now lower all of these copies
directly in MI and without require stack adjustments.
Lots of thanks to Reid who came up with several aspects of this
approach, and Craig who helped me work out a couple of things tripping
me up while working on this.
Differential Revision: https://reviews.llvm.org/D45146
llvm-svn: 329657
2018-04-10 09:41:17 +08:00
|
|
|
; CHECK32-NEXT: pushl %eax
|
|
|
|
; CHECK32-NEXT: movl {{[0-9]+}}(%esp), %edi
|
|
|
|
; CHECK32-NEXT: movl {{[0-9]+}}(%esp), %ebp
|
|
|
|
; CHECK32-NEXT: movl {{[0-9]+}}(%esp), %eax
|
|
|
|
; CHECK32-NEXT: movl {{[0-9]+}}(%esp), %edx
|
|
|
|
; CHECK32-NEXT: movl {{[0-9]+}}(%esp), %ebx
|
|
|
|
; CHECK32-NEXT: movl {{[0-9]+}}(%esp), %ecx
|
|
|
|
; CHECK32-NEXT: movl {{[0-9]+}}(%esp), %esi
|
|
|
|
; CHECK32-NEXT: lock cmpxchg8b (%esi)
|
2018-04-18 23:52:50 +08:00
|
|
|
; CHECK32-NEXT: setne {{[-0-9]+}}(%e{{[sb]}}p) # 1-byte Folded Spill
|
[x86] Introduce a pass to begin more systematically fixing PR36028 and similar issues.
The key idea is to lower COPY nodes populating EFLAGS by scanning the
uses of EFLAGS and introducing dedicated code to preserve the necessary
state in a GPR. In the vast majority of cases, these uses are cmovCC and
jCC instructions. For such cases, we can very easily save and restore
the necessary information by simply inserting a setCC into a GPR where
the original flags are live, and then testing that GPR directly to feed
the cmov or conditional branch.
However, things are a bit more tricky if arithmetic is using the flags.
This patch handles the vast majority of cases that seem to come up in
practice: adc, adcx, adox, rcl, and rcr; all without taking advantage of
partially preserved EFLAGS as LLVM doesn't currently model that at all.
There are a large number of operations that techinaclly observe EFLAGS
currently but shouldn't in this case -- they typically are using DF.
Currently, they will not be handled by this approach. However, I have
never seen this issue come up in practice. It is already pretty rare to
have these patterns come up in practical code with LLVM. I had to resort
to writing MIR tests to cover most of the logic in this pass already.
I suspect even with its current amount of coverage of arithmetic users
of EFLAGS it will be a significant improvement over the current use of
pushf/popf. It will also produce substantially faster code in most of
the common patterns.
This patch also removes all of the old lowering for EFLAGS copies, and
the hack that forced us to use a frame pointer when EFLAGS copies were
found anywhere in a function so that the dynamic stack adjustment wasn't
a problem. None of this is needed as we now lower all of these copies
directly in MI and without require stack adjustments.
Lots of thanks to Reid who came up with several aspects of this
approach, and Craig who helped me work out a couple of things tripping
me up while working on this.
Differential Revision: https://reviews.llvm.org/D45146
llvm-svn: 329657
2018-04-10 09:41:17 +08:00
|
|
|
; CHECK32-NEXT: movl {{[0-9]+}}(%esp), %eax
|
|
|
|
; CHECK32-NEXT: movl %edi, %edx
|
|
|
|
; CHECK32-NEXT: movl %ebp, %ecx
|
|
|
|
; CHECK32-NEXT: movl {{[0-9]+}}(%esp), %ebx
|
|
|
|
; CHECK32-NEXT: movl {{[0-9]+}}(%esp), %esi
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK32-NEXT: lock cmpxchg8b (%esi)
|
|
|
|
; CHECK32-NEXT: sete %al
|
2018-04-18 23:52:50 +08:00
|
|
|
; CHECK32-NEXT: cmpb $0, {{[-0-9]+}}(%e{{[sb]}}p) # 1-byte Folded Reload
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK32-NEXT: jne .LBB5_4
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK32-NEXT: # %bb.1: # %entry
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK32-NEXT: testb %al, %al
|
|
|
|
; CHECK32-NEXT: je .LBB5_4
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK32-NEXT: # %bb.2: # %t
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK32-NEXT: movl $42, %eax
|
|
|
|
; CHECK32-NEXT: jmp .LBB5_3
|
|
|
|
; CHECK32-NEXT: .LBB5_4: # %f
|
|
|
|
; CHECK32-NEXT: xorl %eax, %eax
|
|
|
|
; CHECK32-NEXT: .LBB5_3: # %t
|
|
|
|
; CHECK32-NEXT: xorl %edx, %edx
|
[x86] Introduce a pass to begin more systematically fixing PR36028 and similar issues.
The key idea is to lower COPY nodes populating EFLAGS by scanning the
uses of EFLAGS and introducing dedicated code to preserve the necessary
state in a GPR. In the vast majority of cases, these uses are cmovCC and
jCC instructions. For such cases, we can very easily save and restore
the necessary information by simply inserting a setCC into a GPR where
the original flags are live, and then testing that GPR directly to feed
the cmov or conditional branch.
However, things are a bit more tricky if arithmetic is using the flags.
This patch handles the vast majority of cases that seem to come up in
practice: adc, adcx, adox, rcl, and rcr; all without taking advantage of
partially preserved EFLAGS as LLVM doesn't currently model that at all.
There are a large number of operations that techinaclly observe EFLAGS
currently but shouldn't in this case -- they typically are using DF.
Currently, they will not be handled by this approach. However, I have
never seen this issue come up in practice. It is already pretty rare to
have these patterns come up in practical code with LLVM. I had to resort
to writing MIR tests to cover most of the logic in this pass already.
I suspect even with its current amount of coverage of arithmetic users
of EFLAGS it will be a significant improvement over the current use of
pushf/popf. It will also produce substantially faster code in most of
the common patterns.
This patch also removes all of the old lowering for EFLAGS copies, and
the hack that forced us to use a frame pointer when EFLAGS copies were
found anywhere in a function so that the dynamic stack adjustment wasn't
a problem. None of this is needed as we now lower all of these copies
directly in MI and without require stack adjustments.
Lots of thanks to Reid who came up with several aspects of this
approach, and Craig who helped me work out a couple of things tripping
me up while working on this.
Differential Revision: https://reviews.llvm.org/D45146
llvm-svn: 329657
2018-04-10 09:41:17 +08:00
|
|
|
; CHECK32-NEXT: addl $4, %esp
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK32-NEXT: popl %esi
|
|
|
|
; CHECK32-NEXT: popl %edi
|
|
|
|
; CHECK32-NEXT: popl %ebx
|
|
|
|
; CHECK32-NEXT: popl %ebp
|
|
|
|
; CHECK32-NEXT: retl
|
|
|
|
;
|
|
|
|
; CHECK64-LABEL: test_two_live_flags:
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK64: # %bb.0: # %entry
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK64-NEXT: movq %rsi, %rax
|
|
|
|
; CHECK64-NEXT: lock cmpxchgq %rdx, (%rdi)
|
[x86] Introduce a pass to begin more systematically fixing PR36028 and similar issues.
The key idea is to lower COPY nodes populating EFLAGS by scanning the
uses of EFLAGS and introducing dedicated code to preserve the necessary
state in a GPR. In the vast majority of cases, these uses are cmovCC and
jCC instructions. For such cases, we can very easily save and restore
the necessary information by simply inserting a setCC into a GPR where
the original flags are live, and then testing that GPR directly to feed
the cmov or conditional branch.
However, things are a bit more tricky if arithmetic is using the flags.
This patch handles the vast majority of cases that seem to come up in
practice: adc, adcx, adox, rcl, and rcr; all without taking advantage of
partially preserved EFLAGS as LLVM doesn't currently model that at all.
There are a large number of operations that techinaclly observe EFLAGS
currently but shouldn't in this case -- they typically are using DF.
Currently, they will not be handled by this approach. However, I have
never seen this issue come up in practice. It is already pretty rare to
have these patterns come up in practical code with LLVM. I had to resort
to writing MIR tests to cover most of the logic in this pass already.
I suspect even with its current amount of coverage of arithmetic users
of EFLAGS it will be a significant improvement over the current use of
pushf/popf. It will also produce substantially faster code in most of
the common patterns.
This patch also removes all of the old lowering for EFLAGS copies, and
the hack that forced us to use a frame pointer when EFLAGS copies were
found anywhere in a function so that the dynamic stack adjustment wasn't
a problem. None of this is needed as we now lower all of these copies
directly in MI and without require stack adjustments.
Lots of thanks to Reid who came up with several aspects of this
approach, and Craig who helped me work out a couple of things tripping
me up while working on this.
Differential Revision: https://reviews.llvm.org/D45146
llvm-svn: 329657
2018-04-10 09:41:17 +08:00
|
|
|
; CHECK64-NEXT: setne %dl
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK64-NEXT: movq %r8, %rax
|
|
|
|
; CHECK64-NEXT: lock cmpxchgq %r9, (%rcx)
|
|
|
|
; CHECK64-NEXT: sete %al
|
2018-04-18 23:52:50 +08:00
|
|
|
; CHECK64-NEXT: testb %dl, %dl
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK64-NEXT: jne .LBB5_3
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK64-NEXT: # %bb.1: # %entry
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK64-NEXT: testb %al, %al
|
|
|
|
; CHECK64-NEXT: je .LBB5_3
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK64-NEXT: # %bb.2: # %t
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK64-NEXT: movl $42, %eax
|
|
|
|
; CHECK64-NEXT: retq
|
|
|
|
; CHECK64-NEXT: .LBB5_3: # %f
|
|
|
|
; CHECK64-NEXT: xorl %eax, %eax
|
|
|
|
; CHECK64-NEXT: retq
|
|
|
|
entry:
|
CodeGen peephole: fold redundant phys reg copies
Code generation often exposes redundant physical register copies through
virtual registers such as:
%vreg = COPY %PHYSREG
...
%PHYSREG = COPY %vreg
There are cases where no intervening clobber of %PHYSREG occurs, and the
later copy could therefore be removed. In some cases this further allows
us to remove the initial copy.
This patch contains a motivating example which comes from the x86 build
of Chrome, specifically cc::ResourceProvider::UnlockForRead uses
libstdc++'s implementation of hash_map. That example has two tests live
at the same time, and after machine sinking LLVM has confused itself
enough and things spilling EFLAGS is a great idea even though it's
never restored and the comparison results are both live.
Before this patch we have:
DEC32m %RIP, 1, %noreg, <ga:@L>, %noreg, %EFLAGS<imp-def>
%vreg1<def> = COPY %EFLAGS; GR64:%vreg1
%EFLAGS<def> = COPY %vreg1; GR64:%vreg1
JNE_1 <BB#1>, %EFLAGS<imp-use>
Both copies are useless. This patch tries to eliminate the later copy in
a generic manner.
dec is especially confusing to LLVM when compared with sub.
I wrote this patch to treat all physical registers generically, but only
remove redundant copies of non-allocatable physical registers because
the allocatable ones caused issues (e.g. when calling conventions weren't
properly modeled) and should be handled later by the register allocator
anyways.
The following tests used to failed when the patch also replaced allocatable
registers:
CodeGen/X86/StackColoring.ll
CodeGen/X86/avx512-calling-conv.ll
CodeGen/X86/copy-propagation.ll
CodeGen/X86/inline-asm-fpstack.ll
CodeGen/X86/musttail-varargs.ll
CodeGen/X86/pop-stack-cleanup.ll
CodeGen/X86/preserve_mostcc64.ll
CodeGen/X86/tailcallstack64.ll
CodeGen/X86/this-return-64.ll
This happens because COPY has other special meaning for e.g. dependency
breakage and x87 FP stack.
Note that all other backends' tests pass.
Reviewers: qcolombet
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15157
llvm-svn: 254665
2015-12-04 07:43:56 +08:00
|
|
|
%cx0 = cmpxchg i64* %foo0, i64 %bar0, i64 %baz0 seq_cst seq_cst
|
|
|
|
%p0 = extractvalue { i64, i1 } %cx0, 1
|
|
|
|
%cx1 = cmpxchg i64* %foo1, i64 %bar1, i64 %baz1 seq_cst seq_cst
|
|
|
|
%p1 = extractvalue { i64, i1 } %cx1, 1
|
|
|
|
%flag = and i1 %p0, %p1
|
|
|
|
br i1 %flag, label %t, label %f
|
|
|
|
|
|
|
|
t:
|
|
|
|
ret i64 42
|
|
|
|
|
|
|
|
f:
|
|
|
|
ret i64 0
|
|
|
|
}
|
|
|
|
|
2017-08-25 10:32:51 +08:00
|
|
|
define i1 @asm_clobbering_flags(i32* %mem) nounwind {
|
|
|
|
; CHECK32-LABEL: asm_clobbering_flags:
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK32: # %bb.0: # %entry
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK32-NEXT: movl {{[0-9]+}}(%esp), %ecx
|
|
|
|
; CHECK32-NEXT: movl (%ecx), %edx
|
|
|
|
; CHECK32-NEXT: testl %edx, %edx
|
|
|
|
; CHECK32-NEXT: setg %al
|
|
|
|
; CHECK32-NEXT: #APP
|
|
|
|
; CHECK32-NEXT: bsfl %edx, %edx
|
|
|
|
; CHECK32-NEXT: #NO_APP
|
|
|
|
; CHECK32-NEXT: movl %edx, (%ecx)
|
|
|
|
; CHECK32-NEXT: retl
|
|
|
|
;
|
|
|
|
; CHECK64-LABEL: asm_clobbering_flags:
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK64: # %bb.0: # %entry
|
2017-08-25 10:32:51 +08:00
|
|
|
; CHECK64-NEXT: movl (%rdi), %ecx
|
|
|
|
; CHECK64-NEXT: testl %ecx, %ecx
|
|
|
|
; CHECK64-NEXT: setg %al
|
|
|
|
; CHECK64-NEXT: #APP
|
|
|
|
; CHECK64-NEXT: bsfl %ecx, %ecx
|
|
|
|
; CHECK64-NEXT: #NO_APP
|
|
|
|
; CHECK64-NEXT: movl %ecx, (%rdi)
|
|
|
|
; CHECK64-NEXT: retq
|
|
|
|
entry:
|
CodeGen peephole: fold redundant phys reg copies
Code generation often exposes redundant physical register copies through
virtual registers such as:
%vreg = COPY %PHYSREG
...
%PHYSREG = COPY %vreg
There are cases where no intervening clobber of %PHYSREG occurs, and the
later copy could therefore be removed. In some cases this further allows
us to remove the initial copy.
This patch contains a motivating example which comes from the x86 build
of Chrome, specifically cc::ResourceProvider::UnlockForRead uses
libstdc++'s implementation of hash_map. That example has two tests live
at the same time, and after machine sinking LLVM has confused itself
enough and things spilling EFLAGS is a great idea even though it's
never restored and the comparison results are both live.
Before this patch we have:
DEC32m %RIP, 1, %noreg, <ga:@L>, %noreg, %EFLAGS<imp-def>
%vreg1<def> = COPY %EFLAGS; GR64:%vreg1
%EFLAGS<def> = COPY %vreg1; GR64:%vreg1
JNE_1 <BB#1>, %EFLAGS<imp-use>
Both copies are useless. This patch tries to eliminate the later copy in
a generic manner.
dec is especially confusing to LLVM when compared with sub.
I wrote this patch to treat all physical registers generically, but only
remove redundant copies of non-allocatable physical registers because
the allocatable ones caused issues (e.g. when calling conventions weren't
properly modeled) and should be handled later by the register allocator
anyways.
The following tests used to failed when the patch also replaced allocatable
registers:
CodeGen/X86/StackColoring.ll
CodeGen/X86/avx512-calling-conv.ll
CodeGen/X86/copy-propagation.ll
CodeGen/X86/inline-asm-fpstack.ll
CodeGen/X86/musttail-varargs.ll
CodeGen/X86/pop-stack-cleanup.ll
CodeGen/X86/preserve_mostcc64.ll
CodeGen/X86/tailcallstack64.ll
CodeGen/X86/this-return-64.ll
This happens because COPY has other special meaning for e.g. dependency
breakage and x87 FP stack.
Note that all other backends' tests pass.
Reviewers: qcolombet
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15157
llvm-svn: 254665
2015-12-04 07:43:56 +08:00
|
|
|
%val = load i32, i32* %mem, align 4
|
|
|
|
%cmp = icmp sgt i32 %val, 0
|
|
|
|
%res = tail call i32 asm "bsfl $1,$0", "=r,r,~{cc},~{dirflag},~{fpsr},~{flags}"(i32 %val)
|
|
|
|
store i32 %res, i32* %mem, align 4
|
|
|
|
ret i1 %cmp
|
|
|
|
}
|