[PeepholeOptimizer] Refactor the advanced copy optimization to take advantage of
the isRegSequence property.
This is a follow-up of r215394 and r215404, which respectively introduces the
isRegSequence property and uses it for ARM.
Thanks to the property introduced by the previous commits, this patch is able
to optimize the following sequence:
vmov d0, r2, r3
vmov d1, r0, r1
vmov r0, s0
vmov r1, s2
udiv r0, r1, r0
vmov r1, s1
vmov r2, s3
udiv r1, r2, r1
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov r0, r1, d16
bx lr
into:
udiv r0, r0, r2
udiv r1, r1, r3
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov r0, r1, d16
bx lr
This patch refactors how the copy optimizations are done in the peephole
optimizer. Prior to this patch, we had one copy-related optimization that
replaced a copy or bitcast by a generic, more suitable (in terms of register
file), copy.
With this patch, the peephole optimizer features two copy-related optimizations:
1. One for rewriting generic copies to generic copies:
PeepholeOptimizer::optimizeCoalescableCopy.
2. One for replacing non-generic copies with generic copies:
PeepholeOptimizer::optimizeUncoalescableCopy.
The goals of these two optimizations are slightly different: one rewrite the
operand of the instruction (#1), the other kills off the non-generic instruction
and replace it by a (sequence of) generic instruction(s).
Both optimizations rely on the ValueTracker introduced in r212100.
The ValueTracker has been refactored to use the information from the
TargetInstrInfo for non-generic instruction. As part of the refactoring, we
switched the tracking from the index of the definition to the actual register
(virtual or physical). This one change is to provide better consistency with
register related APIs and to ease the use of the TargetInstrInfo.
Moreover, this patch introduces a new helper class CopyRewriter used to ease the
rewriting of generic copies (i.e., #1).
Finally, this patch adds a dead code elimination pass right after the peephole
optimizer to get rid of dead code that may appear after rewriting.
This is related to <rdar://problem/12702965>.
Review: http://reviews.llvm.org/D4874
llvm-svn: 216088
2014-08-21 01:41:48 +08:00
|
|
|
; RUN: llc -O1 -mtriple=armv7s-apple-ios -mcpu=swift < %s -disable-adv-copy-opt=true | FileCheck -check-prefix=NOOPT --check-prefix=CHECK %s
|
|
|
|
; RUN: llc -O1 -mtriple=armv7s-apple-ios -mcpu=swift < %s -disable-adv-copy-opt=false | FileCheck -check-prefix=OPT --check-prefix=CHECK %s
|
|
|
|
|
|
|
|
; CHECK-LABEL: simpleVectorDiv
|
|
|
|
; ABI: %A => r0, r1.
|
|
|
|
; %B => r2, r3
|
|
|
|
; ret => r0, r1
|
|
|
|
; We want to compute:
|
|
|
|
; r0 = r0 / r2
|
|
|
|
; r1 = r1 / r3
|
|
|
|
;
|
|
|
|
; NOOPT: vmov [[B:d[0-9]+]], r2, r3
|
|
|
|
; NOOPT-NEXT: vmov [[A:d[0-9]+]], r0, r1
|
|
|
|
; Move the low part of B into a register.
|
|
|
|
; Unfortunately, we cannot express that the 's' register is the low
|
|
|
|
; part of B, i.e., sIdx == BIdx x 2. E.g., B = d1, B_low = s2.
|
|
|
|
; NOOPT-NEXT: vmov [[B_LOW:r[0-9]+]], s{{[0-9]+}}
|
|
|
|
; NOOPT-NEXT: vmov [[A_LOW:r[0-9]+]], s{{[0-9]+}}
|
|
|
|
; NOOPT-NEXT: udiv [[RES_LOW:r[0-9]+]], [[A_LOW]], [[B_LOW]]
|
|
|
|
; NOOPT-NEXT: vmov [[B_HIGH:r[0-9]+]], s{{[0-9]+}}
|
|
|
|
; NOOPT-NEXT: vmov [[A_HIGH:r[0-9]+]], s{{[0-9]+}}
|
|
|
|
; NOOPT-NEXT: udiv [[RES_HIGH:r[0-9]+]], [[A_HIGH]], [[B_HIGH]]
|
|
|
|
; NOOPT-NEXT: vmov.32 [[RES:d[0-9]+]][0], [[RES_LOW]]
|
|
|
|
; NOOPT-NEXT: vmov.32 [[RES]][1], [[RES_HIGH]]
|
|
|
|
; NOOPT-NEXT: vmov r0, r1, [[RES]]
|
|
|
|
; NOOPT-NEXT: bx lr
|
|
|
|
;
|
|
|
|
; OPT-NOT: vmov
|
[PeepholeOptimizer] Take advantage of the isInsertSubreg property in the
advanced copy optimization.
This is the final step patch toward transforming:
udiv r0, r0, r2
udiv r1, r1, r3
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov r0, r1, d16
bx lr
into:
udiv r0, r0, r2
udiv r1, r1, r3
bx lr
Indeed, thanks to this patch, this optimization is able to look through
vmov.32 d16[0], r0
vmov.32 d16[1], r1
and is able to rewrite the following sequence:
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov r0, r1, d16
into simple generic GPR copies that the coalescer managed to remove.
<rdar://problem/12702965>
llvm-svn: 216144
2014-08-21 08:19:16 +08:00
|
|
|
; OPT: udiv r0, r0, r2
|
|
|
|
; OPT-NEXT: udiv r1, r1, r3
|
[PeepholeOptimizer] Refactor the advanced copy optimization to take advantage of
the isRegSequence property.
This is a follow-up of r215394 and r215404, which respectively introduces the
isRegSequence property and uses it for ARM.
Thanks to the property introduced by the previous commits, this patch is able
to optimize the following sequence:
vmov d0, r2, r3
vmov d1, r0, r1
vmov r0, s0
vmov r1, s2
udiv r0, r1, r0
vmov r1, s1
vmov r2, s3
udiv r1, r2, r1
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov r0, r1, d16
bx lr
into:
udiv r0, r0, r2
udiv r1, r1, r3
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov r0, r1, d16
bx lr
This patch refactors how the copy optimizations are done in the peephole
optimizer. Prior to this patch, we had one copy-related optimization that
replaced a copy or bitcast by a generic, more suitable (in terms of register
file), copy.
With this patch, the peephole optimizer features two copy-related optimizations:
1. One for rewriting generic copies to generic copies:
PeepholeOptimizer::optimizeCoalescableCopy.
2. One for replacing non-generic copies with generic copies:
PeepholeOptimizer::optimizeUncoalescableCopy.
The goals of these two optimizations are slightly different: one rewrite the
operand of the instruction (#1), the other kills off the non-generic instruction
and replace it by a (sequence of) generic instruction(s).
Both optimizations rely on the ValueTracker introduced in r212100.
The ValueTracker has been refactored to use the information from the
TargetInstrInfo for non-generic instruction. As part of the refactoring, we
switched the tracking from the index of the definition to the actual register
(virtual or physical). This one change is to provide better consistency with
register related APIs and to ease the use of the TargetInstrInfo.
Moreover, this patch introduces a new helper class CopyRewriter used to ease the
rewriting of generic copies (i.e., #1).
Finally, this patch adds a dead code elimination pass right after the peephole
optimizer to get rid of dead code that may appear after rewriting.
This is related to <rdar://problem/12702965>.
Review: http://reviews.llvm.org/D4874
llvm-svn: 216088
2014-08-21 01:41:48 +08:00
|
|
|
; OPT-NEXT: bx lr
|
|
|
|
define <2 x i32> @simpleVectorDiv(<2 x i32> %A, <2 x i32> %B) nounwind {
|
|
|
|
entry:
|
|
|
|
%div = udiv <2 x i32> %A, %B
|
|
|
|
ret <2 x i32> %div
|
|
|
|
}
|