llvm-project/llvm/test/CodeGen/ARM/adv-copy-opt.ll

; RUN: llc -O1 -mtriple=armv7s-apple-ios -mcpu=swift < %s -disable-adv-copy-opt=true | FileCheck -check-prefix=NOOPT --check-prefix=CHECK %s
; RUN: llc -O1 -mtriple=armv7s-apple-ios -mcpu=swift < %s -disable-adv-copy-opt=false | FileCheck -check-prefix=OPT --check-prefix=CHECK %s
; RUN: llc -O1 -mtriple=thumbv7s-apple-ios -mcpu=swift < %s -disable-adv-copy-opt=true | FileCheck -check-prefix=NOOPT --check-prefix=CHECK %s
; RUN: llc -O1 -mtriple=thumbv7s-apple-ios -mcpu=swift < %s -disable-adv-copy-opt=false | FileCheck -check-prefix=OPT --check-prefix=CHECK %s

; CHECK-LABEL: simpleVectorDiv
; ABI: %A => r0, r1.
;      %B => r2, r3
;      ret => r0, r1
; We want to compute:
; r0 = r0 / r2
; r1 = r1 / r3
;
; NOOPT: vmov	[[A:d[0-9]+]], r0, r1
; NOOPT-NEXT: vmov	[[B:d[0-9]+]], r2, r3
; Move the low part of B into a register.
; Unfortunately, we cannot express that the 's' register is the low
; part of B, i.e., sIdx == BIdx x 2. E.g., B = d1, B_low = s2.
; NOOPT-NEXT: vmov	[[B_LOW:r[0-9]+]], s{{[0-9]+}}
; NOOPT-NEXT: vmov	[[B_HIGH:r[0-9]+]], s{{[0-9]+}}
; NOOPT-NEXT: vmov	[[A_LOW:r[0-9]+]], s{{[0-9]+}}
; NOOPT-NEXT: vmov	[[A_HIGH:r[0-9]+]], s{{[0-9]+}}
; NOOPT-NEXT: udiv	[[RES_LOW:r[0-9]+]], [[A_LOW]], [[B_LOW]]
; NOOPT-NEXT: vmov.32	[[RES:d[0-9]+]][0], [[RES_LOW]]
; NOOPT-NEXT: udiv	[[RES_HIGH:r[0-9]+]], [[A_HIGH]], [[B_HIGH]]
; NOOPT-NEXT: vmov.32	[[RES]][1], [[RES_HIGH]]
; NOOPT-NEXT: vmov	r0, r1, [[RES]]
; NOOPT-NEXT: bx	lr
;
; OPT-NOT: vmov
; OPT: udiv	r1, r1, r3
; OPT-NEXT: 	udiv	r0, r0, r2
; OPT-NEXT: bx	lr
define <2 x i32> @simpleVectorDiv(<2 x i32> %A, <2 x i32> %B) nounwind {
entry:
  %div = udiv <2 x i32> %A, %B
  ret <2 x i32> %div
}
[PeepholeOptimizer] Refactor the advanced copy optimization to take advantage of the isRegSequence property. This is a follow-up of r215394 and r215404, which respectively introduces the isRegSequence property and uses it for ARM. Thanks to the property introduced by the previous commits, this patch is able to optimize the following sequence: vmov d0, r2, r3 vmov d1, r0, r1 vmov r0, s0 vmov r1, s2 udiv r0, r1, r0 vmov r1, s1 vmov r2, s3 udiv r1, r2, r1 vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 bx lr into: udiv r0, r0, r2 udiv r1, r1, r3 vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 bx lr This patch refactors how the copy optimizations are done in the peephole optimizer. Prior to this patch, we had one copy-related optimization that replaced a copy or bitcast by a generic, more suitable (in terms of register file), copy. With this patch, the peephole optimizer features two copy-related optimizations: 1. One for rewriting generic copies to generic copies: PeepholeOptimizer::optimizeCoalescableCopy. 2. One for replacing non-generic copies with generic copies: PeepholeOptimizer::optimizeUncoalescableCopy. The goals of these two optimizations are slightly different: one rewrite the operand of the instruction (#1), the other kills off the non-generic instruction and replace it by a (sequence of) generic instruction(s). Both optimizations rely on the ValueTracker introduced in r212100. The ValueTracker has been refactored to use the information from the TargetInstrInfo for non-generic instruction. As part of the refactoring, we switched the tracking from the index of the definition to the actual register (virtual or physical). This one change is to provide better consistency with register related APIs and to ease the use of the TargetInstrInfo. Moreover, this patch introduces a new helper class CopyRewriter used to ease the rewriting of generic copies (i.e., #1). Finally, this patch adds a dead code elimination pass right after the peephole optimizer to get rid of dead code that may appear after rewriting. This is related to <rdar://problem/12702965>. Review: http://reviews.llvm.org/D4874 llvm-svn: 216088 2014-08-21 01:41:48 +08:00			`; RUN: llc -O1 -mtriple=armv7s-apple-ios -mcpu=swift < %s -disable-adv-copy-opt=true \| FileCheck -check-prefix=NOOPT --check-prefix=CHECK %s`
			`; RUN: llc -O1 -mtriple=armv7s-apple-ios -mcpu=swift < %s -disable-adv-copy-opt=false \| FileCheck -check-prefix=OPT --check-prefix=CHECK %s`
[ARM] Move the implementation of the target hooks related to copy-related instruction from ARMInstrInfo to ARMBaseInstrInfo. That way, thumb mode can also benefit from the advanced copy optimization. <rdar://problem/12702965> llvm-svn: 216274 2014-08-23 02:05:22 +08:00			`; RUN: llc -O1 -mtriple=thumbv7s-apple-ios -mcpu=swift < %s -disable-adv-copy-opt=true \| FileCheck -check-prefix=NOOPT --check-prefix=CHECK %s`
			`; RUN: llc -O1 -mtriple=thumbv7s-apple-ios -mcpu=swift < %s -disable-adv-copy-opt=false \| FileCheck -check-prefix=OPT --check-prefix=CHECK %s`
[PeepholeOptimizer] Refactor the advanced copy optimization to take advantage of the isRegSequence property. This is a follow-up of r215394 and r215404, which respectively introduces the isRegSequence property and uses it for ARM. Thanks to the property introduced by the previous commits, this patch is able to optimize the following sequence: vmov d0, r2, r3 vmov d1, r0, r1 vmov r0, s0 vmov r1, s2 udiv r0, r1, r0 vmov r1, s1 vmov r2, s3 udiv r1, r2, r1 vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 bx lr into: udiv r0, r0, r2 udiv r1, r1, r3 vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 bx lr This patch refactors how the copy optimizations are done in the peephole optimizer. Prior to this patch, we had one copy-related optimization that replaced a copy or bitcast by a generic, more suitable (in terms of register file), copy. With this patch, the peephole optimizer features two copy-related optimizations: 1. One for rewriting generic copies to generic copies: PeepholeOptimizer::optimizeCoalescableCopy. 2. One for replacing non-generic copies with generic copies: PeepholeOptimizer::optimizeUncoalescableCopy. The goals of these two optimizations are slightly different: one rewrite the operand of the instruction (#1), the other kills off the non-generic instruction and replace it by a (sequence of) generic instruction(s). Both optimizations rely on the ValueTracker introduced in r212100. The ValueTracker has been refactored to use the information from the TargetInstrInfo for non-generic instruction. As part of the refactoring, we switched the tracking from the index of the definition to the actual register (virtual or physical). This one change is to provide better consistency with register related APIs and to ease the use of the TargetInstrInfo. Moreover, this patch introduces a new helper class CopyRewriter used to ease the rewriting of generic copies (i.e., #1). Finally, this patch adds a dead code elimination pass right after the peephole optimizer to get rid of dead code that may appear after rewriting. This is related to <rdar://problem/12702965>. Review: http://reviews.llvm.org/D4874 llvm-svn: 216088 2014-08-21 01:41:48 +08:00
			`; CHECK-LABEL: simpleVectorDiv`
			`; ABI: %A => r0, r1.`
			`; %B => r2, r3`
			`; ret => r0, r1`
			`; We want to compute:`
			`; r0 = r0 / r2`
			`; r1 = r1 / r3`
			`;`
ARM: Enable MachineScheduler and disable PostRAScheduler for swift. Reapply r242500 now that the swift schedmodel includes LDRLIT. This is mostly done to disable the PostRAScheduler which optimizes for instruction latencies which isn't a good fit for out-of-order architectures. This also allows to leave out the itinerary table in swift in favor of the SchedModel ones. This change leads to performance improvements/regressions by as much as 10% in some benchmarks, in fact we loose 0.4% performance over the llvm-testsuite for reasons that appear to be unknown or out of the compilers control. rdar://20803802 documents the investigation of these effects. While it is probably a good idea to perform the same switch for the other ARM out-of-order CPUs, I limited this change to swift as I cannot perform the benchmark verification on the other CPUs. Differential Revision: http://reviews.llvm.org/D10513 llvm-svn: 242588 2015-07-18 07:18:30 +08:00			`; NOOPT: vmov [[A:d[0-9]+]], r0, r1`
			`; NOOPT-NEXT: vmov [[B:d[0-9]+]], r2, r3`
[PeepholeOptimizer] Refactor the advanced copy optimization to take advantage of the isRegSequence property. This is a follow-up of r215394 and r215404, which respectively introduces the isRegSequence property and uses it for ARM. Thanks to the property introduced by the previous commits, this patch is able to optimize the following sequence: vmov d0, r2, r3 vmov d1, r0, r1 vmov r0, s0 vmov r1, s2 udiv r0, r1, r0 vmov r1, s1 vmov r2, s3 udiv r1, r2, r1 vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 bx lr into: udiv r0, r0, r2 udiv r1, r1, r3 vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 bx lr This patch refactors how the copy optimizations are done in the peephole optimizer. Prior to this patch, we had one copy-related optimization that replaced a copy or bitcast by a generic, more suitable (in terms of register file), copy. With this patch, the peephole optimizer features two copy-related optimizations: 1. One for rewriting generic copies to generic copies: PeepholeOptimizer::optimizeCoalescableCopy. 2. One for replacing non-generic copies with generic copies: PeepholeOptimizer::optimizeUncoalescableCopy. The goals of these two optimizations are slightly different: one rewrite the operand of the instruction (#1), the other kills off the non-generic instruction and replace it by a (sequence of) generic instruction(s). Both optimizations rely on the ValueTracker introduced in r212100. The ValueTracker has been refactored to use the information from the TargetInstrInfo for non-generic instruction. As part of the refactoring, we switched the tracking from the index of the definition to the actual register (virtual or physical). This one change is to provide better consistency with register related APIs and to ease the use of the TargetInstrInfo. Moreover, this patch introduces a new helper class CopyRewriter used to ease the rewriting of generic copies (i.e., #1). Finally, this patch adds a dead code elimination pass right after the peephole optimizer to get rid of dead code that may appear after rewriting. This is related to <rdar://problem/12702965>. Review: http://reviews.llvm.org/D4874 llvm-svn: 216088 2014-08-21 01:41:48 +08:00			`; Move the low part of B into a register.`
			`; Unfortunately, we cannot express that the 's' register is the low`
			`; part of B, i.e., sIdx == BIdx x 2. E.g., B = d1, B_low = s2.`
			`; NOOPT-NEXT: vmov [[B_LOW:r[0-9]+]], s{{[0-9]+}}`
Revert "ARM: Enable MachineScheduler and disable PostRAScheduler for swift." This reverts commit r242500. It broke some internal tests and Matthias asked me to revert it while he is investigating. llvm-svn: 242553 2015-07-18 02:14:19 +08:00			`; NOOPT-NEXT: vmov [[B_HIGH:r[0-9]+]], s{{[0-9]+}}`
ARM: Enable MachineScheduler and disable PostRAScheduler for swift. Reapply r242500 now that the swift schedmodel includes LDRLIT. This is mostly done to disable the PostRAScheduler which optimizes for instruction latencies which isn't a good fit for out-of-order architectures. This also allows to leave out the itinerary table in swift in favor of the SchedModel ones. This change leads to performance improvements/regressions by as much as 10% in some benchmarks, in fact we loose 0.4% performance over the llvm-testsuite for reasons that appear to be unknown or out of the compilers control. rdar://20803802 documents the investigation of these effects. While it is probably a good idea to perform the same switch for the other ARM out-of-order CPUs, I limited this change to swift as I cannot perform the benchmark verification on the other CPUs. Differential Revision: http://reviews.llvm.org/D10513 llvm-svn: 242588 2015-07-18 07:18:30 +08:00			`; NOOPT-NEXT: vmov [[A_LOW:r[0-9]+]], s{{[0-9]+}}`
Revert "ARM: Enable MachineScheduler and disable PostRAScheduler for swift." This reverts commit r242500. It broke some internal tests and Matthias asked me to revert it while he is investigating. llvm-svn: 242553 2015-07-18 02:14:19 +08:00			`; NOOPT-NEXT: vmov [[A_HIGH:r[0-9]+]], s{{[0-9]+}}`
ARM: Enable MachineScheduler and disable PostRAScheduler for swift. Reapply r242500 now that the swift schedmodel includes LDRLIT. This is mostly done to disable the PostRAScheduler which optimizes for instruction latencies which isn't a good fit for out-of-order architectures. This also allows to leave out the itinerary table in swift in favor of the SchedModel ones. This change leads to performance improvements/regressions by as much as 10% in some benchmarks, in fact we loose 0.4% performance over the llvm-testsuite for reasons that appear to be unknown or out of the compilers control. rdar://20803802 documents the investigation of these effects. While it is probably a good idea to perform the same switch for the other ARM out-of-order CPUs, I limited this change to swift as I cannot perform the benchmark verification on the other CPUs. Differential Revision: http://reviews.llvm.org/D10513 llvm-svn: 242588 2015-07-18 07:18:30 +08:00			`; NOOPT-NEXT: udiv [[RES_LOW:r[0-9]+]], [[A_LOW]], [[B_LOW]]`
Revert "ARM: Enable MachineScheduler and disable PostRAScheduler for swift." This reverts commit r242500. It broke some internal tests and Matthias asked me to revert it while he is investigating. llvm-svn: 242553 2015-07-18 02:14:19 +08:00			`; NOOPT-NEXT: vmov.32 [[RES:d[0-9]+]][0], [[RES_LOW]]`
ARM: Enable MachineScheduler and disable PostRAScheduler for swift. Reapply r242500 now that the swift schedmodel includes LDRLIT. This is mostly done to disable the PostRAScheduler which optimizes for instruction latencies which isn't a good fit for out-of-order architectures. This also allows to leave out the itinerary table in swift in favor of the SchedModel ones. This change leads to performance improvements/regressions by as much as 10% in some benchmarks, in fact we loose 0.4% performance over the llvm-testsuite for reasons that appear to be unknown or out of the compilers control. rdar://20803802 documents the investigation of these effects. While it is probably a good idea to perform the same switch for the other ARM out-of-order CPUs, I limited this change to swift as I cannot perform the benchmark verification on the other CPUs. Differential Revision: http://reviews.llvm.org/D10513 llvm-svn: 242588 2015-07-18 07:18:30 +08:00			`; NOOPT-NEXT: udiv [[RES_HIGH:r[0-9]+]], [[A_HIGH]], [[B_HIGH]]`
[PeepholeOptimizer] Refactor the advanced copy optimization to take advantage of the isRegSequence property. This is a follow-up of r215394 and r215404, which respectively introduces the isRegSequence property and uses it for ARM. Thanks to the property introduced by the previous commits, this patch is able to optimize the following sequence: vmov d0, r2, r3 vmov d1, r0, r1 vmov r0, s0 vmov r1, s2 udiv r0, r1, r0 vmov r1, s1 vmov r2, s3 udiv r1, r2, r1 vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 bx lr into: udiv r0, r0, r2 udiv r1, r1, r3 vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 bx lr This patch refactors how the copy optimizations are done in the peephole optimizer. Prior to this patch, we had one copy-related optimization that replaced a copy or bitcast by a generic, more suitable (in terms of register file), copy. With this patch, the peephole optimizer features two copy-related optimizations: 1. One for rewriting generic copies to generic copies: PeepholeOptimizer::optimizeCoalescableCopy. 2. One for replacing non-generic copies with generic copies: PeepholeOptimizer::optimizeUncoalescableCopy. The goals of these two optimizations are slightly different: one rewrite the operand of the instruction (#1), the other kills off the non-generic instruction and replace it by a (sequence of) generic instruction(s). Both optimizations rely on the ValueTracker introduced in r212100. The ValueTracker has been refactored to use the information from the TargetInstrInfo for non-generic instruction. As part of the refactoring, we switched the tracking from the index of the definition to the actual register (virtual or physical). This one change is to provide better consistency with register related APIs and to ease the use of the TargetInstrInfo. Moreover, this patch introduces a new helper class CopyRewriter used to ease the rewriting of generic copies (i.e., #1). Finally, this patch adds a dead code elimination pass right after the peephole optimizer to get rid of dead code that may appear after rewriting. This is related to <rdar://problem/12702965>. Review: http://reviews.llvm.org/D4874 llvm-svn: 216088 2014-08-21 01:41:48 +08:00			`; NOOPT-NEXT: vmov.32 [[RES]][1], [[RES_HIGH]]`
			`; NOOPT-NEXT: vmov r0, r1, [[RES]]`
			`; NOOPT-NEXT: bx lr`
			`;`
			`; OPT-NOT: vmov`
ARM: Enable MachineScheduler and disable PostRAScheduler for swift. Reapply r242500 now that the swift schedmodel includes LDRLIT. This is mostly done to disable the PostRAScheduler which optimizes for instruction latencies which isn't a good fit for out-of-order architectures. This also allows to leave out the itinerary table in swift in favor of the SchedModel ones. This change leads to performance improvements/regressions by as much as 10% in some benchmarks, in fact we loose 0.4% performance over the llvm-testsuite for reasons that appear to be unknown or out of the compilers control. rdar://20803802 documents the investigation of these effects. While it is probably a good idea to perform the same switch for the other ARM out-of-order CPUs, I limited this change to swift as I cannot perform the benchmark verification on the other CPUs. Differential Revision: http://reviews.llvm.org/D10513 llvm-svn: 242588 2015-07-18 07:18:30 +08:00			`; OPT: udiv r1, r1, r3`
			`; OPT-NEXT: udiv r0, r0, r2`
[PeepholeOptimizer] Refactor the advanced copy optimization to take advantage of the isRegSequence property. This is a follow-up of r215394 and r215404, which respectively introduces the isRegSequence property and uses it for ARM. Thanks to the property introduced by the previous commits, this patch is able to optimize the following sequence: vmov d0, r2, r3 vmov d1, r0, r1 vmov r0, s0 vmov r1, s2 udiv r0, r1, r0 vmov r1, s1 vmov r2, s3 udiv r1, r2, r1 vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 bx lr into: udiv r0, r0, r2 udiv r1, r1, r3 vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 bx lr This patch refactors how the copy optimizations are done in the peephole optimizer. Prior to this patch, we had one copy-related optimization that replaced a copy or bitcast by a generic, more suitable (in terms of register file), copy. With this patch, the peephole optimizer features two copy-related optimizations: 1. One for rewriting generic copies to generic copies: PeepholeOptimizer::optimizeCoalescableCopy. 2. One for replacing non-generic copies with generic copies: PeepholeOptimizer::optimizeUncoalescableCopy. The goals of these two optimizations are slightly different: one rewrite the operand of the instruction (#1), the other kills off the non-generic instruction and replace it by a (sequence of) generic instruction(s). Both optimizations rely on the ValueTracker introduced in r212100. The ValueTracker has been refactored to use the information from the TargetInstrInfo for non-generic instruction. As part of the refactoring, we switched the tracking from the index of the definition to the actual register (virtual or physical). This one change is to provide better consistency with register related APIs and to ease the use of the TargetInstrInfo. Moreover, this patch introduces a new helper class CopyRewriter used to ease the rewriting of generic copies (i.e., #1). Finally, this patch adds a dead code elimination pass right after the peephole optimizer to get rid of dead code that may appear after rewriting. This is related to <rdar://problem/12702965>. Review: http://reviews.llvm.org/D4874 llvm-svn: 216088 2014-08-21 01:41:48 +08:00			`; OPT-NEXT: bx lr`
			`define <2 x i32> @simpleVectorDiv(<2 x i32> %A, <2 x i32> %B) nounwind {`
			`entry:`
			`%div = udiv <2 x i32> %A, %B`
			`ret <2 x i32> %div`
			`}`