2017-03-12 11:37:34 +08:00
|
|
|
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
|
2020-12-02 05:23:30 +08:00
|
|
|
; RUN: llc -fast-isel-sink-local-values < %s -fast-isel -mtriple=i686-unknown-unknown -O0 -mcpu=skx | FileCheck %s
|
2017-03-12 11:37:34 +08:00
|
|
|
|
|
|
|
define i32 @_Z3foov() {
|
|
|
|
; CHECK-LABEL: _Z3foov:
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK: # %bb.0: # %entry
|
2017-10-30 22:50:11 +08:00
|
|
|
; CHECK-NEXT: subl $16, %esp
|
[FastISel] Sink local value materializations to first use
Summary:
Local values are constants, global addresses, and stack addresses that
can't be folded into the instruction that uses them. For example, when
storing the address of a global variable into memory, we need to
materialize that address into a register.
FastISel doesn't want to materialize any given local value more than
once, so it generates all local value materialization code at
EmitStartPt, which always dominates the current insertion point. This
allows it to maintain a map of local value registers, and it knows that
the local value area will always dominate the current insertion point.
The downside is that local value instructions are always emitted without
a source location. This is done to prevent jumpy line tables, but it
means that the local value area will be considered part of the previous
statement. Consider this C code:
call1(); // line 1
++global; // line 2
++global; // line 3
call2(&global, &local); // line 4
Today we end up with assembly and line tables like this:
.loc 1 1
callq call1
leaq global(%rip), %rdi
leaq local(%rsp), %rsi
.loc 1 2
addq $1, global(%rip)
.loc 1 3
addq $1, global(%rip)
.loc 1 4
callq call2
The LEA instructions in the local value area have no source location and
are treated as being on line 1. Stepping through the code in a debugger
and correlating it with the assembly won't make much sense, because
these materializations are only required for line 4.
This is actually problematic for the VS debugger "set next statement"
feature, which effectively assumes that there are no registers live
across statement boundaries. By sinking the local value code into the
statement and fixing up the source location, we can make that feature
work. This was filed as https://bugs.llvm.org/show_bug.cgi?id=35975 and
https://crbug.com/793819.
This change is obviously not enough to make this feature work reliably
in all cases, but I felt that it was worth doing anyway because it
usually generates smaller, more comprehensible -O0 code. I measured a
0.12% regression in code generation time with LLC on the sqlite3
amalgamation, so I think this is worth doing.
There are some special cases worth calling out in the commit message:
1. local values materialized for phis
2. local values used by no-op casts
3. dead local value code
Local values can be materialized for phis, and this does not show up as
a vreg use in MachineRegisterInfo. In this case, if there are no other
uses, this patch sinks the value to the first terminator, EH label, or
the end of the BB if nothing else exists.
Local values may also be used by no-op casts, which adds the register to
the RegFixups table. Without reversing the RegFixups map direction, we
don't have enough information to sink these instructions.
Lastly, if the local value register has no other uses, we can delete it.
This comes up when fastisel tries two instruction selection approaches
and the first materializes the value but fails and the second succeeds
without using the local value.
Reviewers: aprantl, dblaikie, qcolombet, MatzeB, vsk, echristo
Subscribers: dotdash, chandlerc, hans, sdardis, amccarth, javed.absar, zturner, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D43093
llvm-svn: 327581
2018-03-15 05:54:21 +08:00
|
|
|
; CHECK-NEXT: .cfi_def_cfa_offset 20
|
2017-03-12 11:37:34 +08:00
|
|
|
; CHECK-NEXT: movw $10959, {{[0-9]+}}(%esp) # imm = 0x2ACF
|
|
|
|
; CHECK-NEXT: movw $-15498, {{[0-9]+}}(%esp) # imm = 0xC376
|
|
|
|
; CHECK-NEXT: movw $19417, {{[0-9]+}}(%esp) # imm = 0x4BD9
|
[FastISel] Sink local value materializations to first use
Summary:
Local values are constants, global addresses, and stack addresses that
can't be folded into the instruction that uses them. For example, when
storing the address of a global variable into memory, we need to
materialize that address into a register.
FastISel doesn't want to materialize any given local value more than
once, so it generates all local value materialization code at
EmitStartPt, which always dominates the current insertion point. This
allows it to maintain a map of local value registers, and it knows that
the local value area will always dominate the current insertion point.
The downside is that local value instructions are always emitted without
a source location. This is done to prevent jumpy line tables, but it
means that the local value area will be considered part of the previous
statement. Consider this C code:
call1(); // line 1
++global; // line 2
++global; // line 3
call2(&global, &local); // line 4
Today we end up with assembly and line tables like this:
.loc 1 1
callq call1
leaq global(%rip), %rdi
leaq local(%rsp), %rsi
.loc 1 2
addq $1, global(%rip)
.loc 1 3
addq $1, global(%rip)
.loc 1 4
callq call2
The LEA instructions in the local value area have no source location and
are treated as being on line 1. Stepping through the code in a debugger
and correlating it with the assembly won't make much sense, because
these materializations are only required for line 4.
This is actually problematic for the VS debugger "set next statement"
feature, which effectively assumes that there are no registers live
across statement boundaries. By sinking the local value code into the
statement and fixing up the source location, we can make that feature
work. This was filed as https://bugs.llvm.org/show_bug.cgi?id=35975 and
https://crbug.com/793819.
This change is obviously not enough to make this feature work reliably
in all cases, but I felt that it was worth doing anyway because it
usually generates smaller, more comprehensible -O0 code. I measured a
0.12% regression in code generation time with LLC on the sqlite3
amalgamation, so I think this is worth doing.
There are some special cases worth calling out in the commit message:
1. local values materialized for phis
2. local values used by no-op casts
3. dead local value code
Local values can be materialized for phis, and this does not show up as
a vreg use in MachineRegisterInfo. In this case, if there are no other
uses, this patch sinks the value to the first terminator, EH label, or
the end of the BB if nothing else exists.
Local values may also be used by no-op casts, which adds the register to
the RegFixups table. Without reversing the RegFixups map direction, we
don't have enough information to sink these instructions.
Lastly, if the local value register has no other uses, we can delete it.
This comes up when fastisel tries two instruction selection approaches
and the first materializes the value but fails and the second succeeds
without using the local value.
Reviewers: aprantl, dblaikie, qcolombet, MatzeB, vsk, echristo
Subscribers: dotdash, chandlerc, hans, sdardis, amccarth, javed.absar, zturner, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D43093
llvm-svn: 327581
2018-03-15 05:54:21 +08:00
|
|
|
; CHECK-NEXT: movzwl {{[0-9]+}}(%esp), %eax
|
2020-09-22 17:20:10 +08:00
|
|
|
; CHECK-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
|
[FastISel] Flush local value map on ever instruction
Local values are constants or addresses that can't be folded into
the instruction that uses them. FastISel materializes these in a
"local value" area that always dominates the current insertion
point, to try to avoid materializing these values more than once
(per block).
https://reviews.llvm.org/D43093 added code to sink these local
value instructions to their first use, which has two beneficial
effects. One, it is likely to avoid some unnecessary spills and
reloads; two, it allows us to attach the debug location of the
user to the local value instruction. The latter effect can
improve the debugging experience for debuggers with a "set next
statement" feature, such as the Visual Studio debugger and PS4
debugger, because instructions to set up constants for a given
statement will be associated with the appropriate source line.
There are also some constants (primarily addresses) that could be
produced by no-op casts or GEP instructions; the main difference
from "local value" instructions is that these are values from
separate IR instructions, and therefore could have multiple users
across multiple basic blocks. D43093 avoided sinking these, even
though they were emitted to the same "local value" area as the
other instructions. The patch comment for D43093 states:
Local values may also be used by no-op casts, which adds the
register to the RegFixups table. Without reversing the RegFixups
map direction, we don't have enough information to sink these
instructions.
This patch undoes most of D43093, and instead flushes the local
value map after(*) every IR instruction, using that instruction's
debug location. This avoids sometimes incorrect locations used
previously, and emits instructions in a more natural order.
This does mean materialized values are not re-used across IR
instruction boundaries; however, only about 5% of those values
were reused in an experimental self-build of clang.
(*) Actually, just prior to the next instruction. It seems like
it would be cleaner the other way, but I was having trouble
getting that to work.
Differential Revision: https://reviews.llvm.org/D91734
2020-11-19 05:27:14 +08:00
|
|
|
; CHECK-NEXT: cmpw $0, {{[0-9]+}}(%esp)
|
2020-12-02 05:23:30 +08:00
|
|
|
; CHECK-NEXT: movb $1, %al
|
2020-09-22 20:55:54 +08:00
|
|
|
; CHECK-NEXT: movb %al, {{[-0-9]+}}(%e{{[sb]}}p) # 1-byte Spill
|
2017-03-12 11:37:34 +08:00
|
|
|
; CHECK-NEXT: jne .LBB0_2
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK-NEXT: # %bb.1: # %lor.rhs
|
2017-03-12 11:37:34 +08:00
|
|
|
; CHECK-NEXT: xorl %eax, %eax
|
2019-05-16 20:50:39 +08:00
|
|
|
; CHECK-NEXT: # kill: def $al killed $al killed $eax
|
|
|
|
; CHECK-NEXT: movb %al, {{[-0-9]+}}(%e{{[sb]}}p) # 1-byte Spill
|
2017-03-12 11:37:34 +08:00
|
|
|
; CHECK-NEXT: jmp .LBB0_2
|
|
|
|
; CHECK-NEXT: .LBB0_2: # %lor.end
|
2020-09-22 20:55:54 +08:00
|
|
|
; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
|
|
|
|
; CHECK-NEXT: movb {{[-0-9]+}}(%e{{[sb]}}p), %cl # 1-byte Reload
|
|
|
|
; CHECK-NEXT: andb $1, %cl
|
|
|
|
; CHECK-NEXT: movzbl %cl, %ecx
|
|
|
|
; CHECK-NEXT: cmpl %ecx, %eax
|
2017-05-19 20:35:15 +08:00
|
|
|
; CHECK-NEXT: setl %al
|
|
|
|
; CHECK-NEXT: andb $1, %al
|
2020-12-02 05:23:30 +08:00
|
|
|
; CHECK-NEXT: movzbl %al, %eax
|
|
|
|
; CHECK-NEXT: xorl $-1, %eax
|
|
|
|
; CHECK-NEXT: cmpl $0, %eax
|
[FastISel] Sink local value materializations to first use
Summary:
Local values are constants, global addresses, and stack addresses that
can't be folded into the instruction that uses them. For example, when
storing the address of a global variable into memory, we need to
materialize that address into a register.
FastISel doesn't want to materialize any given local value more than
once, so it generates all local value materialization code at
EmitStartPt, which always dominates the current insertion point. This
allows it to maintain a map of local value registers, and it knows that
the local value area will always dominate the current insertion point.
The downside is that local value instructions are always emitted without
a source location. This is done to prevent jumpy line tables, but it
means that the local value area will be considered part of the previous
statement. Consider this C code:
call1(); // line 1
++global; // line 2
++global; // line 3
call2(&global, &local); // line 4
Today we end up with assembly and line tables like this:
.loc 1 1
callq call1
leaq global(%rip), %rdi
leaq local(%rsp), %rsi
.loc 1 2
addq $1, global(%rip)
.loc 1 3
addq $1, global(%rip)
.loc 1 4
callq call2
The LEA instructions in the local value area have no source location and
are treated as being on line 1. Stepping through the code in a debugger
and correlating it with the assembly won't make much sense, because
these materializations are only required for line 4.
This is actually problematic for the VS debugger "set next statement"
feature, which effectively assumes that there are no registers live
across statement boundaries. By sinking the local value code into the
statement and fixing up the source location, we can make that feature
work. This was filed as https://bugs.llvm.org/show_bug.cgi?id=35975 and
https://crbug.com/793819.
This change is obviously not enough to make this feature work reliably
in all cases, but I felt that it was worth doing anyway because it
usually generates smaller, more comprehensible -O0 code. I measured a
0.12% regression in code generation time with LLC on the sqlite3
amalgamation, so I think this is worth doing.
There are some special cases worth calling out in the commit message:
1. local values materialized for phis
2. local values used by no-op casts
3. dead local value code
Local values can be materialized for phis, and this does not show up as
a vreg use in MachineRegisterInfo. In this case, if there are no other
uses, this patch sinks the value to the first terminator, EH label, or
the end of the BB if nothing else exists.
Local values may also be used by no-op casts, which adds the register to
the RegFixups table. Without reversing the RegFixups map direction, we
don't have enough information to sink these instructions.
Lastly, if the local value register has no other uses, we can delete it.
This comes up when fastisel tries two instruction selection approaches
and the first materializes the value but fails and the second succeeds
without using the local value.
Reviewers: aprantl, dblaikie, qcolombet, MatzeB, vsk, echristo
Subscribers: dotdash, chandlerc, hans, sdardis, amccarth, javed.absar, zturner, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D43093
llvm-svn: 327581
2018-03-15 05:54:21 +08:00
|
|
|
; CHECK-NEXT: movb $1, %al
|
2019-05-16 20:50:39 +08:00
|
|
|
; CHECK-NEXT: movb %al, {{[-0-9]+}}(%e{{[sb]}}p) # 1-byte Spill
|
2017-03-12 11:37:34 +08:00
|
|
|
; CHECK-NEXT: jne .LBB0_4
|
2017-12-05 01:18:51 +08:00
|
|
|
; CHECK-NEXT: # %bb.3: # %lor.rhs4
|
2017-03-12 11:37:34 +08:00
|
|
|
; CHECK-NEXT: xorl %eax, %eax
|
2019-05-16 20:50:39 +08:00
|
|
|
; CHECK-NEXT: # kill: def $al killed $al killed $eax
|
|
|
|
; CHECK-NEXT: movb %al, {{[-0-9]+}}(%e{{[sb]}}p) # 1-byte Spill
|
2017-03-12 11:37:34 +08:00
|
|
|
; CHECK-NEXT: jmp .LBB0_4
|
|
|
|
; CHECK-NEXT: .LBB0_4: # %lor.end5
|
2019-05-16 20:50:39 +08:00
|
|
|
; CHECK-NEXT: movb {{[-0-9]+}}(%e{{[sb]}}p), %al # 1-byte Reload
|
2017-05-19 20:35:15 +08:00
|
|
|
; CHECK-NEXT: andb $1, %al
|
2020-09-15 21:16:14 +08:00
|
|
|
; CHECK-NEXT: movzbl %al, %eax
|
|
|
|
; CHECK-NEXT: # kill: def $ax killed $ax killed $eax
|
|
|
|
; CHECK-NEXT: movw %ax, {{[0-9]+}}(%esp)
|
2017-03-12 11:37:34 +08:00
|
|
|
; CHECK-NEXT: movzwl {{[0-9]+}}(%esp), %eax
|
2017-10-30 22:50:11 +08:00
|
|
|
; CHECK-NEXT: addl $16, %esp
|
2018-04-24 18:32:08 +08:00
|
|
|
; CHECK-NEXT: .cfi_def_cfa_offset 4
|
2017-03-12 11:37:34 +08:00
|
|
|
; CHECK-NEXT: retl
|
|
|
|
entry:
|
|
|
|
%aa = alloca i16, align 2
|
|
|
|
%bb = alloca i16, align 2
|
|
|
|
%cc = alloca i16, align 2
|
|
|
|
store i16 10959, i16* %aa, align 2
|
|
|
|
store i16 -15498, i16* %bb, align 2
|
|
|
|
store i16 19417, i16* %cc, align 2
|
|
|
|
%0 = load i16, i16* %aa, align 2
|
|
|
|
%conv = zext i16 %0 to i32
|
|
|
|
%1 = load i16, i16* %cc, align 2
|
|
|
|
%tobool = icmp ne i16 %1, 0
|
|
|
|
br i1 %tobool, label %lor.end, label %lor.rhs
|
|
|
|
|
|
|
|
lor.rhs: ; preds = %entry
|
|
|
|
br label %lor.end
|
|
|
|
|
|
|
|
lor.end: ; preds = %lor.rhs, %entry
|
|
|
|
%2 = phi i1 [ true, %entry ], [ false, %lor.rhs ]
|
|
|
|
%conv1 = zext i1 %2 to i32
|
|
|
|
%cmp = icmp slt i32 %conv, %conv1
|
|
|
|
%conv2 = zext i1 %cmp to i32
|
|
|
|
%neg = xor i32 %conv2, -1
|
|
|
|
%tobool3 = icmp ne i32 %neg, 0
|
|
|
|
br i1 %tobool3, label %lor.end5, label %lor.rhs4
|
|
|
|
|
|
|
|
lor.rhs4: ; preds = %lor.end
|
|
|
|
br label %lor.end5
|
|
|
|
|
|
|
|
lor.end5: ; preds = %lor.rhs4, %lor.end
|
|
|
|
%3 = phi i1 [ true, %lor.end ], [ false, %lor.rhs4 ]
|
|
|
|
%conv6 = zext i1 %3 to i16
|
|
|
|
store i16 %conv6, i16* %bb, align 2
|
|
|
|
%4 = load i16, i16* %bb, align 2
|
|
|
|
%conv7 = zext i16 %4 to i32
|
|
|
|
ret i32 %conv7
|
|
|
|
}
|