[BPF] Fix a bug in peephole optimization
One of current peephole optimiations is to remove SLL/SRL if
the sub register has been zero extended. This phase has two bugs
and one limitations.
First, for the physical subregister used in pseudo insn COPY
like below, it permits incorrect optimization.
%0:gpr32 = COPY $w0
...
%4:gpr = MOV_32_64 %0:gpr32
%5:gpr = SLL_ri %4:gpr(tied-def 0), 32
%6:gpr = SRA_ri %5:gpr(tied-def 0), 32
The $w0 could be from the return value of a previous function call
and its upper 32-bit value might contain some non-zero values.
The same applies to function arguments.
Second, the current code may permits removing SLL/SRA like below:
%0:gpr32 = COPY $w0
%1:gpr32 = COPY %0:gpr32
...
%4:gpr = MOV_32_64 %1:gpr32
%5:gpr = SLL_ri %4:gpr(tied-def 0), 32
%6:gpr = SRA_ri %5:gpr(tied-def 0), 32
The reason is that it did not follow def-use chain to skip all
intermediate 32bit-to-32bit COPY instructions.
The current implementation is also very conservative for PHI
instructions. If any PHI insn component is another PHI or COPY insn,
it will just permit SLL/SRA.
This patch fixed the issue as follows:
- During def/use chain traversal, if any physical register is read,
SLL/SRA will be preserved as these physical registers are mostly
from function return values or current function arguments.
- Recursively visit all COPY and PHI instructions.
2019-11-21 01:52:29 +08:00
|
|
|
; RUN: llc -O2 -march=bpfel -mcpu=v2 -mattr=+alu32 < %s | FileCheck %s
|
|
|
|
;
|
|
|
|
; For the below test case, both 'ret' and 'b' at 'ret == b'
|
|
|
|
; need SLL/SLR. For 'ret', 'ret = a' may receive the value
|
|
|
|
; from argument with high 32-bit invalid data.
|
|
|
|
;
|
|
|
|
; extern int helper(int);
|
|
|
|
; int test(int a, int b, int c, int d) {
|
|
|
|
; int ret;
|
|
|
|
; if (a < b)
|
|
|
|
; ret = (c < d) ? a : 0;
|
|
|
|
; else
|
|
|
|
; ret = (c < a) ? 1 : 2;
|
|
|
|
; return helper(ret == b);
|
|
|
|
; }
|
|
|
|
|
|
|
|
define dso_local i32 @test(i32 %a, i32 %b, i32 %c, i32 %d) local_unnamed_addr {
|
|
|
|
entry:
|
|
|
|
%cmp = icmp slt i32 %a, %b
|
|
|
|
%cmp1 = icmp slt i32 %c, %d
|
|
|
|
%cond = select i1 %cmp1, i32 %a, i32 0
|
|
|
|
%cmp2 = icmp slt i32 %c, %a
|
|
|
|
%cond3 = select i1 %cmp2, i32 1, i32 2
|
|
|
|
%ret.0 = select i1 %cmp, i32 %cond, i32 %cond3
|
|
|
|
%cmp4 = icmp eq i32 %ret.0, %b
|
|
|
|
%conv = zext i1 %cmp4 to i32
|
|
|
|
%call = tail call i32 @helper(i32 %conv)
|
|
|
|
ret i32 %call
|
|
|
|
}
|
[BPF] simplify zero extension with MOV_32_64
The current pattern matching for zext results in the following code snippet
being produced,
w1 = w0
r1 <<= 32
r1 >>= 32
Because BPF implementations require zero extension on 32bit loads this
both adds a few extra unneeded instructions but also makes it a bit
harder for the verifier to track the r1 register bounds. For example in
this verifier trace we see at the end of the snippet R2 offset is unknown.
However, if we track this correctly we see w1 should have the same bounds
as r8. R8 smax is less than U32 max value so a zero extend load should keep
the same value. Adding a max value of 800 (R8=inv(id=0,smax_value=800)) to
an off=0, as seen in R7 should create a max offset of 800. However at the
end of the snippet we note the R2 max offset is 0xffffFFFF.
R0=inv(id=0,smax_value=800)
R1_w=inv(id=0,umax_value=2147483647,var_off=(0x0; 0x7fffffff))
R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0)
R8_w=inv(id=0,smax_value=800,umax_value=4294967295,var_off=(0x0; 0xffffffff))
R9=inv800 R10=fp0 fp-8=mmmm????
58: (1c) w9 -= w8
59: (bc) w1 = w8
60: (67) r1 <<= 32
61: (77) r1 >>= 32
62: (bf) r2 = r7
63: (0f) r2 += r1
64: (bf) r1 = r6
65: (bc) w3 = w9
66: (b7) r4 = 0
67: (85) call bpf_get_stack#67
R0=inv(id=0,smax_value=800)
R1_w=ctx(id=0,off=0,imm=0)
R2_w=map_value(id=0,off=0,ks=4,vs=1600,umax_value=4294967295,var_off=(0x0; 0xffffffff))
R3_w=inv(id=0,umax_value=800,var_off=(0x0; 0x3ff))
R4_w=inv0 R6=ctx(id=0,off=0,imm=0)
R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0)
R8_w=inv(id=0,smax_value=800,umax_value=4294967295,var_off=(0x0; 0xffffffff))
R9_w=inv(id=0,umax_value=800,var_off=(0x0; 0x3ff))
R10=fp0 fp-8=mmmm????
After this patch R1 bounds are not smashed by the <<=32 >>=32 shift and we
get correct bounds on R2 umax_value=800.
Further it reduces 3 insns to 1.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Differential Revision: https://reviews.llvm.org/D73985
2020-05-28 02:18:16 +08:00
|
|
|
; CHECK: r{{[0-9]+}} = w{{[0-9]+}}
|
|
|
|
; CHECK: r{{[0-9]+}} = w{{[0-9]+}}
|
[BPF] Fix a bug in peephole optimization
One of current peephole optimiations is to remove SLL/SRL if
the sub register has been zero extended. This phase has two bugs
and one limitations.
First, for the physical subregister used in pseudo insn COPY
like below, it permits incorrect optimization.
%0:gpr32 = COPY $w0
...
%4:gpr = MOV_32_64 %0:gpr32
%5:gpr = SLL_ri %4:gpr(tied-def 0), 32
%6:gpr = SRA_ri %5:gpr(tied-def 0), 32
The $w0 could be from the return value of a previous function call
and its upper 32-bit value might contain some non-zero values.
The same applies to function arguments.
Second, the current code may permits removing SLL/SRA like below:
%0:gpr32 = COPY $w0
%1:gpr32 = COPY %0:gpr32
...
%4:gpr = MOV_32_64 %1:gpr32
%5:gpr = SLL_ri %4:gpr(tied-def 0), 32
%6:gpr = SRA_ri %5:gpr(tied-def 0), 32
The reason is that it did not follow def-use chain to skip all
intermediate 32bit-to-32bit COPY instructions.
The current implementation is also very conservative for PHI
instructions. If any PHI insn component is another PHI or COPY insn,
it will just permit SLL/SRA.
This patch fixed the issue as follows:
- During def/use chain traversal, if any physical register is read,
SLL/SRA will be preserved as these physical registers are mostly
from function return values or current function arguments.
- Recursively visit all COPY and PHI instructions.
2019-11-21 01:52:29 +08:00
|
|
|
; CHECK: if r{{[0-9]+}} == r{{[0-9]+}} goto
|
|
|
|
|
|
|
|
declare dso_local i32 @helper(i32) local_unnamed_addr
|