[LiveInterval] Allow updating subranges with slightly out-dated IR
During register coalescing, we update the live-intervals on-the-fly.
To do that we are in this strange mode where the live-intervals can
be slightly out-of-sync (more precisely they are forward looking)
compared to what the IR actually represents.
This happens because the register coalescer only updates the IR when
it is done with updating the live-intervals and it has to do it this
way because updating the IR on-the-fly would actually clobber some
information on how the live-ranges that are being updated look like.
This is problematic for updates that rely on the IR to accurately
represents the state of the live-ranges. Right now, we have only
one of those: stripValuesNotDefiningMask.
To reconcile this need of out-of-sync IR, this patch introduces a
new argument to LiveInterval::refineSubRanges that allows the code
doing the live range updates to reason about how the code should
look like after the coalescer will have rewritten the registers.
Essentially this captures how a subregister index with be offseted
to match its position in a new register class.
E.g., let say we want to merge:
V1.sub1:<2 x s32> = COPY V2.sub3:<4 x s32>
We do that by choosing a class where sub1:<2 x s32> and sub3:<4 x s32>
overlap, i.e., by choosing a class where we can find "offset + 1 == 3".
Put differently we align V2's sub3 with V1's sub1:
V2: sub0 sub1 sub2 sub3
V1: <offset> sub0 sub1
This offset will look like a composed subregidx in the the class:
V1.(composed sub2 with sub1):<4 x s32> = COPY V2.sub3:<4 x s32>
=> V1.(composed sub2 with sub1):<4 x s32> = COPY V2.sub3:<4 x s32>
Now if we didn't rewrite the uses and def of V1, all the checks for V1
need to account for this offset to match what the live intervals intend
to capture.
Prior to this patch, we would fail to recognize the uses and def of V1
and would end up with machine verifier errors: No live segment at def.
This could lead to miscompile as we would drop some live-ranges and
thus, miss some interferences.
For this problem to trigger, we need to reach stripValuesNotDefiningMask
while having a mismatch between the IR and the live-ranges (i.e.,
we have to apply a subreg offset to the IR.)
This requires the following three conditions:
1. An update of overlapping subreg lanes: e.g., dsub0 == <ssub0, ssub1>
2. An update with Tuple registers with a possibility to coalesce the
subreg index: e.g., v1.dsub_1 == v2.dsub_3
3. Subreg liveness enabled.
looking at the IR to decide what is alive and what is not, i.e., calling
stripValuesNotDefiningMask.
coalescer maintains for the live-ranges information.
None of the targets that currently use subreg liveness (i.e., the targets
that fulfill #3, Hexagon, AMDGPU, PowerPC, and SystemZ IIRC) expose #1 and
and #2, so this patch also artificial enables subreg liveness for ARM,
so that a nice test case can be attached.
2019-11-13 08:32:12 +08:00
|
|
|
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
|
|
|
|
# RUN: llc %s -start-before simple-register-coalescing -mtriple=arm-apple-ios -stop-after machine-scheduler -o - -arm-enable-subreg-liveness -verify-machineinstrs | FileCheck %s
|
|
|
|
|
|
|
|
# Check that when we merge live-ranges that imply offseting
|
|
|
|
# the definition of a subregister by some other subreg index,
|
|
|
|
# we take that new index into account while updating the subrange.
|
|
|
|
#
|
|
|
|
# For this specific test case, the coalescer is going to get rid
|
|
|
|
# of `%5.dsub_1:dtriple = COPY %4.dsub_3` by aligning
|
|
|
|
# %5.dsub_1:<3 x s64> with %4.dsub_3:<4 x s64>.
|
|
|
|
# This is done by moving to a bigger register class <5 x s64>
|
|
|
|
# and offseting %5 definitions with a new subregidx:
|
|
|
|
# NewVar: <5 x s64> dsub_0 dsub_1 dsub_2 dsub_3 dsub_4
|
|
|
|
# %4: <4 x s64> dsub_0 dsub_1 dsub_2 dsub_3
|
|
|
|
# %5: <3 x s64> <==offset===> dsub_0 dsub_1 dsub_2
|
|
|
|
#
|
|
|
|
# In other %5.dsub_0 needs to be mapped to NewVar.dsub_2, %5.dsub_1
|
|
|
|
# to NewVar.dsub_3 and so on. So essentially we are offseting %5 by
|
|
|
|
# dsub_2.
|
|
|
|
#
|
|
|
|
# When updating the live-ranges, the register coalescer actually
|
|
|
|
# has not rewritten the original code, so we need to fake the
|
|
|
|
# rewrite to do that update.
|
|
|
|
# This used to be wrong and this test was failling with a machine
|
|
|
|
# verifier error: No live segment at def.
|
|
|
|
#
|
|
|
|
# The test case runs through the coalescer *and* the scheduler, just
|
|
|
|
# to force the live intervals to be carried around so that the verifier
|
|
|
|
# gets a chance to verify those. If we were to just run the coalescer,
|
|
|
|
# the live intervals would be dropped before running the verifier since
|
|
|
|
# no other pass would need that analysis around.
|
|
|
|
#
|
|
|
|
# Note: The test case looks slightly more complicated than just the
|
|
|
|
# offseting part. That's because the bug needs three things to
|
|
|
|
# trigger:
|
|
|
|
# 1. Overlapping subreg lanes: here, dsub0 == <ssub0, ssub1>
|
|
|
|
# 2. Tuple registers with a possibility to coalesce the subreg index:
|
|
|
|
# here, what we explain with %5.dsub_1 == %4.dsub_3
|
|
|
|
# 3. Subreg liveness enabled.
|
|
|
|
# #1 is required to trigger the splitting of subranges that implies
|
|
|
|
# looking at the IR to decide what is alive and what is not.
|
|
|
|
# #2 is what produces the IR to be out-of-synce with what the reg coalescer
|
|
|
|
# maintains for the live-ranges information.
|
|
|
|
# #3 is, well, the problem has to do with subranges updates!
|
|
|
|
#
|
|
|
|
# In the end, the expected result is to have all the variables
|
|
|
|
# being coalesced in one big (qqqq) variable.
|
|
|
|
---
|
|
|
|
name: main
|
|
|
|
alignment: 1
|
|
|
|
tracksRegLiveness: true
|
|
|
|
frameInfo:
|
|
|
|
maxAlignment: 1
|
|
|
|
machineFunctionInfo: {}
|
|
|
|
body: |
|
|
|
|
bb.0:
|
|
|
|
liveins: $d2, $s1, $d4
|
|
|
|
|
|
|
|
|
|
|
|
; CHECK-LABEL: name: main
|
|
|
|
; CHECK: liveins: $d2, $s1, $d4
|
|
|
|
; CHECK: undef %4.dsub_0:qqqqpr_with_ssub_4 = COPY $d4
|
|
|
|
; CHECK: %4.ssub_4:qqqqpr_with_ssub_4 = COPY $s1
|
|
|
|
; CHECK: %4.dsub_1:qqqqpr_with_ssub_4 = COPY $d2
|
|
|
|
; CHECK: %4.dsub_3:qqqqpr_with_ssub_4 = COPY %4.dsub_1
|
|
|
|
; CHECK: KILL implicit-def %4.dsub_2, implicit %4.qqsub_0
|
|
|
|
; CHECK: %4.dsub_4:qqqqpr_with_ssub_4 = COPY %4.dsub_1
|
[MIR][ARM] MachineOperand comments
This adds infrastructure to print and parse MIR MachineOperand comments.
The motivation for the ARM backend is to print condition code names instead of
magic constants that are difficult to read (for human beings). For example,
instead of this:
dead renamable $r2, $cpsr = tEOR killed renamable $r2, renamable $r1, 14, $noreg
t2Bcc %bb.4, 0, killed $cpsr
we now print this:
dead renamable $r2, $cpsr = tEOR killed renamable $r2, renamable $r1, 14 /* CC::always */, $noreg
t2Bcc %bb.4, 0 /* CC:eq */, killed $cpsr
This shows that MachineOperand comments are enclosed between /* and */. In this
example, the EOR instruction is not conditionally executed (i.e. it is "always
executed"), which is encoded by the 14 immediate machine operand. Thus, now
this machine operand has /* CC::always */ as a comment. The 0 on the next
conditional branch instruction represents the equal condition code, thus now
this operand has /* CC:eq */ as a comment.
As it is a comment, the MI lexer/parser completely ignores it. The benefit is
that this keeps the change in the lexer extremely minimal and no target
specific parsing needs to be done. The changes on the MIPrinter side are also
minimal, as there is only one target hooks that is used to create the machine
operand comments.
Differential Revision: https://reviews.llvm.org/D74306
2020-02-24 22:19:21 +08:00
|
|
|
; CHECK: tBX_RET 14 /* CC::al */, $noreg, implicit %4.ssub_4_ssub_5_ssub_6_ssub_7_ssub_8_ssub_9
|
[LiveInterval] Allow updating subranges with slightly out-dated IR
During register coalescing, we update the live-intervals on-the-fly.
To do that we are in this strange mode where the live-intervals can
be slightly out-of-sync (more precisely they are forward looking)
compared to what the IR actually represents.
This happens because the register coalescer only updates the IR when
it is done with updating the live-intervals and it has to do it this
way because updating the IR on-the-fly would actually clobber some
information on how the live-ranges that are being updated look like.
This is problematic for updates that rely on the IR to accurately
represents the state of the live-ranges. Right now, we have only
one of those: stripValuesNotDefiningMask.
To reconcile this need of out-of-sync IR, this patch introduces a
new argument to LiveInterval::refineSubRanges that allows the code
doing the live range updates to reason about how the code should
look like after the coalescer will have rewritten the registers.
Essentially this captures how a subregister index with be offseted
to match its position in a new register class.
E.g., let say we want to merge:
V1.sub1:<2 x s32> = COPY V2.sub3:<4 x s32>
We do that by choosing a class where sub1:<2 x s32> and sub3:<4 x s32>
overlap, i.e., by choosing a class where we can find "offset + 1 == 3".
Put differently we align V2's sub3 with V1's sub1:
V2: sub0 sub1 sub2 sub3
V1: <offset> sub0 sub1
This offset will look like a composed subregidx in the the class:
V1.(composed sub2 with sub1):<4 x s32> = COPY V2.sub3:<4 x s32>
=> V1.(composed sub2 with sub1):<4 x s32> = COPY V2.sub3:<4 x s32>
Now if we didn't rewrite the uses and def of V1, all the checks for V1
need to account for this offset to match what the live intervals intend
to capture.
Prior to this patch, we would fail to recognize the uses and def of V1
and would end up with machine verifier errors: No live segment at def.
This could lead to miscompile as we would drop some live-ranges and
thus, miss some interferences.
For this problem to trigger, we need to reach stripValuesNotDefiningMask
while having a mismatch between the IR and the live-ranges (i.e.,
we have to apply a subreg offset to the IR.)
This requires the following three conditions:
1. An update of overlapping subreg lanes: e.g., dsub0 == <ssub0, ssub1>
2. An update with Tuple registers with a possibility to coalesce the
subreg index: e.g., v1.dsub_1 == v2.dsub_3
3. Subreg liveness enabled.
looking at the IR to decide what is alive and what is not, i.e., calling
stripValuesNotDefiningMask.
coalescer maintains for the live-ranges information.
None of the targets that currently use subreg liveness (i.e., the targets
that fulfill #3, Hexagon, AMDGPU, PowerPC, and SystemZ IIRC) expose #1 and
and #2, so this patch also artificial enables subreg liveness for ARM,
so that a nice test case can be attached.
2019-11-13 08:32:12 +08:00
|
|
|
%3:dpr_vfp2 = COPY $d4
|
|
|
|
undef %0.ssub_0:dpr_vfp2 = COPY $s1
|
|
|
|
%1:dpr_vfp2 = COPY $d2
|
|
|
|
undef %4.dsub_0:dquad = COPY %3
|
|
|
|
%4.dsub_1:dquad = COPY %1
|
|
|
|
%4.dsub_2:dquad = COPY %0
|
|
|
|
%4.dsub_3:dquad = COPY %1
|
|
|
|
KILL implicit-def undef %5.dsub_0:dtriple, implicit %4
|
|
|
|
%5.dsub_1:dtriple = COPY %4.dsub_3
|
|
|
|
%5.dsub_2:dtriple = COPY %1
|
|
|
|
tBX_RET 14, $noreg, implicit %5
|
|
|
|
|
|
|
|
...
|