[RegisterCoalescer] Fix the creation of subranges when rematerialization is used

* Context *

During register coalescing, we use rematerialization when coalescing is not
possible. That means we may rematerialize a super register when only a smaller
register is actually used.
E.g.,
0B v1 = ldimm 0xFF
1B v2 = COPY v1.low8bits
2B   = v2
=>
0B v1 = ldimm 0xFF
1B v2 = ldimm 0xFF
2B   = v2.low8bits

Where xB are the slot indexes.
Here v2 grew from a 8-bit register to a 16-bit register.

When that happens and subregister liveness is enabled, we create subranges for
the newly created value.
E.g., before remat, the live range of v2 looked like:
main range: [1r, 2r)
(Reads v2 is defined at index 1 slot register and used before the slot register
of index 2)

After remat, it should look like:
main range: [1r, 2r)
low 8 bits: [1r, 2r)
high 8 bits: [1r, 1d) <-- dead def

I.e., the unsused lanes of v2 should be marked as dead definition.

* The Problem *

Prior to this patch, the live-ranges from the previous exampel, would have the
full live-range for all subranges:
main range: [1r, 2r)
low 8 bits: [1r, 2r)
high 8 bits: [1r, 2r) <-- too long

* The Fix *

Technically, the code that this patch changes is not wrong:
When we create the subranges for the newly rematerialized value, we create only
one subrange for the whole bit mask.
In other words, at this point v2 live-range looks like this:
main range: [1r, 2r)
low & high: [1r, 2r)

Then, it gets wrong when we call LiveInterval::refineSubRanges on low 8 bits:
main range: [1r, 2r)
low 8 bits: [1r, 2r)
high 8 bits: [1r, 2r) <-- too long

Ideally, we would like LiveInterval::refineSubRanges to be able to do the right
thing and mark the dead lanes as such. However, this is not possible, because by
the time we update / refine the live ranges, the IR hasn't been updated yet,
therefore we actually don't have enough information to do the right thing.

Another option to fix the problem would have been to call
LiveIntervals::shrinkToUses after the IR is updated. This is not desirable as
this may have a noticeable impact on compile time.

Instead, what this patch does is when we create the subranges for the
rematerialized value, we explicitly create one subrange for the lanes that were
used before rematerialization and one for the lanes that were not used. The used
one inherits the live range of the main range and the unused one is just created
empty. The existing rematerialization code then detects that the unused one are
not live and it correctly sets dead def intervals for them.

https://llvm.org/PR41372
This commit is contained in:
Quentin Colombet 2019-12-04 15:36:35 -08:00
parent 1f822f212c
commit 2ec71ea7c7
2 changed files with 55 additions and 2 deletions

View File

@ -1733,8 +1733,15 @@ void RegisterCoalescer::updateRegDefsUses(unsigned SrcReg, unsigned DstReg,
if (SubIdx != 0 && MO.isUse() && MRI->shouldTrackSubRegLiveness(DstReg)) {
if (!DstInt->hasSubRanges()) {
BumpPtrAllocator &Allocator = LIS->getVNInfoAllocator();
LaneBitmask Mask = MRI->getMaxLaneMaskForVReg(DstInt->reg);
DstInt->createSubRangeFrom(Allocator, Mask, *DstInt);
LaneBitmask FullMask = MRI->getMaxLaneMaskForVReg(DstInt->reg);
LaneBitmask UsedLanes = TRI->getSubRegIndexLaneMask(SubIdx);
LaneBitmask UnusedLanes = FullMask & ~UsedLanes;
DstInt->createSubRangeFrom(Allocator, UsedLanes, *DstInt);
// The unused lanes are just empty live-ranges at this point.
// It is the caller responsibility to set the proper
// dead segments if there is an actual dead def of the
// unused lanes. This may happen with rematerialization.
DstInt->createSubRange(Allocator, UnusedLanes);
}
SlotIndex MIIdx = UseMI->isDebugValue()
? LIS->getSlotIndexes()->getIndexBefore(*UseMI)

View File

@ -0,0 +1,46 @@
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
# RUN: llc -mcpu=z13 -O3 -misched=ilpmin -systemz-subreg-liveness -verify-machineinstrs -start-before simple-register-coalescing %s -mtriple s390x-ibm-linux -stop-after machine-scheduler -o - | FileCheck %s
# Check that when the register coalescer rematerializes a register to set
# only a sub register, it sets the subranges of the unused lanes as being dead
# at the definition point.
#
# The way that test exercises that comes in two steps:
# - First, we need the register coalescer to rematerialize something.
# In that test, %0 is rematerializable and will be rematerialized in
# %1 since %1 and %0 cannot be directly coalesced (they interfere).
# - Second, we indirectly check that the subranges are valid for %1
# when, in the machine scheduler, we move the instructions that define %1
# closer to the return instruction (i.e., we move MSFI and the rematerialized
# definition of %0 (i.e., %1 = LGHI 25) down). When doing that displacement,
# the scheduler updates the live-ranges of %1. When the subrange for the
# unused lane (here the subrange for %1.subreg_h32) was not correct, the
# scheduler would hit an assertion or access some invalid memory location
# making the compiler crash.
#
# Bottom line, this test checks what was intended if at the end, both %0 and %1
# are defined with `LGHI 25` and the instructions defining %1 are right before
# the return instruction.
#
# PR41372
---
name: main
tracksRegLiveness: true
body: |
bb.0:
; CHECK-LABEL
; CHECK-LABEL: name: main
; CHECK: [[LGHI:%[0-9]+]]:gr64bit = LGHI 25
; CHECK: CHIMux [[LGHI]].subreg_l32, 0, implicit-def $cc
; CHECK: [[LGHI1:%[0-9]+]]:gr64bit = LGHI 25
; CHECK: undef [[LGHI1]].subreg_l32:gr64bit = MSFI [[LGHI1]].subreg_l32, -117440512
; CHECK: Return implicit [[LGHI1]].subreg_l32
%0:gr64bit = LGHI 25
%1:gr32bit = COPY %0.subreg_l32
%1:gr32bit = MSFI %1, -117440512
%2:grx32bit = COPY %0.subreg_l32
CHIMux killed %2, 0, implicit-def $cc
%3:gr32bit = COPY killed %1
Return implicit %3
...