RAGreedy has two fields of RegisterClassInfo, one called RCI and another RegClassInfo from its base class.
RCI is initialized without freezeReservedRegs first, while RegClassInfo does. Therefore, if reserved registers
information is changed between last time freezeReservedRegs is called and RAGreedy, it's not picked up by RCI.
Instead of having both fields in RAGreedy, remove RCI and use RegClassInfo instead. Also removed is the TRI field
which is present in its base class.
Reviewed By: MatzeB
Differential Revision: https://reviews.llvm.org/D125926
Add a new TargetRegisterInfo hook to allow targets to tweak the
priority of live ranges, so that AllocationPriority of the register
class will be treated as more important than whether the range is local
to a basic block or global. This is determined per-MachineFunction.
Differential Revision: https://reviews.llvm.org/D125102
Last chance recoloring didn't try recoloring a done register with the
same class since it believed there was no point. This doesn't
necessarily apply if the members in that class overlap. Allow the
recoloring to proceed if the assigned interfering physical register
overlaps with the candidate register.
This avoids an allocation failure with overlapping tuples. This
testcase could be handled better, and I don't believe should reach
last chance recoloring. The failure only manifests with the mutually
unsatisfiable register hints to overlapping tuples. The earlier
assignment decisions probably should have figured out that using these
hints was a bad idea.
This is a replacement for the original fix attempted in
c46aab01c0.
This fixes "overlapping insert" assertion failures when trying to
unwind an unsuccessful recoloring attempt.
The problem would occur when there are multiple recoloring candidates
which recursively required recoloring. If one recoloring candidate was
successfully recolored at one level, and the next recoloring candidate
was unsuccessful, we would not roll back the first candidates
successful recoloring. The forgotten successful recoloring may have
been assigned to something that conflicts with a register that needs
to be restored in a parent recoloring attempt.
See the testcase added in issue48473 for a more concrete example with
explanation.
This reverts commit c46aab01c0.
This evidently blocks compiling in some cases that used to work
before. I'm also not fully convinced this is the correct place to fix
this problem.
Discussed extensively on D98232. The functionality introduced in D35816
never worked correctly. In D98232, it was fixed, but, as it was
introducing a large compile-time regression, and the value of the
original patch was called into doubt, we disabled it by default
everywhere. A year later, it appears that caused no grief, so it seems
safe to remove the disabled code.
This should be accompanied by re-opening bug 26810.
Differential Revision: https://reviews.llvm.org/D121128
growRegion() does not scale in code with BBs with a very large number of edges.
In such code growRegion() becomes a compile-time bottleneck, consuming 60% of
the total compilation time.
This patch adds a limit to the complexity of growRegion() by incrementing a counter
in each iteration. We bail out once the limit is reached.
Differential Revision: https://reviews.llvm.org/D120752
This example is not compilable without handling eviction of specific
subregisters. Last chance recoloring was deciding it could try
evicting an overlapping superregister, which doesn't help make any
progress. The LiveIntervalUnion would then assert due to an
overlapping / identical range when trying the new assignment.
Unfortunately this is also producing a verifier error after the
allocation fails. I've seen a number of these, and not sure if we
should just start deleting the function on error rather than trying to
figure out how to put together valid MIR.
I'm not super confident this is the right place to fix this. I also
have a number of failing testcases I need to fix by handling partial
evictions of superregisters.
This is because a subsequent patch will propose obtaining the VRAI from
the advisor, which will enable feature caching for the ML advisor, for
better compile time. Making this change first as it's both innocuous and
keeps the future patch to be reviewed small.
When evicting interference, it causes an asseertion error
since LiveIntervals::intervalIsInOneMBB assumes that input
is not empty.
This patch fixed bug mentioned in D118020.
Reviewed By: MatzeB
Differential Revision: https://reviews.llvm.org/D118124
This patch simplifies the interface between RAGreedy and the eviction
adviser by passing the allocator to the adviser, which allows the latter
to extract needed information as needed, rather than requiring it be passed
piecemeal at construction time (which would also complicate later
evolution).
Part of this, the patch also moves ExtraRegInfo back to RAGreedy. We
keep the encapsulation of ExtraRegInfo because it has benefits (e.g.
improved readability by abstracting access to the cascade info) and also
simpler re-initialization at regalloc pass re-entry time (we just flush
the Optional).
Differential Revision: https://reviews.llvm.org/D116669
This was suggested in D114831. It should simplify the relation between
eviction advisor and the allocator, and simplify ingesting more features
tied to the internals of the allocator, in the future.
This change simply pulls out RAGreedy, places it in the llvm namespace,
and cleans up a bit the includes in the new header file.
Differential Revision: https://reviews.llvm.org/D116114
This patch introduces the eviction analysis and the eviction advisor,
the default implementation, and the scaffolding for introducing the
other implementations of the advisor.
Differential Revision: https://reviews.llvm.org/D115707
This would allow sharing the LiveRangeStageManager between different
RegAllocEvictionAdvisors. One scenario is for ML training, where we want
to capture what the default advisor would do, for bootstrapping (speeds
up training).
Differential Revision: https://reviews.llvm.org/D114831
There are 2 eviction queries. One is made by tryAssign, when it attempts to
free an interference occupying the hint of the candidate. The other is
during 'regular' interference resolution, where we scan over all
physical registers and try to see if we can evict live ranges in favor
of the candidate. We currently use the same logic in both cases, just
that the former never passes the cost to any subsequent query.
Technically, the 2 decisions could be implemented with different
policies.
This patch splits the 2.
RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-November/153639.html
Differential Revision: https://reviews.llvm.org/D114019
To correctly use Query, one had to first call collectInterferingVRegs to
pre-cache the query result, then call interferingVRegs. Failing the
former, interferingVRegs could be stale. This did cause a bug which was
addressed in D98232, but the underlying usability issue of the Query API
wasn't.
This patch addresses the latter by making collectInterferingVRegs an
implementation detail, and having interferingVRegs play both roles. One
side-effect of this is that interferingVRegs is not const anymore.
Differential Revision: https://reviews.llvm.org/D112882
This simple heuristic uses the estimated live range length combined
with the number of registers in the class to switch which heuristic to
use. This was taking the raw number of registers in the class, even
though not all of them may be available. AMDGPU heavily relies on
dynamically reserved numbers of registers based on user attributes to
satisfy occupancy constraints, so the raw number is highly misleading.
There are still a few problems here. In the original testcase that
made me notice this, the live range size is incorrect after the
scheduler rearranges instructions, since the instructions don't have
the original InstrDist offsets. Additionally, I think it would be more
appropriate to use the number of disjointly allocatable registers in
the class. For the AMDGPU register tuples, there are a large number of
registers in each tuple class, but only a small fraction can actually
be allocated at the same time since they all overlap with each
other. It seems we do not have a query that corresponds to the number
of independently allocatable registers. Relatedly, I'm still debugging
some allocation failures where overlapping tuples seem to not be
handled correctly.
The test changes are mostly noise. There are a handful of x86 tests
that look like regressions with an additional spill, and a handful
that now avoid a spill. The worst looking regression is likely
test/Thumb2/mve-vld4.ll which introduces a few additional
spills. test/CodeGen/AMDGPU/soft-clause-exceeds-register-budget.ll
shows a massive improvement by completely eliminating a large number
of spills inside a loop.
It was introduced in 1a6dc92 and only enabled on PowerPC/AMDGPU. That
should be enabled for all targets.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D108010
ACC registers are a combination of four consecutive vector registers.
If the vector registers are assigned first this often forces a number
of copies to appear just before the ACC register is created. If the ACC
register is assigned first then fewer copies are generated when the vector
registers are assigned.
This patch tries to force the register allocator to assign the ACC registers first
and then the UACC registers and then the vector pair registers. It does this
by changing the priority of the register classes.
This patch also adds hints to help the register allocator assign UACC registers from
known ACC registers and vector pair registers from known UACC registers.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D105854
AMDGPU normally spills SGPRs to VGPRs. Previously, since all register
classes are handled at the same time, this was problematic. We don't
know ahead of time how many registers will be needed to be reserved to
handle the spilling. If no VGPRs were left for spilling, we would have
to try to spill to memory. If the spilled SGPRs were required for exec
mask manipulation, it is highly problematic because the lanes active
at the point of spill are not necessarily the same as at the restore
point.
Avoid this problem by fully allocating SGPRs in a separate regalloc
run from VGPRs. This way we know the exact number of VGPRs needed, and
can reserve them for a second run. This fixes the most serious
issues, but it is still possible using inline asm to make all VGPRs
unavailable. Start erroring in the case where we ever would require
memory for an SGPR spill.
This is implemented by giving each regalloc pass a callback which
reports if a register class should be handled or not. A few passes
need some small changes to deal with leftover virtual registers.
In the AMDGPU implementation, a new pass is introduced to take the
place of PrologEpilogInserter for SGPR spills emitted during the first
run.
One disadvantage of this is currently StackSlotColoring is no longer
used for SGPR spills. It would need to be run again, which will
require more work.
Error if the standard -regalloc option is used. Introduce new separate
-sgpr-regalloc and -vgpr-regalloc flags, so the two runs can be
controlled individually. PBQB is not currently supported, so this also
prevents using the unhandled allocator.
Re-land the patch with a fix of clang test.
Cost of spill location is computed basing on relative branch frequency
where corresponding spill/reload/copy are located.
While the number itself is highly depends on incoming IR,
the total cost can be used when do some changes in RA.
Revert "Revert "[GreedyRA ORE] Add Cost of spill locations into remark""
This reverts commit 680f3d6de7.
Cost of spill location is computed basing on relative branch frequency
where corresponding spill/reload/copy are located.
While the number itself is highly depends on incoming IR,
the total cost can be used when do some changes in RA.
Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100020
Pseudo probe are currently given a slot index like other regular instructions. This affects register pressure and lifetime weight computation because of enlarged lifetime length with pseudo probe instructions. As a consequence, program could get different code generated w/ and w/o pseudo probes. I'm closing the gap by excluding pseudo probes from stack index and downstream register allocation related passes.
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D100334
Greedy RA adds copies of virtual registers when splitting live interval.
This stat might be useful.
Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100017
Patchpoint instructions have operands which is actually zero cost
(or the same as register) to use the value from the stack.
In terms of statistic it makes same to separate them.
Move from computation instructions related to stack spill/reload to
number of stack slot referenced.
Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100016