2022-05-09 22:45:56 +08:00
|
|
|
# Instruction referencing for debug info
|
|
|
|
|
|
|
|
This document explains how LLVM uses value tracking, or instruction
|
|
|
|
referencing, to determine variable locations for debug info in the code
|
|
|
|
generation stage of compilation. This content is aimed at those working on code
|
|
|
|
generation targets and optimisation passes. It may also be of interest to anyone
|
|
|
|
curious about low-level debug info handling.
|
|
|
|
|
|
|
|
# Problem statement
|
|
|
|
|
|
|
|
At the end of compilation, LLVM must produce a DWARF location list (or similar)
|
|
|
|
describing what register or stack location a variable can be found in, for each
|
|
|
|
instruction in that variable's lexical scope. We could track the virtual
|
|
|
|
register that the variable resides in through compilation, however this is
|
|
|
|
vulnerable to register optimisations during regalloc, and instruction
|
|
|
|
movements.
|
|
|
|
|
|
|
|
# Solution: instruction referencing
|
|
|
|
|
|
|
|
Rather than identify the virtual register that a variable value resides in,
|
|
|
|
instead in instruction referencing mode, LLVM refers to the machine instruction
|
|
|
|
and operand position that the value is defined in. Consider the LLVM IR way of
|
|
|
|
referring to instruction values:
|
|
|
|
|
2022-06-01 18:04:46 +08:00
|
|
|
```llvm
|
|
|
|
%2 = add i32 %0, %1
|
|
|
|
call void @llvm.dbg.value(metadata i32 %2,
|
|
|
|
```
|
2022-05-09 22:45:56 +08:00
|
|
|
|
|
|
|
In LLVM IR, the IR Value is synonymous with the instruction that computes the
|
|
|
|
value, to the extent that in memory a Value is a pointer to the computing
|
|
|
|
instruction. Instruction referencing implements this relationship in the
|
|
|
|
codegen backend of LLVM, after instruction selection. Consider the X86 assembly
|
|
|
|
below and instruction referencing debug info, corresponding to the earlier
|
|
|
|
LLVM IR:
|
|
|
|
|
2022-06-01 18:04:46 +08:00
|
|
|
```text
|
|
|
|
%2:gr32 = ADD32rr %0, %1, implicit-def $eflags, debug-instr-number 1
|
|
|
|
DBG_INSTR_REF 1, 0, !123, !456, debug-location !789
|
|
|
|
```
|
2022-05-09 22:45:56 +08:00
|
|
|
|
2022-06-01 18:04:46 +08:00
|
|
|
While the function remains in SSA form, virtual register `%2` is sufficient to
|
2022-05-09 22:45:56 +08:00
|
|
|
identify the value computed by the instruction -- however the function
|
|
|
|
eventually leaves SSA form, and register optimisations will obscure which
|
|
|
|
register the desired value is in. Instead, a more consistent way of identifying
|
2022-06-01 18:04:46 +08:00
|
|
|
the instruction's value is to refer to the `MachineOperand` where the value is
|
|
|
|
defined: independently of which register is defined by that `MachineOperand`. In
|
|
|
|
the code above, the `DBG_INSTR_REF` instruction refers to instruction number
|
|
|
|
one, operand zero, while the `ADD32rr` has a `debug-instr-number` attribute
|
|
|
|
attached indicating that it is instruction number one.
|
2022-05-09 22:45:56 +08:00
|
|
|
|
|
|
|
De-coupling variable locations from registers avoids difficulties involving
|
|
|
|
register allocation and optimisation, but requires additional instrumentation
|
|
|
|
when the instructions are optimised instead. Optimisations that replace
|
|
|
|
instructions with optimised versions that compute the same value must either
|
|
|
|
preserve the instruction number, or record a substitution from the old
|
|
|
|
instruction / operand number pair to the new instruction / operand pair -- see
|
2022-06-01 18:04:46 +08:00
|
|
|
`MachineFunction::substituteDebugValuesForInst`. If debug info maintenance is
|
|
|
|
not performed, or an instruction is eliminated as dead code, the variable
|
|
|
|
location is safely dropped and marked "optimised out". The exception is
|
|
|
|
instructions that are mutated rather than replaced, which always need debug info
|
2022-05-09 22:45:56 +08:00
|
|
|
maintenance.
|
|
|
|
|
|
|
|
# Register allocator considerations
|
|
|
|
|
|
|
|
When the register allocator runs, debugging instructions do not directly refer
|
|
|
|
to any virtual registers, and thus there is no need for expensive location
|
2022-06-01 18:04:46 +08:00
|
|
|
maintenance during regalloc (i.e. `LiveDebugVariables`). Debug instructions are
|
2022-05-09 22:45:56 +08:00
|
|
|
unlinked from the function, then linked back in after register allocation
|
|
|
|
completes.
|
|
|
|
|
2022-06-01 18:04:46 +08:00
|
|
|
The exception is `PHI` instructions: these become implicit definitions at
|
|
|
|
control flow merges once regalloc finishes, and any debug numbers attached to
|
|
|
|
`PHI` instructions are lost. To circumvent this, debug numbers of `PHI`s are
|
|
|
|
recorded at the start of register allocation (`phi-node-elimination`), then
|
|
|
|
`DBG_PHI` instructions are inserted after regalloc finishes. This requires some
|
2022-05-09 22:45:56 +08:00
|
|
|
maintenance of which register a variable is located in during regalloc, but at
|
|
|
|
single positions (block entry points) rather than ranges of instructions.
|
|
|
|
|
|
|
|
An example, before regalloc:
|
|
|
|
|
2022-06-01 18:04:46 +08:00
|
|
|
```text
|
|
|
|
bb.2:
|
|
|
|
%2 = PHI %1, %bb.0, %2, %bb.1, debug-instr-number 1
|
|
|
|
```
|
2022-05-09 22:45:56 +08:00
|
|
|
|
|
|
|
After:
|
|
|
|
|
2022-06-01 18:04:46 +08:00
|
|
|
```text
|
|
|
|
bb.2:
|
|
|
|
DBG_PHI $rax, 1
|
|
|
|
```
|
2022-05-09 22:45:56 +08:00
|
|
|
|
2022-06-01 18:04:46 +08:00
|
|
|
# `LiveDebugValues`
|
2022-05-09 22:45:56 +08:00
|
|
|
|
|
|
|
After optimisations and code layout complete, information about variable
|
|
|
|
values must be translated into variable locations, i.e. registers and stack
|
2022-06-01 18:04:46 +08:00
|
|
|
slots. This is performed in the [LiveDebugValues pass][`LiveDebugValues`], where
|
2022-05-09 22:45:56 +08:00
|
|
|
the debug instructions and machine code are separated out into two independent
|
|
|
|
functions:
|
|
|
|
* One that assigns values to variable names,
|
|
|
|
* One that assigns values to machine registers and stack slots.
|
|
|
|
|
2022-06-01 18:04:46 +08:00
|
|
|
LLVM's existing SSA tools are used to place `PHI`s for each function, between
|
2022-05-09 22:45:56 +08:00
|
|
|
variable values and the values contained in machine locations, with value
|
2022-06-01 18:04:46 +08:00
|
|
|
propagation eliminating any unnecessary `PHI`s. The two can then be joined up
|
2022-05-09 22:45:56 +08:00
|
|
|
to map variables to values, then values to locations, for each instruction in
|
|
|
|
the function.
|
|
|
|
|
|
|
|
Key to this process is being able to identify the movement of values between
|
|
|
|
registers and stack locations, so that the location of values can be preserved
|
|
|
|
for the full time that they are resident in the machine.
|
|
|
|
|
|
|
|
# Required target support and transition guide
|
|
|
|
|
|
|
|
Instruction referencing will work on any target, but likely with poor coverage.
|
|
|
|
Supporting instruction referencing well requires:
|
2022-06-01 18:04:46 +08:00
|
|
|
* Target hooks to be implemented to allow `LiveDebugValues` to follow values
|
|
|
|
through the machine,
|
|
|
|
* Target-specific optimisations to be instrumented, to preserve instruction
|
|
|
|
numbers.
|
2022-05-09 22:45:56 +08:00
|
|
|
|
|
|
|
## Target hooks
|
|
|
|
|
2022-06-01 18:04:46 +08:00
|
|
|
`TargetInstrInfo::isCopyInstrImpl` must be implemented to recognise any
|
|
|
|
instructions that are copy-like -- `LiveDebugValues` uses this to identify when
|
2022-05-09 22:45:56 +08:00
|
|
|
values move between registers.
|
|
|
|
|
2022-06-01 18:04:46 +08:00
|
|
|
`TargetInstrInfo::isLoadFromStackSlotPostFE` and
|
|
|
|
`TargetInstrInfo::isStoreToStackSlotPostFE` are needed to identify spill and
|
2022-05-09 22:45:56 +08:00
|
|
|
restore instructions. Each should return the destination or source register
|
2022-06-01 18:04:46 +08:00
|
|
|
respectively. `LiveDebugValues` will track the movement of a value from / to
|
2022-05-09 22:45:56 +08:00
|
|
|
the stack slot. In addition, any instruction that writes to a stack spill
|
2022-06-01 18:04:46 +08:00
|
|
|
should have a `MachineMemoryOperand` attached, so that `LiveDebugValues` can
|
2022-05-09 22:45:56 +08:00
|
|
|
recognise that a slot has been clobbered.
|
|
|
|
|
|
|
|
## Target-specific optimisation instrumentation
|
|
|
|
|
2022-06-01 18:04:46 +08:00
|
|
|
Optimisations come in two flavours: those that mutate a `MachineInstr` to make
|
2022-05-09 22:45:56 +08:00
|
|
|
it do something different, and those that create a new instruction to replace
|
|
|
|
the operation of the old.
|
|
|
|
|
|
|
|
The former _must_ be instrumented -- the relevant question is whether any
|
|
|
|
register def in any operand will produce a different value, as a result of the
|
2022-06-01 18:04:46 +08:00
|
|
|
mutation. If the answer is yes, then there is a risk that a `DBG_INSTR_REF`
|
2022-05-09 22:45:56 +08:00
|
|
|
instruction referring to that operand will end up assigning the different
|
|
|
|
value to a variable, presenting the debugging developer with an unexpected
|
2022-06-01 18:04:46 +08:00
|
|
|
variable value. In such scenarios, call `MachineInstr::dropDebugNumber()` on the
|
|
|
|
mutated instruction to erase its instruction number. Any `DBG_INSTR_REF`
|
2022-05-09 22:45:56 +08:00
|
|
|
referring to it will produce an empty variable location instead, that appears
|
|
|
|
as "optimised out" in the debugger.
|
|
|
|
|
|
|
|
For the latter flavour of optimisation, to increase coverage you should record
|
|
|
|
an instruction number substitution: a mapping from the old instruction number /
|
|
|
|
operand pair to new instruction number / operand pair. Consider if we replace
|
|
|
|
a three-address add instruction with a two-address add:
|
|
|
|
|
2022-06-01 18:04:46 +08:00
|
|
|
```text
|
|
|
|
%2:gr32 = ADD32rr %0, %1, debug-instr-number 1
|
|
|
|
```
|
2022-05-09 22:45:56 +08:00
|
|
|
|
|
|
|
becomes
|
|
|
|
|
2022-06-01 18:04:46 +08:00
|
|
|
```text
|
|
|
|
%2:gr32 = ADD32rr %0(tied-def 0), %1, debug-instr-number 2
|
|
|
|
```
|
2022-05-09 22:45:56 +08:00
|
|
|
|
|
|
|
With a substitution from "instruction number 1 operand 0" to "instruction number
|
2022-06-01 18:04:46 +08:00
|
|
|
2 operand 0" recorded in the `MachineFunction`. In `LiveDebugValues`,
|
|
|
|
`DBG_INSTR_REF`s will be mapped through the substitution table to find the most
|
|
|
|
recent instruction number / operand number of the value it refers to.
|
2022-05-09 22:45:56 +08:00
|
|
|
|
2022-06-01 18:04:46 +08:00
|
|
|
Use `MachineFunction::substituteDebugValuesForInst` to automatically produce
|
2022-05-09 22:45:56 +08:00
|
|
|
substitutions between an old and new instruction. It assumes that any operand
|
|
|
|
that is a def in the old instruction is a def in the new instruction at the
|
|
|
|
same operand position. This works most of the time, for example in the example
|
|
|
|
above.
|
|
|
|
|
|
|
|
If operand numbers do not line up between the old and new instruction, use
|
2022-06-01 18:04:46 +08:00
|
|
|
`MachineInstr::getDebugInstrNum` to acquire the instruction number for the new
|
|
|
|
instruction, and `MachineFunction::makeDebugValueSubstitution` to record the
|
2022-05-09 22:45:56 +08:00
|
|
|
mapping between register definitions in the old and new instructions. If some
|
|
|
|
values computed by the old instruction are no longer computed by the new
|
2022-06-01 18:04:46 +08:00
|
|
|
instruction, record no substitution -- `LiveDebugValues` will safely drop the
|
2022-05-09 22:45:56 +08:00
|
|
|
now unavailable variable value.
|
|
|
|
|
2022-06-01 18:04:46 +08:00
|
|
|
Should your target clone instructions, much the same as the `TailDuplicator`
|
2022-05-09 22:45:56 +08:00
|
|
|
optimisation pass, do not attempt to preserve the instruction numbers or
|
2022-06-01 18:04:46 +08:00
|
|
|
record any substitutions. `MachineFunction::CloneMachineInstr` should drop the
|
2022-05-09 22:45:56 +08:00
|
|
|
instruction number of any cloned instruction, to avoid duplicate numbers
|
2022-06-01 18:04:46 +08:00
|
|
|
appearing to `LiveDebugValues`. Dealing with duplicated instructions is a
|
2022-05-09 22:45:56 +08:00
|
|
|
natural extension to instruction referencing that's currently unimplemented.
|
|
|
|
|
|
|
|
[LiveDebugValues]: SourceLevelDebugging.html#livedebugvalues-expansion-of-variable-locations
|