llvm/test/DebugInfo/MIR/X86/live-debug-values-reg-copy.mir failed with
EXPENSIVE_CHECKS enabled, causing the patch to be reverted in
rG2c496bb5309c972d59b11f05aee4782ddc087e71.
This patch relands the patch with a proper fix to the
live-debug-values-reg-copy.mir tests, by ensuring the MIR encodes the
callee-saves correctly so that the CalleeSaved info is taken from MIR
directly, rather than letting it be recalculated by the PEI pass. I've
done this by running `llc -stop-before=prologepilog` on the LLVM
IR as captured in the test files, adding the extra MOV instructions
that were manually added in the original test file, then running `llc
-run-pass=prologepilog` and finally re-added the comments for the MOV
instructions.
Commit message from D66935:
This patch fixes a bug exposed by D65653 where a subsequent invocation
of `determineCalleeSaves` ends up with a different size for the callee
save area, leading to different frame-offsets in debug information.
In the invocation by PEI, `determineCalleeSaves` tries to determine
whether it needs to spill an extra callee-saved register to get an
emergency spill slot. To do this, it calls 'estimateStackSize' and
manually adds the size of the callee-saves to this. PEI then allocates
the spill objects for the callee saves and the remaining frame layout
is calculated accordingly.
A second invocation in LiveDebugValues causes estimateStackSize to return
the size of the stack frame including the callee-saves. Given that the
size of the callee-saves is added to this, these callee-saves are counted
twice, which leads `determineCalleeSaves` to believe the stack has
become big enough to require spilling an extra callee-save as emergency
spillslot. It then updates CalleeSavedStackSize with a larger value.
Since CalleeSavedStackSize is used in the calculation of the frame
offset in getFrameIndexReference, this leads to incorrect offsets for
variables/locals when this information is recalculated after PEI.
This patch fixes the lldb unit tests in `functionalities/thread/concurrent_events/*`
Changes after D66935:
Ensures AArch64FunctionInfo::getCalleeSavedStackSize does not return
the uninitialized CalleeSavedStackSize when running `llc` on a specific
pass where the MIR code has already been expected to have gone through PEI.
Instead, getCalleeSavedStackSize (when passed the MachineFrameInfo) will try
to recalculate the CalleeSavedStackSize from the CalleeSavedInfo. In debug
mode, the compiler will assert the recalculated size equals the cached
size as calculated through a call to determineCalleeSaves.
This fixes two tests:
test/DebugInfo/AArch64/asan-stack-vars.mir
test/DebugInfo/AArch64/compiler-gen-bbs-livedebugvalues.mir
that otherwise fail when compiled using msan.
Reviewed By: omjavaid, efriedma
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68783
llvm-svn: 375425
This patch fixes a bug exposed by D65653 where a subsequent invocation
of `determineCalleeSaves` ends up with a different size for the callee
save area, leading to different frame-offsets in debug information.
In the invocation by PEI, `determineCalleeSaves` tries to determine
whether it needs to spill an extra callee-saved register to get an
emergency spill slot. To do this, it calls 'estimateStackSize' and
manually adds the size of the callee-saves to this. PEI then allocates
the spill objects for the callee saves and the remaining frame layout
is calculated accordingly.
A second invocation in LiveDebugValues causes estimateStackSize to return
the size of the stack frame including the callee-saves. Given that the
size of the callee-saves is added to this, these callee-saves are counted
twice, which leads `determineCalleeSaves` to believe the stack has
become big enough to require spilling an extra callee-save as emergency
spillslot. It then updates CalleeSavedStackSize with a larger value.
Since CalleeSavedStackSize is used in the calculation of the frame
offset in getFrameIndexReference, this leads to incorrect offsets for
variables/locals when this information is recalculated after PEI.
Reviewers: omjavaid, eli.friedman, thegameg, efriedma
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D66935
llvm-svn: 372204
Add an explicit construction of the ArrayRef, gcc 5 and earlier don't
seem to select the ArrayRef constructor which takes a C array when the
construction is implicit.
Original commit message:
- Avoid a crash when IPRA calls ARMFrameLowering::determineCalleeSaves
with a null RegScavenger. Simply not updating the register scavenger
is fine because IPRA only cares about the SavedRegs vector, the acutal
code of the function has already been generated at this point.
- Add a new hook to TargetRegisterInfo to get the set of registers which
can be clobbered inside a call, even if the compiler can see both
sides, by linker-generated code.
Differential revision: https://reviews.llvm.org/D64908
llvm-svn: 367819
This optimisation isn't generally profitable for ARM, because we can
save/restore many registers in the prologue and epilogue using the PUSH
and POP instructions, but mostly use individual LDR/STR instructions for
other spills.
Differential revision: https://reviews.llvm.org/D64910
llvm-svn: 367670
- Avoid a crash when IPRA calls ARMFrameLowering::determineCalleeSaves
with a null RegScavenger. Simply not updating the register scavenger
is fine because IPRA only cares about the SavedRegs vector, the acutal
code of the function has already been generated at this point.
- Add a new hook to TargetRegisterInfo to get the set of registers which
can be clobbered inside a call, even if the compiler can see both
sides, by linker-generated code.
Differential revision: https://reviews.llvm.org/D64908
llvm-svn: 367669
This is extremly slow on AMDGPU, which has a lot of physical register
and a lot of register classes.
determineCalleeSaves, via MachineRegisterInfo::isPhysRegUsed already
added all of the super registers to the saved set.
llvm-svn: 365370
I'm not sure if it's worth it or not to add a hook to disable the pass
for an arbitrary function.
This pass is taking up to 5% of compile time in tiny programs by
iterating through all of the physical registers in every register
class. This pass should be rewritten in terms of regunits. For now,
skip doing anything for entry point functions. The vast majority of
functions in the real world aren't callable, so just not running this
will give the majority of the benefit.
llvm-svn: 365255
To determine the list of clobbered registers, the RegUsageInfoCollector pass
uses the list of callee saved registers provided by the target and then augments
it with the list of registers which have all their subregisters saved. It then
basically does the difference between all the registers and the saved registers
to come up with what is clobbered (plus it checks that the register is defined
within that functions).
The patch fixes a bug where when register does not have any subregister lane,
hence when checking if any of its subregister are not saved, we would find none
and think the register is saved as well.
That's obviously wrong.
The code was actually kind of checking for something like that with the
CoveredBySubRegs bit. What this bit says is that a register is completely
covered by its subregisters.
We required that this bit was set, to check that a register was saved by its
subregister lanes, since without this bit, we potentially would miss to check
some part of the register.
However, this bit is used de facto on registers that don't have any
subregisters (e.g., on ARM) and the code was not prepared for that.
This patch fixes this by checking that a register has subregisters before
declaring it saved when none of its lanes are modified.
llvm-svn: 361901
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
MachineFunction can only be used in code using lib/CodeGen, hence we
can keep a more specific reference to LLVMTargetMachine rather than just
TargetMachine around.
Do the same for references in ScheduleDAG and RegUsageInfoCollector.
llvm-svn: 346183
On AMDGPU we have 70 register classes, so iterating over all 70
each time and exiting is costly on the CPU, this flips the loop
around so that it loops over the 70 register classes first,
and exits without doing the inner loop if needed.
On my test just starting radv this takes
RegUsageInfoCollector::runOnMachineFunction
from 6.0% of total time to 2.7% of total time,
and reduces the startup from 2.24s to 2.19s
Patch by David Airlie!
Differential Revision: https://reviews.llvm.org/D48582
llvm-svn: 340993
- Remove unnecessary anchor function
- Remove unnecessary override of getAnalysisUsage
- Use reference instead of pointers where things cannot be nullptr
- Use ArrayRef instead of std::vector where possible
llvm-svn: 337989
- Avoid duplication of regmask size calculation.
- Simplify allocateRegisterMask() call.
- Rename allocateRegisterMask() to allocateRegMask() to be consistent
with naming in MachineOperand.
llvm-svn: 337986
Previously, this pass would look at the (static) set returned by
getCallPreservedMask() and add those back as preserved in the case when
isSafeForNoCSROpt() returns false.
A problem is that a target may have to save some registers even when NoCSROpt
takes place. For instance, on SystemZ, the return register is needed upon
return from a function.
Furthermore, getCallPreservedMask() only includes the registers that the
target actually wishes to emit save/restore instructions for. This means that
subregs and (fully saved) superregs are missing.
This patch instead takes the (dynamic) set returned by target for the
function from determineCalleeSaves() and then adds sub/super regs to build
the set to be used when building the RegMask for the function.
Review: Quentin Colombet, Ulrich Weigand
https://reviews.llvm.org/D46315
llvm-svn: 333261
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.
In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.
Differential Revision: https://reviews.llvm.org/D43624
llvm-svn: 332240
Don't assume the alias of a defined reg is always already in the set.
As the test case in https://bugs.llvm.org/show_bug.cgi?id=36587 discovered,
it is wrong to assume that all the aliases of the defined register in the
*current function* is already present in the UsedPhysRegsMask.
This patch changes this so that any definition in the current function of a
phys-reg always results in all its aliases inserted into the set of defined
registers.
Review: Quentin Colombet
https://reviews.llvm.org/D45157
llvm-svn: 331509
output
As part of the unification of the debug format and the MIR format,
always use `printReg` to print all kinds of registers.
Updated the tests using '_' instead of '%noreg' until we decide which
one we want to be the default one.
Differential Revision: https://reviews.llvm.org/D40421
llvm-svn: 319445
This header already includes a CodeGen header and is implemented in
lib/CodeGen, so move the header there to match.
This fixes a link error with modular codegeneration builds - where a
header and its implementation are circularly dependent and so need to be
in the same library, not split between two like this.
llvm-svn: 317379
The previous algorithm for RegUsageInfoCollector had pretty bad
performance on architectures with a lot of registers that alias
a lot one another, because we potentially iterate for every register
over all the aliasing registers. This costs even more if the function
is small and doesn't define a lot of registers.
This patch changes the algorithm to one that while iterating over
all the registers it will iterate over the aliasing registers only
if the register itself is defined.
This should be faster based on the assumption that only a subset
of the whole LLVM registers set is actually defined in the function.
Differential Revision: https://reviews.llvm.org/D30880
llvm-svn: 297673
This patch fixes a very subtle bug in regmask calculation. Thanks to zan
jyu Wong <zyfwong@gmail.com> for bringing this to notice.
For example if CL is only clobbered than CH should not be marked
clobbered but CX, RCX and ECX should be mark clobbered. Previously for
each modified register all of its aliases are marked clobbered by
markRegClobbred() in RegUsageInfoCollector.cpp but that is wrong because
when CL is clobbered then MRI::isPhysRegModified() will return true for
CL, CX, ECX, RCX which is correct behavior but then for CX, EXC, RCX we
mark CH also clobbered as CH is aliased to CX,ECX,RCX so
markRegClobbred() is not required because isPhysRegModified already take
cares of proper aliasing register. A very simple test case has been
added to verify this change.
Please find relevant bug report here :
http://llvm.org/PR28567
Patch by Vivek Pandya <vivekvpandya@gmail.com>
Differential Revision: https://reviews.llvm.org/D22400
llvm-svn: 276235
IPRA try to optimize caller saved register by propagating register
usage information from callee to caller so it is beneficial to have
caller saved registers compare to callee saved registers when IPRA
is enabled. Please find more detailed explanation here
https://groups.google.com/d/msg/llvm-dev/XRzGhJ9wtZg/tjAJqb0eEgAJ.
This change makes local function do not have any callee preserved
register when IPRA is enabled. A simple test case is also added to
verify this change.
Patch by Vivek Pandya <vivekvpandya@gmail.com>
Differential Revision: http://reviews.llvm.org/D21561
llvm-svn: 275347
Add an option to enable the analysis of MachineFunction register
usage to extract the list of clobbered registers.
When enabled, the CodeGen order is changed to be bottom up on the Call
Graph.
The analysis is split in two parts, RegUsageInfoCollector is the
MachineFunction Pass that runs post-RA and collect the list of
clobbered registers to produce a register mask.
An immutable pass, RegisterUsageInfo, stores the RegMask produced by
RegUsageInfoCollector, and keep them available. A future tranformation
pass will use this information to update every call-sites after
instruction selection.
Patch by Vivek Pandya <vivekvpandya@gmail.com>
Differential Revision: http://reviews.llvm.org/D20769
llvm-svn: 272403