On AMDGPU we have 70 register classes, so iterating over all 70
each time and exiting is costly on the CPU, this flips the loop
around so that it loops over the 70 register classes first,
and exits without doing the inner loop if needed.
On my test just starting radv this takes
RegUsageInfoCollector::runOnMachineFunction
from 6.0% of total time to 2.7% of total time,
and reduces the startup from 2.24s to 2.19s
Patch by David Airlie!
Differential Revision: https://reviews.llvm.org/D48582
llvm-svn: 340993
- Remove unnecessary anchor function
- Remove unnecessary override of getAnalysisUsage
- Use reference instead of pointers where things cannot be nullptr
- Use ArrayRef instead of std::vector where possible
llvm-svn: 337989
- Avoid duplication of regmask size calculation.
- Simplify allocateRegisterMask() call.
- Rename allocateRegisterMask() to allocateRegMask() to be consistent
with naming in MachineOperand.
llvm-svn: 337986
Previously, this pass would look at the (static) set returned by
getCallPreservedMask() and add those back as preserved in the case when
isSafeForNoCSROpt() returns false.
A problem is that a target may have to save some registers even when NoCSROpt
takes place. For instance, on SystemZ, the return register is needed upon
return from a function.
Furthermore, getCallPreservedMask() only includes the registers that the
target actually wishes to emit save/restore instructions for. This means that
subregs and (fully saved) superregs are missing.
This patch instead takes the (dynamic) set returned by target for the
function from determineCalleeSaves() and then adds sub/super regs to build
the set to be used when building the RegMask for the function.
Review: Quentin Colombet, Ulrich Weigand
https://reviews.llvm.org/D46315
llvm-svn: 333261
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.
In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.
Differential Revision: https://reviews.llvm.org/D43624
llvm-svn: 332240
Don't assume the alias of a defined reg is always already in the set.
As the test case in https://bugs.llvm.org/show_bug.cgi?id=36587 discovered,
it is wrong to assume that all the aliases of the defined register in the
*current function* is already present in the UsedPhysRegsMask.
This patch changes this so that any definition in the current function of a
phys-reg always results in all its aliases inserted into the set of defined
registers.
Review: Quentin Colombet
https://reviews.llvm.org/D45157
llvm-svn: 331509
output
As part of the unification of the debug format and the MIR format,
always use `printReg` to print all kinds of registers.
Updated the tests using '_' instead of '%noreg' until we decide which
one we want to be the default one.
Differential Revision: https://reviews.llvm.org/D40421
llvm-svn: 319445
This header already includes a CodeGen header and is implemented in
lib/CodeGen, so move the header there to match.
This fixes a link error with modular codegeneration builds - where a
header and its implementation are circularly dependent and so need to be
in the same library, not split between two like this.
llvm-svn: 317379
The previous algorithm for RegUsageInfoCollector had pretty bad
performance on architectures with a lot of registers that alias
a lot one another, because we potentially iterate for every register
over all the aliasing registers. This costs even more if the function
is small and doesn't define a lot of registers.
This patch changes the algorithm to one that while iterating over
all the registers it will iterate over the aliasing registers only
if the register itself is defined.
This should be faster based on the assumption that only a subset
of the whole LLVM registers set is actually defined in the function.
Differential Revision: https://reviews.llvm.org/D30880
llvm-svn: 297673
This patch fixes a very subtle bug in regmask calculation. Thanks to zan
jyu Wong <zyfwong@gmail.com> for bringing this to notice.
For example if CL is only clobbered than CH should not be marked
clobbered but CX, RCX and ECX should be mark clobbered. Previously for
each modified register all of its aliases are marked clobbered by
markRegClobbred() in RegUsageInfoCollector.cpp but that is wrong because
when CL is clobbered then MRI::isPhysRegModified() will return true for
CL, CX, ECX, RCX which is correct behavior but then for CX, EXC, RCX we
mark CH also clobbered as CH is aliased to CX,ECX,RCX so
markRegClobbred() is not required because isPhysRegModified already take
cares of proper aliasing register. A very simple test case has been
added to verify this change.
Please find relevant bug report here :
http://llvm.org/PR28567
Patch by Vivek Pandya <vivekvpandya@gmail.com>
Differential Revision: https://reviews.llvm.org/D22400
llvm-svn: 276235
IPRA try to optimize caller saved register by propagating register
usage information from callee to caller so it is beneficial to have
caller saved registers compare to callee saved registers when IPRA
is enabled. Please find more detailed explanation here
https://groups.google.com/d/msg/llvm-dev/XRzGhJ9wtZg/tjAJqb0eEgAJ.
This change makes local function do not have any callee preserved
register when IPRA is enabled. A simple test case is also added to
verify this change.
Patch by Vivek Pandya <vivekvpandya@gmail.com>
Differential Revision: http://reviews.llvm.org/D21561
llvm-svn: 275347
Add an option to enable the analysis of MachineFunction register
usage to extract the list of clobbered registers.
When enabled, the CodeGen order is changed to be bottom up on the Call
Graph.
The analysis is split in two parts, RegUsageInfoCollector is the
MachineFunction Pass that runs post-RA and collect the list of
clobbered registers to produce a register mask.
An immutable pass, RegisterUsageInfo, stores the RegMask produced by
RegUsageInfoCollector, and keep them available. A future tranformation
pass will use this information to update every call-sites after
instruction selection.
Patch by Vivek Pandya <vivekvpandya@gmail.com>
Differential Revision: http://reviews.llvm.org/D20769
llvm-svn: 272403