discussions with Andy. Fundamentally, the previous algorithm is both
counter productive on several fronts and prioritizing things which
aren't necessarily the most important: static branch prediction.
The new algorithm uses the existing loop CFG structure information to
walk through the CFG itself to layout blocks. It coalesces adjacent
blocks within the loop where the CFG allows based on the most likely
path taken. Finally, it topologically orders the block chains that have
been formed. This allows it to choose a (mostly) topologically valid
ordering which still priorizes fallthrough within the structural
constraints.
As a final twist in the algorithm, it does violate the CFG when it
discovers a "hot" edge, that is an edge that is more than 4x hotter than
the competing edges in the CFG. These are forcibly merged into
a fallthrough chain.
Future transformations that need te be added are rotation of loop exit
conditions to be fallthrough, and better isolation of cold block chains.
I'm also planning on adding statistics to model how well the algorithm
does at laying out blocks based on the probabilities it receives.
The old tests mostly still pass, and I have some new tests to add, but
the nested loops are still behaving very strangely. This almost seems
like working-as-intended as it rotated the exit branch to be
fallthrough, but I'm not convinced this is actually the best layout. It
is well supported by the probabilities for loops we currently get, but
those are pretty broken for nested loops, so this may change later.
llvm-svn: 142743
element types, even though the element extraction code does. It is surprising
that this bug has been here for so long. Fixes <rdar://problem/10318778>.
llvm-svn: 142740
able to constant fold load instructions where the argument is a constant.
Second, we should be able to watch multiple PHI nodes through the loop; this
patch only supports PHIs in loop headers, more can be done here.
With this patch, we now constant evaluate:
static const int arr[] = {1, 2, 3, 4, 5};
int test() {
int sum = 0;
for (int i = 0; i < 5; ++i) sum += arr[i];
return sum;
}
llvm-svn: 142731
that the set of callee-saved registers is correct for the specific platform.
<rdar://problem/10313708> & ctor_dtor_count & ctor_dtor_count-2
llvm-svn: 142706
The assumption in the back-end is that PHIs are not allowed at the start of the
landing pad block for SjLj exceptions.
<rdar://problem/10313708>
llvm-svn: 142689
Next step in the ongoing saga of NEON load/store assmebly parsing. Handle
VLD1 instructions that take a two-register register list.
Adjust the instruction definitions to only have the single encoded register
as an operand. The super-register from the pseudo is kept as an implicit def,
so passes which come after pseudo-expansion still know that the instruction
defines the other subregs.
llvm-svn: 142670
ZExtPromotedInteger and SExtPromotedInteger based on the operation we legalize.
SetCC return type needs to be legalized via PromoteTargetBoolean.
llvm-svn: 142660
it's a bit more plausible to use this instead of CodePlacementOpt. The
code for this was shamelessly stolen from CodePlacementOpt, and then
trimmed down a bit. There doesn't seem to be much utility in returning
true/false from this pass as we may or may not have rewritten all of the
blocks. Also, the statistic of counting how many loops were aligned
doesn't seem terribly important so I removed it. If folks would like it
to be included, I'm happy to add it back.
This was probably the most egregious of the missing features, and now
I'm going to start gathering some performance numbers and looking at
specific loop structures that have different layout between the two.
Test is updated to include both basic loop alignment and nested loop
alignment.
llvm-svn: 142645
canonical example I used when developing it, and is one of the primary
motivating real-world use cases for __builtin_expect (when burried under
a macro).
I'm working on more test cases here, but I'm trying to make sure both
that the pass is doing the right thing with the test cases and that they
aren't too brittle to changes elsewhere in the code generation pipeline.
Feedback and/or suggestions on how to test this are very welcome.
Especially feedback on whether testing the block comments is a good
strategy; I couldn't find any good examples to steal from but all the
other ideas I had were a lot uglier or more fragile.
llvm-svn: 142644
block frequency analyses. This differs substantially from the existing
block-placement pass in LLVM:
1) It operates on the Machine-IR in the CodeGen layer. This exposes much
more (and more precise) information and opportunities. Also, the
results are more stable due to fewer transforms ocurring after the
pass runs.
2) It uses the generalized probability and frequency analyses. These can
model static heuristics, code annotation derived heuristics as well
as eventual profile loading. By basing the optimization on the
analysis interface it can work from any (or a combination) of these
inputs.
3) It uses a more aggressive algorithm, both building chains from tho
bottom up to maximize benefit, and using an SCC-based walk to layout
chains of blocks in a profitable ordering without O(N^2) iterations
which the old pass involves.
The pass is currently gated behind a flag, and not enabled by default
because it still needs to grow some important features. Most notably, it
needs to support loop aligning and careful layout of loop structures
much as done by hand currently in CodePlacementOpt. Once it supports
these, and has sufficient testing and quality tuning, it should replace
both of these passes.
Thanks to Nick Lewycky and Richard Smith for help authoring & debugging
this, and to Jakob, Andy, Eric, Jim, and probably a few others I'm
forgetting for reviewing and answering all my questions. Writing
a backend pass is *sooo* much better now than it used to be. =D
llvm-svn: 142641
the last compiler built for the previous flavour is used for the next,
for example the Debug clang compiler was being used for the initial build
of the Release LLVM. Flavors should be independent of each other. This
especially matters if the compiler built for the previous flavour doesn't
actually work!
llvm-svn: 142607
AsmParser. This patch adds validation for target data layout strings upon
construction of TargetData objects. An attempt to construct a TargetData object
from a malformed string will trigger an assertion.
llvm-svn: 142605
When checking the availability of instructions using the TLI, a 'promoted'
instruction IS available. It means that the value is bitcasted to another type
for which there is an operation. The correct check for the availablity of an
instruction is to check if it should be expanded.
llvm-svn: 142542
On spec/gcc, this caused a codesize improvement of ~1.9% for ARM mode and ~4.9% for Thumb(2) mode. This is
codesize including literal pools.
The pools themselves doubled in size for ARM mode and quintupled for Thumb mode, leaving suggestion that there
is still perhaps redundancy in LLVM's use of constant pools that could be decreased by sharing entries.
Fixes PR11087.
llvm-svn: 142530
Add a paste operator '#' to take two identifier-like strings and joint
them. Internally paste gets represented as a !strconcat() with any
necessary casts to string added.
This will be used to implement basic for loop functionality as in:
for i = [0, 1, 2, 3, 4, 5, 6, 7] {
def R#i : Register<...>
}
llvm-svn: 142525
Stop parsing a value if we are in name parsing mode and we see a left
brace. A left brace indicates the start of an object body when we are
parsing a name.
llvm-svn: 142521
Add a mode control to value and ID parsers. The two modes are:
- Parse a value. Expect the parsed ID to map to an existing object.
- Parse a name. Expect the parsed ID to not map to any existing object.
The first is used when parsing an identifier to be looked up, for
example a record field or template argument. The second is used for
parsing declarations. Paste functionality implies that declarations
can contain arbitrary expressions so we need to be able to call into
the general value parser to parse declarations with paste operators.
So we need a way to parse a value-like thing without expecting that
the result will map to some existing object. This parse mode provides
that.
llvm-svn: 142519
Add a Value named "NAME" to each Record. This will be set to the def or defm
name when instantiating multiclasses. This will replace the #NAME# processing
hack once paste functionality is in place.
llvm-svn: 142518
Use lookahead to determine whether a number is really a number or is
part of something forming an identifier. This won't come into play
until the paste operator is recognized as a unique token.
llvm-svn: 142513
Add a peek function to let the Lexer look at a character arbitrarily
far ahead in the stream without consuming anything. We need this to
disambiguate numbers and operands of a paste operation. For example:
def foo#8i
Without lookahead the lexer will treat '8' as a number rather than as
part of a string to be pasted to form an identifier.
llvm-svn: 142512
Add Record names to be changed even on Records that aren't yet
registered. We need to be able to do this for paste functionality
because we do not want to register def names before they are unique
and that can only happen once all paste operations are done. This
change lets us update Record names formed by paste operations and
register the result later.
llvm-svn: 142510
Record names may not be fully resolved at this point so ask for the
record name as a string explicitly. This avoids a potential assert.
llvm-svn: 142502
Allow template arg names to be Inits. This is further work to
implement paste as it allows template names to participate in paste
operations.
llvm-svn: 142500
Convert SetValue to take the value name as an Init. This allows us to
set values for variables whose names are not yet fully resolved.
llvm-svn: 142499
Add a couple of utility functions to take a variable name and qualify
it with the namespace of the enclosing class and/or multiclass. This
is inpreparation for making template arg names first-class Inits.
llvm-svn: 142498
Make the VarInit name an Init itself. We need this to implement paste
functionality so we can reference variables whose names are not yet
completely resolved.
llvm-svn: 142497
Add accessors to get Record values by Init name. This lets us look up
Record values whose names are not yet fully resolved. More work
toward paste.
llvm-svn: 142496
Add a couple of utility functions to get at the name init and return
the name init as a string. This will be used for paste functionality.
llvm-svn: 142494
and switches, with arbitrary numbers of successors. Still optimized for
the common case of 2 successors for a conditional branch.
Add a test case for switch metadata showing up in the BlockFrequencyInfo pass.
llvm-svn: 142493
encoding of probabilities. In the absense of metadata, it continues to
fall back on static heuristics.
This allows __builtin_expect, after lowering through llvm.expect
a branch instruction's metadata, to actually enter the branch
probability model. This is one component of resolving PR2577.
llvm-svn: 142492
layer already had support for printing the results of this analysis, but
the wiring was missing.
Now that printing the analysis works, actually bring some of this
analysis, and the BranchProbabilityInfo analysis that it wraps, under
test! I'm planning on fixing some bugs and doing other work here, so
having a nice place to add regression tests and a way to observe the
results is really useful.
llvm-svn: 142491
- This currently just moves over all of the behavior from LLVM. Eventually all of the configure checks that are directly needed by the LLVM build setup should probably go away, and the project should manage their own configuration checks if necessary.
- This is the 1st half of this work, the actual Makefile.common hasn't moved over yet. I've tried to stage this in such a way that incremental builds will properly reconfigure for most active developers (the Makefiles don't handle reconfiguring in a perfectly reliable way, and I haven't found an easy way to make them do so).
llvm-svn: 142456
register and then compare against that" method when it's too large. We have to
move the value into the register in the "movw, movt" pair of instructions.
llvm-svn: 142440
register and then compare against that" method when it's too large. We have to
move the value into the register in the "movw, movt" pair of instructions.
llvm-svn: 142437
Clean up the patterns, fix comments, and avoid confusing both tools
and coders. Note that the special adds/subs SelectionDAG nodes no
longer have the dummy cc_out operand.
llvm-svn: 142397
predecessor to remove the jump to it as well. Delay clearing the 'landing pad'
flag until after the jumps have been removed. (There is an implicit assumption
in several modules that an MBB which jumps to a landing pad has only two
successors.)
<rdar://problem/10304224>
llvm-svn: 142390
-Fix binary codes and rename operands in .td files so that automatically
generated function MipsCodeEmitter::getBinaryCodeForInstr gives correct
encoding for instructions.
-Define new class FMem for instructions that access memory.
-Define new class FFRGPR for instructions that move data between GPR and
FPU general and control registers.
-Define custom encoder methods for memory operands, and also for size
operands of ext and ins instructions.
-Only static relocation model is currently implemented.
Patch by Sasa Stankovic
llvm-svn: 142378
svn r139159 caused SelectionDAG::getConstant() to promote BUILD_VECTOR operands
with illegal types, even before type legalization. For this testcase, that led
to one BUILD_VECTOR with i16 operands and another with promoted i32 operands,
which triggered the assertion.
llvm-svn: 142370
Some of these can be true at the same time and there are a lot to add,
so this should be turned into a bitfield. Some of the other accessors
should probably be folded into this.
llvm-svn: 142318
.file filenumber "directory" "filename"
This removes one join+split of the directory+filename in MC internals. Because
bitcode files have independent fields for directory and filenames in debug info,
this patch may change the .o files written by existing .bc files.
llvm-svn: 142300
Use the custom inserter for the ARM setjmp intrinsics. Instead of creating the
SjLj dispatch table in IR, where it frequently violates serveral assumptions --
in particular assumptions made by the landingpad instruction about what can
branch to a landing pad and what cannot. Performing this in the back-end allows
us to violate these assumptions without the IR getting angry at us.
It also allows us to perform a small optimization. We can shove the address of
the dispatch's basic block into the function context and not have to add code
around the setjmp to check for the return value and jump to the dispatch.
Neat, huh?
<rdar://problem/10116753>
llvm-svn: 142294
NEON immediates are "interesting". Start of the work to handle parsing them
in an 'as' compatible manner. Getting the matcher to play nicely with
these and the floating point immediates from VFP is an extra fun wrinkle.
llvm-svn: 142293
Invalid strings in asm files will result in parse errors. Invalid string literals passed to TargetData constructors will result in an assertion.
llvm-svn: 142288
combining of the landingpad instruction. The ObjC personality function acts
almost identically to the C++ personality function. In particular, it uses
"null" as a "catch-all" value.
llvm-svn: 142256
Once the intrinsics are marked as having a custom inserter, it will call this
method to emit the dispatch table into the machine function.
llvm-svn: 142245
Some code want to check that *any* call within a function has the 'returns
twice' attribute, not just that the current function has one.
llvm-svn: 142221
In machine code, you can't just replaceRegWith() the same way you can
replaceAllUsesWith() in IR. Virtual registers may have different
register classes that need to be merged first.
llvm-svn: 142201
profile metadata at the same time. Use it to preserve metadata attached
to a branch when re-writing it in InstCombine.
Add metadata to the canonicalize_branch InstCombine test, and check that
it is tranformed correctly.
Reviewed by Nick Lewycky!
llvm-svn: 142168
The decision was to pack the bits. Currently no codegen supports this.
Currently, all of the bits in the vector are saved into the same address
in memory.
llvm-svn: 142149