Next step in the ongoing saga of NEON load/store assmebly parsing. Handle
VLD1 instructions that take a two-register register list.
Adjust the instruction definitions to only have the single encoded register
as an operand. The super-register from the pseudo is kept as an implicit def,
so passes which come after pseudo-expansion still know that the instruction
defines the other subregs.
llvm-svn: 142670
ZExtPromotedInteger and SExtPromotedInteger based on the operation we legalize.
SetCC return type needs to be legalized via PromoteTargetBoolean.
llvm-svn: 142660
addDecl() and makeDeclVisibleInContextInternal()
functions, and made the latter private since it
does not and should not have external clients.
llvm-svn: 142655
be sure to consider all of the possible lookup results. We were
assert()'ing (but behaving correctly) for unresolved values. Fixes
PR11134 / <rdar://problem/10290422>.
llvm-svn: 142652
it's a bit more plausible to use this instead of CodePlacementOpt. The
code for this was shamelessly stolen from CodePlacementOpt, and then
trimmed down a bit. There doesn't seem to be much utility in returning
true/false from this pass as we may or may not have rewritten all of the
blocks. Also, the statistic of counting how many loops were aligned
doesn't seem terribly important so I removed it. If folks would like it
to be included, I'm happy to add it back.
This was probably the most egregious of the missing features, and now
I'm going to start gathering some performance numbers and looking at
specific loop structures that have different layout between the two.
Test is updated to include both basic loop alignment and nested loop
alignment.
llvm-svn: 142645
canonical example I used when developing it, and is one of the primary
motivating real-world use cases for __builtin_expect (when burried under
a macro).
I'm working on more test cases here, but I'm trying to make sure both
that the pass is doing the right thing with the test cases and that they
aren't too brittle to changes elsewhere in the code generation pipeline.
Feedback and/or suggestions on how to test this are very welcome.
Especially feedback on whether testing the block comments is a good
strategy; I couldn't find any good examples to steal from but all the
other ideas I had were a lot uglier or more fragile.
llvm-svn: 142644
block frequency analyses. This differs substantially from the existing
block-placement pass in LLVM:
1) It operates on the Machine-IR in the CodeGen layer. This exposes much
more (and more precise) information and opportunities. Also, the
results are more stable due to fewer transforms ocurring after the
pass runs.
2) It uses the generalized probability and frequency analyses. These can
model static heuristics, code annotation derived heuristics as well
as eventual profile loading. By basing the optimization on the
analysis interface it can work from any (or a combination) of these
inputs.
3) It uses a more aggressive algorithm, both building chains from tho
bottom up to maximize benefit, and using an SCC-based walk to layout
chains of blocks in a profitable ordering without O(N^2) iterations
which the old pass involves.
The pass is currently gated behind a flag, and not enabled by default
because it still needs to grow some important features. Most notably, it
needs to support loop aligning and careful layout of loop structures
much as done by hand currently in CodePlacementOpt. Once it supports
these, and has sufficient testing and quality tuning, it should replace
both of these passes.
Thanks to Nick Lewycky and Richard Smith for help authoring & debugging
this, and to Jakob, Andy, Eric, Jim, and probably a few others I'm
forgetting for reviewing and answering all my questions. Writing
a backend pass is *sooo* much better now than it used to be. =D
llvm-svn: 142641
of arbitrary pointers, allowing direct dereferences
of literal addresses. Also disabled special-cased
generation of certain expression results (especially
casts), substituting the IR interpreter.
llvm-svn: 142638
addDeclInternal(). This function suppresses any
calls to FindExternalVisibleDeclsByName() while
a Decl is added to a DeclContext. This behavior
is required for the ASTImporter, because in the
case of the LLDB client the ASTImporter would be
called recursively to import the visible decls,
which leads to assertions because the recursive
call is seeing partially-formed types.
I also modified the ASTImporter to use
addDeclInternal() in all places where it would
otherwise use addDecl(). This fix should not
affect the rest of Clang, passes Clang's
testsuite, and fixes several serious LLDB bugs.
llvm-svn: 142634
o create a fresh target; and
o set the first breakpoint
Example (using lldb to set a breakpoint on lldb's Driver::MainLoop function):
./dotest.py -v +b -x '-F Driver::MainLoop()' -p TestStartupDelays.py
...
1: test_startup_delay (TestStartupDelays.StartupDelaysBench)
Test start up delays creating a target and setting a breakpoint. ...
lldb startup delays benchmark:
create fresh target: Avg: 0.106732 (Laps: 15, Total Elapsed Time: 1.600985)
set first breakpoint: Avg: 0.102589 (Laps: 15, Total Elapsed Time: 1.538832)
ok
llvm-svn: 142628
tables (like the .apple_namespaces) and it would cause us to index DWARF that
didn't need to be indexed.
Updated the MappedHash.h (generic Apple accelerator table) and the DWARF
specific one (HashedNameToDIE.h) to be up to date with the latest and
greatest hash table format.
llvm-svn: 142627