allowing the memcpy to be eliminated.
Unfortunately, the requirements on byval's without explicit
alignment are really weak and impossible to predict in the
mid-level optimizer, so this doesn't kick in much with current
frontends. The fix is to change clang to set alignment on all
byval arguments.
llvm-svn: 119916
so don't claim they are. They are allocated using DAG.getNode, so attempts
to access MemSDNode fields results in reading off the end of the allocated
memory. This fixes crashes with "llc -debug" due to debug code trying to
print MemSDNode fields for these barrier nodes (since the crashes are not
deterministic, use valgrind to see this). Add some nasty checking to try
to catch this kind of thing in the future.
llvm-svn: 119901
It is now possible to navigate the B+-tree using NodeRef::subtree() and
NodeRef::size() without knowing the key and value template types used in the
tree.
llvm-svn: 119880
Key and value objects may not be destructed instantly when they are erased from
the container, but they will be destructed eventually by the IntervalMap
destructor.
llvm-svn: 119873
llvm/include/llvm/ADT/IntervalMap.h:334: error: '((llvm::IntervalMapImpl::DesiredNodeBytes / static_cast<unsigned int>(((2 * sizeof (KeyT)) + sizeof (ValT)))) >? 3u)' is not a valid template argument for type 'unsigned int' because it is a non-constant expression
llvm-svn: 119790
This is a sorted interval map data structure for small keys and values with
automatic coalescing and bidirectional iteration over coalesced intervals.
Except for coalescing intervals, it provides similar functionality to std::map.
It is however much more compact for small keys and values, and hopefully faster
too.
The container object itself can hold the first few intervals without any
allocations, then it switches to a cache conscious B+-tree representation. A
recycling allocator can be shared between many containers, even between
containers holding different types.
The IntervalMap is initially intended to be used with SlotIndex intervals for:
- Backing store for LiveIntervalUnion that is smaller and faster than std::set.
- Backing store for LiveInterval with less overhead than std::vector for typical
intervals and O(N log N) merging of large intervals. 99% of virtual registers
need 4 entries or less and would benefit from the small object optimization.
- Backing store for LiveDebugVariable which doesn't exist yet, but will track
debug variables during register allocation.
This is a work in progress. Missing items are:
- Performance metrics.
- erase().
- insert() shrinkage.
- clear().
- More performance metrics.
- Simplification and detemplatization.
llvm-svn: 119787
MCStreamer instead of just MCObjectStreamer. Address changes cannot
be as efficient as we have to use DW_LNE_set_addres, but at least
most of the logic is shared.
This will be used so that, with CodeGen still using EmitDwarfLocDirective,
llvm-gcc is able to produce debug_line sections without needing an
assembler that supports .loc.
llvm-svn: 119777
lr" instruction cannot be tested just yet. It requires matching a "condition
code", but adding one of those makes things go south quickly...
llvm-svn: 119774
This is a sorted interval map data structure for small keys and values with
automatic coalescing and bidirectional iteration over coalesced intervals.
Except for coalescing intervals, it provides similar functionality to std::map.
It is however much more compact for small keys and values, and hopefully faster
too.
The container object itself can hold the first few intervals without any
allocations, then it switches to a cache conscious B+-tree representation. A
recycling allocator can be shared between many containers, even between
containers holding different types.
The IntervalMap is initially intended to be used with SlotIndex intervals for:
- Backing store for LiveIntervalUnion that is smaller and faster than std::set.
- Backing store for LiveInterval with less overhead than std::vector for typical
intervals and O(N log N) merging of large intervals. 99% of virtual registers
need 4 entries or less and would benefit from the small object optimization.
- Backing store for LiveDebugVariable which doesn't exist yet, but will track
debug variables during register allocation.
This is a work in progress. Missing items are:
- Performance metrics.
- erase().
- insert() shrinkage.
- clear().
- More performance metrics.
- Simplification and detemplatization.
llvm-svn: 119772
This makes it symmetric with the 'u' modifier that forces an unsigned type.
This is needed for unsigned vector shifts, where the shift amount still needs
to be signed. PR8482 (Radar 8603521).
llvm-svn: 119742
This function was being called from two different places for completely
unrelated reasons. During type legalization, it was called to expand 64-bit
shift operations. During operation legalization, it was called to handle
Neon vector shifts. The vector shift code was not written to check for
illegal types, since it was assumed to be only called after type legalization.
Fixed this by splitting off the 64-bit shift expansion into a separate
function. I don't have a particular testcase for this; I just noticed it
by inspection.
llvm-svn: 119738
if the extension types were not the same. The result was that if you
fed a select with sext and zext loads, as in the testcase, then it
would get turned into a zext (or sext) of the select, which is wrong
in the cases when it should have been an sext (resp. zext). Reported
and diagnosed by Sebastien Deldon.
llvm-svn: 119728
preserves LCSSA form out of ScalarEvolution and into the LoopInfo
class. Use it to check that SimplifyInstruction simplifications
are not breaking LCSSA form. Fixes PR8622.
llvm-svn: 119727
Remove movePastCSLoadStoreOps and associated code for simple pointer
increments. Update routines that depended upon other opcodes for save/restore.
Adjust all testcases accordingly.
llvm-svn: 119725
this was a tree of hashtables, and a query recursed into the table for the immediate dominator ad infinitum
if the initial lookup failed. This led to really bad performance on tall, narrow CFGs.
We can instead replace it with what is conceptually a multimap of value numbers to leaders (actually
represented by a hashtable with a list of Value*'s as the value type), and then
determine which leader from that set to use very cheaply thanks to the DFS numberings maintained by
DominatorTree. Because there are typically few duplicates of a given value, this scan tends to be
quite fast. Additionally, we use a custom linked list and BumpPtr allocation to avoid any unnecessary
allocation in representing the value-side of the multimap.
This change brings with it a 15% (!) improvement in the total running time of GVN on 403.gcc, which I
think is pretty good considering that includes all the "real work" being done by MemDep as well.
The one downside to this approach is that we can no longer use GVN to perform simple conditional progation,
but that seems like an acceptable loss since we now have LVI and CorrelatedValuePropagation to pick up
the slack. If you see conditional propagation that's not happening, please file bugs against LVI or CVP.
llvm-svn: 119714
refusing to optimize two memcpy's like this:
copy A <- B
copy C <- A
if it couldn't prove that noalias(B,C). We can eliminate
the copy by producing a memmove instead of memcpy.
llvm-svn: 119694
if it is passed as a byval argument. The byval argument will just be a
read, so it is safe to read from the original global instead. This allows
us to promote away the %agg.tmp alloca in PR8582
llvm-svn: 119686
sahf movl 344(%rdi),%r14d
we used to produce:
t.s:2:1: error: unexpected token in argument list
^
we now produce:
t.s:1:11: error: unexpected token in argument list
sahf movl 344(%rdi),%r14d
^
rdar://8581401
llvm-svn: 119676
The attached patch fixes IRBuilder and the NoFolder class so that when
NoFolder is used the instructions it generates are treated just like
the ones IRBuilder creates directly (insert into block, assign them a
name and debug info, as applicable).
It does this by
1) having NoFolder return Instruction*s instead of Value*s,
2) having IRBuilder call Insert(Value, Name) on values obtained from
the folder like it does on instructions it creates directly, and
3) adding an Insert(Constant*, const Twine& = "") overload which just
returns the constant so that the other folders shouldn't have any
extra overhead as long as inlining is enabled.
While I was there, I also added some missing (CreateFNeg and various
Create*Cast) methods to NoFolder.
llvm-svn: 119614
and testing is easier. A good example is the unknown-location.ll test that
now can just look for ".loc 1 0 0". We also don't use a DW_LNE_set_address for
every address change anymore.
llvm-svn: 119613
It is generally not sufficient to check if the starting offset is in range
of the maximum offset that can be efficiently used for the target.
llvm-svn: 119565
This makes it more clear that the symbol is an internal, compiler-generated
name and gives a little more description about its contents.
llvm-svn: 119564
It was mistakenly looking at the pointer type when checking for the size of
global variables. This is a partial fix for Radar 8673120.
llvm-svn: 119563
needs to be checked that this won't break LCSSA form.
Change the existing checking method to a more direct one:
rather than seeing if all predecessors belong to the loop,
check that the replacing value is either not in any loop or
is in a loop that contains the phi node.
llvm-svn: 119556
and xor. The 32-bit move immediates can be hoisted out of loops by machine
LICM but the isel hacks were preventing them.
Instead, let peephole optimization pass recognize registers that are defined by
immediates and the ARM target hook will fold the immediates in.
Other changes include 1) do not fold and / xor into cmp to isel TST / TEQ
instructions if there are multiple uses. This happens when the 'and' is live
out, machine sink would have sinked the computation and that ends up pessimizing
code. The peephole pass would recognize situations where the 'and' can be
toggled to define CPSR and eliminate the comparison anyway.
2) Move peephole pass to after machine LICM, sink, and CSE to avoid blocking
important optimizations.
rdar://8663787, rdar://8241368
llvm-svn: 119548
instructions out of InstCombine and into InstructionSimplify. While
there, introduce an m_AllOnes pattern to simplify matching with integers
and vectors with all bits equal to one.
llvm-svn: 119536
hasConstantValue. I was leery of using SimplifyInstruction
while the IR was still in a half-baked state, which is the
reason for delaying the simplification until the IR is fully
cooked.
llvm-svn: 119494
simplified to itself (this can only happen in unreachable blocks).
Change it to return null instead. Hopefully this will fix some
buildbot failures.
llvm-svn: 119490
SrcMgrDiagHandler, we can improve clang diagnostics for inline asm:
instead of reporting them on a source line of the original line,
we can report it on the correct line wherever the string literal came
from. For something like this:
void foo() {
asm("push %rax\n"
".code32\n");
}
we used to get this: (note that the line in t.c isn't helpful)
t.c:4:7: error: warning: ignoring directive for now
asm("push %rax\n"
^
<inline asm>:2:1: note: instantiated into assembly here
.code32
^
now we get:
t.c:5:8: error: warning: ignoring directive for now
".code32\n"
^
<inline asm>:2:1: note: instantiated into assembly here
.code32
^
Note that we're pointing to line 5 properly now.
llvm-svn: 119488
cookie argument to the SourceMgr diagnostic stuff. This cleanly separates
LLVMContext's inlineasm handler from the sourcemgr error handling
definition, increasing type safety and cleaning things up.
llvm-svn: 119486
instructions have to distinguish between lists of single- and double-precision
registers in order for the ASM matcher to do a proper job. In all other
respects, a list of single- or double-precision registers are the same as a list
of GPR registers.
llvm-svn: 119460
class, uses DominatorTree which is an analysis. This change moves all of
the tricky hasConstantValue logic to SimplifyInstruction, and replaces it
with a very simple literal implementation. I already taught users of
hasConstantValue that need tricky stuff to use SimplifyInstruction instead.
I didn't update InlineFunction because the IR looks like it might be in a
funky state at the point it calls hasConstantValue, which makes calling
SimplifyInstruction dangerous since it can in theory do a lot of tricky
reasoning. This may be a pessimization, for example in the case where
all phi node operands are either undef or a fixed constant.
llvm-svn: 119459
systematically, CollapsePhi will always return null here. Note
that CollapsePhi did an extra check, isSafeReplacement, which
the SimplifyInstruction logic does not do. I think that check
was bogus - I guess we will soon find out! (It was originally
added in commit 41998 without a testcase).
llvm-svn: 119456
Always spill the full representative register at any point where any subregister
is live.
This fixes PR8620 which caused the old logic to get confused and not spill
anything at all.
The fundamental problem here is that the coalescer is too aggressive about
physical register coalescing. It sometimes makes it impossible to allocate
registers without these emergency spills.
llvm-svn: 119375
Stop defining types with "__neon_" prefixes and then using typedefs without
the prefix; there's no reason to do that anymore. Remove types that combine
multiple Neon vectors and treat them as a single long vector; they are not
used.
llvm-svn: 119369
The system API's will be shifted over to returning an error_code, and returning
other return values as out parameters to the function.
Code that needs to check error conditions will use the errc enum values which
are the same as the posix_errno defines (EBADF, E2BIG, etc...), and are
compatable with the error codes in WinError.h due to some magic in system_error.
An example would be:
if (error_code ec = KillEvil("Java")) { // error_code can be converted to bool.
handle_error(ec);
}
llvm-svn: 119360
over a phi node by applying it to each operand may be wrong if the
operation and the phi node are mutually interdependent (the testcase
has a simple example of this). So only do this transform if it would
be correct to perform the operation in each predecessor of the block
containing the phi, i.e. if the other operands all dominate the phi.
This should fix the FFMPEG snow.c regression reported by İsmail Dönmez.
llvm-svn: 119347