BuildSchedGraph was quadratic in the number of calls in the basic
block. After this fix, it keeps only a single call at the top of the
DefList so compile time doesn't blow up on large blocks. This reduces
postRA sched time on an external test case from 81s to 0.3s. Although
r130800 (reduced ARM register alias defs) also partially fixes the
issue by reducing the constant overhead of checking call interference
by an order of magnitude.
Fixes <rdar://problem/7662664> very poor compile time with post RA scheduling.
llvm-svn: 130943
who used this flag, and it now emits CFI and doesn't emit this anymore. All
other targets left this flag "false".
<rdar://problem/8486371>
llvm-svn: 130918
Joining physregs is inherently dangerous because it uses a heuristic to avoid
creating invalid code. Linear scan had an emergency spilling mechanism to deal
with those rare cases. The new greedy allocator does not.
The greedy register allocator is much better at taking hints, so this has almost
no impact on code size and quality. The few cases where it matters show up as
unit tests that now have -join-physregs enabled explicitly.
llvm-svn: 130896
landing pad as its successor.
SjLj exception handling jumps to the correct landing pad via a switch statement
that's generated right before code-gen. Loosen the constraint in the machine
instruction verifier to allow for this. Note, this isn't the most rigorous check
since we cannot determine where that switch statement came from. But it's
marginally better than turning this check off when SjLj exceptions are used.
<rdar://problem/9187612>
llvm-svn: 130881
Original message:
Teach MachineCSE how to do simple cross-block CSE involving physregs. This allows, for example, eliminating duplicate cmpl's on x86. Part of rdar://problem/8259436 .
llvm-svn: 130877
it is both inefficient and unexpected by dwarfdump. Change to
a DW_FORM_data4.
While in here, change the predicate name to reflect that the position
is not really absolute (it is an offset), just that the linker needs a
relocation.
llvm-svn: 130846
but according to my super-optimizer there are only two missed simplifications
of -instsimplify kind when compiling bzip2, and this is one of them. It amuses
me to have bzip2 be perfectly optimized as far as instsimplify goes!
llvm-svn: 130840
This adds functionality to remove size/zero extension during indvars
without generating a canonical IV and rewriting all IV users. It's
disabled by default so should have no effect on codegen. Work in progress.
llvm-svn: 130829
LiveVariables doesn't understand that clobbering D0 and D1 completely overwrites
Q0, so if Q0 is live-in to a function, its live range will extend beyond a
function call that only clobbers D0 and D1. This shows up in the
ARM/2009-11-01-NeonMoves test case.
LiveVariables should probably implement the much stricter rules for physreg
liveness that RAFast imposes - a physreg is killed by the first use of any
alias.
llvm-svn: 130801
Only create a canonical IV for backedge taken count if it will
actually be used by LinearFunctionTestReplace. And some related
cleanup, preparing to reduce dependence on canonical IVs.
No significant effect on x86 or arm in the test-suite.
llvm-svn: 130799
Register coalescing can sometimes create live ranges that end in the middle of a
basic block without any killing instruction. When SplitKit detects this, it will
repair the live range by shrinking it to its uses.
Live range splitting also needs to know about this. When the range shrinks so
much that it becomes allocatable, live range splitting fails because it can't
find a good split point. It is paranoid about making progress, so an allocatable
range is considered an error.
The coalescer should really not be creating these bad live ranges. They appear
when coalescing dead copies.
llvm-svn: 130787
max(a,b) >= a -> true. According to my super-optimizer, these are
by far the most common simplifications (of the -instsimplify kind)
that occur in the testsuite and aren't caught by -std-compile-opts.
llvm-svn: 130780
model constants which can be added to base registers via add-immediate
instructions which don't require an additional register to materialize
the immediate.
llvm-svn: 130743
Def operands may also have an <undef> flag, but that just means that a
sub-register redef doesn't actually read the super-register. For physical
registers, it has no meaning.
llvm-svn: 130714
This works around a limitation in gdb which is reported by following inherit.exp test failures from gdb testsuite.
gdb.cp/inherit.exp: print g_vB.vB::vb
gdb.cp/inherit.exp: print g_vB.vB::vx
gdb.cp/inherit.exp: print g_vC.vC::vc
gdb.cp/inherit.exp: print g_vC.vC::vx
gdb.cp/inherit.exp: print g_vD.vB::vb
...
llvm-svn: 130702
When an interfering live range ends at a dead slot index between two
instructions, make sure that the inserted copy instruction gets a slot index
after the dead ones. This makes it possible to avoid the interference.
Ideally, there shouldn't be interference ending at a deleted instruction, but
physical register coalescing can sometimes do that to sub-registers.
This fixes PR9823.
llvm-svn: 130687
comments claimed it did this, but the LHS value was actually an unused variable.
The new system considers only the '-foo' part when comparing it for typos
against flags that have values, but still look at the whole string for flags
that don't. That way, we'll still correct '-inst=combine' to '-instcombine'.
llvm-svn: 130685
for all symbol differences and can drop the old EmitPCRelSymbolValue
method.
This also make getExprForFDESymbol on ELF equal to the one on MachO, and it
can be made non-virtual.
llvm-svn: 130634
after folding ADD32ri to ADD32mi, so don't do that.
This only happens when the greedy register allocator gets itself in trouble and
spills %vreg9 here:
16L %vreg9<def> = MOVPC32r 0, %ESP<imp-use>; GR32:%vreg9
48L %vreg9<def> = ADD32ri %vreg9, <es:_GLOBAL_OFFSET_TABLE_>[TF=1], %EFLAGS<imp-def,dead>; GR32:%vreg9
That should never happen, the live range should be split instead.
llvm-svn: 130625
Currently the output should be almost identical to the one produced by CodeGen
to make the transition easier.
The only two differences I know of are:
* Some files get an extra advance loc of size 0. This will be fixed when
relaxations are enabled.
* The optimization of declaring an EH symbol as an external variable is not
implemented. This is a subset of adding the nounwind attribute, so we if really
this at -O0 we should probably do it at the IL level.
llvm-svn: 130623
This obviously helps a lot if the division would be turned into a libcall
(think i64 udiv on i386), but div is also one of the few remaining instructions
on modern CPUs that become more expensive when the bitwidth gets bigger.
This also helps register pressure on i386 when dividing chars, divb needs
two 8-bit parts of a 16 bit register as input where divl uses two registers.
int foo(unsigned char a) { return a/10; }
int bar(unsigned char a, unsigned char b) { return a/b; }
compiles into (x86_64)
_foo:
imull $205, %edi, %eax
shrl $11, %eax
ret
_bar:
movzbl %dil, %eax
divb %sil, %al
movzbl %al, %eax
ret
llvm-svn: 130615
This shouldn't happen in practice because the icmp would be a constant.
Add a check so we don't miscompile code if something goes wrong.
llvm-svn: 130446
between two reads (threading).
Fix an off-by-one in the indirect counter table that I meant to revert after an
earlier experiment. Whoops!
Implement GCOV_PREFIX. Doesn't handle GCOV_PREFIX_STRIP yet.
Fix an off-by-one in string emission. Extra whoops!
Tolerate DISubprograms that have null Function*'s attached to them. I don't yet
understand what this means, but it happens when you have a global static with
a non-trivial constructor/destructor.
Fix a crash on switch statements with a single successor (default-only).
llvm-svn: 130443