give it a bit more responsibility. Also implement it for MachO.
If hacked to use cfi, 32 bit MachO will produce
.cfi_personality 155, L___gxx_personality_v0$non_lazy_ptr
and 64 bit will produce
.cfi_presonality ___gxx_personality_v0
The general idea is that .cfi_personality gets passed the final symbol. It is
up to codegen to produce it if using indirect representation (like 32 bit
MachO), but it is up to MC to decide which relocations to create.
llvm-svn: 130341
Modified LinearFunctionTestReplace to push the condition on the dead
list instead of eagerly deleting it. This can cause unnecessary
IV rewrites, which should have no effect on codegen and will not be an
issue once we stop generating canonical IVs.
llvm-svn: 130340
successors) and use inverse depth first search to traverse the BBs. However
that doesn't work when the CFG has infinite loops. Simply do a linear
traversal of all BBs work just fine.
rdar://9344645
llvm-svn: 130324
only check arguments with pointer types. Update the documentation
of IntrReadArgMem reflect this.
While here, add support for TBAA tags on intrinsic calls.
llvm-svn: 130317
effective in avoiding recomputation of LCSSA form; the widespread
use of instsimplify (which looks through phi nodes) means it was
not preserving LCSSA form anyway; and instcombine is no longer
scheduled in the middle of the loop passes so this doesn't matter
anymore.
llvm-svn: 130301
Added a type check in ScalarEvolution::computeSCEVAtScope to handle the case in which operands of an
AddRecExpr in the current scope are folded.
llvm-svn: 130271
an earlier load could be widened to encompass a later load. For example,
if we see:
X = load i8* P, align 4
Y = load i8* (P+3), align 1
and we have a 32-bit native integer type, we can widen the former load
to i32 which then makes the second load redundant. GVN can't actually
do anything with this load/load relation yet, so this isn't testable, but
it is the next step to resolving PR6627, and a fairly general class of
"merge neighboring loads" missed optimizations.
llvm-svn: 130250
more callee-saved registers and introduce copies. Only allows it if scheduling
a node above calls would end up lessen register pressure.
Call operands also has added ABI restrictions for register allocation, so be
extra careful with hoisting them above calls.
rdar://9329627
llvm-svn: 130245
1. Only run the early (in the module pass pipe) instcombine/simplifycfg
if the "unit at a time" passes they are cleaning up after runs.
2. Move the "clean up after the unroller" pass to the very end of the
function-level pass pipeline. Loop unroll uses instsimplify now,
so it doesn't create a ton of trash. Moving instcombine later allows
it to clean up after opportunities are exposed by GVN, DSE, etc.
3. Introduce some phase ordering tests for things that are specifically
intended to be simplified by the full optimizer as a whole.
This resolves PR2338, and is progress towards PR6627, which will be
generating code that looks similar to test2.
llvm-svn: 130241
when X has multiple uses. This is useful for exposing secondary optimizations,
but the X86 backend isn't ready for this when X has a single use. For example,
this can disable load folding.
This is inching towards resolving PR6627.
llvm-svn: 130238
This has two effects: 1. We never inflate to a larger register class than what
the sub-target can handle. 2. Completely unconstrained virtual registers get the
largest possible register class.
llvm-svn: 130229
The hook will be used by the register allocator when recomputing register
classes after removing constraints.
Thumb1 code doesn't allow anything larger than tGPR, and x86 needs to ensure
that the spill size doesn't change.
llvm-svn: 130228
This worked untill now because stars are aligned (i.e. num of complex address elments are always 0 or 2+ and when it is 2+ at least two elements are access together)
llvm-svn: 130225
Add support for switch and indirectbr edges. This works by densely numbering
all blocks which have such terminators, and then separately numbering the
possible successors. The predecessors write down a number, the successor knows
its own number (as a ConstantInt) and sends that and the pointer to the number
the predecessor wrote down to the runtime, who looks up the counter in a
per-function table.
Coverage data should now be functional, but I haven't tested it on anything
other than my 2-file synthetic test program for coverage.
llvm-svn: 130186
return it as a clobber. This allows GVN to do smart things.
Enhance GVN to be smart about the case when a small load is clobbered
by a larger overlapping load. In this case, forward the value. This
allows us to compile stuff like this:
int test(void *P) {
int tmp = *(unsigned int*)P;
return tmp+*((unsigned char*)P+1);
}
into:
_test: ## @test
movl (%rdi), %ecx
movzbl %ch, %eax
addl %ecx, %eax
ret
which has one load. We already handled the case where the smaller
load was from a must-aliased base pointer.
llvm-svn: 130180
Fixes Thumb2 ADCS and SBCS lowering: <rdar://problem/9275821>.
t2ADCS/t2SBCS are now pseudo instructions, consistent with ARM, so the
assembly printer correctly prints the 's' suffix.
Fixes Thumb2 adde -> SBC matching to check for live/dead carry flags.
Fixes the internal ARM machine opcode mnemonic for ADCS/SBCS.
Fixes ARM SBC lowering to check for live carry (potential bug).
llvm-svn: 130048
<h2>Section Example</h2>
<div> <!-- h2+div is applied -->
<p>Section preamble.</p>
<h3>Subsection Example</h3>
<p> <!-- h3+p is applied -->
Subsection body
</p>
<!-- End of section body -->
</div>
FIXME: Care H5 better.
llvm-svn: 130040
fix bugs exposed by the gcc dejagnu testsuite:
1. The load may actually be used by a dead instruction, which
would cause an assert.
2. The load may not be used by the current chain of instructions,
and we could move it past a side-effecting instruction. Change
how we process uses to define the problem away.
llvm-svn: 130018
On x86 this allows to fold a load into the cmp, greatly reducing register pressure.
movzbl (%rdi), %eax
cmpl $47, %eax
->
cmpb $47, (%rdi)
This shaves 8k off gcc.o on i386. I'll leave applying the patch in README.txt to Chris :)
llvm-svn: 130005
This tends to happen a lot with bitfield code generated by clang. A simple example for x86_64 is
uint64_t foo(uint64_t x) { return (x&1) << 42; }
which used to compile into bloated code:
shlq $42, %rdi ## encoding: [0x48,0xc1,0xe7,0x2a]
movabsq $4398046511104, %rax ## encoding: [0x48,0xb8,0x00,0x00,0x00,0x00,0x00,0x04,0x00,0x00]
andq %rdi, %rax ## encoding: [0x48,0x21,0xf8]
ret ## encoding: [0xc3]
with this patch we can fold the immediate into the and:
andq $1, %rdi ## encoding: [0x48,0x83,0xe7,0x01]
movq %rdi, %rax ## encoding: [0x48,0x89,0xf8]
shlq $42, %rax ## encoding: [0x48,0xc1,0xe0,0x2a]
ret ## encoding: [0xc3]
It's possible to save another byte by using 'andl' instead of 'andq' but I currently see no way of doing
that without making this code even more complicated. See the TODOs in the code.
llvm-svn: 129990
add <rd>, sp, #<imm8>
ldr <rd>, [sp, #<imm8>]
When the offset from sp is multiple of 4 and in range of 0-1020.
This saves code size by utilizing 16-bit instructions.
rdar://9321541
llvm-svn: 129971
An exception is thrown via a call to _cxa_throw, which we don't expect to
return. Therefore, the "true" part of the invoke goes to a BB that has
'unreachable' as its only instruction. This is lowered into an empty MachineBB.
The landing pad for this invoke, however, is directly after the "true" MBB.
When the empty MBB is removed, the landing pad is directly below the BB with the
invoke call. The unconditional branch is removed and then the two blocks are
merged together.
The testcase is too big for a regression test.
<rdar://problem/9305728>
llvm-svn: 129965
These intervals are allocatable immediately after splitting, but they may be
evicted because of later splitting. This is rare, but when it happens they
should be split again.
The remainder intervals that cannot be allocated after splitting still move
directly to spilling.
SplitEditor::finish can optionally provide a mapping from new live intervals
back to the original interval indexes returned by openIntv().
Each original interval index can map to multiple new intervals after connected
components have been separated. Dead code elimination may also add existing
intervals to the list.
The reverse mapping allows the SplitEditor client to treat the new intervals
differently depending on the split region they came from.
llvm-svn: 129925
This patch depends on the prior fix r129908 that changes to use std::find,
rather than std::binary_search, on unordered array.
Patch by Dan Bailey
llvm-svn: 129909
necessary since gcov counts transitions between blocks. It can't see if you've
run every line in a straight-line function, so we add an edge for it to notice.
llvm-svn: 129905
<h2>Section Example</h2>
<div> <!-- h2+div is applied -->
<p>Section preamble.</p>
<h3>Subsection Example</h3>
<p> <!-- h3+p is applied -->
Subsection body
</p>
<!-- End of section body -->
</div>
llvm-svn: 129901
TII::isTriviallyReMaterializable() shouldn't depend on any properties of the
register being defined by the instruction. Rematerialization is going to create
a new virtual register anyway.
llvm-svn: 129882
On the x86-64 and thumb2 targets, some registers are more expensive to encode
than others in the same register class.
Add a CostPerUse field to the TableGen register description, and make it
available from TRI->getCostPerUse. This represents the cost of a REX prefix or a
32-bit instruction encoding required by choosing a high register.
Teach the greedy register allocator to prefer cheap registers for busy live
ranges (as indicated by spill weight).
llvm-svn: 129864
used by Clang. To help Clang integration, the PTX target has been split
into two targets: ptx32 and ptx64, depending on the desired pointer size.
- Add GCCBuiltin class to all intrinsics
- Split PTX target into ptx32 and ptx64
llvm-svn: 129851
llvm is built with unsigned chars where an immediate such as 0xff would be zero
extended to 64-bits, turning "cmp $0xff,%eax" into
"cmp $0xffffffffffffffff,%eax".
llvm-svn: 129845
manually and pass all (now) 4 arguments to the mul libcall. Add a new
ExpandLibCall for just this (copied gratuitously from type legalization).
Fixes rdar://9292577
llvm-svn: 129842
- As before, there is a minor semantic change here (evidenced by the test
change) for Darwin triples that have no version component. I debated changing
the default behavior of isOSVersionLT, but decided it made more sense for
triples to be explicit.
llvm-svn: 129805
- There is a minor semantic change here (evidenced by the test change) for
Darwin triples that have no version component. I debated changing the default
behavior of isOSVersionLT, but decided it made more sense for triples to be
explicit.
llvm-svn: 129802
Making use of VFP / NEON floating point multiply-accumulate / subtraction is
difficult on current ARM implementations for a few reasons.
1. Even though a single vmla has latency that is one cycle shorter than a pair
of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause
additional pipeline stall. So it's frequently better to single codegen
vmul + vadd.
2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to
stall for 4 cycles. We need to schedule them apart.
3. A vmla followed vmla is a special case. Obvious issuing back to back RAW
vmla + vmla is very bad. But this isn't ideal either:
vmul
vadd
vmla
Instead, we want to expand the second vmla:
vmla
vmul
vadd
Even with the 4 cycle vmul stall, the second sequence is still 2 cycles
faster.
Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough
but it isn't the optimial solution. This patch attempts to make it possible to
use vmla / vmls in cases where it is profitable.
A. Add missing isel predicates which cause vmla to be codegen'ed.
B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to
compute a fmul and a fmla.
C. Add additional isel checks for vmla, avoid cases where vmla is feeding into
fp instructions (except for the #3 exceptional case).
D. Add ARM hazard recognizer to model the vmla / vmls hazards.
E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the
vmla / vmls will trigger one of the special hazards.
Enable these fp vmlx codegen changes for Cortex-A9.
llvm-svn: 129775
Add a avoidWriteAfterWrite() target hook to identify register classes that
suffer from write-after-write hazards. For those register classes, try to avoid
writing the same register in two consecutive instructions.
This is currently disabled by default. We should not spill to avoid hazards!
The command line flag -avoid-waw-hazard can be used to enable waw avoidance.
llvm-svn: 129772