BranchProbabilityInfo provides an interface for IR passes to query the
likelihood that control follows a CFG edge. This patch provides an
initial implementation of static branch predication that will populate
BranchProbabilityInfo for branches with no external profile
information using very simple heuristics. It currently isn't hooked up
to any external profile data, so static prediction does all the work.
llvm-svn: 132613
queries in the case of a DAG, where a query reaches a node
visited earlier, but it's not on a cycle. This avoids
MayAlias results in cases where BasicAA is expected to
return MustAlias or PartialAlias in order to protect TBAA.
llvm-svn: 132609
of reserved registers.
Use RegisterClassInfo in RABasic as well. This slightly changes som
allocation orders because RegisterClassInfo puts CSR aliases last.
llvm-svn: 132581
- Check for MTCTR8 in addition to MTCTR when looking up a hazard.
- When lowering an indirect call use CTR8 when targeting 64bit.
- Introduce BCTR8 that uses CTR8 and use it on 64bit when expanding ISD::BRIND.
The last change fixes PR8487. With those changes, we are able to compile a
running "ls" and "sh" on FreeBSD/PowerPC64.
llvm-svn: 132552
which edge to split by pred/succ pair, which means that we can end up splitting
the wrong edge (by case value) in the switch statement entirely. Fixes PR10031!
llvm-svn: 132535
Added asserts whenever attempting to use a potentially
uninitialized pass. This helps people trying to develop a new pass and
people trying to understand the bug reports filed by the former people.
llvm-svn: 132520
When compiling a program with lots of small functions like
483.xalancbmk, this makes RAFast 11% faster.
Add some comments to clarify the difference between unallocatable and
reserved registers. It's quite subtle.
The fast register allocator depends on EFLAGS' not being allocatable on
x86. That way it can completely avoid tracking liveness, and it won't
mind when there are multiple uses of a single def.
llvm-svn: 132514
Some register classes are only used for instruction operand constraints.
They should never be used for virtual registers. Previously, those
register classes were given an empty allocation order, but now you can
say 'let isAllocatable=0' in the register class definition.
TableGen calculates if a register is part of any allocatable register
class, and makes that information available in TargetRegisterDesc::inAllocatableClass.
The goal here is to eliminate use cases for overriding allocation_order_*
methods.
llvm-svn: 132508
I was confused whether new uint8_t[] would zero-initialize the returned
array, and it seems that so is gcc-4.0.
This should fix the test failures on darwin 9.
llvm-svn: 132500
Instead, use simpler approach and let DBG_VALUE follow its predecessor instruction. After live debug value analysis pass, all DBG_VALUE instruction are placed at the right place. Thanks Jakob for the hint!
llvm-svn: 132483
Parsing a register name/number for .cfi directives can't assume that a
register name starts with a '%' token. Be more flexible and check for a
register number instead. Still unlikely to be perfect, but it allows us
to parse both plain identifiers as register names and integers as register
numbers, which is what we're wanting to support at this point.
llvm-svn: 132466
register classes.
It provides information for each register class that cannot be
determined statically, like:
- The number of allocatable registers in a class after filtering out the
reserved and invalid registers.
- The preferred allocation order with registers that overlap callee-saved
registers last.
- The last callee-saved register that overlaps a given physical register.
This information usually doesn't change between functions, so it is
reused for compiling multiple functions when possible. The many
possible combinations of reserved and callee saves registers makes it
unfeasible to compute this information statically in TableGen.
Use RegisterClassInfo to count available registers in various heuristics
in SimpleRegisterCoalescing, making the pass run 4% faster.
llvm-svn: 132450
In the given testcase, the "Clobber" was pointing to a load, and GVN was incorrectly assuming that meant that the "Clobber" load overlapped the load being analyzed (when they are actually unrelated).
The included testcase tests both this commit and r132434.
Part two of rdar://9429882. (r132434 was mislabeled.)
llvm-svn: 132442
floating-point comparison, generate a mask of 0s or 1s, and generally
DTRT with NaNs. Only profitable when the user wants a materialized 0
or 1 at runtime. rdar://problem/5993888
llvm-svn: 132404
Add TargetRegisterInfo::hasSubClassEq and use it to check for compatible
register classes instead of trying to list all register classes in
X86's getLoadStoreRegOpcode.
llvm-svn: 132398
patch we add a flag to enable a new type legalization decision - to promote
integer elements in vectors. Currently, the rest of the codegen does not support
this kind of legalization. This flag will be removed when the transition is
complete.
llvm-svn: 132394
For targets with no itinerary (x86) it is a nop by default. For
targets with issue width already expressed in the itinerary (ARM) it
bypasses a scoreboard check but otherwise does not affect the
schedule. It does make the code more consistent and complete and
allows new targets to specify their issue width in an arbitrary way.
llvm-svn: 132385
turns out that it could cause an infinite loop in some situations. If this code
is triggered and it converts a cleanup into a catchall, but that cleanup was in
already in a cleanup, then the _Unwind_SjLj_Resume could infinite loop. I.e.,
the code doesn't consume the exception object and passes it on to
_Unwind_SjLj_Resume. But _USjLjR expects it to be consumed (since it's landing
at a catchall instead of a cleanup). So it uses the values that are presently
there, which are the values that tell it to jump to the fake landing pad.
<rdar://problem/9508402>
llvm-svn: 132381
When assigned ranges are evicted, they are put in the RS_Evicted stage and are
not allowed to evict anything else. That prevents looping automatically.
When evicting ranges just to get a cheaper register, use only spill weights to
find the possible candidates. Avoid breaking hints for this purpose, it is not
worth it.
Start implementing more complex eviction heuristics, guarded by the temporary
-complex-eviction flag. The initial version permits a heavier range to be
evicted if it doesn't have any uses where the evicting range is live. This makes
it a good candidate for live ranfge splitting.
llvm-svn: 132358
nand), atomic.swap and atomic.cmp.swap, all in i8, i16 and i32 versions.
The intrinsics are implemented by creating pseudo-instructions, which are
then expanded in the method MipsTargetLowering::EmitInstrWithCustomInserter.
Patch by Sasa Stankovic.
llvm-svn: 132323
same dwarf number. This will be used for creating a dwarf number to register
mapping.
The only case that needs this so far is the XMM/YMM registers that unfortunately
do have the same numbers.
llvm-svn: 132314
This only affects targets like Mips where branch instructions may kill virtual
registers. Most other targets branch on flag values, so virtual registers are
not involved.
The problem is that MachineBasicBlock::updateTerminator deletes branches and
inserts new ones while LiveVariables keeps a list of pointers to instructions
that kill virtual registers. That list wasn't properly updated in
MBB::SplitCriticalEdge.
llvm-svn: 132298
This is important for the correct lowering of unwind instructions
(which doesn't matter at all) and llvm.eh.resume calls (which does).
Take 2, now with more basic competence.
llvm-svn: 132295
This is important for the correct lowering of unwind instructions
(which doesn't matter at all) and llvm.eh.resume calls (which does).
llvm-svn: 132291
variable. Noticed by inspection.
Simulate memset in EvaluateFunction where the target of the memset and the
value we're setting are both the null value. Fixes PR10047!
llvm-svn: 132288
handler.
At this moment, only GCC-style exceptions are supported. Other kinds
of exceptions, including "traditional" SEH and Microsoft Visual C++ exceptions,
need more work--and an compiler exception model that isn't specific to
GCC-style exceptions!
In particular, I imagine that it would be possible to mix "traditional" SEH
with GCC-style EH or Microsoft C++ EH. Currently LLVM has no way (beyond some
target-specific defaults and whole-module compiler switches) of knowing which
scheme to use when.
llvm-svn: 132283
This patch does not change the behavior of the type legalizer. The codegen
produces the same code.
This infrastructural change is needed in order to enable complex decisions
for vector types (needed by the vector-select patch).
llvm-svn: 132263
transformed by the inliner into a branch to the enclosing landing pad
(when inlined through an invoke). If not so optimized, it is lowered
DWARF EH preparation into a call to _Unwind_Resume (or _Unwind_SjLj_Resume
as appropriate). Its chief advantage is that it takes both the
exception value and the selector value as arguments, meaning that there
is zero effort in recovering these; however, the frontend is required
to pass these down, which is not actually particularly difficult.
Also document the behavior of landing pads a bit better, and make it
clearer that it's okay that personality functions don't always land at
landing pads. This is just a fact of life. Don't write optimizations that
rely on pushing things over an unwind edge.
llvm-svn: 132253
to load/store i64 values. Since there's no current support to explicitly
declare such restrictions, implement it by using specific hardcoded register
pairs during isel.
llvm-svn: 132248
Delete the Kill and Def markers in BlockInfo. They are no longer
necessary when BlockInfo describes a continuous live range.
This only affects the relatively rare kind of basic block where a live
range looks like this:
|---x o---|
Now live range splitting can pretend that it is looking at two blocks:
|---x
o---|
This allows the code to be simplified a bit.
llvm-svn: 132245
It is important that this function returns the same number of live blocks as
countLiveBlocks(CurLI) because live range splitting uses the number of live
blocks to ensure it is making progress.
This is in preparation of supporting duplicate UseBlock entries for basic blocks
that have a virtual register live-in and live-out, but not live-though.
llvm-svn: 132244
register allocation dependent and will occasionally break. WIP in the
register allocator to model paired/etc registers.
rdar://9119939
llvm-svn: 132242
There was no way to check if a given register/mode pair was valid. We now return
an error code (-2) instead of asserting. If anyone thinks that an assert
at this point is really needed, we can autogen a hasValidDwarfRegNum instead.
llvm-svn: 132236
was saying that the matching superregister class of GR32_NOREX in GR64_NOREX_NOSP
is GR64_NOREX, which drops the NOSP constraint. This fixes PR10032.
llvm-svn: 132225
subregisters:
When a value is in a subregister, at least report the location as being
the superregister. We should extend the .td files to encode the bit
range so that we can produce a DW_OP_bit_piece.
llvm-svn: 132224
suffix (e.g. .xdata$myfunc). The suffix part isn't implemented yet, but
I'll get to it in the next patch.
Fix up all callers of the affected functions. Make them pass said suffix to
the function.
llvm-svn: 132205
- the selector for the landing pad must provide all available information
about the handlers, filters, and cleanups within that landing pad
- calls to _Unwind_Resume must be converted to branches to the enclosing
lpad so as to avoid re-entering the unwinder when the lpad claimed it
was going to handle the exception in some way
This is quite specific to libUnwind-based unwinding. In an effort to not
interfere too badly with other unwinders, and with existing hacks in frontends,
this only triggers on _Unwind_Resume (not _Unwind_Resume_or_Rethrow) and does
nothing with selectors if it cannot find a selector call for either lpad.
llvm-svn: 132200
- Flip order of bitfields. This gets our output matching GAS.
- Handle case where the end of the prolog wasn't specified.
- If the resulting unwind info struct is less than 8 bytes, pad to 8 bytes.
Add a test for the latter two.
llvm-svn: 132188
still report leaks, but they're spurious now. Valgrind cannot peer into
std::vector objects--or any dynamic array, for that matter--because it doesn't
know how big the array is.
llvm-svn: 132174
This looks like it flagged an actual bug. Devang, please review. I added
the parentheses that change behavior, but make the behavior more closely
match commit log's intent.
llvm-svn: 132165
Rework how the MCWin64EHUnwindInfo instances are stored. Fix issues with
chained unwind areas exposed by the test that were related to this.
The ChainedParent field had the wrong address, because when the chained unwind
info was added, the addresses shifted around. Now we store the pointers to the
structures, which are now allocated from the MC heap.
llvm-svn: 132106
Use a proper worklist for use-def traversal without holding onto an
iterator. Now that we process all IV uses, we need complete logic for
resusing existing derived IV defs. See HoistStep.
llvm-svn: 132103
This doesn't change functionality (much), but it allows for a more fine-grained
eviction policy. The current policy only compares spill weights, and that is not
always the best thing to do. Spill weights are designed to serve linear scan,
and they don't consider live range splitting.
Add a mechanism so canEvict() can request that a live range be evicted and
split/spilled. This is to avoid infinite eviction loops.
llvm-svn: 132101
The practical effects here are that x86-64 fast-isel can now handle trunc from i8 to i1, and ARM fast-isel can handle many more constructs involving integers narrower than 32 bits (including loads, stores, and many integer casts).
rdar://9437928 .
llvm-svn: 132099
them.
I had to add a special SwitchSectionNoChange method to MCStreamer just for
.seh_handlerdata. If this isn't OK, please let me know, and I'll find some
other way to fix .seh_handlerdata streaming.
llvm-svn: 132084
target register, matching BX. I filed this bug because I was confused at first:
PR10007 - ARM branch instructions have inconsistent predicate operand placement
<http://llvm.org/bugs/show_bug.cgi?id=10007>
llvm-svn: 132041
case of a switch instruction. Back off this optimization when this would
eliminate all of the predecessors to the latch.
Sorry, I am unable to reduce a reasonably sized test case.
rdar://9486843
llvm-svn: 132022
Add a size alignment check to the .seh_stackalloc directive parser. Add a
more descriptive error message to the .seh_handler directive parser.
Add methods to the TargetAsmInfo struct in support of all this.
llvm-svn: 131992
deficiencies exist:
- Works only if ABI is o32.
- Zero-sized structures cannot be passed.
- There is a lot of redundancy in generated code.
llvm-svn: 131986
after checking for a GEP, so that it matches what GetUnderlyingObject
does. This fixes an obscure bug turned up by bugpoint in the testcase
for PR9931.
llvm-svn: 131971
non-zero.
- Teach X86 cmov optimization to eliminate the cmov from ctlz, cttz extension
when the source of X86ISD::BSR / X86ISD::BSF is proven to be non-zero.
rdar://9490949
llvm-svn: 131948
variable arguments in LowerCall and LowerFormalArguments. This should also fix
the bug in which handling of variable arguments is incorrect when the front-end
optimizes away unused fixed arguments.
llvm-svn: 131942
The following improvements are accomplished as a result of applying this patch:
- Fixed frame objects' offsets (relative to either the virtual frame pointer or
the stack pointer) are set before instruction selection is completed. There is
no need to wait until Prologue/Epilogue Insertion is run to set them.
- Calculation of final offsets of fixed frame objects is straightforward. It is
no longer necessary to assign negative offsets to fixed objects for incoming
arguments in order to distinguish them from the others.
- Since a fixed object has its relative offset set during instruction
selection, there is no need to conservatively set its alignment to 4.
- It is no longer necessary to reorder non-fixed frame objects in
MipsFrameLowering::adjustMipsStackFrame.
llvm-svn: 131915
I haven't implemented any of the ones that take registers yet. The problem is
that for x86-64 the streamer methods expect a native x86 register number (note:
%r8-%r15 want 8-15 instead of 0-7; same for %xmm8-%xmm15). I haven't figured
out exactly how I want to do that yet.
llvm-svn: 131899
aligned.
Teach memcpyopt to not give up all hope when confonted with an underaligned
memcpy feeding an overaligned byval. If the *source* of the memcpy can be
determined to be adequeately aligned, or if it can be forced to be, we can
eliminate the memcpy.
This addresses PR9794. We now compile the example into:
define i32 @f(%struct.p* nocapture byval align 8 %q) nounwind ssp {
entry:
%call = call i32 @g(%struct.p* byval align 8 %q) nounwind
ret i32 %call
}
in both x86-64 and x86-32 mode. We still don't get a tailcall though,
because tailcalls apparently can't handle byval.
llvm-svn: 131884
result is non-zero. Implement an example optimization (PR9814), which allows us to
transform:
A / ((1 << B) >>u 2)
into:
A >>u (B-2)
which we compile into:
_divu3: ## @divu3
leal -2(%rsi), %ecx
shrl %cl, %edi
movl %edi, %eax
ret
instead of:
_divu3: ## @divu3
movb %sil, %cl
movl $1, %esi
shll %cl, %esi
shrl $2, %esi
movl %edi, %eax
xorl %edx, %edx
divl %esi, %eax
ret
llvm-svn: 131860
failing to form a memset, then having to delete it" but my approximation
isn't safe for self recurrent loops. Instead of doign a hack, just
do it the right way.
llvm-svn: 131858
I also changed -simplifycfg, -jump-threading and -codegenprepare to use this to produce slightly better code without any extra cleanup passes (AFAICT this was the only place in -simplifycfg where now-dead conditions of replaced terminators weren't being cleaned up). The only other user of this function is -sccp, but I didn't read that thoroughly enough to figure out whether it might be holding pointers to instructions that could be deleted by this.
llvm-svn: 131855
Original log message:
When BasicAA can determine that two pointers have the same base but
differ by a dynamic offset, return PartialAlias instead of MayAlias.
See the comment in the code for details. This fixes PR9971.
llvm-svn: 131809
preparation for reversing StackDirection.
Fixed objects are created in the following order:
1. Incoming arguments passed on stack.
2. va_arg objects (include both arguments that are passed in registers and
pointer to the location of the first va_arg argument).
3. $gp restore slot.
4. Outgoing arguments passed on stack.
5. Pointer to alloca'd space.
llvm-svn: 131767
No functionality enabled by default. Use -disable-iv-rewrite.
Extended IVUsers to keep track of the phi that represents the users' IV.
Added the WidenIV transform to replace a narrow IV with a wide IV
by doing a one-for-one replacement of IV users instead of expanding the
SCEV expressions. [sz]exts are removed and truncs are inserted.
llvm-svn: 131744
There's really nothing to implement. All this really does is swap to a
pseudo-section that later gets written to the unwind info struct. That
needs to be implemented in the object streamers.
llvm-svn: 131734
text section.
Assume the following bit of annotated assembly:
.section .data.rel.ro,"aw",%progbits
.align 2
.LAlpha:
.long startval(GOTOFF)
.text
.align 2
.type main,%function
.align 4
main: ;;; assume "main" starts at offset 0x20
0x0 push {r11, lr}
0x4 movw r0, :lower16:(.LAlpha-(.LBeta+8))
;;; ==> (.AddrOf(.LAlpha) - ((.AddrOf(.LBeta) - .AddrOf(".")) + 8)
;;; ==> (??? - ((16-4) + 8) = -20
0x8 movt r0, :upper16:(.LAlpha-(.LBeta+8))
;;; ==> (.AddrOf(.LAlpha) - ((.AddrOf(.LBeta) - .AddrOf(".")) + 8)
;;; ==> (??? - ((16-8) + 8) = -16
0xc ... blah
.LBeta:
0x10 add r0, pc, r0
0x14 ... blah
.LGamma:
0x18 add r1, pc, r1
Above snippet results in the following relocs in the .o file for the
first pair of movw/movt instructions
00000024 R_ARM_MOVW_PREL_NC .LAlpha
00000028 R_ARM_MOVT_PREL .LAlpha
And the encoded instructions in the .o file for main: must be
00000020 <main>:
20: e92d4800 push {fp, lr}
24: e30f0fec movw r0, #65516 ; 0xffec i.e. -20
28: e34f0ff0 movt r0, #65520 ; 0xfff0 i.e. -16
However, llc (prior to this commit) generates the following sequence
00000020 <main>:
20: e92d4800 push {fp, lr}
24: e30f0fec movw r0, #65516 ; 0xffec - i.e. -20
28: e34f0fff movt r0, #65535 ; 0xffff - i.e. -1
What has to happen in the ArmAsmBackend is that if the relocation is PC
relative, the 16 bits encoded as part of movw and movt must be both addends,
not addresses. It makes sense to encode addresses by right shifting the value
by 16, but the result is incorrect for PIC.
i.e., the right shift by 16 for movt is ONLY valid for the NON-PCRel case.
This change agrees with what GNU as does, and makes the PIC code run.
MC/ARM/elf-movt.s covers this case.
llvm-svn: 131674
ours compatible with GAS.
In retrospect, I should have emailed binutils about this earlier. Thanks to
Kai Tietz for pointing out that GAS already had SEH directives.
llvm-svn: 131652
piclabel operand. The operand in the tablegen definition doesn't actually turn
into an MI operand, so it just confuses anything checking the TargetInstrDesc
for the number of operands. It suffices to just have an implicit def of LR.
llvm-svn: 131626
on CodeGen/X86/2007-05-07-InvokeSRet.ll. There is probably a bug here that was
fixed by r128961, but since there is no test or reference to a source file I have
to revert it.
llvm-svn: 131618
- StartChained and EndChained delimit a chained unwind area, which can contain
additional operations to be undone if an exception occurs inside of it.
- UnwindOnly declares that this function doesn't handle any exceptions. If it
has a handler, it's an unwind handler instead of an exception handler.
- Lsda declares the location and size of the LSDA, which in the Win64 EH
scheme is kept inside the UNWIND_INFO struct. Windows itself ignores the
LSDA; it's used by the Language-Specific Handler (the "Personality Function"
from DWARF).
llvm-svn: 131572
than either the primitive size or the element primitive size (in the case
of vectors), simplify the vector logic. No functionality change. There
is some distracting churn in the patch because I lined up comments better
while there - sorry about that.
llvm-svn: 131533
happily accept things like "sext <2 x i32> to <999 x i64>". It would
also accept "sext <2 x i32> to i64", though the verifier would catch
that later. Fixed by having castIsValid check that vector lengths match
except when doing a bitcast. (2) When creating a cast instruction, check
that the cast is valid (this was already done when creating constexpr
casts). While there, replace getScalarSizeInBits (used to allow more
vector casts) with getPrimitiveSizeInBits in getCastOpcode and isCastable
since vector to vector casts are now handled explicitly by passing to the
element types; i.e. this bit should result in no functional change.
llvm-svn: 131532
can be used to turn a <4 x i64> into a <4 x i32> but getCastOpcode would assert
if you passed these types to it. Note that this strictly extends the previous
functionality: if getCastOpcode previously accepted two vector types (i.e. didn't
assert) then it still will and returns the same opcode (BitCast). That's because
before it would only accept vectors with the same bitwidth, and the new code only
touches vectors with the same length. However if two vectors have both the same
bitwidth and the same length then their element types have the same bitwidth, so
the new logic will return BitCast as before.
llvm-svn: 131530
GAS has no such directives (not even mingw-w64 GAS has them), so I took
creative license with their names in assembly. I prefixed them all with
"w64_" to avoid namespace collisions, for example. If I discover that GAS
has taken a different approach, I'll change ours to match.
llvm-svn: 131525
the purposes of the Win64 EH tables, I realized we had no way to tell where
the function ends. (MASM bounds functions with PROC and ENDP keywords.)
Add a directive to delimit the end of the function, and rename the 'frame'
directive to more accurately reflect its duality with the new directive.
llvm-svn: 131522
LiveInterval::shrinkToUses recomputes the live range from scratch instead of
removing snippets. This should avoid the problem with dangling live ranges.
Leave physreg identity copies alone. They can be created when joining a virtreg
with a physreg. They don't affect register allocation, and they will be removed
by the rewriter.
llvm-svn: 131521
As an example, the change to InstCombineCalls catches a common case where a call to a bitcast of a function is rewritten.
Chris, does this approach look reasonable?
llvm-svn: 131516
The greedy register allocator has live range splitting and register class
inflation, so it can actually fully undo this join, including restoring the
original register classes.
We still don't want to do this for long live ranges, mostly because of the high
register pressure of there are many constrained live ranges overlapping.
llvm-svn: 131466
When instructions are deleted, they leave tombstone SlotIndex entries.
The isZeroLength method should ignore these null indexes.
This causes RABasic to sometimes spill a callee-saved register in the
abi-isel.ll test, so don't run that test with -regalloc=basic. Prioritizing
register allocation according to spill weight can cause more registers to be
used.
llvm-svn: 131436