for calls to weak symbols with a definition has the appearance of working
with LLVM-generated code because weak symbol definitions are put in their
own sections.
llvm-svn: 126933
There are probably much larger speedups to be had by renumbering locally instead
of looping over the whole function. For now, the greedy register allocator is
25% faster.
llvm-svn: 126926
This is much faster than using a pointer to a ManagedStatic object accessed with
a function call. The greedy register allocator is 5% faster overall just from
the SlotIndex default constructor savings.
llvm-svn: 126925
The SlotIndex created by the default construction does not represent a position
in the function, and it doesn't make sense to compare it to other indexes.
llvm-svn: 126924
We need to wait until we meet a PHIDef in its defining block before resurrecting
PHIKills in the predecessors.
This should unbreak the llvm-gcc-build-x86_64-darwin10-x-mingw32-x-armeabi bot.
llvm-svn: 126905
David Greene changed CannotYetSelect() to print the full DAG including multiple
copies of operands reached through different paths in the DAG. Unfortunately
this blows up exponentially in some cases. The depth limit of 100 is way too
high to prevent this -- I'm seeing a message string of 150MB with a depth of
only 40 in one particularly bad case, even though the DAG has less than 200
nodes. Part of the problem is that the printing code is following chain
operands, so if you fail to select an operation with a chain, the printer will
follow all the chained operations back to the entry node.
llvm-svn: 126899
Values that map to a single new value in a new interval after splitting don't
need new PHIDefs, and if the parent value was never rematerialized the live
range will be the same.
llvm-svn: 126894
uses.
The result produced by the streamer is used to give the linker more accurate
information and to add to llvm.compiler.used. The second improvement removes
the need for the user to add __attribute__((used)) to functions only used in
inline asm. The first one lets us build firefox with LTO on Darwin :-)
llvm-svn: 126830
- Allow i16, i32, i64, float, and double types, using the native .u16,
.u32, .u64, .f32, and .f64 PTX types.
- Allow loading/storing of all primitive types.
- Allow primitive types to be passed as parameters.
- Allow selection of PTX Version and Shader Model as sub-target attributes.
- Merge integer/floating-point test cases for load/store.
- Use .u32 instead of .s32 to conform to output from NVidia nvcc compiler.
Patch by Justin Holewinski
llvm-svn: 126824
Extract the updateSSA() method from the too long extendRange().
LiveOutCache can be shared among all the new intervals since there is at most
one of the new ranges live out from each basic block.
llvm-svn: 126818
This method could probably be used by LiveIntervalAnalysis::shrinkToUses, and
now it can use extendIntervalEndTo() which coalesces ranges.
llvm-svn: 126803
The value map is currently not used, all values are 'complex mapped' and
LiveIntervalMap::mapValue is used to dig them out.
This is the first step in a series changes leading to the removal of
LiveIntervalMap. Its data structures can be shared among all the live intervals
created by a split, so it is wasteful to create a copy for each.
llvm-svn: 126800
Use 8 bits from line number field to keep track of argument ordering while encoding debug info for an argument. That leaves 24 bit for line no, DebugLoc also allocates 24 bit for line numbers. If a function has more than 255 arguments then rest of the arguments will be ordered by llvm.dbg.* intrinsics' ordering in IR.
llvm-svn: 126793
addressing code. On 403.gcc this almost halves CodeGenPrepare time and reduces
total llc time by 9.5%. Unfortunately, getNumUses() is still the hottest function
in llc.
llvm-svn: 126782
This effectively disables the 'turbo' functionality of the greedy register
allocator where all new live ranges created by splitting would be reconsidered
as if they were originals.
There are two reasons for doing this, 1. It guarantees that the algorithm
terminates. Early versions were prone to infinite looping in certain corner
cases. 2. It is a 2x speedup. We can skip a lot of unnecessary interference
checks that won't lead to good splitting anyway.
The problem is that region splitting only gets one shot, so it should probably
be changed to target multiple physical registers at once.
Local live range splitting is still 'turbo' enabled. It only accounts for a
small fraction of compile time, so it is probably not necessary to do anything
about that.
llvm-svn: 126781
intersection of the LHS and RHS ConstantRanges and return "false" when
the range is empty.
This simplifies some code and catches some extra cases.
llvm-svn: 126744
and 256-bit forms. Because the number of elements in a vector
does not determine the vector type (4 elements could be v4f32 or
v4f64), pass the full type of the vector to decode routines.
llvm-svn: 126664
- Add appropriate TableGen patterns for fadd, fsub, fmul.
- Add .f32 as the PTX type for the LLVM float type.
- Allow parameters, return values, and global variable declarations
to accept the float type.
- Add appropriate test cases.
Patch by Justin Holewinski
llvm-svn: 126636
1. Inform users of ADDEs with two 0 operands that it never sets carry
2. Fold other ADDs or ADDCs into the ADDE if possible
It would be neat if we could do the same thing for SETCC+ADD eventually, but we can't do that in target independent code.
llvm-svn: 126557
Yes, there are other types than i8* and GEPs on them can produce an add+multiply.
We don't consider that cheap enough to be speculatively executed.
llvm-svn: 126481
is possible to do better if the high bit is set in either KnownZero/KnownOne, but
in practice NumSignBits is always 1 when we are zero extending because nothing
is known about that register.
llvm-svn: 126465
D registers since the vpush list may not have gaps. Make sure the stack
adjustment instruction isn't moved between them. Ditto for vpop in
epilogues.
Sorry, can't reduce a small test case.
rdar://9043312
llvm-svn: 126457
New live ranges are assigned in long -> short order, but live ranges that have
been evicted at least once are deferred and assigned in short -> long order.
Also disable splitting and spilling for live ranges seen for the first time.
The intention is to create a realistic interference pattern from the heavy live
ranges before starting splitting and spilling around it.
llvm-svn: 126451
Introduce a variable in the AsmParserExtension whether [] is valid in an
expression. If it is true, parse them like (). Enable this for ELF only.
llvm-svn: 126443
Limit the folding of any_ext and sext into the load operation to scalars.
Limit the active-bits trunc optimization to scalars.
Document vector trunc and vector sext in LangRef.
Similar to commit 126080 (for enabling zext).
llvm-svn: 126424
The problem was codegen guessing the wrong values and printing
.section .eh_frame,"aMS",@progbits,4
It is not clear at all if Codegen should try to guess, MC is the
one that should know the default flags.
llvm-svn: 126421
registers at phis. This enables us to eliminate a lot of pointless zexts during
the DAGCombine phase. This fixes <rdar://problem/8760114>.
llvm-svn: 126380
function prototype into a call to a varargs prototype. We do
allow the xform if we have a definition, but otherwise we don't
want to risk that we're changing the abi in a subtle way. On
X86-64, for example, varargs require passing stuff in %al.
llvm-svn: 126363
enabled for all targets. Non-X86 targets should not have this behavior
enabled by default.
Joerg, if you would like to resubmit with the behavior conditionalized to be
X86-ELF only, that's fine.
llvm-svn: 126336
The previous codegen for the slow path (when values are in VFP / NEON
registers) was incorrect if the source is NaN.
The new codegen uses NEON vbsl instruction to copy the sign bit. e.g.
vmov.i32 d1, #0x80000000
vbsl d1, d2, d0
If NEON is not available, it uses integer instructions to copy the sign bit.
rdar://9034702
llvm-svn: 126295
When a large live range is evicted, it will usually be split when it comes
around again. By deferring evicted live ranges, the splitting happens at a time
when the interference pattern is more realistic. This prevents repeated
splitting and evictions.
llvm-svn: 126282
Use interval sizes instead of spill weights to determine if it is legal to evict
interference. A smaller interval can evict interference if all interfering live
ranges are larger.
Allow multiple interferences to be evicted as along as they are all larger than
the live range being allocated.
Spill weights are still used to select the preferred eviction candidate.
llvm-svn: 126276