In some cases such as interpreters using indirectbr, the CFG can be very
complicated, and live range splitting may be forced to insert a large
number of phi-defs. When that happens, traceSiblingValue can spend a
lot of time zipping around in the CFG looking for defs and reloads.
This patch causes more information to be cached in SibValues, and the
cached values are used to terminate searches early. This speeds up
spilling by 20x in one interpreter test case. For more typical code,
this is just a 10% speedup of spilling.
llvm-svn: 139247
(The fix for the related failures on x86 is going to be nastier because we actually need Acquire memoperands attached to the atomic load instrs, etc.)
llvm-svn: 139221
with a vector condition); such selects become VSELECT codegen nodes.
This patch also removes VSETCC codegen nodes, unifying them with SETCC
nodes (codegen was actually often using SETCC for vector SETCC already).
This ensures that various DAG combiner optimizations kick in for vector
comparisons. Passes dragonegg bootstrap with no testsuite regressions
(nightly testsuite as well as "make check-all"). Patch mostly by
Nadav Rotem.
llvm-svn: 139159
init.trampoline and adjust.trampoline intrinsics, into two intrinsics
like in GCC. While having one combined intrinsic is tempting, it is
not natural because typically the trampoline initialization needs to
be done in one function, and the result of adjust trampoline is needed
in a different (nested) function. To get around this llvm-gcc hacks the
nested function lowering code to insert an additional parent variable
holding the adjust.trampoline result that can be accessed from the child
function. Dragonegg doesn't have the luxury of tweaking GCC code, so it
stored the result of adjust.trampoline in the memory GCC set aside for
the trampoline itself (this is always available in the child function),
and set up some new memory (using an alloca) to hold the trampoline.
Unfortunately this breaks Go which allocates trampoline memory on the
heap and wants to use it even after the parent has exited (!). Rather
than doing even more hacks to get Go working, it seemed best to just use
two intrinsics like in GCC. Patch mostly by Sanjoy Das.
llvm-svn: 139140
If we have a chain of zext -> assert_zext -> zext -> use, the first zext would get simplified away because of the later zext, and then the later zext would get simplified away because of the assert. The solution is to teach SimplifyDemandedBits that assert_zext demands all of the high bits of its input, rather than only those demanded by its users. No testcase because the only example I have manifests as llvm-gcc miscompiling LLVM, and I haven't found a smaller case that reproduces this problem.
Fixes <rdar://problem/10063365>.
llvm-svn: 139059
to be unreliable on platforms which require memcpy calls, and it is
complicating broader legalize cleanups. It is hoped that these cleanups
will make memcpy byval easier to implement in the future.
llvm-svn: 138977
- On COFF the .lcomm directive has an alignment argument.
- On ELF we fall back to .local + .comm
Based on a patch by NAKAMURA Takumi.
Fixes PR9337, PR9483 and PR10128.
llvm-svn: 138976
An instruction may define part of a register where the other bits are
undefined. In that case, it is safe to rematerialize the instruction.
For example:
%vreg2:ssub_0<def> = VLDRS <cp#0>, 0, pred:14, pred:%noreg, %vreg2<imp-def>
The extra <imp-def> operand indicates that the instruction does not read
the other parts of the virtual register, so a remat is safe.
This patch simply allows multiple def operands for the virtual register.
It is MI->readsVirtualRegister() that determines if we depend on a
previous value so remat is impossible.
llvm-svn: 138953
An instruction that redefines only part of a larger register can never
be rematerialized since the virtual register value depends on the old
value in other parts of the register.
This was fixed for the inline spiller in r138794. This patch fixes the
problem for all register allocators, and includes a small test case.
<rdar://problem/10032939>
llvm-svn: 138944
Added canClobberReachingPhysRegUse() to handle a particular pattern in
which a two-address instruction could be forced to interfere with
EFLAGS, causing a compare to be unnecessarilly cloned.
Fixes rdar://problem/5875261
llvm-svn: 138924
X86. Modify the pass added in the previous patch to call this new
code.
This new prologues generated will call a libgcc routine (__morestack)
to allocate more stack space from the heap when required
Patch by Sanjoy Das.
llvm-svn: 138812
Add a instruction flag: hasPostISelHook which tells the pre-RA scheduler to
call a target hook to adjust the instruction. For ARM, this is used to
adjust instructions which may be setting the 's' flag. ADC, SBC, RSB, and RSC
instructions have implicit def of CPSR (required since it now uses CPSR physical
register dependency rather than "glue"). If the carry flag is used, then the
target hook will *fill in* the optional operand with CPSR. Otherwise, the hook
will remove the CPSR implicit def from the MachineInstr.
llvm-svn: 138810
I don't really like the patterns, but I'm having trouble coming up with a
better way to handle them.
I plan on making other targets use the same legalization
ARM-without-memory-barriers is using... it's not especially efficient, but
if anyone cares, it's not that hard to fix for a given target if there's
some better lowering.
llvm-svn: 138621
A value of -1 at a call site tells the personality function that this call isn't
handled by the current function. Since the ResumeInsts are converted to calls to
_Unwind_SjLj_Resume, add a (volatile) store of -1 to its 'call site'.
llvm-svn: 138416
This is not necessarily the first or dominating use of the EH values. The IR
breaks if it's not. So replace the specific value in the instruction with the
new value.
llvm-svn: 138406
The invoke could be at the end of the entry block. If it's the only one, then we
won't process all of the landingpad instructions correctly. This code is
currently ugly, but should be made much nicer once the new EH switch is thrown.
llvm-svn: 138397
value, we insert a load of the exception object and selector object from memory,
which is where it actually resides. If it's used by a PHI node, we follow that
to where it is being used. Eventually, all landingpad instructions should have
no uses. Any PHI nodes that were associated with those landingpads should be
removed.
llvm-svn: 138302
the intent seems to be to terminate even in Release builds, just use abort()
directly.
If program flow ever reaches a __builtin_unreachable (which llvm_unreachable is
#define'd to on newer GCCs) then the program is undefined.
llvm-svn: 138068
Normally, a partial register def is treated as reading the
super-register unless it also defines the full register like this:
%vreg110:sub_32bit<def> = COPY %vreg77:sub_32bit, %vreg110<imp-def>
This patch also uses the <undef> flag on partial defs to recognize
non-reading operands:
%vreg110:sub_32bit<def,undef> = COPY %vreg77:sub_32bit
This fixes a subtle bug in RegisterCoalescer where LIS->shrinkToUses
would treat a coalesced copy as still reading the register, extending
the live range artificially.
My test case only works when I disable DCE so a dead copy is left for
RegisterCoalescer, so I am not including it.
<rdar://problem/9967101>
llvm-svn: 138018
The landingpad instruction is lowered into the EXCEPTIONADDR and EHSELECTION
SDNodes. The information from the landingpad instruction is harvested by the
'AddLandingPadInfo' function. The new EH uses the current EH scheme in the
back-end. This will change once we switch over to the new scheme. (Reviewed by
Jakob!)
llvm-svn: 137880
This generates the SDNodes for the new exception handling scheme. It takes the
two values coming from the landingpad instruction and assigns them to the
EXCEPTIONADDR and EHSELECTION nodes.
llvm-svn: 137873
Things are much saner now. We no longer need to modify the laning pads, because
of the invariants we impose upon them. The only thing DwarfEHPrepare needs to do
is convert the 'resume' instruction into a call to '_Unwind_Resume'.
llvm-svn: 137855
MDNodes graph structure such that compiler unit keeps track of important MDNodes and update dwarf writer to process mdnodes top-down instead of bottom up.
llvm-svn: 137778
When a variable is inlined multiple places, abstract variable keeps name, location, type etc.. info and all other concreate instances of the variable directly refers to abstract variable.
llvm-svn: 137637
This implements the 'landingpad' instruction. It's used to indicate that a basic
block is a landing pad. There are several restrictions on its use (see
LangRef.html for more detail). These restrictions allow the exception handling
code to gather the information it needs in a much more sane way.
This patch has the definition, implementation, C interface, parsing, and bitcode
support in it.
llvm-svn: 137501
The Query class now holds two iterators instead of an InterferenceResult
instance. The iterators are used as bookmarks for repeated
collectInterferingVRegs calls.
llvm-svn: 137380
The InterferenceResult iterator turned out to be less important than we
thought it would be. LiveIntervalUnion clients want higher level
information, like the list of interfering virtual registers.
llvm-svn: 137346
Coalescing can remove copy-like instructions with sub-register operands
that constrained the register class. Examples are:
x86: GR32_ABCD:sub_8bit_hi -> GR32
arm: DPR_VFP2:ssub0 -> DPR
Recompute the register class of any virtual registers that are used by
less instructions after coalescing.
This affects code generation for the Cortex-A8 where we use NEON
instructions for f32 operations, c.f. fp_convert.ll:
vadd.f32 d16, d1, d0
vcvt.s32.f32 d0, d16
The register allocator is now free to use d16 for the temporary, and
that comes first in the allocation order because it doesn't interfere
with any s-registers.
llvm-svn: 137133
This function doesn't have anything to do with spill weights, and MRI
already has functions for manipulating the register class of a virtual
register.
llvm-svn: 137123
The local ranges created get to stay in the RS_New stage, just like for
local and region splitting.
This gives tryLocalSplit a bit more freedom the first time it sees one
of these new local ranges.
llvm-svn: 137001
Normally, we don't create a live range for a single instruction in a
basic block, the spiller does that anyway. However, when splitting a
live range that belongs to a proper register sub-class, inserting these
extra COPY instructions completely remove the constraints from the
remainder interval, and it may be allocated from the larger super-class.
The spiller will mop up these small live ranges if we end up spilling
anyway. It calls them snippets.
llvm-svn: 136989
Some instructions require restricted register classes, but most of the
time that doesn't affect register allocation. For example, some
instructions don't work with the stack pointer, but that is a reserved
register anyway.
Sometimes it matters, GR32_ABCD only has 4 allocatable registers. For
such a proper sub-class, the register allocator should try to enable
register class inflation since that makes more registers available for
allocation.
Make sure only legal super-classes are considered. For example, tGPR is
not a proper sub-class in Thumb mode, but in ARM mode it is.
llvm-svn: 136981
The old code would look at kills and defs in one pass over the
instruction operands, causing problems with this code:
%R0<def>, %CPSR<def,dead> = tLSLri %R5<kill>, 2, pred:14, pred:%noreg
%R0<def>, %CPSR<def,dead> = tADDrr %R4<kill>, %R0<kill>, pred:14, %pred:%noreg
The last instruction kills and redefines %R0, so it is still live after
the instruction.
This caused a register scavenger crash when compiling 483.xalancbmk for
armv6. I am not including a test case because it requires too much bad
luck to expose this old bug.
First you need to convince the register allocator to use %R0 twice on
the tADDrr instruction, then you have to convince BranchFolding to do
something that causes it to run the register scavenger on he bad block.
<rdar://problem/9898200>
llvm-svn: 136973
inlined variable, based on the discussion in PR10542.
This explodes the runtime of several passes down the pipeline due to
a large number of "copies" remaining live across a large function. This
only shows up with both debug and opt, but when it does it creates
a many-minute compile when self-hosting LLVM+Clang. There are several
other cases that show these types of regressions.
All of this is tracked in PR10542, and progress is being made on fixing
the issue. Once its addressed, the re-instated, but until then this
restores the performance for self-hosting and other opt+debug builds.
Devang, let me know if this causes any trouble, or impedes fixing it in
any way, and thanks for working on this!
llvm-svn: 136953
It is possible to have multiple DBG_VALUEs for the same variable:
32L TEST32rr %vreg0<kill>, %vreg0, %EFLAGS<imp-def>; GR32:%vreg0
DBG_VALUE 2, 0, !"i"
DBG_VALUE %noreg, %0, !"i"
When that happens, keep the last one instead of the first.
llvm-svn: 136842
This helps generate better code in functions with high register
pressure.
The previous version of compact region splitting caused regressions
because the regions were a bit too large. A stronger negative bias
applied in r136832 fixed this problem.
llvm-svn: 136836
Apply twice the negative bias on transparent blocks when computing the
compact regions. This excludes loop backedges from the region when only
one of the loop blocks uses the register.
Previously, we would include the backedge in the region if the loop
preheader and the loop latch both used the register, but the loop header
didn't.
When both the header and latch blocks use the register, we still keep it
live on the backedge.
llvm-svn: 136832
This is either an invalid SlotIndex, or valno->def for the first value
defined inside the block. PHI values are not counted as defined inside
the block.
The FirstDef field will be used when estimating the cost of spilling
around a block.
llvm-svn: 136736
The PrefBoth constraint is used for blocks that ideally want a live-in
value both on the stack and in a register. This would be used by a block
that has a use before interference forces a spill.
Secondly, add the ChangesValue flag to BlockConstraint. This tells
SpillPlacement if a live-in value on the stack can be reused as a
live-out stack value for free. If the block redefines the virtual
register, a spill would be required for that.
This extra information will be used by SpillPlacement to more accurately
calculate spill costs when a value can exist both on the stack and in a
register.
The simplest example is a basic block that reads the virtual register,
but doesn't change its value. Spilling around such a block requires a
reload, but no spill in the block.
The spiller already knows this, but the spill placer doesn't. That can
sometimes lead to suboptimal regions.
llvm-svn: 136731
This adds the 'resume' instruction class, IR parsing, and bitcode reading and
writing. The 'resume' instruction resumes propagation of an existing (in-flight)
exception whose unwinding was interrupted with a 'landingpad' instruction (to be
added later).
llvm-svn: 136589
This includes registers like EFLAGS and ST0-ST7. We don't check for
liveness issues in the verifier and scavenger because registers will
never be allocated from these classes.
While in SSA form, we do care about the liveness of unallocatable
unreserved registers. Liveness of EFLAGS and ST0 neds to be correct for
MachineDCE and MachineSinking.
llvm-svn: 136541
This flag is true from isel to register allocation when the machine
function is required to be in SSA form. The TwoAddressInstructionPass
and PHIElimination passes clear the flag.
The SSA flag wil be used by the machine code verifier to check for SSA
form, and eventually an assertion can enforce it in +Asserts builds.
This will catch the common target error of creating machine code with
multiple defs of a virtual register.
llvm-svn: 136532
working on x86 (at least for trivial testcases); other architectures will
need more work so that they actually emit the appropriate instructions for
orderings stricter than 'monotonic'. (As far as I can tell, the ARM, PPC,
Mips, and Alpha backends need such changes.)
llvm-svn: 136457
specified in the same file that the library itself is created. This is
more idiomatic for CMake builds, and also allows us to correctly specify
dependencies that are missed due to bugs in the GenLibDeps perl script,
or change from compiler to compiler. On Linux, this returns CMake to
a place where it can relably rebuild several targets of LLVM.
I have tried not to change the dependencies from the ones in the current
auto-generated file. The only places I've really diverged are in places
where I was seeing link failures, and added a dependency. The goal of
this patch is not to start changing the dependencies, merely to move
them into the correct location, and an explicit form that we can control
and change when necessary.
This also removes a serialization point in the build because we don't
have to scan all the libraries before we begin building various tools.
We no longer have a step of the build that regenerates a file inside the
source tree. A few other associated cleanups fall out of this.
This isn't really finished yet though. After talking to dgregor he urged
switching to a single CMake macro to construct libraries with both
sources and dependencies in the arguments. Migrating from the two macros
to that style will be a follow-up patch.
Also, llvm-config is still generated with GenLibDeps.pl, which means it
still has slightly buggy dependencies. The internal CMake
'llvm-config-like' macro uses the correct explicitly specified
dependencies however. A future patch will switch llvm-config generation
(when using CMake) to be based on these deps as well.
This may well break Windows. I'm getting a machine set up now to dig
into any failures there. If anyone can chime in with problems they see
or ideas of how to solve them for Windows, much appreciated.
llvm-svn: 136433
This generates the correct SDNodes for the landingpad instruction. It makes an
assumption that the result of the landingpad instruction has at least two
values. And that the first value is a pointer to the exception object and the
second value is the "selector."
llvm-svn: 136430
'atomicrmw' instructions, which allow representing all the current atomic
rmw intrinsics.
The allowed operands for these instructions are heavily restricted at the
moment; we can probably loosen it a bit, but supporting general
first-class types (where it makes sense) might get a bit complicated,
given how SelectionDAG works.
As an initial cut, these operations do not support specifying an alignment,
but it would be possible to add if we think it's useful. Specifying an
alignment lower than the natural alignment would be essentially
impossible to support on anything other than x86, but specifying a greater
alignment would be possible. I can't think of any useful optimizations which
would use that information, but maybe someone else has ideas.
Optimizer/codegen support coming soon.
llvm-svn: 136404
Code like that would only be produced by bugpoint, but we should still
handle it correctly.
When a register is defined by a REG_SEQUENCE of undefs, the register
itself is undef. Previously, we would create a register with uses but no
defs.
Fixes part of PR10520.
llvm-svn: 136401
There are two conflicting strategies in play:
- Under high register pressure, we want to assign large live ranges
first. Smaller live ranges are easier to place afterwards.
- Live range splitting is guided by interference, so splitting should be
deferred until interference is as realistic as possible.
With the recent changes to the live range stages, and with compact
regions enabled, it is less traumatic to split a live range too early.
If some of the split products were too big, they can often be split
again.
By reversing the RS_Split order, we get this queue order:
1. Normal live ranges, large to small.
2. RS_Split live ranges, large to small.
The large-to-small order improves RAGreedy's puzzle solving skills under
high register pressure. It may cause a bit more iterated splitting, but
we handle that better now.
With this change, -compact-regions is mostly an improvement on SPEC.
llvm-svn: 136388
When splitting global live ranges, it is now possible to split for
multiple destination intervals at once. Previously, we only had the main
and stack intervals.
Each edge bundle is assigned to a split candidate, and splitAroundRegion
will insert copies between the candidate intervals and the stack
interval as needed.
The multi-way splitting is used to split around compact regions when
enabled with -compact-regions. The best candidate register still gets
all the bundles it wants, but everything outside the main interval is
first split around compact regions before we create single-block
intervals.
Compact region splitting still causes some regressions, so it is not
enabled by default.
llvm-svn: 136186
These copies would coalesce easily, but the resulting value would be
defined by a deleted instruction. Now we also remove the undefined value
number from the destination register.
This fixes PR10503.
llvm-svn: 136174
When dead code elimination deletes a PHI value, the virtual register may
split into multiple connected components. In that case, revert each
component to the RS_Assign stage.
The new components are guaranteed to be smaller (the original value
numbers are distributed among the components), so this will always be
making progress. The components are now allowed to evict other live
ranges or be split again.
llvm-svn: 136034
This mechanism already exists, but the RS_Split2 stage makes it clearer.
When live range splitting creates ranges that may not be making
progress, they are marked RS_Split2 instead of RS_New. These ranges may
be split again, but only in a way that can be proven to make progress.
For local ranges, that means they must be split into ranges used by
strictly fewer instructions.
For global ranges, region splitting is bypassed and the RS_Split2
ranges go straight to per-block splitting.
llvm-svn: 135912
The stage is used to control where a live range is going, not where it
is coming from. Live ranges created by splitting will usually be marked
RS_New, but some are marked RS_Spill to avoid wasting time trying to
split them again.
The old RS_Global and RS_Local stages are merged - they are really the
same thing for local and global live ranges.
llvm-svn: 135911
This fixes PR10463. A two-address instruction with an <undef> use
operand was incorrectly rewritten so the def and use no longer used the
same register, violating the tie constraint.
Fix this by always rewriting <undef> operands with the register a def
operand would use.
llvm-svn: 135885
This method computes the edge bundles that should be live when splitting
around a compact region. This is independent of interference.
The function returns false if the live range was already a compact
region, or the compact region doesn't have any live bundles - it would
be the same as splitting around basic blocks.
Compact regions are computed using the normal spill placement code. We
pretend there is interference in all live-through blocks that don't use
the live range. This removes all edges from the Hopfield network used
for spill placement, so it converges instantly.
llvm-svn: 135847
If there is no interference and no last split point, we cannot
enterIntvBefore(Stop) - that function needs a real instruction.
Use enterIntvAtEnd instead for that very easy case.
This code doesn't currently run, it is needed by multi-way splitting.
llvm-svn: 135846
A split candidate can have a null PhysReg which means that it doesn't
map to a real interference pattern. Instead, pretend that all through
blocks have interference.
This makes it possible to generate compact regions where the live range
doesn't go through blocks that don't use it. The live range will still
be live between directly connected blocks with uses.
Splitting around a compact region tends to produce a live range with a
high spill weight, so it may evict a less dense live range.
llvm-svn: 135845
This method matches addLinks - All the listed blocks are considered to
have interference, so they add a negative bias to their bundles.
This could also be done by addConstraints, but that requires building a
separate BlockConstraint array.
llvm-svn: 135844
- Introduce JITDefault code model. This tells targets to set different default
code model for JIT. This eliminates the ugly hack in TargetMachine where
code model is changed after construction.
llvm-svn: 135580
(including compilation, assembly). Move relocation model Reloc::Model from
TargetMachine to MCCodeGenInfo so it's accessible even without TargetMachine.
llvm-svn: 135468
to MCRegisterInfo. Also initialize the mapping at construction time.
This patch eliminate TargetRegisterInfo from TargetAsmInfo. It's another step
towards fixing the layering violation.
llvm-svn: 135424
When splitting a live range immediately before an LDR_POST instruction
that redefines the address register, make sure to use the correct value
number in leaveIntvBefore.
We need the value number entering the instruction.
<rdar://problem/9793765>
llvm-svn: 135413
When trying to rematerialize a value before an instruction that has an
early-clobber redefine of the virtual register, make sure to look up the
correct value number.
Early-clobber defs are moved one slot back, so getBaseIndex is needed to
find the used value number.
Bugpoint was unable to reduce the test case for this, see PR10388.
llvm-svn: 135378
This gets rid of some of the gory splitting details in RAGreedy and
makes them available to future SplitKit clients.
Slightly generalize the functionality to support multi-way splitting.
Specifically, SplitEditor::splitLiveThroughBlock() supports switching
between different register intervals in a block.
llvm-svn: 135307
when determining validity of matching constraint. Allow i1
types access to the GR8 reg class for x86.
Fixes PR10352 and rdar://9777108
llvm-svn: 135180
During type legalization we often use the SIGN_EXTEND_INREG SDNode.
When this SDNode is legalized during the LegalizeVector phase, it is
scalarized because non-simple types are automatically marked to be expanded.
In this patch we add support for lowering SIGN_EXTEND_INREG manually.
This fixes CodeGen/X86/vec_sext.ll when running with the '-promote-elements'
flag.
llvm-svn: 135144
Original commit message:
Count references to interference cache entries.
Each InterferenceCache::Cursor instance references a cache entry. A
non-zero reference count guarantees that the entry won't be reused for a
new register.
This makes it possible to have multiple live cursors examining
interference for different physregs.
The total number of live cursors into a cache must be kept below
InterferenceCache::getMaxCursors().
Code generation should be unaffected by this change, and it doesn't seem
to affect the cache replacement strategy either.
llvm-svn: 135130
Each InterferenceCache::Cursor instance references a cache entry. A
non-zero reference count guarantees that the entry won't be reused for a
new register.
This makes it possible to have multiple live cursors examining
interference for different physregs.
The total number of live cursors into a cache must be kept below
InterferenceCache::getMaxCursors().
Code generation should be unaffected by this change, and it doesn't seem
to affect the cache replacement strategy either.
llvm-svn: 135121
The cache entry referenced by the best split candidate could become
clobbered by an unsuccessful candidate.
The correct fix here is to use reference counts on the cache entries.
Coming up.
llvm-svn: 135113
Some pysical registers create split solutions that would spill anywhere.
They should not even be considered in future multi-way global splits.
This does not affect code generation (yet).
llvm-svn: 135080
an assert on Darwin llvm-gcc builds.
Assertion failed: (castIsValid(op, S, Ty) && "Invalid cast!"), function Create, file /Users/buildslave/zorg/buildbot/smooshlab/slave-0.8/build.llvm-gcc-i386-darwin9-RA/llvm.src/lib/VMCore/Instructions.cpp, li\
ne 2067.
etc.
http://smooshlab.apple.com:8013/builders/llvm-gcc-i386-darwin9-RA/builds/2354
--- Reverse-merging r134893 into '.':
U include/llvm/Target/TargetData.h
U include/llvm/DerivedTypes.h
U tools/bugpoint/ExtractFunction.cpp
U unittests/Support/TypeBuilderTest.cpp
U lib/Target/ARM/ARMGlobalMerge.cpp
U lib/Target/TargetData.cpp
U lib/VMCore/Constants.cpp
U lib/VMCore/Type.cpp
U lib/VMCore/Core.cpp
U lib/Transforms/Utils/CodeExtractor.cpp
U lib/Transforms/Instrumentation/ProfilingUtils.cpp
U lib/Transforms/IPO/DeadArgumentElimination.cpp
U lib/CodeGen/SjLjEHPrepare.cpp
--- Reverse-merging r134888 into '.':
G include/llvm/DerivedTypes.h
U include/llvm/Support/TypeBuilder.h
U include/llvm/Intrinsics.h
U unittests/Analysis/ScalarEvolutionTest.cpp
U unittests/ExecutionEngine/JIT/JITTest.cpp
U unittests/ExecutionEngine/JIT/JITMemoryManagerTest.cpp
U unittests/VMCore/PassManagerTest.cpp
G unittests/Support/TypeBuilderTest.cpp
U lib/Target/MBlaze/MBlazeIntrinsicInfo.cpp
U lib/Target/Blackfin/BlackfinIntrinsicInfo.cpp
U lib/VMCore/IRBuilder.cpp
G lib/VMCore/Type.cpp
U lib/VMCore/Function.cpp
G lib/VMCore/Core.cpp
U lib/VMCore/Module.cpp
U lib/AsmParser/LLParser.cpp
U lib/Transforms/Utils/CloneFunction.cpp
G lib/Transforms/Utils/CodeExtractor.cpp
U lib/Transforms/Utils/InlineFunction.cpp
U lib/Transforms/Instrumentation/GCOVProfiling.cpp
U lib/Transforms/Scalar/ObjCARC.cpp
U lib/Transforms/Scalar/SimplifyLibCalls.cpp
U lib/Transforms/Scalar/MemCpyOptimizer.cpp
G lib/Transforms/IPO/DeadArgumentElimination.cpp
U lib/Transforms/IPO/ArgumentPromotion.cpp
U lib/Transforms/InstCombine/InstCombineCompares.cpp
U lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
U lib/Transforms/InstCombine/InstCombineCalls.cpp
U lib/CodeGen/DwarfEHPrepare.cpp
U lib/CodeGen/IntrinsicLowering.cpp
U lib/Bitcode/Reader/BitcodeReader.cpp
llvm-svn: 134949
and MCSubtargetInfo.
- Added methods to update subtarget features (used when targets automatically
detect subtarget features or switch modes).
- Teach X86Subtarget to update MCSubtargetInfo features bits since the
MCSubtargetInfo layer can be shared with other modules.
- These fixes .code 16 / .code 32 support since mode switch is updated in
MCSubtargetInfo so MC code emitter can do the right thing.
llvm-svn: 134884
patch brings numerous advantages to LLVM. One way to look at it
is through diffstat:
109 files changed, 3005 insertions(+), 5906 deletions(-)
Removing almost 3K lines of code is a good thing. Other advantages
include:
1. Value::getType() is a simple load that can be CSE'd, not a mutating
union-find operation.
2. Types a uniqued and never move once created, defining away PATypeHolder.
3. Structs can be "named" now, and their name is part of the identity that
uniques them. This means that the compiler doesn't merge them structurally
which makes the IR much less confusing.
4. Now that there is no way to get a cycle in a type graph without a named
struct type, "upreferences" go away.
5. Type refinement is completely gone, which should make LTO much MUCH faster
in some common cases with C++ code.
6. Types are now generally immutable, so we can use "Type *" instead
"const Type *" everywhere.
Downsides of this patch are that it removes some functions from the C API,
so people using those will have to upgrade to (not yet added) new API.
"LLVM 3.0" is the right time to do this.
There are still some cleanups pending after this, this patch is large enough
as-is.
llvm-svn: 134829
CPU, and feature string. Parsing some asm directives can change
subtarget state (e.g. .code 16) and it must be reflected in other
modules (e.g. MCCodeEmitter). That is, the MCSubtargetInfo instance
must be shared.
llvm-svn: 134795
Spills should be hoisted out of loops, but we don't want to hoist them
to dominating blocks at the same loop depth. That could cause the spills
to be executed more often.
llvm-svn: 134782
Try to move spills as early as possible in their basic block. This can
help eliminate interferences by shortening the live range being
spilled.
This fixes PR10221.
llvm-svn: 134776
RAGreedy::tryAssign will now evict interference from the preferred
register even when another register is free.
To support this, add the EvictionCost struct that counts how many hints
are broken by an eviction. We don't want to break one hint just to
satisfy another.
Rename canEvict to shouldEvict, and add the first bit of eviction policy
that doesn't depend on spill weights: Always make room in the preferred
register as long as the evictees can be split and aren't already
assigned to their preferred register.
Also make the CSR avoidance more accurate. When looking for a cheaper
register it is OK to use a new volatile register. Only CSR aliases that
have never been used before should be avoided.
llvm-svn: 134735
We have to do this in DAGBuilder instead of DAGCombiner, because the exact bit is lost after building.
struct foo { char x[24]; };
long bar(struct foo *a, struct foo *b) { return a-b; }
is now compiled into
movl 4(%esp), %eax
subl 8(%esp), %eax
sarl $3, %eax
imull $-1431655765, %eax, %eax
instead of
movl 4(%esp), %eax
subl 8(%esp), %eax
movl $715827883, %ecx
imull %ecx
movl %edx, %eax
shrl $31, %eax
sarl $2, %edx
addl %eax, %edx
movl %edx, %eax
llvm-svn: 134695
- Each target asm parser now creates its own MCSubtatgetInfo (if needed).
- Changed AssemblerPredicate to take subtarget features which tablegen uses
to generate asm matcher subtarget feature queries. e.g.
"ModeThumb,FeatureThumb2" is translated to
"(Bits & ModeThumb) != 0 && (Bits & FeatureThumb2) != 0".
llvm-svn: 134678
DBG_VALUE 3.310000e+02, 0, !"ds"; dbg:sse.stepfft.c:138:18 @[ sse.stepfft.c:32:10 ]
DBG_VALUE 3.310000e+02, 0, !"ds"; dbg:sse.stepfft.c:138:18 @[ sse.stepfft.c:31:10 ]
These two MIs represent identical value, 3.31..., for one variable, ds, but they are not identical because the represent two separate instances of inlined variable "ds".
llvm-svn: 134620
hasPredecessorHelper function allows predecessors to be cached to speed up
repeated invocations. This fixes PR10186.
X.isPredecessorOf(Y) now just calls Y.hasPredecessor(X)
Y.hasPredecessor(X) calls Y.hasPredecessorHelper(X, Visited, Worklist) with
empty Visited and Worklist sets (i.e. no caching over invocations).
Y.hasPredecessorHelper(X, Visited, Worklist) caches search state in Visited
and Worklist to speed up repeated calls. The Visited set is searched for X
before going to the worklist to further search the DAG if necessary.
llvm-svn: 134592
Unfortunately, the testcase I have is large and confidential, so I don't have a test to commit at the moment; I'll see if I can come up with something smaller where this issue reproduces.
<rdar://problem/9716278>
llvm-svn: 134565
This is impossible in theory, I can prove it. In practice, our near-zero
threshold can cause the network to oscillate between equally good
solutions.
<rdar://problem/9720596>
llvm-svn: 134428
Remat during spilling triggers dead code elimination. If a phi-def
becomes unused, that may also cause live ranges to split into separate
connected components.
This type of splitting is different from normal live range splitting. In
particular, there may not be a common original interval.
When the split range is its own original, make sure that the new
siblings are also their own originals. The range being split cannot be
used as an original since it doesn't cover the new siblings.
llvm-svn: 134413