The Query class now holds two iterators instead of an InterferenceResult
instance. The iterators are used as bookmarks for repeated
collectInterferingVRegs calls.
llvm-svn: 137380
The InterferenceResult iterator turned out to be less important than we
thought it would be. LiveIntervalUnion clients want higher level
information, like the list of interfering virtual registers.
llvm-svn: 137346
Coalescing can remove copy-like instructions with sub-register operands
that constrained the register class. Examples are:
x86: GR32_ABCD:sub_8bit_hi -> GR32
arm: DPR_VFP2:ssub0 -> DPR
Recompute the register class of any virtual registers that are used by
less instructions after coalescing.
This affects code generation for the Cortex-A8 where we use NEON
instructions for f32 operations, c.f. fp_convert.ll:
vadd.f32 d16, d1, d0
vcvt.s32.f32 d0, d16
The register allocator is now free to use d16 for the temporary, and
that comes first in the allocation order because it doesn't interfere
with any s-registers.
llvm-svn: 137133
This function doesn't have anything to do with spill weights, and MRI
already has functions for manipulating the register class of a virtual
register.
llvm-svn: 137123
The local ranges created get to stay in the RS_New stage, just like for
local and region splitting.
This gives tryLocalSplit a bit more freedom the first time it sees one
of these new local ranges.
llvm-svn: 137001
Normally, we don't create a live range for a single instruction in a
basic block, the spiller does that anyway. However, when splitting a
live range that belongs to a proper register sub-class, inserting these
extra COPY instructions completely remove the constraints from the
remainder interval, and it may be allocated from the larger super-class.
The spiller will mop up these small live ranges if we end up spilling
anyway. It calls them snippets.
llvm-svn: 136989
Some instructions require restricted register classes, but most of the
time that doesn't affect register allocation. For example, some
instructions don't work with the stack pointer, but that is a reserved
register anyway.
Sometimes it matters, GR32_ABCD only has 4 allocatable registers. For
such a proper sub-class, the register allocator should try to enable
register class inflation since that makes more registers available for
allocation.
Make sure only legal super-classes are considered. For example, tGPR is
not a proper sub-class in Thumb mode, but in ARM mode it is.
llvm-svn: 136981
The old code would look at kills and defs in one pass over the
instruction operands, causing problems with this code:
%R0<def>, %CPSR<def,dead> = tLSLri %R5<kill>, 2, pred:14, pred:%noreg
%R0<def>, %CPSR<def,dead> = tADDrr %R4<kill>, %R0<kill>, pred:14, %pred:%noreg
The last instruction kills and redefines %R0, so it is still live after
the instruction.
This caused a register scavenger crash when compiling 483.xalancbmk for
armv6. I am not including a test case because it requires too much bad
luck to expose this old bug.
First you need to convince the register allocator to use %R0 twice on
the tADDrr instruction, then you have to convince BranchFolding to do
something that causes it to run the register scavenger on he bad block.
<rdar://problem/9898200>
llvm-svn: 136973
inlined variable, based on the discussion in PR10542.
This explodes the runtime of several passes down the pipeline due to
a large number of "copies" remaining live across a large function. This
only shows up with both debug and opt, but when it does it creates
a many-minute compile when self-hosting LLVM+Clang. There are several
other cases that show these types of regressions.
All of this is tracked in PR10542, and progress is being made on fixing
the issue. Once its addressed, the re-instated, but until then this
restores the performance for self-hosting and other opt+debug builds.
Devang, let me know if this causes any trouble, or impedes fixing it in
any way, and thanks for working on this!
llvm-svn: 136953
It is possible to have multiple DBG_VALUEs for the same variable:
32L TEST32rr %vreg0<kill>, %vreg0, %EFLAGS<imp-def>; GR32:%vreg0
DBG_VALUE 2, 0, !"i"
DBG_VALUE %noreg, %0, !"i"
When that happens, keep the last one instead of the first.
llvm-svn: 136842
This helps generate better code in functions with high register
pressure.
The previous version of compact region splitting caused regressions
because the regions were a bit too large. A stronger negative bias
applied in r136832 fixed this problem.
llvm-svn: 136836
Apply twice the negative bias on transparent blocks when computing the
compact regions. This excludes loop backedges from the region when only
one of the loop blocks uses the register.
Previously, we would include the backedge in the region if the loop
preheader and the loop latch both used the register, but the loop header
didn't.
When both the header and latch blocks use the register, we still keep it
live on the backedge.
llvm-svn: 136832
This is either an invalid SlotIndex, or valno->def for the first value
defined inside the block. PHI values are not counted as defined inside
the block.
The FirstDef field will be used when estimating the cost of spilling
around a block.
llvm-svn: 136736
The PrefBoth constraint is used for blocks that ideally want a live-in
value both on the stack and in a register. This would be used by a block
that has a use before interference forces a spill.
Secondly, add the ChangesValue flag to BlockConstraint. This tells
SpillPlacement if a live-in value on the stack can be reused as a
live-out stack value for free. If the block redefines the virtual
register, a spill would be required for that.
This extra information will be used by SpillPlacement to more accurately
calculate spill costs when a value can exist both on the stack and in a
register.
The simplest example is a basic block that reads the virtual register,
but doesn't change its value. Spilling around such a block requires a
reload, but no spill in the block.
The spiller already knows this, but the spill placer doesn't. That can
sometimes lead to suboptimal regions.
llvm-svn: 136731
This adds the 'resume' instruction class, IR parsing, and bitcode reading and
writing. The 'resume' instruction resumes propagation of an existing (in-flight)
exception whose unwinding was interrupted with a 'landingpad' instruction (to be
added later).
llvm-svn: 136589
This includes registers like EFLAGS and ST0-ST7. We don't check for
liveness issues in the verifier and scavenger because registers will
never be allocated from these classes.
While in SSA form, we do care about the liveness of unallocatable
unreserved registers. Liveness of EFLAGS and ST0 neds to be correct for
MachineDCE and MachineSinking.
llvm-svn: 136541
This flag is true from isel to register allocation when the machine
function is required to be in SSA form. The TwoAddressInstructionPass
and PHIElimination passes clear the flag.
The SSA flag wil be used by the machine code verifier to check for SSA
form, and eventually an assertion can enforce it in +Asserts builds.
This will catch the common target error of creating machine code with
multiple defs of a virtual register.
llvm-svn: 136532
working on x86 (at least for trivial testcases); other architectures will
need more work so that they actually emit the appropriate instructions for
orderings stricter than 'monotonic'. (As far as I can tell, the ARM, PPC,
Mips, and Alpha backends need such changes.)
llvm-svn: 136457
specified in the same file that the library itself is created. This is
more idiomatic for CMake builds, and also allows us to correctly specify
dependencies that are missed due to bugs in the GenLibDeps perl script,
or change from compiler to compiler. On Linux, this returns CMake to
a place where it can relably rebuild several targets of LLVM.
I have tried not to change the dependencies from the ones in the current
auto-generated file. The only places I've really diverged are in places
where I was seeing link failures, and added a dependency. The goal of
this patch is not to start changing the dependencies, merely to move
them into the correct location, and an explicit form that we can control
and change when necessary.
This also removes a serialization point in the build because we don't
have to scan all the libraries before we begin building various tools.
We no longer have a step of the build that regenerates a file inside the
source tree. A few other associated cleanups fall out of this.
This isn't really finished yet though. After talking to dgregor he urged
switching to a single CMake macro to construct libraries with both
sources and dependencies in the arguments. Migrating from the two macros
to that style will be a follow-up patch.
Also, llvm-config is still generated with GenLibDeps.pl, which means it
still has slightly buggy dependencies. The internal CMake
'llvm-config-like' macro uses the correct explicitly specified
dependencies however. A future patch will switch llvm-config generation
(when using CMake) to be based on these deps as well.
This may well break Windows. I'm getting a machine set up now to dig
into any failures there. If anyone can chime in with problems they see
or ideas of how to solve them for Windows, much appreciated.
llvm-svn: 136433
This generates the correct SDNodes for the landingpad instruction. It makes an
assumption that the result of the landingpad instruction has at least two
values. And that the first value is a pointer to the exception object and the
second value is the "selector."
llvm-svn: 136430
'atomicrmw' instructions, which allow representing all the current atomic
rmw intrinsics.
The allowed operands for these instructions are heavily restricted at the
moment; we can probably loosen it a bit, but supporting general
first-class types (where it makes sense) might get a bit complicated,
given how SelectionDAG works.
As an initial cut, these operations do not support specifying an alignment,
but it would be possible to add if we think it's useful. Specifying an
alignment lower than the natural alignment would be essentially
impossible to support on anything other than x86, but specifying a greater
alignment would be possible. I can't think of any useful optimizations which
would use that information, but maybe someone else has ideas.
Optimizer/codegen support coming soon.
llvm-svn: 136404
Code like that would only be produced by bugpoint, but we should still
handle it correctly.
When a register is defined by a REG_SEQUENCE of undefs, the register
itself is undef. Previously, we would create a register with uses but no
defs.
Fixes part of PR10520.
llvm-svn: 136401
There are two conflicting strategies in play:
- Under high register pressure, we want to assign large live ranges
first. Smaller live ranges are easier to place afterwards.
- Live range splitting is guided by interference, so splitting should be
deferred until interference is as realistic as possible.
With the recent changes to the live range stages, and with compact
regions enabled, it is less traumatic to split a live range too early.
If some of the split products were too big, they can often be split
again.
By reversing the RS_Split order, we get this queue order:
1. Normal live ranges, large to small.
2. RS_Split live ranges, large to small.
The large-to-small order improves RAGreedy's puzzle solving skills under
high register pressure. It may cause a bit more iterated splitting, but
we handle that better now.
With this change, -compact-regions is mostly an improvement on SPEC.
llvm-svn: 136388
When splitting global live ranges, it is now possible to split for
multiple destination intervals at once. Previously, we only had the main
and stack intervals.
Each edge bundle is assigned to a split candidate, and splitAroundRegion
will insert copies between the candidate intervals and the stack
interval as needed.
The multi-way splitting is used to split around compact regions when
enabled with -compact-regions. The best candidate register still gets
all the bundles it wants, but everything outside the main interval is
first split around compact regions before we create single-block
intervals.
Compact region splitting still causes some regressions, so it is not
enabled by default.
llvm-svn: 136186
These copies would coalesce easily, but the resulting value would be
defined by a deleted instruction. Now we also remove the undefined value
number from the destination register.
This fixes PR10503.
llvm-svn: 136174
When dead code elimination deletes a PHI value, the virtual register may
split into multiple connected components. In that case, revert each
component to the RS_Assign stage.
The new components are guaranteed to be smaller (the original value
numbers are distributed among the components), so this will always be
making progress. The components are now allowed to evict other live
ranges or be split again.
llvm-svn: 136034
This mechanism already exists, but the RS_Split2 stage makes it clearer.
When live range splitting creates ranges that may not be making
progress, they are marked RS_Split2 instead of RS_New. These ranges may
be split again, but only in a way that can be proven to make progress.
For local ranges, that means they must be split into ranges used by
strictly fewer instructions.
For global ranges, region splitting is bypassed and the RS_Split2
ranges go straight to per-block splitting.
llvm-svn: 135912
The stage is used to control where a live range is going, not where it
is coming from. Live ranges created by splitting will usually be marked
RS_New, but some are marked RS_Spill to avoid wasting time trying to
split them again.
The old RS_Global and RS_Local stages are merged - they are really the
same thing for local and global live ranges.
llvm-svn: 135911
This fixes PR10463. A two-address instruction with an <undef> use
operand was incorrectly rewritten so the def and use no longer used the
same register, violating the tie constraint.
Fix this by always rewriting <undef> operands with the register a def
operand would use.
llvm-svn: 135885
This method computes the edge bundles that should be live when splitting
around a compact region. This is independent of interference.
The function returns false if the live range was already a compact
region, or the compact region doesn't have any live bundles - it would
be the same as splitting around basic blocks.
Compact regions are computed using the normal spill placement code. We
pretend there is interference in all live-through blocks that don't use
the live range. This removes all edges from the Hopfield network used
for spill placement, so it converges instantly.
llvm-svn: 135847
If there is no interference and no last split point, we cannot
enterIntvBefore(Stop) - that function needs a real instruction.
Use enterIntvAtEnd instead for that very easy case.
This code doesn't currently run, it is needed by multi-way splitting.
llvm-svn: 135846
A split candidate can have a null PhysReg which means that it doesn't
map to a real interference pattern. Instead, pretend that all through
blocks have interference.
This makes it possible to generate compact regions where the live range
doesn't go through blocks that don't use it. The live range will still
be live between directly connected blocks with uses.
Splitting around a compact region tends to produce a live range with a
high spill weight, so it may evict a less dense live range.
llvm-svn: 135845
This method matches addLinks - All the listed blocks are considered to
have interference, so they add a negative bias to their bundles.
This could also be done by addConstraints, but that requires building a
separate BlockConstraint array.
llvm-svn: 135844
- Introduce JITDefault code model. This tells targets to set different default
code model for JIT. This eliminates the ugly hack in TargetMachine where
code model is changed after construction.
llvm-svn: 135580
(including compilation, assembly). Move relocation model Reloc::Model from
TargetMachine to MCCodeGenInfo so it's accessible even without TargetMachine.
llvm-svn: 135468
to MCRegisterInfo. Also initialize the mapping at construction time.
This patch eliminate TargetRegisterInfo from TargetAsmInfo. It's another step
towards fixing the layering violation.
llvm-svn: 135424
When splitting a live range immediately before an LDR_POST instruction
that redefines the address register, make sure to use the correct value
number in leaveIntvBefore.
We need the value number entering the instruction.
<rdar://problem/9793765>
llvm-svn: 135413
When trying to rematerialize a value before an instruction that has an
early-clobber redefine of the virtual register, make sure to look up the
correct value number.
Early-clobber defs are moved one slot back, so getBaseIndex is needed to
find the used value number.
Bugpoint was unable to reduce the test case for this, see PR10388.
llvm-svn: 135378
This gets rid of some of the gory splitting details in RAGreedy and
makes them available to future SplitKit clients.
Slightly generalize the functionality to support multi-way splitting.
Specifically, SplitEditor::splitLiveThroughBlock() supports switching
between different register intervals in a block.
llvm-svn: 135307
when determining validity of matching constraint. Allow i1
types access to the GR8 reg class for x86.
Fixes PR10352 and rdar://9777108
llvm-svn: 135180
During type legalization we often use the SIGN_EXTEND_INREG SDNode.
When this SDNode is legalized during the LegalizeVector phase, it is
scalarized because non-simple types are automatically marked to be expanded.
In this patch we add support for lowering SIGN_EXTEND_INREG manually.
This fixes CodeGen/X86/vec_sext.ll when running with the '-promote-elements'
flag.
llvm-svn: 135144
Original commit message:
Count references to interference cache entries.
Each InterferenceCache::Cursor instance references a cache entry. A
non-zero reference count guarantees that the entry won't be reused for a
new register.
This makes it possible to have multiple live cursors examining
interference for different physregs.
The total number of live cursors into a cache must be kept below
InterferenceCache::getMaxCursors().
Code generation should be unaffected by this change, and it doesn't seem
to affect the cache replacement strategy either.
llvm-svn: 135130
Each InterferenceCache::Cursor instance references a cache entry. A
non-zero reference count guarantees that the entry won't be reused for a
new register.
This makes it possible to have multiple live cursors examining
interference for different physregs.
The total number of live cursors into a cache must be kept below
InterferenceCache::getMaxCursors().
Code generation should be unaffected by this change, and it doesn't seem
to affect the cache replacement strategy either.
llvm-svn: 135121
The cache entry referenced by the best split candidate could become
clobbered by an unsuccessful candidate.
The correct fix here is to use reference counts on the cache entries.
Coming up.
llvm-svn: 135113
Some pysical registers create split solutions that would spill anywhere.
They should not even be considered in future multi-way global splits.
This does not affect code generation (yet).
llvm-svn: 135080
an assert on Darwin llvm-gcc builds.
Assertion failed: (castIsValid(op, S, Ty) && "Invalid cast!"), function Create, file /Users/buildslave/zorg/buildbot/smooshlab/slave-0.8/build.llvm-gcc-i386-darwin9-RA/llvm.src/lib/VMCore/Instructions.cpp, li\
ne 2067.
etc.
http://smooshlab.apple.com:8013/builders/llvm-gcc-i386-darwin9-RA/builds/2354
--- Reverse-merging r134893 into '.':
U include/llvm/Target/TargetData.h
U include/llvm/DerivedTypes.h
U tools/bugpoint/ExtractFunction.cpp
U unittests/Support/TypeBuilderTest.cpp
U lib/Target/ARM/ARMGlobalMerge.cpp
U lib/Target/TargetData.cpp
U lib/VMCore/Constants.cpp
U lib/VMCore/Type.cpp
U lib/VMCore/Core.cpp
U lib/Transforms/Utils/CodeExtractor.cpp
U lib/Transforms/Instrumentation/ProfilingUtils.cpp
U lib/Transforms/IPO/DeadArgumentElimination.cpp
U lib/CodeGen/SjLjEHPrepare.cpp
--- Reverse-merging r134888 into '.':
G include/llvm/DerivedTypes.h
U include/llvm/Support/TypeBuilder.h
U include/llvm/Intrinsics.h
U unittests/Analysis/ScalarEvolutionTest.cpp
U unittests/ExecutionEngine/JIT/JITTest.cpp
U unittests/ExecutionEngine/JIT/JITMemoryManagerTest.cpp
U unittests/VMCore/PassManagerTest.cpp
G unittests/Support/TypeBuilderTest.cpp
U lib/Target/MBlaze/MBlazeIntrinsicInfo.cpp
U lib/Target/Blackfin/BlackfinIntrinsicInfo.cpp
U lib/VMCore/IRBuilder.cpp
G lib/VMCore/Type.cpp
U lib/VMCore/Function.cpp
G lib/VMCore/Core.cpp
U lib/VMCore/Module.cpp
U lib/AsmParser/LLParser.cpp
U lib/Transforms/Utils/CloneFunction.cpp
G lib/Transforms/Utils/CodeExtractor.cpp
U lib/Transforms/Utils/InlineFunction.cpp
U lib/Transforms/Instrumentation/GCOVProfiling.cpp
U lib/Transforms/Scalar/ObjCARC.cpp
U lib/Transforms/Scalar/SimplifyLibCalls.cpp
U lib/Transforms/Scalar/MemCpyOptimizer.cpp
G lib/Transforms/IPO/DeadArgumentElimination.cpp
U lib/Transforms/IPO/ArgumentPromotion.cpp
U lib/Transforms/InstCombine/InstCombineCompares.cpp
U lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
U lib/Transforms/InstCombine/InstCombineCalls.cpp
U lib/CodeGen/DwarfEHPrepare.cpp
U lib/CodeGen/IntrinsicLowering.cpp
U lib/Bitcode/Reader/BitcodeReader.cpp
llvm-svn: 134949
and MCSubtargetInfo.
- Added methods to update subtarget features (used when targets automatically
detect subtarget features or switch modes).
- Teach X86Subtarget to update MCSubtargetInfo features bits since the
MCSubtargetInfo layer can be shared with other modules.
- These fixes .code 16 / .code 32 support since mode switch is updated in
MCSubtargetInfo so MC code emitter can do the right thing.
llvm-svn: 134884
patch brings numerous advantages to LLVM. One way to look at it
is through diffstat:
109 files changed, 3005 insertions(+), 5906 deletions(-)
Removing almost 3K lines of code is a good thing. Other advantages
include:
1. Value::getType() is a simple load that can be CSE'd, not a mutating
union-find operation.
2. Types a uniqued and never move once created, defining away PATypeHolder.
3. Structs can be "named" now, and their name is part of the identity that
uniques them. This means that the compiler doesn't merge them structurally
which makes the IR much less confusing.
4. Now that there is no way to get a cycle in a type graph without a named
struct type, "upreferences" go away.
5. Type refinement is completely gone, which should make LTO much MUCH faster
in some common cases with C++ code.
6. Types are now generally immutable, so we can use "Type *" instead
"const Type *" everywhere.
Downsides of this patch are that it removes some functions from the C API,
so people using those will have to upgrade to (not yet added) new API.
"LLVM 3.0" is the right time to do this.
There are still some cleanups pending after this, this patch is large enough
as-is.
llvm-svn: 134829
CPU, and feature string. Parsing some asm directives can change
subtarget state (e.g. .code 16) and it must be reflected in other
modules (e.g. MCCodeEmitter). That is, the MCSubtargetInfo instance
must be shared.
llvm-svn: 134795
Spills should be hoisted out of loops, but we don't want to hoist them
to dominating blocks at the same loop depth. That could cause the spills
to be executed more often.
llvm-svn: 134782
Try to move spills as early as possible in their basic block. This can
help eliminate interferences by shortening the live range being
spilled.
This fixes PR10221.
llvm-svn: 134776
RAGreedy::tryAssign will now evict interference from the preferred
register even when another register is free.
To support this, add the EvictionCost struct that counts how many hints
are broken by an eviction. We don't want to break one hint just to
satisfy another.
Rename canEvict to shouldEvict, and add the first bit of eviction policy
that doesn't depend on spill weights: Always make room in the preferred
register as long as the evictees can be split and aren't already
assigned to their preferred register.
Also make the CSR avoidance more accurate. When looking for a cheaper
register it is OK to use a new volatile register. Only CSR aliases that
have never been used before should be avoided.
llvm-svn: 134735
We have to do this in DAGBuilder instead of DAGCombiner, because the exact bit is lost after building.
struct foo { char x[24]; };
long bar(struct foo *a, struct foo *b) { return a-b; }
is now compiled into
movl 4(%esp), %eax
subl 8(%esp), %eax
sarl $3, %eax
imull $-1431655765, %eax, %eax
instead of
movl 4(%esp), %eax
subl 8(%esp), %eax
movl $715827883, %ecx
imull %ecx
movl %edx, %eax
shrl $31, %eax
sarl $2, %edx
addl %eax, %edx
movl %edx, %eax
llvm-svn: 134695
- Each target asm parser now creates its own MCSubtatgetInfo (if needed).
- Changed AssemblerPredicate to take subtarget features which tablegen uses
to generate asm matcher subtarget feature queries. e.g.
"ModeThumb,FeatureThumb2" is translated to
"(Bits & ModeThumb) != 0 && (Bits & FeatureThumb2) != 0".
llvm-svn: 134678
DBG_VALUE 3.310000e+02, 0, !"ds"; dbg:sse.stepfft.c:138:18 @[ sse.stepfft.c:32:10 ]
DBG_VALUE 3.310000e+02, 0, !"ds"; dbg:sse.stepfft.c:138:18 @[ sse.stepfft.c:31:10 ]
These two MIs represent identical value, 3.31..., for one variable, ds, but they are not identical because the represent two separate instances of inlined variable "ds".
llvm-svn: 134620
hasPredecessorHelper function allows predecessors to be cached to speed up
repeated invocations. This fixes PR10186.
X.isPredecessorOf(Y) now just calls Y.hasPredecessor(X)
Y.hasPredecessor(X) calls Y.hasPredecessorHelper(X, Visited, Worklist) with
empty Visited and Worklist sets (i.e. no caching over invocations).
Y.hasPredecessorHelper(X, Visited, Worklist) caches search state in Visited
and Worklist to speed up repeated calls. The Visited set is searched for X
before going to the worklist to further search the DAG if necessary.
llvm-svn: 134592
Unfortunately, the testcase I have is large and confidential, so I don't have a test to commit at the moment; I'll see if I can come up with something smaller where this issue reproduces.
<rdar://problem/9716278>
llvm-svn: 134565
This is impossible in theory, I can prove it. In practice, our near-zero
threshold can cause the network to oscillate between equally good
solutions.
<rdar://problem/9720596>
llvm-svn: 134428
Remat during spilling triggers dead code elimination. If a phi-def
becomes unused, that may also cause live ranges to split into separate
connected components.
This type of splitting is different from normal live range splitting. In
particular, there may not be a common original interval.
When the split range is its own original, make sure that the new
siblings are also their own originals. The range being split cannot be
used as an original since it doesn't cover the new siblings.
llvm-svn: 134413
This fixes the issue noted in PR10251 where early tail dup of bbs with
indirectbr would cause a bb to be duplicated into a loop preheader
and then into its predecessors, creating phi nodes with identical
operands just before register allocation.
This helps with jsinterp.o size (__TEXT goes from 163568 to 126656)
and a bit with performance 1.005x faster on sunspider (jits still enabled).
The result on webkit with the jit disabled is more significant: 1.021x faster.
llvm-svn: 134372
A split point inserted in a block with a landing pad successor may be
hoisted above the call to ensure that it dominates all successors. The
code that handles the rest of the basic block must take this into
account.
I am not including a test case, it would be very fragile. PR10244 comes
from building clang with exceptions enabled.
llvm-svn: 134369
Add a MI->emitError() method that the backend can use to report errors
related to inline assembly. Call it from X86FloatingPoint.cpp when the
constraints are wrong.
This enables proper clang diagnostics from the backend:
$ clang -c pr30848.c
pr30848.c:5:12: error: Inline asm output regs must be last on the x87 stack
__asm__ ("" : "=u" (d)); /* { dg-error "output regs" } */
^
1 error generated.
llvm-svn: 134307
Every live range is assigned a cascade number the first time it is
involved in an eviction. As the evictor, it gets a new cascade number.
Every evictee is assigned the same cascade number as the evictor.
Eviction is prohibited if the evictor has a lower assigned cascade
number than the evictee.
This means that assigned cascade numbers are monotonically increasing
with every eviction, yet they are bounded by NextCascade which can only
be incremented by new live ranges. Thus, infinite loops cannot happen,
but eviction cascades can still be triggered by new live ranges as we
want.
Thanks to Andy for explaining this to me.
llvm-svn: 134303
copy is a kill") to see if it fixes the i386 dragonegg buildbot, which is timing out
because gcc built with dragonegg is going into an infinite loop.
llvm-svn: 134237
The constraints are represented by the register class of the original
virtual register created for the inline asm. If the register class were
included in the operand descriptor, we might be able to do this.
For now, just give up on regclass inflation when inline asm is involved.
No test case, this bug hasn't happened yet.
llvm-svn: 134226
This patch will sometimes choose live range split points next to
interference instead of always splitting next to a register point. That
means spill code can now appear almost anywhere, and it was necessary
to fix code that didn't expect that.
The difficult places were:
- Between a CALL returning a value on the x87 stack and the
corresponding FpPOP_RETVAL (was FpGET_ST0). Probably also near x87
inline assembly, but that didn't actually show up in testing.
- Between a CALL popping arguments off the stack and the corresponding
ADJCALLSTACKUP.
Both are fixed now. The only place spill code can't appear is after
terminators, see SplitAnalysis::getLastSplitPoint.
Original commit message:
Rewrite RAGreedy::splitAroundRegion, now with cool ASCII art.
This function has to deal with a lot of special cases, and the old
version got it wrong sometimes. In particular, it would sometimes leave
multiple uses in the stack interval in a single block. That causes bad
code with multiple reloads in the same basic block.
The new version handles block entry and exit in a single pass. It first
eliminates all the easy cases, and then goes on to create a local
interval for the blocks with difficult interference. Previously, we
would only create the local interval for completely isolated blocks.
It can happen that the stack interval becomes completely empty because
we could allocate a register in all edge bundles, and the new local
intervals deal with the interference. The empty stack interval is
harmless, but we need to remove a SplitKit assertion that checks for
empty intervals.
llvm-svn: 134125
This function has to deal with a lot of special cases, and the old
version got it wrong sometimes. In particular, it would sometimes leave
multiple uses in the stack interval in a single block. That causes bad
code with multiple reloads in the same basic block.
The new version handles block entry and exit in a single pass. It first
eliminates all the easy cases, and then goes on to create a local
interval for the blocks with difficult interference. Previously, we
would only create the local interval for completely isolated blocks.
It can happen that the stack interval becomes completely empty because
we could allocate a register in all edge bundles, and the new local
intervals deal with the interference. The empty stack interval is
harmless, but we need to remove a SplitKit assertion that checks for
empty intervals.
llvm-svn: 134047
sink them into MC layer.
- Added MCInstrInfo, which captures the tablegen generated static data. Chang
TargetInstrInfo so it's based off MCInstrInfo.
llvm-svn: 134021
Removed the check that peeks past EXTRA_SUBREG, which I don't think
makes sense any more. Intead treat it as a normal register def. No
significant affect on x86 or ARM benchmarks.
llvm-svn: 133917
Both become <earlyclobber> defs on the INLINEASM MachineInstr, but we
now use two different asm operand kinds.
The new Kind_Clobber is treated identically to the old
Kind_RegDefEarlyClobber for now, but x87 floating point stack inline
assembly does care about the difference.
This will pop a register off the stack:
asm("fstp %st" : : "t"(x) : "st");
While this will pop the input and push an output:
asm("fst %st" : "=&t"(r) : "t"(x));
We need to know if ST0 was a clobber or an output operand, and we can't
depend on <dead> flags for that.
llvm-svn: 133902
The INLINEASM MachineInstrs have an immediate operand describing each
original inline asm operand. Decode the bits in MachineInstr::print() so
it is easier to read:
INLINEASM <es:rorq $1,$0>, $0:[regdef], %vreg0<def>, %vreg1<def>, $1:[imm], 1, $2:[reguse] [tiedto:$0], %vreg2, %vreg3, $3:[regdef-ec], %EFLAGS<earlyclobber,imp-def>
llvm-svn: 133901
target machine from those that are only needed by codegen. The goal is to
sink the essential target description into MC layer so we can start building
MC based tools without needing to link in the entire codegen.
First step is to refactor TargetRegisterInfo. This patch added a base class
MCRegisterInfo which TargetRegisterInfo is derived from. Changed TableGen to
separate register description from the rest of the stuff.
llvm-svn: 133782
register allocation if it has a indirectbr or if we can duplicate it to
every predecessor.
This fixes the SingleSource/Benchmarks/Shootout-C++/matrix.cpp regression but
keeps the previous improvements to sunspider.
llvm-svn: 133682
If the linker supports it, this will hold the CIE and FDE information in a
compact format. The implementation of the compact unwinding emission is coming
soon.
llvm-svn: 133658
be one with only one unconditional branch and no phis. Duplicating the phis in this case
is possible, but requeres liveness analysis or breaking edges.
llvm-svn: 133607
1. (((x) & 0xFF00) >> 8) | (((x) & 0x00FF) << 8)
=> (bswap x) >> 16
2. ((x&0xff)<<8)|((x&0xff00)>>8)|((x&0xff000000)>>8)|((x&0x00ff0000)<<8))
=> (rotl (bswap x) 16)
This allows us to eliminate most of the def : Pat patterns for ARM rev16
revsh instructions. It catches many more cases for ARM and x86.
rdar://9609108
llvm-svn: 133503
* Don't introduce a duplicated bb in the CFG
* When making a branch unconditional, clear the PredCond array so that it
is really unconditional.
llvm-svn: 133432
dragonegg buildbots back to life. Original commit message:
Teach early dup how to duplicate basic blocks with one successor and only phi instructions
into more complex blocks.
llvm-svn: 133430
all over the place in different styles and variants. Standardize on two
preferred entrypoints: one that takes a StructType and ArrayRef, and one that
takes StructType and varargs.
In cases where there isn't a struct type convenient, we now add a
ConstantStruct::getAnon method (whose name will make more sense after a few
more patches land).
It would be "really really nice" if the ConstantStruct::get and
ConstantVector::get methods didn't make temporary std::vectors.
llvm-svn: 133412