This allows me to begin enabling (or backing out) misched by default
for one subtarget at a time. To run misched we typically want to:
- Disable SelectionDAG scheduling (use the source order scheduler)
- Enable more aggressive coalescing (until we decide to always run the coalescer this way)
- Enable MachineScheduler pass itself.
Disabling PostRA sched may follow for some subtargets.
llvm-svn: 167826
This adds the -join-globalcopies option which can be enabled by
default once misched is also enabled.
Ideally, the register coalescer would be able to split local live
ranges in a way that produces copies that can be easily resolved by
the scheduler. Until then, this heuristic should be good enough to at
least allow the scheduler to run after coalescing.
llvm-svn: 167825
This teaches the register coalescer to be less prone to split critical
edges. I am currently benchmarking this with the new (post-coalescer)
scheduler. I plan to enable this by default and remove the option as
soon as misched is enabled.
llvm-svn: 167758
Partial copies can show up even when CoalescerPair.isPartial() returns
false. For example:
%vreg24:dsub_0<def> = COPY %vreg31:dsub_0; QPR:%vreg24,%vreg31
Such a partial-partial copy is not good enough for the transformation
adjustCopiesBackFrom() needs to do.
llvm-svn: 166944
The new coalescer can merge a dead def into an unused lane of an
otherwise live vector register.
Clear the <dead> flag when that happens since the flag refers to the
full virtual register which is still live after the partial dead def.
This fixes PR14079.
llvm-svn: 165877
PHIElimination inserts IMPLICIT_DEF instructions to guarantee that all
PHI predecessors have a live-out value. These IMPLICIT_DEF values are
not considered to be real interference when coalescing virtual
registers:
%vreg1 = IMPLICIT_DEF
%vreg2 = MOV32r0
When joining %vreg1 and %vreg2, the IMPLICIT_DEF instruction and its
value number should simply be erased since the %vreg2 value number now
provides a live-out value for the PHI predecesor block.
llvm-svn: 165813
JoinVals::pruneValues() calls LIS->pruneValue() to avoid conflicts when
overlapping two different values. This produces a set of live range end
points that are used to reconstruct the live range (with SSA update)
after joining the two registers.
When a value is pruned twice, the set of end points was insufficient:
v1 = DEF
v1 = REPLACE1
v1 = REPLACE2
KILL v1
The end point at KILL would only reconstruct the live range from
REPLACE2 to KILL, leaving the range REPLACE1-REPLACE2 dead.
Add REPLACE2 as an end point in this case so the full live range is
reconstructed.
This fixes PR13999.
llvm-svn: 165056
The new coalescer can turn a full virtual register definition into a
partial redef by merging another value into an unused vector lane.
Make sure to clear the <read-undef> flag on such defs.
llvm-svn: 164807
A PHI can't create interference on its own. If two live ranges interfere
at a PHI, they must also interfere when leaving one of the PHI
predecessors.
llvm-svn: 164330
The old-fashioned many-to-one value mapping doesn't always work when
merging vector lanes. A value can map to multiple different values, and
it can even be necessary to insert new PHIs.
When a value number is defined by a copy from a value number that
required SSa update, include the live range of the copied value number
in the SSA update as well. It is not necessarily a copy of the original
value number any longer.
llvm-svn: 164329
A common coalescing conflict in vector code is lane insertion:
%dst = FOO
%src = BAR
%dst:ssub0 = COPY %src
The live range of %src interferes with the ssub0 lane of %dst, but that
lane is never read after %src would have clobbered it. That makes it
safe to merge the live ranges and eliminate the COPY:
%dst = FOO
%dst:ssub0 = BAR
This patch teaches the new coalescer to resolve conflicts where dead
vector lanes would be clobbered, at least as long as the clobbered
vector lanes don't escape the basic block.
llvm-svn: 164250
Add LIS::pruneValue() and extendToIndices(). These two functions are
used by the register coalescer when merging two live ranges requires
more than a trivial value mapping as supported by LiveInterval::join().
The pruneValue() function can remove the part of a value number that is
going to conflict in join(). Afterwards, extendToIndices can restore the
live range, using any new dominating value numbers and updating the SSA
form.
Use this complex value mapping to support merging a register into a
vector lane that has a conflicting value, but the clobbered lane is
undef.
llvm-svn: 164074
The live range of an SSA value forms a sub-tree of the dominator tree.
That means the live ranges of two values overlap if and only if the def
of one value lies within the live range of the other.
This can be used to simplify the interference checking a bit: Visit each
def in the two registers about to be joined. Check for interference
against the value that is live in the other register at the def point
only. It is not necessary to scan the set of overlapping live ranges,
this interference check can be done while computing the value mapping
required for the final live range join.
The new algorithm is prepared to handle more complicated conflict
resolution - We can allow overlapping live ranges with different values
as long as the differing lanes are undef or unused in the other
register.
The implementation in this patch doesn't do that yet, it creates code
that is nearly identical to the old algorithm's, except:
- The new stripCopies() function sees through multiple copies while
the old RegistersDefinedFromSameValue() only can handle one.
- There are a few rare cases where the new algorithm can erase an
IMPLICIT_DEF instuction that RegistersDefinedFromSameValue() couldn't
handle.
llvm-svn: 163991
Kill flags are removed more and more aggressively during the register
allocation passes, it is better to get information from LiveIntervals.
llvm-svn: 163972
Based on CR feedback from r162301 and Craig Topper's refactoring in r162347
here are a few other places that could use the same API (& in one instance drop
a Function.h dependency).
llvm-svn: 162367
The only real user of the flag was removeCopyByCommutingDef(), and it
has been switched to LiveIntervals::hasPHIKill().
All the code changed by this patch was only concerned with computing and
propagating the flag.
llvm-svn: 161255
The VNInfo::HAS_PHI_KILL is only half supported. We precompute it in
LiveIntervalAnalysis, but it isn't properly updated by live range
splitting and functions like shrinkToUses().
It is only used in one place: RegisterCoalescer::removeCopyByCommutingDef().
This patch changes that function to use a new LiveIntervals::hasPHIKill()
function that computes the flag for a given value number.
llvm-svn: 161254
We are extending live ranges, so kill flags are not accurate. They
aren't needed until they are recomputed after RA anyway.
<rdar://problem/11950722>
llvm-svn: 161023
implicit_def, the other instruction can be anything, including instructions
that define multiple values. Be careful about that and don't assume what operand
0 is.
Fixes pr13249.
llvm-svn: 159509
With regunit liveness permanently enabled, this function would always
return true.
Also remove now obsolete code for checking physreg interference.
llvm-svn: 159006
There are some that I didn't remove this round because they looked like
obvious stubs. There are dead variables in gtest too, they should be
fixed upstream.
llvm-svn: 158090
Don't print out the register number and spill weight, making the TRI
argument unnecessary.
This allows callers to interpret the reg field. It can currently be a
virtual register, a physical register, a spill slot, or a register unit.
llvm-svn: 158031
No functional change intended.
Sorry for the churn. The iterator classes are supposed to help avoid
giant commits like this one in the future. The TableGen-produced
register lists are getting quite large, and it may be necessary to
change the table representation.
This makes it possible to do so without changing all clients (again).
llvm-svn: 157854
Now that the coalescer keeps live intervals and machine code in sync at
all times, it needs to deal with identity copies differently.
When merging two virtual registers, all identity copies are removed
right away. This means that other identity copies must come from
somewhere else, and they are going to have a value number.
Deal with such copies by merging the value numbers before erasing the
copy instruction. Otherwise, we leave dangling value numbers in the live
interval.
This fixes PR12927.
llvm-svn: 157340
With physreg joining out of the way, it is easy to recognize the
instructions that need their kill flags cleared while testing for
interference.
This allows us to skip the final scan of all instructions for an 11%
speedup of the coalescer pass.
llvm-svn: 157169
Dead code elimination during coalescing could cause a virtual register
to be split into connected components. The following rewriting would be
confused about the already joined copies present in the code, but
without a corresponding value number in the live range.
Erase all joined copies instantly when joining intervals such that the
MI and LiveInterval representations are always in sync.
llvm-svn: 157135
Dead code and joined copies are now eliminated on the fly, and there is
no need for a post pass.
This makes the coalescer work like other modern register allocator
passes: Code is changed on the fly, there is no pending list of changes
to be committed.
llvm-svn: 157132
The late dead code elimination is no longer necessary.
The test changes are cause by a register hint that can be either %rdi or
%rax. The choice depends on the use list order, which this patch changes.
llvm-svn: 157131
Before rewriting uses of one value in A to register B, check that there
are no tied uses. That would require multiple A values to be rewritten.
This bug can't bite in the current version of the code for a fairly
subtle reason: A tied use would have caused 2-addr to insert a copy
before the use. If the copy has been coalesced, it will be found by the
same loop changed by this patch, and the optimization is aborted.
This was exposed by 400.perlbench and lua after applying a patch that
deletes joined copies aggressively.
llvm-svn: 157130
Remaining virtreg->physreg copies were rematerialized during
updateRegDefsUses(), but we already do the same thing in joinCopy() when
visiting the physreg copy instruction.
Eliminate the preserveSrcInt argument to reMaterializeTrivialDef(). It
is now always true.
llvm-svn: 157103
Dead copies cause problems because they are trivial to coalesce, but
removing them gived the live range a dangling end point. This patch
enables full dead code elimination which trims live ranges to their uses
so end points don't dangle.
DCE may erase multiple instructions. Put the pointers in an ErasedInstrs
set so we never risk visiting erased instructions in the work list.
There isn't supposed to be any dead copies entering RegisterCoalescer,
but they do slip by as evidenced by test/CodeGen/X86/coalescer-dce.ll.
llvm-svn: 157101
It is no longer necessary to separate VirtCopies, PhysCopies, and
ImpDefCopies. Implicitly defined copies are extremely rare after we
added the ProcessImplicitDefs pass, and physical register copies are not
joined any longer.
llvm-svn: 157059
This has been disabled for a while, and it is not a feature we want to
support. Copies between physical and virtual registers are eliminated by
good hinting support in the register allocator. Joining virtual and
physical registers is really a form of register allocation, and the
coalescer is not properly equipped to do that. In particular, it cannot
backtrack coalescing decisions, and sometimes that would cause it to
create programs that were impossible to register allocate, by exhausting
a small register class.
It was also very difficult to keep track of the live ranges of aliasing
registers when extending the live range of a physreg. By disabling
physreg joining, we can let fixed physreg live ranges remain constant
throughout the register allocator super-pass.
One type of physreg joining remains: A virtual register that has a
single value which is a copy of a reserved register can be merged into
the reserved physreg. This always lowers register pressure, and since we
don't compute live ranges for reserved registers, there are no problems
with aliases.
llvm-svn: 157055
RegisterCoalescer set <undef> flags on all operands of copy instructions
that are scheduled to be removed. This is so they won't affect
shrinkToUses() by introducing false register reads.
Make sure those <undef> flags are never cleared, or shrinkToUses() could
cause live intervals to end at instructions about to be deleted.
This would be a lot simpler if RegisterCoalescer could just erase joined
copies immediately instead of keeping all the to-be-deleted instructions
around.
This fixes PR12862. Unfortunately, bugpoint can't create a sane test
case for this. Like many other coalescer problems, this failure depends
of a very fragile series of events.
<rdar://problem/11474428>
llvm-svn: 157001
When widening an existing <def,reads-undef> operand to a super-register,
it may be necessary to clear the <undef> flag because the wider register
is now read-modify-write through the instruction.
Conversely, it may be necessary to add an <undef> flag when the
coalescer turns a full-register def into a sub-register def, but the
larger register wasn't live before the instruction.
This happens in test/CodeGen/ARM/coalesce-subregs.ll, but the test
is too small for the <undef> flags to affect the generated code.
llvm-svn: 156951
It is now possible to coalesce weird skewed sub-register copies by
picking a super-register class larger than both original registers. The
included test case produces code like this:
vld2.32 {d16, d17, d18, d19}, [r0]!
vst2.32 {d18, d19, d20, d21}, [r0]
We still perform interference checking as if it were a normal full copy
join, so this is still quite conservative. In particular, the f1 and f2
functions in the included test case still have remaining copies because
of false interference.
llvm-svn: 156878
It is possible to coalesce two overlapping registers to a common
super-register that it larger than both of the original registers.
The important difference is that it may be necessary to rewrite DstReg
operands as well as SrcReg operands because the sub-register index has
changed.
This behavior is still disabled by CoalescerPair.
llvm-svn: 156869
Now both SrcReg and DstReg can be sub-registers of the final coalesced
register.
CoalescerPair::setRegisters still rejects such copies because
RegisterCoalescer doesn't yet handle them.
llvm-svn: 156848
At least some of them:
%vreg1:sub_16bit = COPY %vreg2:sub_16bit; GR64:%vreg1, GR32: %vreg2
Previously, we couldn't figure out that the above copy could be
eliminated by coalescing %vreg2 with %vreg1:sub_32bit.
The new getCommonSuperRegClass() hook makes it possible.
This is not very useful yet since the unmodified part of the destination
register usually interferes with the source register. The coalescer
needs to understand sub-register interference checking first.
llvm-svn: 156334
The getPointerRegClass() hook can return register classes that depend on
the calling convention of the current function (ptr_rc_tailcall).
So far, we have been able to infer the calling convention from the
subtarget alone, but as we add support for multiple calling conventions
per target, that no longer works.
Patch by Yiannis Tsiouris!
llvm-svn: 156328
Cross-class joins have been normal and fully supported for a while now.
With TableGen generating the getMatchingSuperRegClass() hook, they are
unlikely to cause problems again.
llvm-svn: 155552
Remove the heuristic for disabling cross-class joins. The greedy
register allocator can handle the narrow register classes, and when it
splits a live range, it can pick a larger register class.
Benchmarks were unaffected by this change.
<rdar://problem/11302212>
llvm-svn: 155551
Creates a configurable regalloc pipeline.
Ensure specific llc options do what they say and nothing more: -reglloc=... has no effect other than selecting the allocator pass itself. This patch introduces a new umbrella flag, "-optimize-regalloc", to enable/disable the optimizing regalloc "superpass". This allows for example testing coalscing and scheduling under -O0 or vice-versa.
When a CodeGen pass requires the MachineFunction to have a particular property, we need to explicitly define that property so it can be directly queried rather than naming a specific Pass. For example, to check for SSA, use MRI->isSSA, not addRequired<PHIElimination>.
CodeGen transformation passes are never "required" as an analysis
ProcessImplicitDefs does not require LiveVariables.
We have a plan to massively simplify some of the early passes within the regalloc superpass.
llvm-svn: 150226
A live range that has an early clobber tied redef now looks like a
normal tied redef, except the early clobber def uses the early clobber
slot.
This is enough to handle any strange interference problems.
llvm-svn: 149769
If a value is defined by a COPY, that instuction can easily and cheaply
be found by getInstructionFromIndex(VNI->def).
This reduces the size of VNInfo from 24 to 16 bytes, and improves
llc compile time by 3%.
llvm-svn: 149763
around within a basic block while maintaining live-intervals.
Updated ScheduleTopDownLive in MachineScheduler.cpp to use the moveInstr API
when reordering MIs.
llvm-svn: 149147
Reserved registers don't have proper live ranges, their LiveInterval
simply has a snippet of liveness for each def. Virtual registers with a
single value that is a copy of a reserved register (typically %esp) can
be coalesced with the reserved register if the live range doesn't
overlap any reserved register defs.
When coalescing with a reserved register, don't modify the reserved
register live range. Just leave it as a bunch of dead defs. This
eliminates quadratic coalescer behavior in i386 functions with many
function calls.
PR11699
llvm-svn: 147726
generator to it. For non-bundle instructions, these behave exactly the same
as the MC layer API.
For properties like mayLoad / mayStore, look into the bundle and if any of the
bundled instructions has the property it would return true.
For properties like isPredicable, only return true if *all* of the bundled
instructions have the property.
For properties like canFoldAsLoad, isCompare, conservatively return false for
bundles.
llvm-svn: 146026
1. Added opcode BUNDLE
2. Taught MachineInstr class to deal with bundled MIs
3. Changed MachineBasicBlock iterator to skip over bundled MIs; added an iterator to walk all the MIs
4. Taught MachineBasicBlock methods about bundled MIs
llvm-svn: 145975