SimplifyCFG tends to form a lot of 2-3 case switches when merging branches. Move
the most likely condition to the front so it is checked first and the others can
be skipped. This is currently not as effective as it could be because SimplifyCFG
destroys profiling metadata when merging branches and switches. Merging branch
weight metadata is tricky though.
This code touches at most 3 cases so I didn't use a proper sorting algorithm.
llvm-svn: 157521
then it doesn't alter the instructions composing it, however it would continue
to move the instructions to just before the expression root. Ensure it doesn't
move them either, so now it really does nothing if there is nothing to do. That
commit also ensured that nsw etc flags weren't cleared if the expression was not
being changed. Tweak this a bit so that it doesn't clear flags on the initial
part of a computation either if that part didn't change but later bits did.
llvm-svn: 157518
are passed in. However, those arguments may be in a write-protected area, as far
as the runtime library is concerned. For instance, the data could be placed into
a 'linkedit' section, which isn't writable. Emit the code from
llvm_gcda_increment_indirect_counter directly into the function instead.
Note: The code for this is ugly, and can lead to bloat. We should look into
simplifying this code instead of having all of these branches.
<rdar://problem/11181370>
llvm-svn: 157505
to pass around a struct instead of a large set of individual values. This
cleans up the interface and allows more information to be added to the struct
for future targets without requiring changes to each and every target.
NV_CONTRIB
llvm-svn: 157479
with arbitrary topologies (previously it would give up when hitting a diamond
in the use graph for example). The testcase from PR12764 is now reduced from
a pile of additions to the optimal 1617*%x0+208. In doing this I changed the
previous strategy of dropping all uses for expression leaves to one of dropping
all but one use. This works out more neatly (but required a bunch of tweaks)
and is also safer: some recently fixed bugs during recursive linearization were
because the linearization code thinks it completely owns a node if it has no uses
outside the expression it is linearizing. But if the node was also in another
expression that had been linearized (and thus all uses of the node from that
expression dropped) then the conclusion that it is completely owned by the
expression currently being linearized is wrong. Keeping one use from within each
linearized expression avoids this kind of mistake.
llvm-svn: 157467
The Hazard checker implements in-order contraints, or interlocked
resources. Ready instructions with hazards do not enter the available
queue and are not visible to other heuristics.
The major code change is the addition of SchedBoundary to encapsulate
the state at the top or bottom of the schedule, including both a
pending and available queue.
The scheduler now counts cycles in sync with the hazard checker. These
are minimum cycle counts based on known hazards.
Targets with no itinerary (x86_64) currently remain at cycle 0. To fix
this, we need to provide some maximum issue width for all targets. We
also need to add the concept of expected latency vs. minimum latency.
llvm-svn: 157427
I'm not sure it's really worth expressing this as a range rather than 3 specific equalities, but it doesn't seem fundamentally wrong either.
llvm-svn: 157398
LowerSwitch::Clusterify : main functinality was replaced with CRSBuilder::optimize, so big part of Clusterify's code was reduced.
test/Transform/LowerSwitch/feature.ll - this test was refactored: grep + count was replaced with FileCheck usage.
llvm-svn: 157384
Live ranges with a constrained register class may benefit from splitting
around individual uses. It allows the remaining live range to use a
larger register class where it may allocate. This is like spilling to a
different register class.
This is only attempted on constrained register classes.
<rdar://problem/11438902>
llvm-svn: 157354
CHECK. The latter error was hidden by the former, and the test harness
used by e.g. "make check" silently ignored that opt was printing an
error message about an unknown flag instead of running on the test file.
llvm-svn: 157341
Now that the coalescer keeps live intervals and machine code in sync at
all times, it needs to deal with identity copies differently.
When merging two virtual registers, all identity copies are removed
right away. This means that other identity copies must come from
somewhere else, and they are going to have a value number.
Deal with such copies by merging the value numbers before erasing the
copy instruction. Otherwise, we leave dangling value numbers in the live
interval.
This fixes PR12927.
llvm-svn: 157340
leader table. That's because it wasn't expecting instructions to turn up as
leader for a value number that is not its own, but equality propagation could
create this situation. One solution is to have the leader table use a WeakVH
but this slows down GVN by about 5%. Instead just have equality propagation not
add instructions to the leader table, only constants and arguments. In theory
this might cause GVN to run more (each time it changes something it runs again)
but it doesn't seem to occur enough to cause a slow down.
llvm-svn: 157251
instruction encodings can be excluded during mips16 processing.
This revision fixes the issue raised by Jim Grosbach.
bool hasStandardEncoding() const { return !inMips16Mode(); }
When micromips is added it will be
bool StandardEncoding() const { return !inMips16Mode()&& !inMicroMipsMode(); }
No additional testing is needed other than to assure that there is no regression
from this patch.
Patch by Reed Kotler.
llvm-svn: 157234
32-bit offset jump tables just use real branch instructions and so aren't
marked as data regions. We were still emitting the .end_data_region
marker though, which assert()ed.
rdar://11499158
llvm-svn: 157221
This helps compile time when the greedy register allocator splits live
ranges in giant functions. Without the bias, we would try to grow
regions through the giant edge bundles, usually to find out that the
region became too big and expensive.
If a live range has many uses in blocks near the giant bundle, the small
negative bias doesn't make a big difference, and we still consider
regions including the giant edge bundle.
Giant edge bundles are usually connected to landing pads or indirect
branches.
llvm-svn: 157174
With physreg joining out of the way, it is easy to recognize the
instructions that need their kill flags cleared while testing for
interference.
This allows us to skip the final scan of all instructions for an 11%
speedup of the coalescer pass.
llvm-svn: 157169
may be RAUW'd by the recursive call to LegalizeOps; instead, retrieve
the other operands when calling UpdateNodeOperands. Fixes PR12889.
llvm-svn: 157162
There should be no difference in the resulting binary, given a sufficiently
smart compiler. However we already had compiler timeouts on the generated
code in Intrinsics.gen, this hopefully makes the lives of slow buildbots a
little easier.
llvm-svn: 157161
This class is meant to be the primary interface for examining a live
range in the vicinity on a given instruction. It avoids all the messy
dealings with iterators and early clobbers.
This is a more abstract interface to live ranges, hiding the
implementation as a vector of segments.
llvm-svn: 157141
Dead code elimination during coalescing could cause a virtual register
to be split into connected components. The following rewriting would be
confused about the already joined copies present in the code, but
without a corresponding value number in the live range.
Erase all joined copies instantly when joining intervals such that the
MI and LiveInterval representations are always in sync.
llvm-svn: 157135
The current code will generate a prologue which starts with something like:
mflr 0
stw 31, -4(1)
stw 0, 4(1)
stwu 1, -16(1)
But under the PPC32 SVR4 ABI, access to negative offsets from R1 is not allowed.
This was pointed out by Peter Bergner.
llvm-svn: 157133
Dead code and joined copies are now eliminated on the fly, and there is
no need for a post pass.
This makes the coalescer work like other modern register allocator
passes: Code is changed on the fly, there is no pending list of changes
to be committed.
llvm-svn: 157132
The late dead code elimination is no longer necessary.
The test changes are cause by a register hint that can be either %rdi or
%rax. The choice depends on the use list order, which this patch changes.
llvm-svn: 157131
Before rewriting uses of one value in A to register B, check that there
are no tied uses. That would require multiple A values to be rewritten.
This bug can't bite in the current version of the code for a fairly
subtle reason: A tied use would have caused 2-addr to insert a copy
before the use. If the copy has been coalesced, it will be found by the
same loop changed by this patch, and the optimization is aborted.
This was exposed by 400.perlbench and lua after applying a patch that
deletes joined copies aggressively.
llvm-svn: 157130
Remaining virtreg->physreg copies were rematerialized during
updateRegDefsUses(), but we already do the same thing in joinCopy() when
visiting the physreg copy instruction.
Eliminate the preserveSrcInt argument to reMaterializeTrivialDef(). It
is now always true.
llvm-svn: 157103
Dead copies cause problems because they are trivial to coalesce, but
removing them gived the live range a dangling end point. This patch
enables full dead code elimination which trims live ranges to their uses
so end points don't dangle.
DCE may erase multiple instructions. Put the pointers in an ErasedInstrs
set so we never risk visiting erased instructions in the work list.
There isn't supposed to be any dead copies entering RegisterCoalescer,
but they do slip by as evidenced by test/CodeGen/X86/coalescer-dce.ll.
llvm-svn: 157101
getUDivExpr attempts to simplify by checking for overflow.
isLoopEntryGuardedByCond then evaluates the loop predicate which
may lead to the same getUDivExpr causing endless recursion.
Fixes PR12868: clang 3.2 segmentation fault.
llvm-svn: 157092
Use a dedicated MachO load command to annotate data-in-code regions.
This is the same format the linker produces for final executable images,
allowing consistency of representation and use of introspection tools
for both object and executable files.
Data-in-code regions are annotated via ".data_region"/".end_data_region"
directive pairs, with an optional region type.
data_region_directive := ".data_region" { region_type }
region_type := "jt8" | "jt16" | "jt32" | "jta32"
end_data_region_directive := ".end_data_region"
The previous handling of ARM-style "$d.*" labels was broken and has
been removed. Specifically, it didn't handle ARM vs. Thumb mode when
marking the end of the section.
rdar://11459456
llvm-svn: 157062
It is no longer necessary to separate VirtCopies, PhysCopies, and
ImpDefCopies. Implicitly defined copies are extremely rare after we
added the ProcessImplicitDefs pass, and physical register copies are not
joined any longer.
llvm-svn: 157059
This has been disabled for a while, and it is not a feature we want to
support. Copies between physical and virtual registers are eliminated by
good hinting support in the register allocator. Joining virtual and
physical registers is really a form of register allocation, and the
coalescer is not properly equipped to do that. In particular, it cannot
backtrack coalescing decisions, and sometimes that would cause it to
create programs that were impossible to register allocate, by exhausting
a small register class.
It was also very difficult to keep track of the live ranges of aliasing
registers when extending the live range of a physreg. By disabling
physreg joining, we can let fixed physreg live ranges remain constant
throughout the register allocator super-pass.
One type of physreg joining remains: A virtual register that has a
single value which is a copy of a reserved register can be merged into
the reserved physreg. This always lowers register pressure, and since we
don't compute live ranges for reserved registers, there are no problems
with aliases.
llvm-svn: 157055
SelectionDAGBuilder::Clusterify : main functinality was replaced with CRSBuilder::optimize, so big part of Clusterify's code was reduced.
llvm-svn: 157046
non-profitable commute using outdated info. The test case would still fail
because of poor pre-RA schedule. That will be fixed by MI scheduler.
rdar://11472010
llvm-svn: 157038
This is the same as the other tests: Clever tricks are required to make
the arguments and return value line up in a single-instruction function.
It rarely happens in real life.
We have plenty other examples of this behavior.
llvm-svn: 157030
This option has been disabled for a while, and it is going away so I can
clean up the coalescer code.
The tests that required physreg joining to be enabled were almost all of
the form "tiny function with interference between arguments and return
value". Such functions are usually inlined in the real world.
The problem exposed by phys_subreg_coalesce-3.ll is real, but fairly
rare.
llvm-svn: 157027
the 0b10 mask encoding bits. Make MSR APSR writes without a _<bits> qualifier
an alias for MSR APSR_nzcvq even though ARM as deprecated it use. Also add
support for suffixes (_nzcvq, _g, _nzcvqg) for APSR versions. Some FIXMEs in
the code for better error checking when versions shouldn't be used.
rdar://11457025
llvm-svn: 157019
- Added HOST_ARCH to Makefile.config.in
The HOST_ARCH will be used by MCJIT tests filter, because MCJIT supported only x86 and ARM architectures now.
llvm-svn: 157015
Introduce the basic strategy for register pressure scheduling.
1) Respect target limits at all times.
2) Indentify critical register classes (pressure sets).
Track pressure within the scheduled region.
Avoid increasing scheduled pressure for critical registers.
3) Avoid exceeding the max pressure of the region prior to scheduling.
Added logic for picking between the top and bottom ready Q's based on
regpressure heuristics.
Status: functional but needs to be asjusted to achieve good results.
llvm-svn: 157006
RegisterCoalescer set <undef> flags on all operands of copy instructions
that are scheduled to be removed. This is so they won't affect
shrinkToUses() by introducing false register reads.
Make sure those <undef> flags are never cleared, or shrinkToUses() could
cause live intervals to end at instructions about to be deleted.
This would be a lot simpler if RegisterCoalescer could just erase joined
copies immediately instead of keeping all the to-be-deleted instructions
around.
This fixes PR12862. Unfortunately, bugpoint can't create a sane test
case for this. Like many other coalescer problems, this failure depends
of a very fragile series of events.
<rdar://problem/11474428>
llvm-svn: 157001
separate side table, using the handy SequenceToOffsetTable class. This encodes all
these weird things into another 256 bytes, allowing all intrinsics to be encoded this way.
llvm-svn: 156995
TableGen already computes register units as the basic unit of
interference. We can use that to compute the set of overlapping
registers.
This means that we can easily compute overlap sets for one register at a
time. There is no benefit to computing all registers at once.
llvm-svn: 156960
llc to recognize MIPS16 as a MIPS ASE extension. -mips16 will mean the
mips16 ASE for mips32 by default.
As part of fixing of adding this we discovered some small changes that
need to be made to MipsInstrInfo::storeRegToStackSLot and
MipsInstrInfo::loadRegFromStackSlot. We were using some "==" equality tests
where in fact we should have been using Mips::<regclas>.hasSubClassEQ instead,
per suggestion of Jakob Stoklund Olesen.
Patch by Reed Kotler.
llvm-svn: 156958
When widening an existing <def,reads-undef> operand to a super-register,
it may be necessary to clear the <undef> flag because the wider register
is now read-modify-write through the instruction.
Conversely, it may be necessary to add an <undef> flag when the
coalescer turns a full-register def into a sub-register def, but the
larger register wasn't live before the instruction.
This happens in test/CodeGen/ARM/coalesce-subregs.ll, but the test
is too small for the <undef> flags to affect the generated code.
llvm-svn: 156951
It's more flexible for MCJIT tasks, in addition it's provides a invalidation instruction cache for code sections which will be used before JIT code will be executed.
llvm-svn: 156933
options, to enable easier testing of the innards of LLVM that are
enabled by such optimization strategies.
Note that this doesn't provide the (much needed) function attribute
support for -Oz (as opposed to -Os), but still seems like a positive
step to better test the logic that Clang currently relies on.
Patch by Patrik Hägglund.
llvm-svn: 156913
generated code (for Intrinsic::getType) into a table. This handles common cases right now,
but I plan to extend it to handle all cases and merge in type verification logic as well
in follow-on patches.
llvm-svn: 156905
It is now possible to coalesce weird skewed sub-register copies by
picking a super-register class larger than both original registers. The
included test case produces code like this:
vld2.32 {d16, d17, d18, d19}, [r0]!
vst2.32 {d18, d19, d20, d21}, [r0]
We still perform interference checking as if it were a normal full copy
join, so this is still quite conservative. In particular, the f1 and f2
functions in the included test case still have remaining copies because
of false interference.
llvm-svn: 156878
It is possible to coalesce two overlapping registers to a common
super-register that it larger than both of the original registers.
The important difference is that it may be necessary to rewrite DstReg
operands as well as SrcReg operands because the sub-register index has
changed.
This behavior is still disabled by CoalescerPair.
llvm-svn: 156869
Now both SrcReg and DstReg can be sub-registers of the final coalesced
register.
CoalescerPair::setRegisters still rejects such copies because
RegisterCoalescer doesn't yet handle them.
llvm-svn: 156848
This feature avoids creating edges in the scheduler's dependence graph
for non-aliasing memory operations according to whichever alias
analysis is available. It has been fully tested in Hexagon. Before
making this default, it needs to be extended to handle multiple
MachineMemOperands, compile time needs more evaluation, and
benchmarking on X86 and ARM is needed.
Patch by Sergei Larin!
llvm-svn: 156842
Many targets always use the same bitwise encoding value for physical
registers in all (or most) instructions. Add this mapping to the
.td files and TableGen'erate the information and expose an accessor
in MCRegisterInfo.
patch by Tom Stellard.
llvm-svn: 156829