This fixes PR7479 and PR7485. The test cases from those PRs are big, so not
included. However, PR7485 comes from self hosting on FreeBSD, so we will surely
hear about any regression.
llvm-svn: 106811
which don't have a catch-all associated with them not just clean-ups. This fixes
the SingleSource/Benchmarks/Shootout-C++/except.cpp testcase that broke because
of my change r105902.
llvm-svn: 106772
CoalescerPair can determine if a copy can be coalesced, and which register gets
merged away. The old logic in SimpleRegisterCoalescing had evolved into
something a bit too convoluted.
This second attempt fixes some crashes that only occurred Linux.
llvm-svn: 106769
[L]oad, [u]se, [d]ef, or [S]tore slots.
This makes it easier to see if two indices refer to the same instruction,
avoiding mental mod 4 calculations.
llvm-svn: 106766
In this case it is essential that the kill is real because the spiller will
decide to omit a spill if it thinks there is a later kill.
llvm-svn: 106751
when the condition is constant. This optimization shouldn't be
necessary, because codegen shouldn't be able to find dead control
paths that the IR-level optimizer can't find. And it's undesirable,
because it encourages bugpoint to leave "br i1 false" branches
in its output. And it wasn't updating the CFG.
I updated all the tests I could, but some tests are too reduced
and I wasn't able to meaningfully preserve them.
llvm-svn: 106748
CoalescerPair can determine if a copy can be coalesced, and which register gets
merged away. The old logic in SimpleRegisterCoalescing had evolved into
something a bit too convoluted.
llvm-svn: 106701
atomic intrinsics, either because the use locking instructions for the
atomics, or because they perform the locking directly. Add support in the
DAG combiner to fold away the fences.
llvm-svn: 106630
instructions.
This does not affect codegen much because SUBREG_TO_REG is only used by X86 and
X86 does not use the register scavenger, but it prevents verifier errors.
llvm-svn: 106583
Measurements show that it does not speed up coalescing, so there is no reason
the keep the added complexity around.
Also clean out some unused methods and static functions.
llvm-svn: 106548
opportunities. For example, this lets it emit this:
movq (%rax), %rcx
addq %rdx, %rcx
instead of this:
movq %rdx, %rcx
addq (%rax), %rcx
in the case where %rdx has subsequent uses. It's the same number
of instructions, and usually the same encoding size on x86, but
it appears faster, and in general, it may allow better scheduling
for the load.
llvm-svn: 106493
Split the code for materializing a value out of
SelectionDAGBuilder::getValue into a helper function, so that it can
be used in other ways. Add a new getNonRegisterValue function which
uses it, for use in code which doesn't want a CopyFromReg even
when FuncMap.ValueMap already has an entry for it.
llvm-svn: 106422
- This fixed a number of bugs in if-converter, tail merging, and post-allocation
scheduler. If-converter now runs branch folding / tail merging first to
maximize if-conversion opportunities.
- Also changed the t2IT instruction slightly. It now defines the ITSTATE
register which is read by instructions in the IT block.
- Added Thumb2 specific hazard recognizer to ensure the scheduler doesn't
change the instruction ordering in the IT block (since IT mask has been
finalized). It also ensures no other instructions can be scheduled between
instructions in the IT block.
This is not yet enabled.
llvm-svn: 106344
instructions, but it doesn't really understand live ranges, so the first
INSERT_SUBREG uses an implicitly defined register.
Fix it in LiveVariableAnalysis by adding the <undef> flag.
llvm-svn: 106333
entries used by llvm-gcc. *_[U]MIN and such can be added later if needed.
This enables the front ends to simplify handling of the atomic intrinsics by
removing the target-specific decision about which targets can handle the
intrinsics.
llvm-svn: 106321
so when IfConverter::CopyAndPredicateBlock checks to see if it should ignore
an instruction because it is a branch, it should not check if the branch is
predicated.
This case (when IgnoreBr is true) is only relevant from IfConvertTriangle,
where new branches are inserted after the block has been copied and predicated.
If the original branch is not removed, we end up with multiple conditional
branches (possibly conflicting) at the end of the block. Aside from any
immediate errors resulting from that, this confuses the AnalyzeBranch functions
so that the branches are not analyzable. That in turn causes the IfConverter to
think that the "Simple" pattern can be applied, and things go downhill fast
because the "Simple" pattern does _not_ apply if the block can fall through.
This is pretty fragile. If there are other degenerate cases where AnalyzeBranch
fails, but where the block may still fall through, the IfConverter should not
perform its "Simple" if-conversion. But, I don't know how to do that with the
current AnalyzeBranch interface, so for now, the best thing seems to be to
avoid creating branches that AnalyzeBranch cannot handle.
Evan, please review!
llvm-svn: 106291
switch from this:
if (TimePassesIsEnabled) {
NamedRegionTimer T(Name, GroupName);
do_something();
} else {
do_something(); // duplicate the code, this time without a timer!
}
to this:
{
NamedRegionTimer T(Name, GroupName, TimePassesIsEnabled);
do_something();
}
llvm-svn: 106285
addresses a longstanding deficiency noted in many FIXMEs scattered
across all the targets.
This effectively moves the problem up one level, replacing eleven
FIXMEs in the targets with eight FIXMEs in CodeGen, plus one path
through FastISel where we actually supply a DebugLoc, fixing Radar
7421831.
llvm-svn: 106243
for the moment. The implementation of the libcall will follow.
Currently, the llvm-gcc knows when the intrinsics can be correctly handled by
the back end and only generates them in those cases, issuing libcalls directly
otherwise. That's too much coupling. The intrinsics should always be
generated and the back end decide how to handle them, be it with a libcall,
inline code, or whatever. This patch is a step in that direction.
rdar://8097623
llvm-svn: 106227
LiveVariableAnalysis was a bit picky about a register only being redefined once,
but that really isn't necessary.
Here is an example of chained INSERT_SUBREGs that we can handle now:
68 %reg1040<def> = INSERT_SUBREG %reg1040, %reg1028<kill>, 14
register: %reg1040 +[70,134:0)
76 %reg1040<def> = INSERT_SUBREG %reg1040, %reg1029<kill>, 13
register: %reg1040 replace range with [70,78:1) RESULT: %reg1040,0.000000e+00 = [70,78:1)[78,134:0) 0@78-(134) 1@70-(78)
84 %reg1040<def> = INSERT_SUBREG %reg1040, %reg1030<kill>, 12
register: %reg1040 replace range with [78,86:2) RESULT: %reg1040,0.000000e+00 = [70,78:1)[78,86:2)[86,134:0) 0@86-(134) 1@70-(78) 2@78-(86)
92 %reg1040<def> = INSERT_SUBREG %reg1040, %reg1031<kill>, 11
register: %reg1040 replace range with [86,94:3) RESULT: %reg1040,0.000000e+00 = [70,78:1)[78,86:2)[86,94:3)[94,134:0) 0@94-(134) 1@70-(78) 2@78-(86) 3@86-(94)
rdar://problem/8096390
llvm-svn: 106152
will conflict with another live range. The place which creates this scenerio is
the code in X86 that lowers a select instruction by splitting the MBBs. This
eliminates the need to check from the bottom up in an MBB for live pregs.
llvm-svn: 106066
SimpleRegisterCoalescing::JoinIntervals() uses CoalescerPair to determine if a
copy is coalescable, and in very rare cases it can return true where LHS is not
live - the coalescable copy can come from an alias of the physreg in LHS.
llvm-svn: 106021
combined to an insert_subreg, i.e., where the destination register is larger
than the source. We need to check that the subregs can be composed for that
case in a symmetrical way to the case when the destination is smaller.
llvm-svn: 106004
Early clobbers defining a virtual register were first alocated to a physreg and
then processed as a physreg EC, spilling the virtreg.
This fixes PR7382.
llvm-svn: 105998
Given a copy instruction, CoalescerPair can determine which registers to
coalesce in order to eliminate the copy. It deals with all the subreg fun to
determine a tuple (DstReg, SrcReg, SubIdx) such that:
- SrcReg is a virtual register that will disappear after coalescing.
- DstReg is a virtual or physical register whose live range will be extended.
- SubIdx is 0 when DstReg is a physical register.
- SrcReg can be joined with DstReg:SubIdx.
CoalescerPair::isCoalescable() determines if another copy instruction is
compatible with the same tuple. This fixes some NEON miscompilations where
shuffles are getting coalesced as if they were copies.
The CoalescerPair class will replace a lot of the spaghetti logic in JoinCopy
later.
llvm-svn: 105997
replacing the overly conservative checks that I had introduced recently to
deal with correctness issues. This makes a pretty noticable difference
in our testcases where reg_sequences are used. I've updated one test to
check that we no longer emit the unnecessary subreg moves.
llvm-svn: 105991
clean-up to a catch-all after inlining, take into account that there could be
filter IDs as well. The presence of filters don't mean that the selector catches
anything. It's just metadata information.
llvm-svn: 105872
This is a bit of a hack to make inline asm look more like call instructions.
It would be better to produce correct dead flags during isel.
llvm-svn: 105749
%reg1025 = <sext> %reg1024
...
%reg1026 = SUBREG_TO_REG 0, %reg1024, 4
into this:
%reg1025 = <sext> %reg1024
...
%reg1027 = EXTRACT_SUBREG %reg1025, 4
%reg1026 = SUBREG_TO_REG 0, %reg1027, 4
The problem here is that SUBREG_TO_REG is there to assert that an implicit zext
occurs. It doesn't insert a zext instruction. If we allow the EXTRACT_SUBREG
here, it will give us the value after the <sext>, not the original value of
%reg1024 before <sext>.
llvm-svn: 105741
register allocation.
Process all of the clobber lists at the end of the function, marking the
registers as used in MachineRegisterInfo.
This is necessary in case the calls clobber callee-saved registers (sic).
llvm-svn: 105473
replace an OpA with a widened OpB, it is possible to get new uses of OpA due to CSE
when recursively updating nodes. Since OpA has been processed, the new uses are
not examined again. The patch checks if this occurred and it it did, updates the
new uses of OpA to use OpB.
llvm-svn: 105453
Check that all the instructions are in the same basic block, that the
EXTRACT_SUBREGs write to the same subregs that are being extracted, and that
the source and destination registers are in the same regclass. Some of
these constraints can be relaxed with a bit more work. Jakob suggested
that the loop that checks for subregs when NewSubIdx != 0 should use the
"nodbg" iterator, so I made that change here, too.
llvm-svn: 105437
registers it defines then interfere with an existing preg live range.
For instance, if we had something like these machine instructions:
BB#0
... = imul ... EFLAGS<imp-def,dead>
test ..., EFLAGS<imp-def>
jcc BB#2 EFLAGS<imp-use>
BB#1
... ; fallthrough to BB#2
BB#2
... ; No code that defines EFLAGS
jcc ... EFLAGS<imp-use>
Machine sink will come along, see that imul implicitly defines EFLAGS, but
because it's "dead", it assumes that it can move imul into BB#2. But when it
does, imul's "dead" imp-def of EFLAGS is raised from the dead (a zombie) and
messes up the condition code for the jump (and pretty much anything else which
relies upon it being correct).
The solution is to know which pregs are live going into a basic block. However,
that information isn't calculated at this point. Nor does the LiveVariables pass
take into account non-allocatable physical registers. In lieu of this, we do a
*very* conservative pass through the basic block to determine if a preg is live
coming out of it.
llvm-svn: 105387
expansion is the same as that used by LegalizeDAG.
The resulting code sucks in terms of performance/codesize on x86-32 for a
64-bit operation; I haven't looked into whether different expansions might be
better in general.
llvm-svn: 105378
spills and reloads.
This means that a partial define of a register causes a reload so the other
parts of the register are preserved.
The reload can be prevented by adding an <imp-def> operand for the full
register. This is already done by the coalescer and live interval analysis where
relevant.
llvm-svn: 105369
register updates.
These operands tell the spiller that the other parts of the partially defined
register are don't-care, and a reload is not necessary.
llvm-svn: 105361
instruction defines subregisters.
Any existing subreg indices on the original instruction are preserved or
composed with the new subreg index.
Also substitute multiple operands mentioning the original register by using the
new MachineInstr::substituteRegister() function. This is necessary because there
will soon be <imp-def> operands added to non read-modify-write partial
definitions. This instruction:
%reg1234:foo = FLAP %reg1234<imp-def>
will reMaterialize(%reg3333, bar) like this:
%reg3333:bar-foo = FLAP %reg333:bar<imp-def>
Finally, replace the TargetRegisterInfo pointer argument with a reference to
indicate that it cannot be NULL.
llvm-svn: 105358
that are too large. This causes the freebsd bootloader to be too
large apparently.
It's unclear if this should be an -Os or -Oz thing. Thoughts welcome.
llvm-svn: 105228
implementation that is correct for most targets. Tablegen will override where
needed.
Add MachineOperand::subst{Virt,Phys}Reg methods that correctly handle existing
subreg indices when sustituting registers.
llvm-svn: 104985
optimization level.
This only really affects llc for now because both the llvm-gcc and clang front
ends override the default register allocator. I intend to remove that code later.
llvm-svn: 104904
implementing pop with a linear search for a "best" element. The priority
queue was a neat idea, but in practice the comparison functions depend
on dynamic information.
llvm-svn: 104718
If you have a setjmp/longjmp situation, it's possible for stack slot coloring to
reuse a stack slot before it's really dead. For instance, if we have something
like this:
1: y = g;
x = sigsetjmp(env, 0);
switch (x) {
case 1:
/* ... */
goto run;
case 0:
run:
do_run(); /* marked as "no return" */
break;
case 3:
if (...) {
/* ... */
goto run;
}
/* ... */
break;
}
2: g = y;
"y" may be put onto the stack, so the expression "g = y" is relying upon the
fact that the stack slot containing "y" isn't modified between (1) and (2). But
it can be, because of the "no return" calls in there. A longjmp might come back
with 3, modify the stack slot, and then go to case 0. And it's perfectly
acceptable to reuse the stack slot there because there's no CFG flow from case 3
to (2).
The fix is to disable certain optimizations in these situations. Ideally, we'd
disable them for all "returns twice" functions. But we don't support that
attribute. Check for "setjmp" and "sigsetjmp" instead.
llvm-svn: 104640
Mon Ping provided; unfortunately bugpoint failed to
reduce it, but I think it's important to have a test for
this in the suite. 8023512.
llvm-svn: 104624
so that it will continue to test what it was meant to test when I commit a
separate change for better support of BUILD_VECTOR and VECTOR_SHUFFLE for Neon.
Fix a DAG combiner crash exposed by this test change.
llvm-svn: 104380
that are aliases of the specified register.
- Rename modifiesRegister to definesRegister since it's looking a def of the
specific register or one of its super-registers. It's not looking for def of a
sub-register or alias that could change the specified register.
- Added modifiesRegister to look for defs of aliases.
llvm-svn: 104377
reads or writes a register.
This takes partial redefines and undef uses into account.
Don't actually use it yet. That caused miscompiles.
llvm-svn: 104372
definitions of the virtual register.
This happens when spilling the registers produced by REG_SEQUENCE:
%reg1047:5<def>, %reg1047:6<def>, %reg1047:7<def> = VLD3d8 %reg1033, 0, pred:14, pred:%reg0
The rewriter would spill the register multiple times, dead store elimination
tried to keep up, but ended up cutting the branch it was sitting on.
llvm-svn: 104321
pipeline stall. It's useful for targets like ARM cortex-a8. NEON has a lot
of long latency instructions so a strict register pressure reduction
scheduler does not work well.
Early experiments show this speeds up some NEON loops by over 30%.
llvm-svn: 104216
A partial redef now triggers a reload if required. Also don't add
<imp-def,dead> operands for physical superregisters.
Kill flags are still treated as full register kills, and <imp-use,kill> operands
are added for physical superregisters as before.
llvm-svn: 104167
partial redefines.
We are going to treat a partial redefine of a virtual register as a
read-modify-write:
%reg1024:6 = OP
Unless the register is fully clobbered:
%reg1024:6 = OP, %reg1024<imp-def>
MachineInstr::readsVirtualRegister() knows the difference. The first case is a
read, the second isn't.
llvm-svn: 104149
need to be promoted. The BUILD_VECTOR and EXTRACT_VECTOR_ELT nodes generated
here already allow the promoted type to be used without further changes, so
just do the promotion. This fixes part of pr7167.
llvm-svn: 104141
The trouble arises when the result of a vector cmp + sext is then and'ed with all ones. Instcombine will turn it into a vector cmp + zext, dag combiner will miss turning it into a vsetcc and hell breaks loose after that.
Teach dag combine to turn a vector cpm + zest into a vsetcc + and 1. This fixes rdar://7923010.
llvm-svn: 104094
This fixes the miscompilations of MultiSource/Applications/JM/l{en,de}cod.
Clang now successfully self hosts in a debug build with the fast register allocator.
llvm-svn: 103975
While that approach works wonders for register pressure, it tends to break
everything.
This should unbreak the arm-linux builder and fix a number of miscompilations.
llvm-svn: 103946
out aliases when allocating. Clean up allocVirtReg().
Use calcSpillCost() to allow more aggressive hinting. Now the hint is always
taken unless blocked by a reserved register. This leads to more coalescing,
lower register pressure, and less spilling.
llvm-svn: 103939
This is safe to do because the physreg has been marked UsedInInstr and the kill flag will be set on the last operand using the virtreg if there are more then one.
llvm-svn: 103933
Debug code doesn't use callee saved registers anyway, and the code is simpler this way. Now spillVirtReg always kills, and the isKill parameter is not needed.
llvm-svn: 103927
The implementation in LegalizeIntegerTypes to handle this as
sint64->float + appropriate power of 2 is subject to double rounding,
considered incorrect by numerics people. Use this implementation only
when it is safe. This leads to using library calls in some cases
that produced inline code before, but it's correct now.
(EVTToAPFloatSemantics belongs somewhere else, any suggestions?)
Add a correctly rounding (though not particularly fast) conversion
that uses X87 80-bit computations for x86-32.
7885399, 5901940. This shows up in gcc.c-torture/execute/ieee/rbug.c
in the gcc testsuite on some platforms.
llvm-svn: 103883
a condition's grouping. Every other use of Allocatable.test(Hint) groups it the
same way as it is indented, so move the parentheses to agree with that
grouping.
llvm-svn: 103869
When working top-down in a basic block, substituting physregs for virtregs, the use-def chains are kept up to date. That means we can recognize a virtreg kill by the use-def chain becoming empty.
This makes the fast allocator independent of incoming kill flags.
llvm-svn: 103866
instructions.
e.g.
%reg1026<def> = VLDMQ %reg1025<kill>, 260, pred:14, pred:%reg0
%reg1027<def> = EXTRACT_SUBREG %reg1026, 6
%reg1028<def> = EXTRACT_SUBREG %reg1026<kill>, 5
...
%reg1029<def> = REG_SEQUENCE %reg1028<kill>, 5, %reg1027<kill>, 6, %reg1028, 7, %reg1027, 8, %reg1028, 9, %reg1027, 10, %reg1030<kill>, 11, %reg1032<kill>, 12
After REG_SEQUENCE is eliminated, we are left with:
%reg1026<def> = VLDMQ %reg1025<kill>, 260, pred:14, pred:%reg0
%reg1029:6<def> = EXTRACT_SUBREG %reg1026, 6
%reg1029:5<def> = EXTRACT_SUBREG %reg1026<kill>, 5
The regular coalescer will not be able to coalesce reg1026 and reg1029 because it doesn't
know how to combine sub-register indices 5 and 6. Now 2-address pass will consult the
target whether sub-registers 5 and 6 of reg1026 can be combined to into a larger
sub-register (or combined to be reg1026 itself as is the case here). If it is possible,
it will be able to replace references of reg1026 with reg1029 + the larger sub-register
index.
llvm-svn: 103835
the variable actually tracks.
N.B., several back-ends are using "HasCalls" as being synonymous for something
that adjusts the stack. This isn't 100% correct and should be looked into.
llvm-svn: 103802
- Kill is implicit when use and def registers are identical.
- Only virtual registers can differ.
Add a -verify-fast-regalloc to run the verifier before the fast allocator.
llvm-svn: 103797
-filetype=obj test, and -filetype=obj leaks a few objects. Added a FIXME, we
need to sort out the ownership model for the various MC objects.
llvm-svn: 103769
This loop is quadratic in the capacity for a DenseMap:
while(!map.empty())
map.erase(map.begin());
Instead we now do a normal begin() - end() iteration followed by map.clear().
That also has the nice sideeffect of shrinking the map capacity on demand.
llvm-svn: 103747