This reverts commit cf1c774d6a.
This change caused several regressions in the gdb test suite - at least
a sample of which was due to line zero instructions making breakpoints
un-lined. I think they're worth investigating/understanding more (&
possibly addressing) before moving forward with this change.
Revert "[FastISel] NFC: Clean up unnecessary bookkeeping"
This reverts commit 3fd39d3694.
Revert "[FastISel] NFC: Remove obsolete -fast-isel-sink-local-values option"
This reverts commit a474657e30.
Revert "Remove static function unused after cf1c774."
This reverts commit dc35368ccf.
Revert "[lldb] Fix TestThreadStepOut.py after "Flush local value map on every instruction""
This reverts commit 53a14a47ee.
This rewrites big parts of the fast register allocator. The basic
strategy of doing block-local allocation hasn't changed but I tweaked
several details:
Track register state on register units instead of physical
registers. This simplifies and speeds up handling of register aliases.
Process basic blocks in reverse order: Definitions are known to end
register livetimes when walking backwards (contrary when walking
forward then uses may or may not be a kill so we need heuristics).
Check register mask operands (calls) instead of conservatively
assuming everything is clobbered. Enhance heuristics to detect
killing uses: In case of a small number of defs/uses check if they are
all in the same basic block and if so the last one is a killing use.
Enhance heuristic for copy-coalescing through hinting: We check the
first k defs of a register for COPYs rather than relying on there just
being a single definition. When testing this on the full llvm
test-suite including SPEC externals I measured:
average 5.1% reduction in code size for X86, 4.9% reduction in code on
aarch64. (ranging between 0% and 20% depending on the test) 0.5%
faster compiletime (some analysis suggests the pass is slightly slower
than before, but we more than make up for it because later passes are
faster with the reduced instruction count)
Also adds a few testcases that were broken without this patch, in
particular bug 47278.
Patch mostly by Matthias Braun
This seems to have caused incorrect register allocation in some cases,
breaking tests in the Zig standard library (PR47278).
As discussed on the bug, revert back to green for now.
> Record internal state based on register units. This is often more
> efficient as there are typically fewer register units to update
> compared to iterating over all the aliases of a register.
>
> Original patch by Matthias Braun, but I've been rebasing and fixing it
> for almost 2 years and fixed a few bugs causing intermediate failures
> to make this patch independent of the changes in
> https://reviews.llvm.org/D52010.
This reverts commit 66251f7e1d, and
follow-ups 931a68f26b
and 0671a4c508. It also adjust some
test expectations.
Record internal state based on register units. This is often more
efficient as there are typically fewer register units to update
compared to iterating over all the aliases of a register.
Original patch by Matthias Braun, but I've been rebasing and fixing it
for almost 2 years and fixed a few bugs causing intermediate failures
to make this patch independent of the changes in
https://reviews.llvm.org/D52010.
Trace through multiple COPYs when looking for a physreg source. Add
hinting for vregs that will be copied into physregs (we only hinted
for vregs getting copied to a physreg previously). Give hinted a
register a bonus when deciding which value to spill. This is part of
my rewrite regallocfast series. In fact this one doesn't even have an
effect unless you also flip the allocation to happen from back to
front of a basic block. Nonetheless it helps to split this up to ease
review of D52010
Patch by Matthias Braun
llvm-svn: 360887
Set `LiveReg::PhysReg` to zero when freeing a register instead of
removing it from the entry from `LiveRegMap`. This way no iterators get
invalidated and we can avoid passing around and updating iterators all
over the place.
This does not change any allocator decisions. It is not completely NFC
because the arbitrary iteration order through `LiveRegMap` in
`spillAll()` changes so we may get a different order in those spill
sequences (the amount of spills does not change).
This is in preparation of https://reviews.llvm.org/D52010.
llvm-svn: 346298
This is causing compilation timeouts on code with long sequences of
local values and calls (i.e. foo(1); foo(2); foo(3); ...). It turns out
that code coverage instrumentation is a great way to create sequences
like this, which how our users ran into the issue in practice.
Intel has a tool that detects these kinds of non-linear compile time
issues, and Andy Kaylor reported it as PR37010.
The current sinking code scans the whole basic block once per local
value sink, which happens before emitting each call. In theory, local
values should only be introduced to be used by instructions between the
current flush point and the last flush point, so we should only need to
scan those instructions.
llvm-svn: 329822
Summary:
Local values are constants, global addresses, and stack addresses that
can't be folded into the instruction that uses them. For example, when
storing the address of a global variable into memory, we need to
materialize that address into a register.
FastISel doesn't want to materialize any given local value more than
once, so it generates all local value materialization code at
EmitStartPt, which always dominates the current insertion point. This
allows it to maintain a map of local value registers, and it knows that
the local value area will always dominate the current insertion point.
The downside is that local value instructions are always emitted without
a source location. This is done to prevent jumpy line tables, but it
means that the local value area will be considered part of the previous
statement. Consider this C code:
call1(); // line 1
++global; // line 2
++global; // line 3
call2(&global, &local); // line 4
Today we end up with assembly and line tables like this:
.loc 1 1
callq call1
leaq global(%rip), %rdi
leaq local(%rsp), %rsi
.loc 1 2
addq $1, global(%rip)
.loc 1 3
addq $1, global(%rip)
.loc 1 4
callq call2
The LEA instructions in the local value area have no source location and
are treated as being on line 1. Stepping through the code in a debugger
and correlating it with the assembly won't make much sense, because
these materializations are only required for line 4.
This is actually problematic for the VS debugger "set next statement"
feature, which effectively assumes that there are no registers live
across statement boundaries. By sinking the local value code into the
statement and fixing up the source location, we can make that feature
work. This was filed as https://bugs.llvm.org/show_bug.cgi?id=35975 and
https://crbug.com/793819.
This change is obviously not enough to make this feature work reliably
in all cases, but I felt that it was worth doing anyway because it
usually generates smaller, more comprehensible -O0 code. I measured a
0.12% regression in code generation time with LLC on the sqlite3
amalgamation, so I think this is worth doing.
There are some special cases worth calling out in the commit message:
1. local values materialized for phis
2. local values used by no-op casts
3. dead local value code
Local values can be materialized for phis, and this does not show up as
a vreg use in MachineRegisterInfo. In this case, if there are no other
uses, this patch sinks the value to the first terminator, EH label, or
the end of the BB if nothing else exists.
Local values may also be used by no-op casts, which adds the register to
the RegFixups table. Without reversing the RegFixups map direction, we
don't have enough information to sink these instructions.
Lastly, if the local value register has no other uses, we can delete it.
This comes up when fastisel tries two instruction selection approaches
and the first materializes the value but fails and the second succeeds
without using the local value.
Reviewers: aprantl, dblaikie, qcolombet, MatzeB, vsk, echristo
Subscribers: dotdash, chandlerc, hans, sdardis, amccarth, javed.absar, zturner, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D43093
llvm-svn: 327581
I had to drop fast-isel-abort from a test because we can't fast isel some of the mask stuff. When we used intrinsics we implicitly fell back to SelectionDAG for the intrinsic call without triggering the abort error. But with native IR that doesn't happen the same way.
llvm-svn: 322050
As part of the unification of the debug format and the MIR format, print
MBB references as '%bb.5'.
The MIR printer prints the IR name of a MBB only for block definitions.
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber\(\)/" << printMBBReference(*\1)/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber\(\)/" << printMBBReference(\1)/g'
* find . \( -name "*.txt" -o -name "*.s" -o -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g'
* grep -nr 'BB#' and fix
Differential Revision: https://reviews.llvm.org/D40422
llvm-svn: 319665
Summary:
This suppresses the generation of .Lcfi labels in our textual assembler.
It was annoying that this generated cascading .Lcfi labels:
llc foo.ll -o - | llvm-mc | llvm-mc
After three trips through MCAsmStreamer, we'd have three labels in the
output when none are necessary. We should only bother creating the
labels and frame data when making a real object file.
This supercedes D38605, which moved the entire .seh_ implementation into
MCObjectStreamer.
This has the advantage that we do more checking when emitting textual
assembly, as a minor efficiency cost. Outputting textual assembly is not
performance critical, so this shouldn't matter.
Reviewers: majnemer, MatzeB
Subscribers: qcolombet, nemanjai, javed.absar, eraman, hiraditya, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D38638
llvm-svn: 315259
Summary:
ZExt and SExt from i8 to i16 aren't implemented in the autogenerated fast isel table because normal isel does a zext/sext to 32-bits and a subreg extract to avoid a partial register write or false dependency on the upper bits of the destination. This means without handling in fast isel we end up triggering a fast isel abort.
We had no custom sign extend handling at all so while I was there I went ahead and implemented sext i1->i8/i16/i32/i64 which was also missing. This generates an i1->i8 sign extend using a mask with 1, then an 8-bit negate, then continues with a sext from i8. A better sequence would be a wider and/negate, but would require more custom code.
Fast isel tests are a mess and I couldn't find a good home for the tests so I created a new one.
The test pr34381.ll had to have fast-isel removed because it was relying on a fast isel abort to hit the bug. The test case still seems valid with fast-isel disabled though some of the instructions changed.
Reviewers: spatel, zvi, igorb, guyblank, RKSimon
Reviewed By: guyblank
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37320
llvm-svn: 312422