This is a subset of the original commit from rL359879
which was reverted because it could crash when using the 'RemovedInstructions'
structure that enables delayed deletion of dead instructions. The motivating
compile-time win does not require that change though. We should get most of
that win from this change alone.
Using/updating a dominator tree to match math overflow patterns may be very
expensive in compile-time (because of the way CGP uses a DT), so just handle
the single-block case.
See post-commit thread for rL354298 for more details:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190422/646276.html
Differential Revision: https://reviews.llvm.org/D61075
llvm-svn: 359969
Using/updating a dominator tree to match math overflow patterns may be very
expensive in compile-time (because of the way CGP uses a DT), so just handle
the single-block case.
Also, we were restarting the iterator loops when doing the overflow intrinsic
transforms by marking the dominator tree for update. That was done to prevent
iterating over a removed instruction. But we can postpone the deletion using
the existing "RemovedInsts" structure, and that means we don't need to update
the DT.
See post-commit thread for rL354298 for more details:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190422/646276.html
Differential Revision: https://reviews.llvm.org/D61075
llvm-svn: 359879
As it's causing some bot failures (and per request from kbarton).
This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.
llvm-svn: 358546
This is probably a bigger limitation than necessary, but since we don't have any evidence yet
that this transform led to real-world perf improvements rather than regressions, I'm making a
quick, blunt fix.
In the motivating x86 example from:
https://bugs.llvm.org/show_bug.cgi?id=41129
...and shown in the regression test, we want to avoid an extra instruction in the dominating
block because that could be costly.
The x86 LSR test diff is reversing the changes from D57789. There's no evidence that 1 version
is any better than the other yet.
Differential Revision: https://reviews.llvm.org/D59602
llvm-svn: 356665
This is almost the same as:
rL355345
...and should prevent any potential crashing from examples like:
https://bugs.llvm.org/show_bug.cgi?id=41064
...although the bug was masked by:
rL355823
...and I'm not sure how to repro the problem after that change.
llvm-svn: 356218
The test is reduced from an example in the post-commit thread for:
rL354746
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190304/632396.html
While we must avoid dying here, the real question should be:
Why is non-canonical and/or degenerate code making it to CGP when
using the new pass manager?
llvm-svn: 355345
There's likely a missed IR canonicalization for at least 1 of these
patterns. Otherwise, we wouldn't have needed the pattern-matching
enhancement in D57516.
Note that -- unlike usubo added with D57789 -- the TLI hook for
this transform defaults to 'on'. So if there's any perf fallout
from this, targets should look at how they're lowering the uaddo
node in SDAG and/or override that hook.
The x86 diffs suggest that there's some missing pattern-matching
for forming inc/dec.
This should fix the remaining known problems in:
https://bugs.llvm.org/show_bug.cgi?id=40486https://bugs.llvm.org/show_bug.cgi?id=31754
llvm-svn: 354746
The motivating x86 cases for forming the intrinsic are shown in PR31754 and PR40487:
https://bugs.llvm.org/show_bug.cgi?id=31754https://bugs.llvm.org/show_bug.cgi?id=40487
..and those are shown in the IR test file and x86 codegen file.
Matching the usubo pattern is harder than uaddo because we have 2 independent values rather than a def-use.
This adds a TLI hook that should preserve the existing behavior for uaddo formation, but disables usubo
formation by default. Only x86 overrides that setting for now although other targets will likely benefit
by forming usbuo too.
Differential Revision: https://reviews.llvm.org/D57789
llvm-svn: 354298
This is no-functional-change-intended although there could
be intermediate variations caused by a difference in the
debug info produced by setting that from the builder's
insertion point.
I'm updating the IR test file associated with this code just
to show that the naming differences from using the builder
are visible.
The motivation for adding a helper function is that we are
likely to extend this code to deal with other overflow ops.
llvm-svn: 353056
There are 2 changes visible here:
1. There's no reason to limit this transform based on number
of condition registers. That diff allows PPC to produce
slightly better (dot-instructions should be generally good)
code.
Note: someone that cares about PPC codegen might want to
look closer at that output because it seems like we could
still improve this.
2. We (probably?) should not bother trying to form uaddo (or
other overflow ops) when there's no target support for such
an op. This goes beyond checking whether the op is expanded
because both PPC and AArch64 show better codegen for standard
types regardless of whether the op is legal/custom.
llvm-svn: 353001
This is the most important uaddo problem mentioned in PR31754:
https://bugs.llvm.org/show_bug.cgi?id=31754
...but that was overcome in x86 codegen with D57637.
That patch also corrects the inc vs. add regressions seen with the previous attempt at this.
Still, we want to make this matcher complete, so we can potentially canonicalize the pattern
even if it's an 'add 1' operation.
Pattern matching, however, shouldn't assume that we have canonicalized IR, so we match 4
commuted variants of uaddo.
There's also a test with a crazy type to show that the existing CGP transform based on this
matcher is not limited by target legality checks.
I'm not sure if the Hexagon diff means the test is no longer testing what it intended to
test, but that should be solvable in a follow-up.
Differential Revision: https://reviews.llvm.org/D57516
llvm-svn: 352998