Commit Graph

21 Commits

Author SHA1 Message Date
Sanjay Patel 5ab41a7a05 [CodeGenPrepare] limit overflow intrinsic matching to a single basic block (2nd try)
This is a subset of the original commit from rL359879
which was reverted because it could crash when using the 'RemovedInstructions'
structure that enables delayed deletion of dead instructions. The motivating
compile-time win does not require that change though. We should get most of
that win from this change alone.

Using/updating a dominator tree to match math overflow patterns may be very
expensive in compile-time (because of the way CGP uses a DT), so just handle
the single-block case.

See post-commit thread for rL354298 for more details:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190422/646276.html

Differential Revision: https://reviews.llvm.org/D61075

llvm-svn: 359969
2019-05-04 12:46:32 +00:00
Evgeniy Stepanov 46ec57e576 Revert "[CodeGenPrepare] limit overflow intrinsic matching to a single basic block"
This reverts commit r359879, which introduced a compiler crash.

llvm-svn: 359908
2019-05-03 17:31:49 +00:00
Sanjay Patel 8ff072e48e [CodeGenPrepare] limit overflow intrinsic matching to a single basic block
Using/updating a dominator tree to match math overflow patterns may be very
expensive in compile-time (because of the way CGP uses a DT), so just handle
the single-block case.

Also, we were restarting the iterator loops when doing the overflow intrinsic
transforms by marking the dominator tree for update. That was done to prevent
iterating over a removed instruction. But we can postpone the deletion using
the existing "RemovedInsts" structure, and that means we don't need to update
the DT.

See post-commit thread for rL354298 for more details:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190422/646276.html

Differential Revision: https://reviews.llvm.org/D61075

llvm-svn: 359879
2019-05-03 13:09:18 +00:00
Eric Christopher cee313d288 Revert "Temporarily Revert "Add basic loop fusion pass.""
The reversion apparently deleted the test/Transforms directory.

Will be re-reverting again.

llvm-svn: 358552
2019-04-17 04:52:47 +00:00
Eric Christopher a863435128 Temporarily Revert "Add basic loop fusion pass."
As it's causing some bot failures (and per request from kbarton).

This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.

llvm-svn: 358546
2019-04-17 02:12:23 +00:00
Sanjay Patel d47eac59ef [CodeGenPrepare] limit formation of overflow intrinsics (PR41129)
This is probably a bigger limitation than necessary, but since we don't have any evidence yet
that this transform led to real-world perf improvements rather than regressions, I'm making a
quick, blunt fix.

In the motivating x86 example from:
https://bugs.llvm.org/show_bug.cgi?id=41129
...and shown in the regression test, we want to avoid an extra instruction in the dominating
block because that could be costly.

The x86 LSR test diff is reversing the changes from D57789. There's no evidence that 1 version
is any better than the other yet.

Differential Revision: https://reviews.llvm.org/D59602

llvm-svn: 356665
2019-03-21 13:57:07 +00:00
Sanjay Patel fb44f99b73 [CGP][x86] add tests for usubo regression (PR41129); NFC
llvm-svn: 356559
2019-03-20 15:02:35 +00:00
Sanjay Patel 2c9275a790 [CGP] add another bailout for degenerate code (PR41064)
This is almost the same as:
rL355345
...and should prevent any potential crashing from examples like:
https://bugs.llvm.org/show_bug.cgi?id=41064
...although the bug was masked by:
rL355823
...and I'm not sure how to repro the problem after that change.

llvm-svn: 356218
2019-03-14 23:14:31 +00:00
Sanjay Patel 3b2d0bc7c2 [CodeGenPrepare] avoid crashing on non-canonical/degenerate code
The test is reduced from an example in the post-commit thread for:
rL354746
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190304/632396.html

While we must avoid dying here, the real question should be:
Why is non-canonical and/or degenerate code making it to CGP when
using the new pass manager?

llvm-svn: 355345
2019-03-04 22:47:13 +00:00
Sanjay Patel cb04ba032f [CGP] add special-cases to form unsigned add with overflow (PR40486)
There's likely a missed IR canonicalization for at least 1 of these
patterns. Otherwise, we wouldn't have needed the pattern-matching
enhancement in D57516.

Note that -- unlike usubo added with D57789 -- the TLI hook for
this transform defaults to 'on'. So if there's any perf fallout
from this, targets should look at how they're lowering the uaddo
node in SDAG and/or override that hook.

The x86 diffs suggest that there's some missing pattern-matching
for forming inc/dec.

This should fix the remaining known problems in:
https://bugs.llvm.org/show_bug.cgi?id=40486
https://bugs.llvm.org/show_bug.cgi?id=31754

llvm-svn: 354746
2019-02-24 15:31:27 +00:00
Sanjay Patel 973143ab79 [CGP] add tests for uaddo increment/decrement; NFC
llvm-svn: 354699
2019-02-22 23:19:34 +00:00
Sanjay Patel 198cc305e9 [CGP] match a special-case of unsigned subtract overflow
This is the 'sub0' (negate) pattern from PR31754:
https://bugs.llvm.org/show_bug.cgi?id=31754

llvm-svn: 354519
2019-02-20 21:23:04 +00:00
Sanjay Patel 8d91faa481 [CGP][x86] add tests for usubo special-case; NFC
This is another example from PR31754:
https://bugs.llvm.org/show_bug.cgi?id=31754

llvm-svn: 354475
2019-02-20 15:40:58 +00:00
Sanjay Patel d8b4efcb6b [CGP] form usub with overflow from sub+icmp
The motivating x86 cases for forming the intrinsic are shown in PR31754 and PR40487:
https://bugs.llvm.org/show_bug.cgi?id=31754
https://bugs.llvm.org/show_bug.cgi?id=40487
..and those are shown in the IR test file and x86 codegen file.

Matching the usubo pattern is harder than uaddo because we have 2 independent values rather than a def-use.

This adds a TLI hook that should preserve the existing behavior for uaddo formation, but disables usubo
formation by default. Only x86 overrides that setting for now although other targets will likely benefit
by forming usbuo too.

Differential Revision: https://reviews.llvm.org/D57789

llvm-svn: 354298
2019-02-18 23:33:05 +00:00
Sanjay Patel b696b771bf [CGP] add test for unsigned subtract of 1 with overflow; NFC
llvm-svn: 353179
2019-02-05 15:27:40 +00:00
Sanjay Patel 738e17b5df [CGP] fix bogus test names/comments; NFC
Inverted operand 0 and operand 1.

llvm-svn: 353106
2019-02-04 22:37:05 +00:00
Sanjay Patel 1f9e23e3cc [CGP] add tests for usubo; NFC
llvm-svn: 353103
2019-02-04 22:27:08 +00:00
Sanjay Patel c00bdab4c8 [CGP] use IRBuilder to simplify code
This is no-functional-change-intended although there could
be intermediate variations caused by a difference in the
debug info produced by setting that from the builder's 
insertion point. 

I'm updating the IR test file associated with this code just
to show that the naming differences from using the builder
are visible.

The motivation for adding a helper function is that we are
likely to extend this code to deal with other overflow ops.

llvm-svn: 353056
2019-02-04 16:30:46 +00:00
Sanjay Patel 84ceae6048 [CGP] adjust target constraints for forming uaddo
There are 2 changes visible here:
1. There's no reason to limit this transform based on number
   of condition registers. That diff allows PPC to produce 
   slightly better (dot-instructions should be generally good) 
   code.
   Note: someone that cares about PPC codegen might want to 
   look closer at that output because it seems like we could
   still improve this.

2. We (probably?) should not bother trying to form uaddo (or
   other overflow ops) when there's no target support for such
   an op. This goes beyond checking whether the op is expanded
   because both PPC and AArch64 show better codegen for standard
   types regardless of whether the op is legal/custom.

llvm-svn: 353001
2019-02-03 17:53:09 +00:00
Sanjay Patel 837552fe9f [PatternMatch] add special-case uaddo matching for increment-by-one (2nd try)
This is the most important uaddo problem mentioned in PR31754:
https://bugs.llvm.org/show_bug.cgi?id=31754
...but that was overcome in x86 codegen with D57637.

That patch also corrects the inc vs. add regressions seen with the  previous attempt at this.

Still, we want to make this matcher complete, so we can potentially canonicalize the pattern 
even if it's an 'add 1' operation.
Pattern matching, however, shouldn't assume that we have canonicalized IR, so we match 4 
commuted variants of uaddo.

There's also a test with a crazy type to show that the existing CGP transform based on this 
matcher is not limited by target legality checks.

I'm not sure if the Hexagon diff means the test is no longer testing what it intended to
test, but that should be solvable in a follow-up.

Differential Revision: https://reviews.llvm.org/D57516

llvm-svn: 352998
2019-02-03 16:16:48 +00:00
Sanjay Patel b961bd68f0 [CGP] move test file to prevent bot failures
The test specifiies the triple, so it needs to be in the
x86 directory in case a bot has been configured without
the x86 target.

llvm-svn: 352992
2019-02-03 14:19:45 +00:00