Commit Graph

6537 Commits

Author SHA1 Message Date
Augie Fackler 5e4c75db3b InstructionCombining: avoid eliding mismatched alloc/free pairs
Prior to this change LLVM would happily elide a call to any allocation
function and a call to any free function operating on the same unused
pointer. This can cause problems in some obscure cases, for example if
the body of operator::new can be inlined but the body of
operator::delete can't, as in this example from jyknight:

    #include <stdlib.h>
    #include <stdio.h>

    int allocs = 0;

    void *operator new(size_t n) {
        allocs++;
        void *mem = malloc(n);
        if (!mem) abort();
        return mem;
    }

    __attribute__((noinline)) void operator delete(void *mem) noexcept {
        allocs--;
        free(mem);
    }

    void deleteit(int*i) { delete i; }
    int main() {
        int*i = new int;
        deleteit(i);
        if (allocs != 0)
          printf("MEMORY LEAK! allocs: %d\n", allocs);
    }

This patch addresses the issue by introducing the concept of an
allocator function family and uses it to make sure that alloc/free
function pairs are only removed if they're in the same family.

Differential Revision: https://reviews.llvm.org/D117356
2022-03-04 10:41:10 -05:00
Nikita Popov c1b9667148 [InstCombine] Support opaque pointers in callee bitcast fold
To make this actually trigger, we also need to check whether the
function types differ, which is a hidden cast under opaque pointers.
The transform is somewhat less relevant there because it is
primarily about pointer bitcasts, but it can also happen with other
bit- or pointer-castable types.

Byval handling is easier with opaque pointers because there is no
need to adjust the byval type, we only need to make sure that it's
still a pointer.
2022-03-03 11:07:39 +01:00
Nikita Popov 6c8adc5054 [InstCombine] Remove unnecessary byval check in callee cast fold
The logic for handling this was fixed in
8d7f118ab2, but the check for byval
on the callee was retained. This resulted in a weird situation
where the transform would work depending on whether the byval
was only on the call or on both the call and the function.
2022-03-03 10:55:14 +01:00
Nikita Popov 2555ed55a4 [InstCombine] Add callee bitcast test with byval on callee (NFC)
Same as the existing test, but the callee also has a byval
attribute.
2022-03-03 10:55:14 +01:00
Nikita Popov 61580d0949 Reapply [InstCombine] Remove one-use limitation from X-Y==0 fold
This is a recommit without changes. I originally reverted this
due to a significant code-size regression on tramp3d-v4, however
further investigation showed that in the tramp3d-v4 case this
change enables additional optimizations (in particular more
jump threading), which happens to reduce the size of a function
just enough to be eligible for inlining at hot callsites, which
results in the code size increase. As such, this was just bad
luck.

-----

This one-use limitation is artificial, we do not increase
instruction count if we perform the fold with multiple uses. The
motivating case is shown in @sub_eq_zero_select, where the one-use
limitation causes us to miss a subsequent select fold.

I believe the backend is pretty good about reusing flag-producing
subs for cmps with same operands, so I think doing this is fine.

Differential Revision: https://reviews.llvm.org/D120337
2022-03-02 16:43:33 +01:00
Nikita Popov 5555252b52 [InstCombine] Add additional test for phi to switch cond fold (NFC)
This test exposes a bug in the edge dominance implementation.
2022-03-02 14:33:15 +01:00
Nikita Popov 5cf06d10f8 Revert "[InstCombine] Support switch in phi to cond fold"
This reverts commit 0817ce86b5.

Seeing some ppc64le stage2 failures, reverting to investigate.
2022-03-02 12:49:47 +01:00
Nikita Popov 0817ce86b5 [InstCombine] Support switch in phi to cond fold
For conditional branches, we know the value is i1 0 or i1 1 along
the outgoing edges. For switches we can apply exactly the same
optimization, just with the known values determined by the switch
cases.
2022-03-02 12:16:32 +01:00
Nikita Popov 85491fb6e4 [InstCombine] Add tests for phi to cond with switch (NFC)
Currently we only handle br but not switch in this fold.
2022-03-02 11:06:15 +01:00
Nikita Popov a1f442b278 [InstCombine] Support phi to cond fold with more than two preds
This transform can still be applied if there are more than two
phi inputs, as long as phi inputs with the same value are dominated
by the same idom edge.
2022-03-01 16:31:49 +01:00
Nikita Popov 0bb698a2fb [InstCombine] Add additional test for phi to condition fold (NFC)
This one does not have an intermediate block for the true branch,
and demonstrates the importance of using edge dominance.
2022-03-01 16:08:47 +01:00
Nikita Popov a968bee093 [InstCombine] Add more tests for phi to cond fold (NFC)
These have more than two predecessors.
2022-03-01 15:47:55 +01:00
Nikita Popov 26748bb15a [InstCombine] Slightly relax one-use check in abs canonicalization
Treat the icmp and sub symmetrically, and require that one of them
has one use, not the icmp in particular. This could be further
relaxed in the abs (but not nabs) case to not check one-use at
all.
2022-03-01 15:06:41 +01:00
Sanjay Patel 84812b9b07 [InstCombine] drop FMF in select->copysign transform
It is not correct to propagate flags from the select
to the new instructions:
https://alive2.llvm.org/ce/z/tNATrd
https://alive2.llvm.org/ce/z/VwcVzn

Fixes #54077
2022-03-01 08:51:41 -05:00
Sanjay Patel 53dbedcd18 [InstCombine] add test for copysign with FMF propagation; NFC
This is a miscompile as noted in #54077.
2022-03-01 08:51:40 -05:00
Sanjay Patel 278b407a30 [InstCombine] fold mul-with-overflow intrinsic with -1 operand
extractvalue (any_mul_with_overflow X, -1), 0 --> -X

There are similar other potential transforms that we could do as
noted by the last TODO in the test diffs.

Fixes #54053
2022-02-28 14:13:48 -05:00
Sanjay Patel f422c5d871 [InstCombine] fold select-of-zero-or-ones with negated op
(X u< 2) ? -X : -1 --> sext (X != 0)
(X u> 1) ? -1 : -X --> sext (X != 0)

https://alive2.llvm.org/ce/z/U3y5Bb
https://alive2.llvm.org/ce/z/hgi-4p

This is part of solving:
2022-02-28 12:07:49 -05:00
Sanjay Patel 614f36fd38 [InstCombine] add tests for select of zero or all ones; NFC
See #54053
2022-02-28 11:20:02 -05:00
Sanjay Patel 2dc90eee46 [InstCombine] add tests for mul-with-overflow by -1; NFC 2022-02-28 10:29:19 -05:00
Nikita Popov e1608a9df8 [InstCombine] Remove SPF min/max canonicalization
Now that we canonicalize SPF min/max to intrinsics, there's no
need to canonicalize the structure of the SPF min/max itself
anymore. This is conceptually NFC, but in practice does slightly
impact results due to folding order differences.
2022-02-25 11:24:09 +01:00
Sanjay Patel 5379f76e63 [InstCombine] try harder to preserve 'nsz' in fneg-of-select transform
The corner case where 'nsz' needs to be removed is very narrow
as discussed here:
https://reviews.llvm.org/rG3cdd05e519dd

If the select condition is not undef, there's no problem with
propagating 'nsz':
https://alive2.llvm.org/ce/z/4GWJdq
2022-02-24 10:43:53 -05:00
Sanjay Patel 788b08a58c [InstCombine] add test for fneg of select with FMF; NFC 2022-02-24 10:42:25 -05:00
Nikita Popov a266af7211 [InstCombine] Canonicalize SPF to min/max intrinsics
Now that integer min/max intrinsics have good support in both
InstCombine and other passes, start canonicalizing SPF min/max
to intrinsic min/max.

Once this sticks, we can stop matching SPF min/max in various
places, and can remove hacks we have for preventing infinite loops
and breaking of SPF canonicalization.

Differential Revision: https://reviews.llvm.org/D98152
2022-02-24 09:01:20 +01:00
Nikita Popov aa551ad198 Revert "[InstCombine] Remove one-use limitation from X-Y==0 fold"
This reverts commit 65dc78d63e.

This caused a major code-size regression on tramp3d-v4, revert
until I can investigate.
2022-02-24 08:50:40 +01:00
Nikita Popov 587c7ff15c [InstCombine] Support min/max intrinsics in udiv->lshr fold
This complements the existing fold for selects. This fold is a bit
more conservative, requiring one-use. The other folds here should
probably also be subjected to a one-use restriction.

https://alive2.llvm.org/ce/z/Q9eCDU
https://alive2.llvm.org/ce/z/8YK2CJ
2022-02-23 15:51:36 +01:00
Nikita Popov 2824a65c1f [InstCombine] Add tests for udiv->lshr fold with min/max intrinsics (NFC) 2022-02-23 15:51:36 +01:00
Nikita Popov 5ccb0582c2 [InstCombine] Simplify udiv -> lshr folding
What we're really doing here is converting Op0 udiv Op1 into
Op0 lshr log2(Op1), so phrase it in that way. Actually pushing
the lshr into the log2(Op1) expression should be seen as a separate
transform.
2022-02-23 14:55:23 +01:00
Nikita Popov 6777ec9e4d [ValueTracking] Support signed intrinsic clamp
This is the same special logic we apply for SPF signed clamps
when computing the number of sign bits, just for intrinsics.

This just uses the same logic as the select case, but there's
multiple directions this could be improved in: We could also use
the num sign bits from the clamped value, we could do this during
constant range calculation, and there's probably unsigned analogues
for the constant range case at least.
2022-02-23 12:45:16 +01:00
Nikita Popov d6e008089c [InstCombine] Add tests for add of clamp pattern (NFC)
Add intrinsic versions of existing SPF tests.
2022-02-23 12:45:16 +01:00
Nikita Popov e2f627e5e3 [InstCombine] Fold sub of umin to usub.sat
We were handling sub of umax, but not the conjugated umin case.

https://alive2.llvm.org/ce/z/4fdZfy
https://alive2.llvm.org/ce/z/BhUQBM
2022-02-23 12:00:34 +01:00
Nikita Popov 4b5261e10f [InstCombine] Add tests for sub of umin intrinsic (NFC)
We should be converting these into usub.sat.
2022-02-23 12:00:34 +01:00
Nikita Popov 65dc78d63e [InstCombine] Remove one-use limitation from X-Y==0 fold
This one-use limitation is artificial, we do not increase
instruction count if we perform the fold with multiple uses. The
motivating case is shown in @sub_eq_zero_select, where the one-use
limitation causes us to miss a subsequent select fold.

I believe the backend is pretty good about reusing flag-producing
subs for cmps with same operands, so I think doing this is fine.

Differential Revision: https://reviews.llvm.org/D120337
2022-02-23 09:37:30 +01:00
Philip Reames a9861d3c85 [instcombine] Avoid binops for comparison consistency tests
It turns out that instcombine is smarter than I am, and several of these ended up folded for the wrong reasons.
2022-02-22 17:26:03 -08:00
Philip Reames ea31442279 [NFC] Add a bit more coverage for an upcoming patch 2022-02-22 16:36:15 -08:00
Philip Reames 9030d90aeb [instcombine] Add coverage for consistent use of unescaped malloc case 2022-02-22 16:21:56 -08:00
Philip Reames 8b9f42b61b [instcombine] Autogen a test for ease of update 2022-02-22 16:02:27 -08:00
Philip Reames 2cca2c7d18 [instcombine] Extend test coverage for a tricky bit of reasoning about unescaped mallocs 2022-02-22 16:01:39 -08:00
Philip Reames 57a6d92163 [instcombine] Add test coverage for a tricky bit of reasoning about unescaped mallocs 2022-02-22 15:52:42 -08:00
Nikita Popov f4e9df22b5 [InstCombine] Add test for missed select fold due to one use limitation (NFC)
The eq sub zero fold currently has an artificial one-use limitation,
causing us to miss this fold.
2022-02-22 17:57:00 +01:00
David Sherwood 47eff645d8 [InstCombine] Bail out of load-store forwarding for scalable vector types
This patch fixes an invalid TypeSize->uint64_t implicit conversion in
FoldReinterpretLoadFromConst. If the size of the constant is scalable
we bail out of the optimisation for now.

Tests added here:

  Transforms/InstCombine/load-store-forward.ll

Differential Revision: https://reviews.llvm.org/D120240
2022-02-22 09:26:04 +00:00
Philip Reames 357b18e282 [instcombine] Add/cleanup attributes in a test 2022-02-18 19:01:55 -08:00
Arthur Eubanks 4a26abc0b9 [InstCombine][OpaquePtr] Check store type in DSE implementation 2022-02-17 10:01:14 -08:00
Sanjay Patel 58df2da054 [InstCombine] push constant operand down/outside in sequence of min/max intrinsics
A generalization like this was suggested in D119754.
This is the inverse direction of D119851,
and we get all of the folds there plus the one that was missed.

There is precedence for this kind of transform in instcombine
with "or" instructions (but strangely only with that one opcode AFAICT).

Similar justification as in the other patch:
The line between instcombine and reassociate for these kinds of folds
is blurry. This doesn't appear to have much cost and gives us the
expected wins from repeated folds as seen in the last set of test diffs.

Differential Revision: https://reviews.llvm.org/D119955
2022-02-17 10:36:37 -05:00
Sanjay Patel 234a8422c9 [InstCombine] add test for min/max intrinsic with constant expression; NFC 2022-02-17 10:36:37 -05:00
Sanjay Patel f150d295da [InstCombine] add tests for min/max reassociation; NFC
D119851
2022-02-16 12:16:24 -05:00
Sanjay Patel 483ae099f0 [InstCombine] add test for min/max intrinsic reassociation; NFC
D119851
2022-02-16 09:16:16 -05:00
Chuanqi Xu a2609be0b2 [ValueTracking] Checking haveNoCommonBitsSet for (x & y) and ~(x | y)
This one tries to fix:
https://github.com/llvm/llvm-project/issues/53357.

Simply, this one would check (x & y) and ~(x | y) in
haveNoCommonBitsSet. Since they shouldn't have common bits (we could
traverse the case by enumerating), and we could convert this one to (x &
y) | ~(x | y) . Then the compiler could handle it in
InstCombineAndOrXor.
Further more, since ((x & y) + (~x & ~y)) would be converted to ((x & y)
+ ~(x | y)), this patch would fix it too.

https://alive2.llvm.org/ce/z/qsKzRS

Reviewed By: spatel, xbolva00, RKSimon, lebedev.ri

Differential Revision: https://reviews.llvm.org/D118094
2022-02-16 13:42:52 +08:00
Sanjay Patel 6357ccf57f [InstCombine] reassociate min/max intrinsics with constant operands
Integer min/max operations are associative:
  max (max X, C0), C1 --> max X, (max C0, C1) --> max X, NewC

https://alive2.llvm.org/ce/z/wW5HVM

This would avoid a regression when we canonicalize to min/max intrinsics
(see D98152 ).

Differential Revision: https://reviews.llvm.org/D119754
2022-02-15 08:31:23 -05:00
Simon Pilgrim 9606c69087 [InstCombine] Fold sub(Y,and(lshr(X,C),1)) --> add(ashr(shl(X,(BW-1)-C),BW-1),Y) (PR53610)
As noted on PR53610, we can fold a 'bit splat' negation of a shifted bitmask pattern into a pair of shifts.

https://alive2.llvm.org/ce/z/eGrsoN

Differential Revision: https://reviews.llvm.org/D119715
2022-02-15 13:24:20 +00:00
Sanjay Patel d1f32a2021 [InstCombine] add tests for min/max intrinsics; NFC 2022-02-15 07:48:28 -05:00