There are a pair of folds that try to merge fneg into fsub
with an intervening cast, but as shown in the FIXME tests,
they can create extra instructions.
llvm-svn: 329501
D45344 is proposing to remove the use restriction that made the calloc
transform safe, but it doesn't currently address the problematic example
given inD16337. Add a test to make sure that doesn't break.
llvm-svn: 329412
This restores what was lost with rL73243 but without
re-introducing the bug that was present in the old code.
Note that we already have these transforms if the ops are
marked 'fast' (and I assume that's happening somewhere in
the code added with rL170471), but we clearly don't need
all of 'fast' for these transforms.
llvm-svn: 329362
A fold for this pattern was removed at rL73243 to fix PR4374:
https://bugs.llvm.org/show_bug.cgi?id=4374
...and apparently there were no tests that went with that fold.
llvm-svn: 329360
Summary:
More tests for D45108:
* One use tests
* allow shift to be a variable, too
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45329
llvm-svn: 329348
Summary:
This is a fix to PR37005.
Essentially, rL328539 ([InstCombine] reassociate loop invariant GEP chains to enable LICM) contains a bug
whereby it will convert:
%src = getelementptr inbounds i8, i8* %base, <2 x i64> %val
%res = getelementptr inbounds i8, <2 x i8*> %src, i64 %val2
into:
%src = getelementptr inbounds i8, i8* %base, i64 %val2
%res = getelementptr inbounds i8, <2 x i8*> %src, <2 x i64> %val
By swapping the index operands if the GEPs are in a loop, and %val is loop variant while %val2
is loop invariant.
This fix recreates new GEP instructions if the index operand swap would result in the type
of %src changing from vector to scalar, or vice versa.
Reviewers: sebpop, spatel
Reviewed By: sebpop
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45287
llvm-svn: 329331
There used to be a fold that would handle this case more generally,
but it was removed at rL73243 to fix PR4374:
https://bugs.llvm.org/show_bug.cgi?id=4374
llvm-svn: 329322
Using cstfp_pred_ty in the definition allows us to match vectors with undef elements.
This replicates the change for m_Not from D44076 / rL326823 and continues
towards making all pattern matchers allow undef elements in vectors.
llvm-svn: 329303
Summary:
Clang's __builtin_operator_new/delete was recently taught about the aligned allocation overloads (r328134). This patch makes LLVM aware of them as well.
This allows the compiler to perform certain optimizations including eliding new/delete calls.
Reviewers: rsmith, majnemer, dblaikie, vsk, bkramer
Reviewed By: bkramer
Subscribers: ckennelly, llvm-commits
Differential Revision: https://reviews.llvm.org/D44769
llvm-svn: 329218
Summary:
Clang's __builtin_operator_new/delete was recently taught about the aligned allocation overloads (r328134). This patch makes LLVM aware of them as well.
This allows the compiler to perform certain optimizations including eliding new/delete calls.
Reviewers: rsmith, majnemer, dblaikie, vsk, bkramer
Reviewed By: bkramer
Subscribers: ckennelly, llvm-commits
Differential Revision: https://reviews.llvm.org/D44769
llvm-svn: 329215
The tests marked with 'FIXME' require loosening the check
in SimplifyAssociativeOrCommutative() to optimize completely;
that's still checking isFast() in Instruction::isAssociative().
llvm-svn: 329121
Summary:
Folding patterns like:
%vec = shufflevector <4 x i8> %insvec, <4 x i8> undef, <4 x i32> zeroinitializer
%cast = bitcast <4 x i8> %vec to i32
%cond = icmp eq i32 %cast, 0
into:
%ext = extractelement <4 x i8> %insvec, i32 0
%cond = icmp eq i32 %ext, 0
Combined with existing rules, this allows us to fold patterns like:
%insvec = insertelement <4 x i8> undef, i8 %val, i32 0
%vec = shufflevector <4 x i8> %insvec, <4 x i8> undef, <4 x i32> zeroinitializer
%cast = bitcast <4 x i8> %vec to i32
%cond = icmp eq i32 %cast, 0
into:
%cond = icmp eq i8 %val, 0
When we construct a splat vector via a shuffle, and bitcast the vector into an integer type for comparison against an integer constant. Then we can simplify the the comparison to compare the splatted value against the integer constant.
Reviewers: spatel, anna, mkazantsev
Reviewed By: spatel
Subscribers: efriedma, rengolin, llvm-commits
Differential Revision: https://reviews.llvm.org/D44997
llvm-svn: 329087
Summary:
The cast simplifications that instcombine does here do not make any
attempt to obey the verifier rules for musttail calls. Therefore we have
to disable them.
Reviewers: efriedma, majnemer, pcc
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D45186
llvm-svn: 329027
This change brings performance of zlib up by 10%. The example below is from a
hot loop in longest_match() from zlib.
do.body:
%cur_match.addr.0 = phi i32 [ %cur_match, %entry ], [ %2, %do.cond ]
%idx.ext = zext i32 %cur_match.addr.0 to i64
%add.ptr = getelementptr inbounds i8, i8* %win, i64 %idx.ext
%add.ptr2 = getelementptr inbounds i8, i8* %add.ptr, i64 %idx.ext1
%add.ptr3 = getelementptr inbounds i8, i8* %add.ptr2, i64 -1
In this example %idx.ext1 is a loop invariant. It will be moved above the use of
loop induction variable %idx.ext such that it can be hoisted out of the loop by
LICM. The operands that have dependences carried by the loop will be sinked down
in the GEP chain. This patch will produce the following output:
do.body:
%cur_match.addr.0 = phi i32 [ %cur_match, %entry ], [ %2, %do.cond ]
%idx.ext = zext i32 %cur_match.addr.0 to i64
%add.ptr = getelementptr inbounds i8, i8* %win, i64 %idx.ext1
%add.ptr2 = getelementptr inbounds i8, i8* %add.ptr, i64 -1
%add.ptr3 = getelementptr inbounds i8, i8* %add.ptr2, i64 %idx.ext
llvm-svn: 328539
This replaces a large chunk of code that was looking for compound
patterns that include these sub-patterns. Existing tests ensure that
all of the previous examples are still folded as expected.
We still need to loosen the FMF check.
llvm-svn: 328502
We have thorough coverage of predicates and scalar types,
so we just need a sampling of vector tests to show that
things are working or not with vectors types.
llvm-svn: 328449
There are at least 3 problems:
1. We're distributing across large patterns, but fail to do that for the minimal patterns.
2. We're not checking uses, so we may create more instructions than we eliminate.
3. We should be able to do these transforms with less than full 'fast' fmuls.
llvm-svn: 328152
This is complicated by -0.0 and nan. This is based on the DAG patterns
as shown in D44091. I'm hoping that we can just remove those DAG folds
and always rely on IR canonicalization to handle the matching to fabs.
We would still need to delete the broken code from DAGCombiner to fix
PR36600:
https://bugs.llvm.org/show_bug.cgi?id=36600
Differential Revision: https://reviews.llvm.org/D44550
llvm-svn: 327858
Summary:
This pattern came up in PR36682 / D44390
https://bugs.llvm.org/show_bug.cgi?id=36682https://reviews.llvm.org/D44390https://godbolt.org/g/oKvT5H
Looking at the IR pattern in question, as per [[ https://github.com/rutgers-apl/alive-nj | alive-nj ]], for all the type combinations i checked
(input: `i16`, `i32`, `i64`; intermediate: `half`/`i16`, `float`/`i32`, `double`/`i64`)
for the following `icmp` comparisons the `uitofp`+`bitcast` can be dropped:
* `eq 0`
* `ne 0`
I did not check vectors, but i'm guessing it's the same there.
{F5889189}
Thus all these cases are in the testcase (along with the vector variant with additional `undef` element in the middle).
There are no negative patterns here (unless alive-nj lied/is broken), all of these should be optimized.
Generated with
{F5889196}
Reviewers: spatel, majnemer, efriedma, arsenm
Reviewed By: spatel
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D44416
llvm-svn: 327534
This was supposed to be an NFC refactoring that will eventually allow
eliminating the isFast() predicate, but there's a rare possibility
that we would pessimize the code as shown in the test case because
we failed to check 'hasOneUse()' properly. This version also removes
an inefficiency of the old code; we would look for:
(X * C) * C1 --> X * (C * C1)
...but that pattern is always handled by
SimplifyAssociativeOrCommutative().
llvm-svn: 327404
Summary:
This pattern came up in PR36682:
https://bugs.llvm.org/show_bug.cgi?id=36682https://godbolt.org/g/LhuD9A
Tests for proposed fix in D44367.
Looking at the IR pattern in question, as per [[ https://github.com/rutgers-apl/alive-nj | alive-nj ]], for all the type combinations i checked
(input: `i16`, `i32`, `i64`; intermediate: `half`/`i16`, `float`/`i32`, `double`/`i64`)
for the following `icmp` comparisons the `sitofp`+`bitcast` can be dropped:
* `eq 0`
* `ne 0`
* `slt 0`
* `sle 0`
* `sge 0`
* `sgt 0`
* `slt 1`
* `sge 1`
* `sle -1`
* `sgt -1`
I did not check vectors, but i'm guessing it's the same there.
{F5887419}
Thus all these cases are in the testcase (along with the vector variant with additional `undef` element in the middle).
There are no negative patterns here (unless alive-nj lied/is broken), all of these should be optimized.
Generated with {F5887551}
Reviewers: spatel, majnemer, efriedma, arsenm
Reviewed By: spatel
Subscribers: nlopes, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D44390
llvm-svn: 327301
With the updated LangRef ( D44216 / rL327138 ) in place, we can proceed with more constant folding.
I'm intentionally taking the conservative path here: no matter what the constant or the FMF, we can
always fold to NaN. This is because the undef operand can be chosen as NaN, and in our simplified
default FP env, nothing else happens - NaN just propagates to the result. If we find some way/need
to propagate undef instead, that can be added subsequently.
The tests show that we always choose the same quiet NaN constant (0x7FF8000000000000 in IR text).
There were suggestions to improve that with a 'NaN' string token or not always print a 64-bit hex
value, but those are independent changes. We might also consider setting/propagating the payload of
NaN constants as an enhancement.
Differential Revision: https://reviews.llvm.org/D44308
llvm-svn: 327208
These are uncontroversial and independent of a proposed LangRef edits (D44216).
I tried to fix tests that would fold away:
rL327004
rL327028
rL327030
rL327034
I'm not sure if the Reassociate tests are meaningless yet, but they probably will be
as we add more folds, so if anyone has suggestions or wants to fix those, please do.
Differential Revision: https://reviews.llvm.org/D44258
llvm-svn: 327058
These are based on:
https://bugs.llvm.org/show_bug.cgi?id=35875
It's not clear if/how instcombine can reduce these,
but we should have the tests here either way to
document current behavior.
llvm-svn: 327039
Using cst_pred_ty in the definition allows us to match vectors with undef elements.
This is a continuation of an effort to make all pattern matchers allow undef elements in vectors:
rL325437
rL325466
D43792
Differential Revision: https://reviews.llvm.org/D44076
llvm-svn: 326823
Summary:
Presently, InstCombiner::foldICmpWithCastAndCast() implicitly assumes that it is
only invoked with icmp instructions of integer type. If that assumption is broken,
and it is called with an icmp of vector type, then it fails (asserts/crashes).
This patch addresses the deficiency. It allows it to simplify
icmp (ptrtoint x), (ptrtoint/c) of vector type into a compare of the inputs,
much as is done when the type is integer.
Reviewers: apilipenko, fedor.sergeev, mkazantsev, anna
Reviewed By: anna
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44063
llvm-svn: 326730
This patch teaches getMinimumFPType to support shrinking a vector of ConstantFPs. This should improve our ability to combine vector fptrunc with fp binops.
Differential Revision: https://reviews.llvm.org/D43774
llvm-svn: 326729
The code was checking that all of the instructions in the
sequence are 'fast', but that's not necessary. The final
multiply is all that we need to check (tests adjusted).
The fmul doesn't need to be fully 'fast' either, but that
can be another patch.
llvm-svn: 326608
This narrow fold was added with no motivation or test cases
a bit over 5 years ago. Removing a constant operand is a
good canonicalization? We should handle Y*2.0 too then?
llvm-svn: 326606
If we are only truncating bits from the extend we should be able to just use a smaller extend.
If we are truncating more than the extend we should be able to just use a fptrunc since the presense of the fpextend shouldn't affect rounding.
Differential Revision: https://reviews.llvm.org/D43970
llvm-svn: 326595
This includes the test cases from D43970 and additional tests for combining (fptrunc (binop (fpext), (fpext))) where the pre-extended types don't match the trunc and therefore can't be completely removed.
llvm-svn: 326528
The loads and stores were getting the data and storing the results. There's no reason we can't just use function arguments and return.
llvm-svn: 326515
This is a retry of r326502 with updates to the reassociate
test file that I missed the first time.
@test15_reassoc in the supposed -reassociate test file
(except that it tests 2 other passes too...) shows that
there's no clear responsiblity for reassociation transforms.
Instcombine now gets that case, but only because the
constant values are identical. Otherwise, it would still
miss that pattern.
Reassociate doesn't get that case because it hasn't been
updated to use less than 'fast' FMF.
llvm-svn: 326513
I forgot that I added tests for 'reassoc' to -reassociate, but
suprisingly that file calls -instcombine too, so it is affected.
I'll update that file and try again.
llvm-svn: 326510
Note: gcc appears to allow this fold with -freciprocal-math alone,
but clang/llvm require more than that with this patch. The wording
in the definitions seems fuzzy enough that it could go either way,
but we'll err on the conservative side of FMF interpretation.
This patch also changes the newly created fmul to have FMF propagated
by the last fdiv rather than intersecting the FMF of the fdivs. This
matches the behavior of other folds near here. The new fmul is only
used to produce an intermediate op for the final fdiv result, so it
shouldn't be any stricter than that result. The previous behavior
could result in dropping FMF via other folds in instcombine or CSE.
Differential Revision: https://reviews.llvm.org/D43398
llvm-svn: 326098
This is usually not a problem because this code's main purpose is
eliminating unused new/delete pairs. We got deletes of nullptr or
nobuiltin deletes of builtin new wrong though.
llvm-svn: 325630