We assumed the constant was a scalar when creating the replacement operand.
Also, improve tests for this fold and move the tests for this fold to their own file.
I'll move the related and missing tests to this file as a follow-up.
llvm-svn: 306985
I noticed this missed bswap optimization in the CGP memcmp() expansion,
and then I saw that we don't have the fold in InstCombine.
Differential Revision: https://reviews.llvm.org/D34763
llvm-svn: 306980
Summary:
I came across this while thinking about what would happen if one of the operands in this xor pattern was itself a inverted (A & ~B) ^ (~A & B)-> (A^B).
The patterns here assume that the (~a | ~b) will be demorganed to ~(a & b) first. Though I wonder if there's a multiple use case that would prevent the demorgan.
Reviewers: spatel
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34870
llvm-svn: 306967
There are two conditions ORed here with similar checks and each contain two matches that must be true for the if to succeed. With the commutable match on the first half of the OR then both ifs basically have the same first part and only the second part distinguishs. With this change we move the commutable match to second half and make the first half unique.
This caused some tests to change because we now produce a commuted result, but this shouldn't matter in practice.
llvm-svn: 306800
Summary:
The original intent of test/Transforms/InstCombine/memset.ll was to test for lowering of llvm.memset into stores when the size of the memset is 1, 2, 4, or 8. Sometime between then and now the test has stopped testing for that, but remained passing due to testing for the absence of llvm.memset calls rather than the presence of store instructions. Right now this test ends up with an empty function body because the alloca is eliminated as safe-to-remove, which results in the llvm.memset calls's being eliminated due to their pointer args being undef; so it is not testing for conversion of llvm.memset into store instructions at all.
This change alters the test to verify that store instructions are created, and moves the target of the memset to an arg of the proc to avoid it being eliminated as unused.
Reviewers: anna, efriedma
Reviewed By: efriedma
Subscribers: efriedma, llvm-commits
Differential Revision: https://reviews.llvm.org/D34642
llvm-svn: 306681
Summary:
Rather than testing for expected results, test/Transforms/InstCombine/memmove.ll is testing for the absence of calls to llvm.memmove.
In the case of test3, the test has stopped testing for materialization of loads/stores, but remained passing due to testing for the absence of llvm.memset calls rather than the presence of load/store instructions. Right now this test ends up with an empty function body because the alloca is eliminated as safe-to-remove, which results in the llvm.memmove calls being eliminated due to a pointer arg being undef; so it is not testing for conversion of llvm.memmove into load/store instructions at all.
Reviewers: eli.friedman, anna, efriedma
Reviewed By: efriedma
Subscribers: efriedma, llvm-commits
Differential Revision: https://reviews.llvm.org/D34645
llvm-svn: 306679
Summary:
As discussed on the mailing list it is legal to propagate TBAA to loads/stores
from/to smaller regions of a larger load tagged with TBAA. Do so for
(load->extractvalue)=>(gep->load) and similar foldings.
Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D31954
llvm-svn: 306615
I think we only need to make sure the value fits in 64-bits not that bit width is 64-bit.
This helps places that use this for shift amounts since the shift amount needs to be the same bitwidth as the LHS, but can't be larger than the bit width.
Differential Revision: https://reviews.llvm.org/D34737
llvm-svn: 306577
Summary:
This commit allows matchSelectPattern to recognize clamp of float
arguments in the presence of FMF the same way as already done for
integers.
This case is a little different though. With integers, given the
min/max pattern is recognized, DAGBuilder starts selecting MIN/MAX
"automatically". That is not the case for float, because for them only
full FMINNAN/FMINNUM/FMAXNAN/FMAXNUM ISD nodes exist and they do care
about NaNs. On the other hand, some backends (e.g. X86) have only
FMIN/FMAX nodes that do not care about NaNS and the former NAN/NUM
nodes are illegal thus selection is not happening. So I decided to do
such kind of transformation in IR (InstCombiner) instead of
complicating the logic in the backend.
Reviewers: spatel, jmolloy, majnemer, efriedma, craig.topper
Reviewed By: efriedma
Subscribers: hiraditya, javed.absar, n.bozhenov, llvm-commits
Patch by Andrei Elovikov <andrei.elovikov@intel.com>
Differential Revision: https://reviews.llvm.org/D33186
llvm-svn: 306525
Summary:
This commit adds the tests for clamp pattern as a prerequisite of
D33186 to make the impact of that fix more clear and also to document
current behavior.
Reviewers: spatel, jmolloy
Reviewed By: spatel
Subscribers: n.bozhenov, llvm-commits
Patch by Andrei Elovikov <andrei.elovikov@intel.com>
Differential Revision: https://reviews.llvm.org/D34350
llvm-svn: 306524
The check to see if we can propagate the nsw flag used m_ConstantInt(uint64_t*&) which doesn't work with splat vectors and has a restriction that the bitwidth of the ConstantInt must be 64-bits are less.
This patch changes it to use m_APInt to remove both these issues
Differential Revision: https://reviews.llvm.org/D34699
llvm-svn: 306457
This canonicalization was suggested in D33172 as a way to make InstCombine behavior more uniform.
We have this transform for icmp+br, so unless there's some reason that icmp+select should be
treated differently, we should do the same thing here.
The benefit comes from increasing the chances of creating identical instructions. This is shown in
the tests in logical-select.ll (PR32791). InstCombine doesn't fold those directly, but EarlyCSE
can simplify the identical cmps, and then InstCombine can fold the selects together.
The possible regression for the tests in select.ll raises questions about poison/undef:
http://lists.llvm.org/pipermail/llvm-dev/2017-May/113261.html
...but that transform is just as likely to be triggered by this canonicalization as it is to be
missed, so we're just pointing out a commutation deficiency in the pattern matching:
https://reviews.llvm.org/rL228409
Differential Revision: https://reviews.llvm.org/D34242
llvm-svn: 306435
Not sure why this restriction existed, but it seems like we should support any size Constant here.
The particular pattern in the tests is not the only use of this matcher in the tree. There's one in CodeGenPrepare and one in InstSimplify as well.
Differential Revision: https://reviews.llvm.org/D34666
llvm-svn: 306417
http://rise4fun.com/Alive/i8Q
A narrow bitwise logic op is obviously better than math for value tracking,
and zext is better than sext. Typically, the 'not' will be folded into an
icmp predicate.
The IR difference would even survive through codegen for x86, so we would see
worse code:
https://godbolt.org/g/C14HMF
one_or_zero(int, int): # @one_or_zero(int, int)
xorl %eax, %eax
cmpl %esi, %edi
setle %al
retq
one_or_zero_alt(int, int): # @one_or_zero_alt(int, int)
xorl %ecx, %ecx
cmpl %esi, %edi
setg %cl
movl $1, %eax
subl %ecx, %eax
retq
llvm-svn: 306243
Summary:
InstCombine replaces large allocas with small globals consts causing buffer overflows
on valid code, see PR33372.
This fix permits this optimization only if the global is dereference for alloca size.
Fixes PR33372
Reviewers: eugenis, majnemer, chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34311
llvm-svn: 306194
Summary:
Many languages have a three way comparison idiom where comparing two values
produces not a boolean, but a tri-state value. Typical values (e.g. as used in
the lcmp/fcmp bytecodes from Java) are -1 for less than, 0 for equality, and +1
for greater than.
We actually do a great job already of converting three way comparisons into
binary comparisons when the result produced has one a single use. Unfortunately,
such values can have more than one use, and in that case, our existing
optimizations break down.
The patch adds a peephole which converts a three-way compare + test idiom into a
binary comparison on the original inputs. It focused on replacing the test on
the result of the three way compare and does nothing about removing the three
way compare itself. That's left to other optimizations (which do actually kick
in commonly.)
We currently recognize one idiom on signed integer compare. In the future, we
plan to recognize and simplify other comparison idioms on
other signed/unsigned datatypes such as floats, vectors etc.
This is a resurrection of Philip Reames' original patch:
https://reviews.llvm.org/D19452
Reviewers: majnemer, apilipenko, reames, sanjoy, mkazantsev
Reviewed by: mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34278
llvm-svn: 306100
Summary:
InstCombine likes to turn (icmp eq (and X, C1), 0) into (icmp slt (trunc (X)), 0) sometimes. This breaks foldSelectICmpAndOr's ability to recognize (select (icmp eq (and X, C1), 0), Y, (or Y, C2))->(or (shl (and X, C1), C3), y).
This patch tries to recover this. I had to flip around some of the early out checks so that I could create a new And instruction during the compare processing without it possibly never getting used.
Reviewers: spatel, majnemer, davide
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34184
llvm-svn: 306029
If the components of the and/or had multiple uses, this transform created an additional instruction.
This patch makes sure we remove one of the components.
Differential Revision: https://reviews.llvm.org/D34498
llvm-svn: 306027
There are 2 parts to this patch made simultaneously to avoid a regression.
We're reversing the canonicalization that moves bitwise vector ops before bitcasts.
We're moving bitwise vector ops *after* bitcasts instead. That's the 1st and 3rd hunks
of the patch. The motivation is that there's only one fold that currently depends on
the existing canonicalization (see next), but there are many folds that would
automatically benefit from the new canonicalization.
PR33138 ( https://bugs.llvm.org/show_bug.cgi?id=33138 ) shows why/how we have these
patterns in IR.
There's an or(and,andn) pattern that requires an adjustment in order to continue matching
to 'select' because the bitcast changes position. This match is unfortunately complicated
because it requires 4 logic ops with optional bitcast and sext ops.
Test diffs:
1. The bitcast.ll and bitcast-bigendian.ll changes show the most basic difference -
bitcast comes before logic.
2. There are also tests with no diffs in bitcast.ll that verify that we're still doing
folds that were enabled by the previous canonicalization.
3. icmp-xor-signbit.ll shows the payoff. We don't need to adjust existing icmp patterns
to look through bitcasts.
4. logical-select.ll contains several tests for the or(and,andn) --> select fold to
verify that we are still handling those cases. The lone diff shows the movement of
the bitcast from the new canonicalization rule.
Differential Revision: https://reviews.llvm.org/D33517
llvm-svn: 306011
Summary:
I noticed that passing known bits across these intrinsics isn't great at capturing the information we really know. Turning known bits of the input into known bits of a count output isn't able to convey a lot of what we really know.
This patch adds range metadata to these intrinsics based on the known bits.
Currently the patch punts if we already have range metadata present.
Reviewers: spatel, RKSimon, davide, majnemer
Reviewed By: RKSimon
Subscribers: sanjoy, hfinkel, llvm-commits
Differential Revision: https://reviews.llvm.org/D32582
llvm-svn: 305927
Summary:
Previously this folding had no checks to see if it was going to result in less instructions. This was pointed out during the review of D34184
This patch adds code to count how many instructions its going to create vs how many its going to remove so we can make a proper decision.
Reviewers: spatel, majnemer
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34437
llvm-svn: 305926
We have a large portfolio of folds for and-of-icmps and or-of-icmps in InstSimplify and InstCombine,
but hardly anything for xor-of-icmps. Rather than trying to rethink and translate all of those folds,
we can use the truth table definition of xor:
X ^ Y --> (X | Y) & !(X & Y)
...to see if we can convert the xor to and/or and then use the existing folds.
http://rise4fun.com/Alive/J9v
Differential Revision: https://reviews.llvm.org/D33342
llvm-svn: 305792
Summary:
Some optimizations in AddReachableCodeToWorklist did not update
the MadeIRChange state. This could happen both when removing
trivially dead instructions (DCE) and at constant folds.
It is essential that changes to the IR is reported correctly,
since for example InstCombinePass::run() will indicate that all
analyses are preserved otherwise.
And the CGPassManager determines if the CallGraph is up-to-date
based on status from InstructionCombiningPass::runOnFunction().
The new test case early_dce_clobbers_callgraph.ll is a reproducer
for some asserts that started to trigger after changes in the
inliner in r305245. With this patch the test case passes again.
Reviewers: sanjoy, craig.topper, dblaikie
Reviewed By: craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34346
llvm-svn: 305725
Summary:
These 4 patterns have the same one use check repeated twice for each. Once without a cast and one with. But the cast has no effect on what method is called.
For the OR case I believe it is always profitable regardless of the number of uses since we'll never increase the instruction count.
For the AND case I believe it is profitable if the pair of xors has one use such that we'll get rid of it completely. Or if the C value is something freely invertible, in which case the not doesn't cost anything.
Reviewers: spatel, majnemer
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34308
llvm-svn: 305705
Summary:
When we fold vector constants that are operands of phi's that feed into select,
we need to set the correct insertion point for the *new* selects that get generated.
The correct insertion point is the incoming block for the phi.
Such cases can occur with patch r298845, which fixed folding of
vector constants, but the new selects could be inserted incorrectly (as the added
test case shows).
Reviewers: majnemer, spatel, sanjoy
Reviewed by: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34162
llvm-svn: 305591
Summary:
Background: http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html
This change is to alter the prototype for the atomic memcpy intrinsic. The prototype itself is being changed to more closely resemble the semantics and parameters of the llvm.memcpy intrinsic -- to ease later combination of the llvm.memcpy and atomic memcpy intrinsics. Furthermore, the name of the atomic memcpy intrinsic is being changed to make it clear that it is not a generic atomic memcpy, but specifically a memcpy is unordered atomic.
Reviewers: reames, sanjoy, efriedma
Reviewed By: reames
Subscribers: mzolotukhin, anna, llvm-commits, skatkov
Differential Revision: https://reviews.llvm.org/D33240
llvm-svn: 305558
Summary: This is the demorganed version of the case we already handle for the OR of iszero.
Reviewers: spatel
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34244
llvm-svn: 305548
Currently we expect A to be on the same side in both Ands but nothing guarantees that.
While there also switch to using matchers for some of the code.
Differential Revision: https://reviews.llvm.org/D34230
llvm-svn: 305487
There's an early out that's trying to detect when we don't know any bits that make up the legal range of a shift. The code subtracts one from BitWidth which creates a mask in the lower bits for power of 2 bit widths. This is then ANDed with the known bits to see if any of those bits are known. If the bit width isn't a power of 2 this creates a non-sensical mask.
This patch corrects this by rounding up to a power of 2 before doing the subtract and mask.
Differential Revision: https://reviews.llvm.org/D34165
llvm-svn: 305400