Summary: I noticed in the select folding code that we copied fast math flags, but did not do the same for the similar handling in phi nodes. This patch fixes that to do the same thing as select
Reviewers: spatel, davide, majnemer, hfinkel
Reviewed By: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31690
llvm-svn: 299838
Currently we only fold with ConstantInt RHS. This generalizes to any Constant RHS.
Differential Revision: https://reviews.llvm.org/D31610
llvm-svn: 299466
A common way to implement nearbyint is by fiddling with the floating
point environment and calling rint. This is used at least by the BSD
libm and musl. As such, canonicalizing the latter to the former will
create infinite loops for libm and generally pessimize performance, at
least when the generic C versions are used.
This change preserves the rint in the libcall translation and also
handles the domain truncation logic, so that rint with float argument
will be reduced to rintf etc.
llvm-svn: 299247
Summary: Currently the VP metadata was dropped when InstCombine converts a call to direct call. This patch converts the VP metadata to branch_weights so that its hotness is recorded.
Reviewers: eraman, davidxl
Reviewed By: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31344
llvm-svn: 299228
Some of the GEP combines (e.g., descaling) can't handle vector GEPs. We have an
existing check that attempts to bail out if given a vector GEP. However, the
check only tests the GEP's pointer operand. A GEP results in a vector of
pointers if at least one of its operands is vector-typed (e.g., its pointer
operand could be a scalar, but its index could be a vector). We should just
check the type of the GEP itself. This should fix PR32414.
Reference: https://bugs.llvm.org/show_bug.cgi?id=32414
Differential Revision: https://reviews.llvm.org/D31470
llvm-svn: 299017
Summary:
We are incorrectly folding selects into phi nodes when the incoming value of a phi
node is a constant vector. This optimization is done in `FoldOpIntoPhi` when the
select condition is a phi node with constant incoming values.
Without the fix, we are miscompiling (i.e. incorrectly folding the
select into the phi node) when the vector contains non-zero
elements.
This patch fixes the miscompile and we will correctly fold based on the
select vector operand (see added test cases).
Reviewers: majnemer, sanjoy, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31189
llvm-svn: 298845
insertelement (insertelement X, Y, IdxC1), ScalarC, IdxC2 -->
insertelement (insertelement X, ScalarC, IdxC2), Y, IdxC1
As noted in the code comment and seen in the test changes, the motivation is that by pulling
constant insertion up, we may be able to constant fold some insertelement instructions.
Differential Revision: https://reviews.llvm.org/D31196
llvm-svn: 298520
Summary: Subtracts can have constants on the left side, but we don't shrink them based on demanded bits. This patch fixes that to match the right hand side.
Reviewers: davide, majnemer, spatel, sanjoy, hfinkel
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31119
llvm-svn: 298478
This adds a parameter to @llvm.objectsize that makes it return
conservative values if it's given null.
This fixes PR23277.
Differential Revision: https://reviews.llvm.org/D28494
llvm-svn: 298430
Summary:
The reverse of an artbitrary bitpattern is also an arbitrary
bitpattern.
Reviewers: trentxintong, arsenm, majnemer
Reviewed By: majnemer
Subscribers: majnemer, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D31118
llvm-svn: 298201
[Reapplies r297971 and punting on finding a better API for findDbgValues()]
This patch improves debug info quality in InstCombine by looking at
values that are about to be deleted, checking whether there are any
dbg.value instrinsics referring to them, and potentially encoding the
semantics of the deleted instruction into the dbg.value's
DIExpression.
In the example in the testcase (which was extracted from XNU) there is a sequence of
%4 = load %struct.entry*, %struct.entry** %next2, align 8, !dbg !41
%5 = bitcast %struct.entry* %4 to i8*, !dbg !42
%add.ptr4 = getelementptr inbounds i8, i8* %5, i64 -8, !dbg !43
%6 = bitcast i8* %add.ptr4 to %struct.entry*, !dbg !44
call void @llvm.dbg.value(metadata %struct.entry* %6, i64 0, metadata !20, metadata !21), !dbg 34
When these instructions are eliminated by instcombine one after
another, we can still salvage the otherwise dead debug info:
- Bitcasts have no effect, so have the dbg.value point to operand(0)
- Loads can be expressed via a DW_OP_deref
- Constant gep instructions can be replaced by DWARF expression arithmetic
The API introduced by this patch is not specific to instcombine and
can be useful in other places, too.
rdar://problem/30725338
Differential Revision: https://reviews.llvm.org/D30919
llvm-svn: 297994
As the related tests show, we're not canonicalizing to this form for scalars or vectors yet,
but this solves the immediate problem in:
https://bugs.llvm.org/show_bug.cgi?id=32306
llvm-svn: 297989
This patch improves debug info quality in InstCombine by looking at
values that are about to be deleted, checking whether there are any
dbg.value instrinsics referring to them, and potentially encoding the
semantics of the deleted instruction into the dbg.value's
DIExpression.
In the example in the testcase (which was extracted from XNU) there is a sequence of
%4 = load %struct.entry*, %struct.entry** %next2, align 8, !dbg !41
%5 = bitcast %struct.entry* %4 to i8*, !dbg !42
%add.ptr4 = getelementptr inbounds i8, i8* %5, i64 -8, !dbg !43
%6 = bitcast i8* %add.ptr4 to %struct.entry*, !dbg !44
call void @llvm.dbg.value(metadata %struct.entry* %6, i64 0, metadata !20, metadata !21), !dbg 34
When these instructions are eliminated by instcombine one after
another, we can still salvage the otherwise dead debug info:
- Bitcasts have no effect, so have the dbg.value point to operand(0)
- Loads can be expressed via a DW_OP_deref
- Constant gep instructions can be replaced by DWARF expression arithmetic
The API introduced by this patch is not specific to instcombine and
can be useful in other places, too.
rdar://problem/30725338
Differential Revision: https://reviews.llvm.org/D30919
llvm-svn: 297971
If it is possible for the RHS of a shift operation to be greater than or equal
to the bit-width, then the result might be undef, and we can't report any known
bits.
In some cases, this was allowing a transformation in instcombine which widened
an undef value from i1 to i32, increasing the range of values that a function
could return.
Differential revision: https://reviews.llvm.org/D30781
llvm-svn: 297724
This was committed at r297155 and reverted at r297166 because of an
over-reaching clang test. That should be fixed with r297189.
This is one part of solving a recent bug report:
http://lists.llvm.org/pipermail/llvm-dev/2017-February/110293.html
This keeps with our general approach: changing arbitrary shuffles is off-limts,
but changing splat is ok. The transform is very similar to the existing
shrinkBitwiseLogic() canonicalization.
Differential Revision: https://reviews.llvm.org/D30123
llvm-svn: 297232
Summary:
When InstCombine is optimizing certain select-cmp-br patterns
it replaces the result of the select in uses outside of the
basic block containing the select. This is only legal if the
path from the select to the outside use is disjoint from all
other paths out from the originating basic block.
The problem found was that InstCombiner::replacedSelectWithOperand
did not consider the case when both edges out from the br pointed
to the same label. In that case the paths aren't disjoint and the
transformation is illegal. This patch avoids the faulty rewrites
by verifying that there is a single flow to the successor where
we want to replace uses.
Reviewers: llvm-commits, spatel, majnemer
Differential Revision: https://reviews.llvm.org/D30455
llvm-svn: 296752
Summary:
Solves PR 31990.
The bad rewrite could replace a memcpy of one word with
store i4 -1
while it should actually be
store i8 -1
Hopefully opt and llc has improved enough so the original optimization
done by the code isn't needed anymore.
One already existing testcase is affected. It originally tested that
the memcpy was replaced with
load double
but since we now remove that rewrite it will be
load i64
instead.
Patch suggestion by Eli Friedman.
Reviewers: eli.friedman, majnemer, efriedma
Reviewed By: efriedma
Subscribers: efriedma, llvm-commits
Differential Revision: https://reviews.llvm.org/D30254
llvm-svn: 296585
This optimisation was crashing when there was a chain of more than one bitcast
instruction to replace, as a result of the changes in D27283.
Patch by James Price.
Differential Revision: https://reviews.llvm.org/D30347
llvm-svn: 296163
This is part of trying to clean up our handling of min/max patterns in IR.
By converting these to canonical form, we're more likely to recognize them
because there are various places in InstCombine that don't use
matchSelectPattern or m_SMax and friends.
The backend fixups referenced in the now deleted TODO comment were added with:
https://reviews.llvm.org/rL291392https://reviews.llvm.org/rL289738
If there's any codegen fallout from this change, we should be able to address
it in DAGCombiner or target-specific lowering.
llvm-svn: 295758
Summary:
This is a fix for assertion failure in
`getInverseMinMaxSelectPattern` when ABS is passed in as a select pattern.
We should not be invoking the simplification rule for
ABS(MIN(~ x,y))) or ABS(MAX(~x,y)) combinations.
Added a test case which would cause an assertion failure without the patch.
Reviewers: sanjoy, majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30051
llvm-svn: 295719
Changing to 'or' (rather than 'xor' when no wrapping flags are set)
allows icmp simplifies to happen as expected.
Differential Revision: https://reviews.llvm.org/D29729
llvm-svn: 295574
The change to InstCombine in:
https://reviews.llvm.org/D29729
...exposes this missing fold in InstSimplify, so adding this
first to avoid a regression.
llvm-svn: 295573
For function-scope variables with large initialisation list, FE usually
generates a global variable to hold the initializer, then generates
memcpy intrinsic to initialize the alloca. InstCombiner::visitAllocaInst
identifies such allocas which are accessed only by reading and replaces
them with the global variable. This is done by casting the global variable
to the type of the alloca and replacing all references.
However, when the global variable is in a different address space which
is disjoint with addr space 0 (e.g. for IR generated from OpenCL,
global variable cannot be in private addr space i.e. addr space 0), casting
the global variable to addr space 0 results in invalid IR for certain
targets (e.g. amdgpu).
To fix this issue, when the global variable is not in addr space 0,
instead of casting it to addr space 0, this patch chases down the uses
of alloca until reaching the load instructions, then replaces load from
alloca with load from the global variable. If during the chasing
bitcast and GEP are encountered, new bitcast and GEP based on the global
variable are generated and used in the load instructions.
Differential Revision: https://reviews.llvm.org/D27283
llvm-svn: 294786
This fold already existed for vectors but only when 'C1' was a splat
constant (but 'C2' could be any constant).
There were no tests for any vector constants, so I'm adding a test
that shows non-splat constants for both operands.
llvm-svn: 294650
This patch is based on the llvm-dev discussion here:
http://lists.llvm.org/pipermail/llvm-dev/2017-January/109631.html
Folding to i1 should always be desirable because that's better for value tracking
and we have special folds for i1 types.
I checked for other users of shouldChangeType() where this might have an effect,
but we already handle the i1 case differently than other types in all of those cases.
Side note: the default datalayout includes i1, so it seems we only find this gap in
shouldChangeType + phi folding for the case when there is (1) an explicit datalayout
without i1, (2) casting to i1 from a legal type, and (3) a phi with exactly 2 incoming
casted operands (as Björn mentioned).
Differential Revision: https://reviews.llvm.org/D29336
llvm-svn: 294066
The code comments didn't match the code logic, and we didn't actually distinguish the fake unary (not/neg/fneg)
operators from arguments. Adding another level to the weighting scheme provides more structure and can help
simplify the pattern matching in InstCombine and other places.
I fixed regressions that would have shown up from this change in:
rL290067
rL290127
But that doesn't mean there are no pattern-matching logic holes left; some combines may just be missing regression tests.
Should fix:
https://llvm.org/bugs/show_bug.cgi?id=28296
Differential Revision: https://reviews.llvm.org/D27933
llvm-svn: 294049
Although this is 'no-functional-change-intended', I'm adding tests
for shl-shl and lshr-lshr pairs because there is no existing test
coverage for those folds.
It seems like we should be able to remove some code from foldShiftedShift()
at this point because we're handling those patterns on the general path.
llvm-svn: 293814
Summary:
If there are two adjacent guards with different conditions, we can
remove one of them and include its condition into the condition of
another one. This patch allows InstCombine to merge them by the
following pattern:
guard(a); guard(b) -> guard(a & b).
Reviewers: reames, apilipenko, igor-laevsky, anna, sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29378
llvm-svn: 293778
transformToIndexedCompare
If they don't have the same type, the size of the constant
index would need to be adjusted (and this wouldn't be always
possible).
Alternatively we could try the analysis with the initial
RHS value, which would guarantee that the two sides have
the same type. However it is unlikely that in practice this
would pass our transformation requirements.
Fixes PR31808 (https://llvm.org/bugs/show_bug.cgi?id=31808).
llvm-svn: 293629
The original shift is bigger, so this may qualify as 'obvious',
but here's an attempt at an Alive-based proof:
Name: exact
Pre: (C1 u< C2)
%a = shl i8 %x, C1
%b = lshr exact i8 %a, C2
=>
%c = lshr exact i8 %x, C2 - C1
%b = and i8 %c, ((1 << width(C1)) - 1) u>> C2
Optimization is correct!
llvm-svn: 293498
This is a minimal patch to avoid the infinite loop in:
https://llvm.org/bugs/show_bug.cgi?id=31751
But the general problem is bigger: we're not canonicalizing all of the min/max forms reported
by value tracking's matchSelectPattern(), and we don't define min/max consistently. Some code
uses matchSelectPattern(), other code uses matchers like m_Umax, and others have their own
inline definitions which may be subtly different from any of the above.
The reason that the test cases in this patch need a cast op to trigger is because we don't
(yet) canonicalize all min/max forms based on matchSelectPattern() in
canonicalizeMinMaxWithConstant(), but we do make min/max+cast transforms based on
matchSelectPattern() in visitSelectInst().
The location of the icmp transforms that trigger the inf-loop seems arbitrary at best, so
I'm moving those behind the min/max fence in visitICmpInst() as the quick fix.
llvm-svn: 293345
Summary:
There are many NVVM intrinsics that we can't entirely get rid of, but
that nonetheless often correspond to target-generic LLVM intrinsics.
For example, if flush denormals to zero (ftz) is enabled, we can convert
@llvm.nvvm.ceil.ftz.f to @llvm.ceil.f32. On the other hand, if ftz is
disabled, we can't do this, because @llvm.ceil.f32 will be lowered to a
non-ftz PTX instruction. In this case, we can, however, simplify the
non-ftz nvvm ceil intrinsic, @llvm.nvvm.ceil.f, to @llvm.ceil.f32.
These transformations are particularly useful because they let us
constant fold instructions that appear in libdevice, the bitcode library
that ships with CUDA and essentially functions as its libm.
Reviewers: tra
Subscribers: hfinkel, majnemer, llvm-commits
Differential Revision: https://reviews.llvm.org/D28794
llvm-svn: 293244
This change reverts:
r293061: "[InstCombine] Canonicalize guards for NOT OR condition"
r293058: "[InstCombine] Canonicalize guards for AND condition"
They miscompile cases like:
```
declare void @llvm.experimental.guard(i1, ...)
define void @test_guard_not_or(i1 %A, i1 %B) {
%C = or i1 %A, %B
%D = xor i1 %C, true
call void(i1, ...) @llvm.experimental.guard(i1 %D, i32 20, i32 30)[ "deopt"() ]
ret void
}
```
because they do transfer the `i32 20, i32 30` parameters to newly
created guard instructions.
llvm-svn: 293227
We already have this fold when the lshr has one use, but it doesn't need that
restriction. We may be able to remove some code from foldShiftedShift().
Also, move the similar:
(X << C) >>u C --> X & (-1 >>u C)
...directly into visitLShr to help clean up foldShiftByConstOfShiftByConst().
That whole function seems questionable since it is called by commonShiftTransforms(),
but there's really not much in common if we're checking the shift opcodes for every
fold.
llvm-svn: 293215
This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine
Reviewed By: apilipenko
Differential Revision: https://reviews.llvm.org/D29075
Patch by Maxim Kazantsev.
llvm-svn: 293061
This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine
Reviewed By: apilipenko
Differential Revision: https://reviews.llvm.org/D29074
Patch by Maxim Kazantsev.
llvm-svn: 293058
This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine
Reviewed By: majnemer, apilipenko
Differential Revision: https://reviews.llvm.org/D29071
Patch by Maxim Kazantsev.
llvm-svn: 293056
We may be able to assert that no shl-shl or lshr-lshr pairs ever get here
because we should have already handled those in foldShiftedShift().
llvm-svn: 292726
Summary:
Currently we return undef, but we're in the process of changing the
LangRef so that llvm.sqrt behaves like the other math intrinsics,
matching the return value of the standard libcall but not setting errno.
This change is legal even without the LangRef change because currently
calling llvm.sqrt(x) where x is negative is spec'ed to be UB. But in
practice it's also safe because we're simply constant-folding fewer
inputs: Inputs >= -0 get constant-folded as before, but inputs < -0 now
aren't constant-folded, because ConstantFoldFP aborts if the host math
function raises an fp exception.
Reviewers: hfinkel, efriedma, sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28929
llvm-svn: 292692
Unfortunately, recognizing these in value tracking may cause us to hit
a hack in InstCombiner::visitICmpInst() more often:
http://lists.llvm.org/pipermail/llvm-dev/2017-January/109340.html
...but besides being the obviously Right Thing To Do, there's a clear
codegen win from identifying these patterns for several targets.
llvm-svn: 292655
Simplify a packss/packus truncation based on the elements of the mask that are actually demanded.
Differential Revision: https://reviews.llvm.org/D28777
llvm-svn: 292591
Also, add the corresponding match to the AssumptionCache's 'Affected Values' list.
Differential Revision: https://reviews.llvm.org/D28485
llvm-svn: 292239
Add missing fabs(fpext) optimzation that worked with the call,
and also fixes it creating a second fpext when there were multiple
uses.
llvm-svn: 292172
Simplify a pshufb shuffle mask based on the elements of the mask that are actually demanded.
Differential Revision: https://reviews.llvm.org/D28745
llvm-svn: 292101
cover domtree and alias analysis. These are the pretty clear analyses
that we would always want to survive this pass.
To make these survive, we also need to preserve the assumption cache.
Added a test that verifies the important bits of this preservation.
llvm-svn: 292037
Allows LLVM to optimize sequences like the following:
%add = add nuw i32 %x, 1
%cmp = icmp ugt i32 %add, %y
Into:
%cmp = icmp uge i32 %x, %y
Previously, only signed comparisons were being handled.
Decrements could also be handled, but 'sub nuw %x, 1' is currently canonicalized to
'add %x, -1' in InstCombineAddSub, losing the nuw flag. Removing that canonicalization
seems like it might have far-reaching ramifications so I kept this simple for now.
Patch by Matti Niemenmaa!
Differential Revision: https://reviews.llvm.org/D24700
llvm-svn: 291975
fabs(x * x) is not generally safe to assume x is positive if x is a NaN.
This is also less general than it could be, so this will be replaced
with a transformation on the intrinsic.
llvm-svn: 291359
We can perform the following:
(add (zext (add nuw X, C1)), C2) -> (zext (add nuw X, C1+C2))
This is only possible if C2 is negative and C2 is greater than or equal to negative C1.
llvm-svn: 290927
The accidentally had trivially dead code. Also needed to adjust the rounding mode to not CUR_DIRECTION so the intrinsics don't get converted to native operations before going through SimplifyDemandedVectorElts.
llvm-svn: 290702
This adds a combine that canonicalizes a chain of inserts which broadcasts
a value into a single insert + a splat shufflevector.
This fixes PR31286.
Differential Revision: https://reviews.llvm.org/D27992
llvm-svn: 290641
We currently ignore the `allocsize` attribute on functions calls with
the `nobuiltin` attribute when trying to lower `@llvm.objectsize`. We
shouldn't care about `nobuiltin` here: `allocsize` is explicitly added
by the user, not inferred based on a function's symbol.
llvm-svn: 290588
PMULDQ/PMULUDQ vXi64 instructions only use the even numbered v2Xi32 input elements which SimplifyDemandedVectorElts should try and use.
This builds on r290554 which added supported for 128 and 256-bit.
llvm-svn: 290582
An earlier commit added support for unmasked scalar operations. At that time isel wouldn't generate an optimal sequence for masked operations, but that has now been fixed.
llvm-svn: 290566
PMULDQ/PMULUDQ vXi64 instructions only use the even numbered v2Xi32 input elements which SimplifyDemandedVectorElts should try and use.
Differential Revision: https://reviews.llvm.org/D28119
llvm-svn: 290554
Summary:
I only do this for unmasked cases for now because isel is failing to fold the mask. I'll try to fix that soon.
I'll do the same thing for packed add/sub/mul/div in a future patch.
Reviewers: delena, RKSimon, zvi, craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27879
llvm-svn: 290535
Summary:
This patch adds support for converting the masked vpermv intrinsics into shufflevector instructions if the indices are constants.
We also need to wrap a select instruction around the shuffle to take care of the masking part. InstCombine will take care of optimizing the select if the mask is constant so I didn't bother checking for that.
Reviewers: zvi, delena, spatel, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27825
llvm-svn: 290530
Background/motivation - I was circling back around to:
https://llvm.org/bugs/show_bug.cgi?id=28296
I made a simple patch for that and noticed some regressions, so added test cases for
those with rL281055, and this is hopefully the minimal fix for just those cases.
But as you can see from the surrounding untouched folds, we are missing commuted patterns
all over the place, and of course there are no regression tests to cover any of those cases.
We could sprinkle "m_c_" dust all over this file and catch most of the missing folds, but
then we still wouldn't have test coverage, and we'd still miss some fraction of commuted
patterns because they require adjustments to the match order.
I'm aware of the concern about the potential compile-time performance impact of adding
matches like this (currently being discussed on llvm-dev), but I don't think there's any
evidence yet to suggest that handling commutative pattern matching more thoroughly is not
a worthwhile goal of InstCombine.
Differential Revision: https://reviews.llvm.org/D24419
llvm-svn: 290067
Min/max canonicalization (r287585) exposes the fact that we're missing combines for min/max patterns.
This patch won't solve the example that was attached to that thread, so something else still needs fixing.
The line between InstCombine and InstSimplify gets blurry here because sometimes the icmp instruction that
we want to fold to already exists, but sometimes it's the swapped form of what we want.
Corresponding changes for smax/umin/umax to follow.
Differential Revision: https://reviews.llvm.org/D27531
llvm-svn: 289855
A number of new patterns for simplifying and/xor of icmp:
(icmp ne %x, 0) ^ (icmp ne %y, 0) => icmp ne %x, %y if the following is true:
1- (%x = and %a, %mask) and (%y = and %b, %mask)
2- %mask is a power of 2.
(icmp eq %x, 0) & (icmp ne %y, 0) => icmp ult %x, %y if the following is true:
1- (%x = and %a, %mask1) and (%y = and %b, %mask2)
2- Let %t be the smallest power of 2 where %mask1 & %t != 0. Then for any
%s that is a power of 2 and %s & %mask2 != 0, we must have %s <= %t.
For example if %mask1 = 24 and %mask2 = 16, setting %s = 16 and %t = 8
violates condition (2) above. So this optimization cannot be applied.
llvm-svn: 289813
There was an efficiency problem with how we processed @llvm.assume in
ValueTracking (and other places). The AssumptionCache tracked all of the
assumptions in a given function. In order to find assumptions relevant to
computing known bits, etc. we searched every assumption in the function. For
ValueTracking, that means that we did O(#assumes * #values) work in InstCombine
and other passes (with a constant factor that can be quite large because we'd
repeat this search at every level of recursion of the analysis).
Several of us discussed this situation at the last developers' meeting, and
this implements the discussed solution: Make the values that an assume might
affect operands of the assume itself. To avoid exposing this detail to
frontends and passes that need not worry about it, I've used the new
operand-bundle feature to add these extra call "operands" in a way that does
not affect the intrinsic's signature. I think this solution is relatively
clean. InstCombine adds these extra operands based on what ValueTracking, LVI,
etc. will need and then those passes need only search the users of the values
under consideration. This should fix the computational-complexity problem.
At this point, no passes depend on the AssumptionCache, and so I'll remove
that as a follow-up change.
Differential Revision: https://reviews.llvm.org/D27259
llvm-svn: 289755
Only the lower bits of the input element are used. And only the lower element can be undef since the upper bits are zeroed.
Have InstCombineCalls call SimplifyDemandedVectorElts for these intrinsics to reuse this support.
llvm-svn: 289523
We could truncate the condition and then try to fold the add into the
original condition value causing wrong case constants to be used.
Move the offset transform ahead of the truncate transform and return
after each transform, so there's no chance of getting confused values.
Fix for:
https://llvm.org/bugs/show_bug.cgi?id=31260
llvm-svn: 289442
Summary:
This change adds some verification in the IR verifier around struct path
TBAA metadata.
Other than some basic sanity checks (e.g. we get constant integers where
we expect constant integers), this checks:
- That by the time an struct access tuple `(base-type, offset)` is
"reduced" to a scalar base type, the offset is `0`. For instance, in
C++ you can't start from, say `("struct-a", 16)`, and end up with
`("int", 4)` -- by the time the base type is `"int"`, the offset
better be zero. In particular, a variant of this invariant is needed
for `llvm::getMostGenericTBAA` to be correct.
- That there are no cycles in a struct path.
- That struct type nodes have their offsets listed in an ascending
order.
- That when generating the struct access path, you eventually reach the
access type listed in the tbaa tag node.
Reviewers: dexonsmith, chandlerc, reames, mehdi_amini, manmanren
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D26438
llvm-svn: 289402
This teaches SimplifyDemandedElts that the FMA can be removed if the lower element isn't used. It also teaches it that if upper elements of the first operand aren't used then we can simplify them.
llvm-svn: 289377
The tests that already work are folded in InstSimplify, so those
tests should be redundant and we can remove them if they don't
seem worthwhile for completeness.
llvm-svn: 288957
This solves a secondary problem seen in PR6137:
https://llvm.org/bugs/show_bug.cgi?id=6137#c6
This is similar to the bitwise logic op fold added with:
https://reviews.llvm.org/rL287707
And like that patch, I'm artificially restricting the
transform from vector <-> scalar types until we're sure
that the backend can handle that.
llvm-svn: 288584
The instcombine code which folds loads and stores into their use types can trip up if the use is a bitcast to a type which we can't directly load or store in the IR. In principle, such types shouldn't exist, but in practice they do today. This is a workaround to avoid a bug while we work towards the long term goal.
Differential Revision: https://reviews.llvm.org/D24365
llvm-svn: 288415
In PR27925:
https://llvm.org/bugs/show_bug.cgi?id=27925
...we proposed adding this fold to eliminate a bitcast. In D20774, there was
some concern about changing the type of a bitwise op as well as creating
bitcasts that might not be free for a target. However, if we're strictly
eliminating an instruction (by limiting this to one-use ops), then we should
be able to do this in InstCombine.
But we're cautiously restricting the transform for now to vector types to
avoid possible backend problems. A transform to make sure the logic op is
legal for the target should be added to reverse this transform and improve
codegen.
Differential Revision: https://reviews.llvm.org/D26641
llvm-svn: 287707
This is a first step towards canonicalization and improved folding/codegen
for integer min/max as discussed here:
http://lists.llvm.org/pipermail/llvm-dev/2016-November/106868.html
Here, we're just matching the simplest min/max patterns and adjusting the
icmp predicate while swapping the select operands.
I've included FIXME tests in test/Transforms/InstCombine/select_meta.ll
so it's easier to see how this might be extended (corresponds to the TODO
comment in the code). That's also why I'm using matchSelectPattern()
rather than a simpler check; once the backend is patched, we can just
remove some of the restrictions to allow the obfuscated min/max patterns
in the FIXME tests to be matched.
Differential Revision: https://reviews.llvm.org/D26525
llvm-svn: 287585
Currently LLVM assumes that a pointer addrspacecasted to a different addr space is equivalent to trunc or zext bitwise, which is not true. For example, in amdgcn target, when a null pointer is addrspacecasted from addr space 4 to 0, its value is changed from i64 0 to i32 -1.
This patch teaches LLVM not to assume known bits of addrspacecast instruction to its operand.
Differential Revision: https://reviews.llvm.org/D26803
llvm-svn: 287545
This is a prerequisite patch for D26556:
https://reviews.llvm.org/D26556
...because there was no direct coverage for these folds (which in some cases are adding instructions).
llvm-svn: 287400
This is a straightforward extension of the existing support for 32/64-bit element types. Just needed to add the additional instrinsics to the switches.
llvm-svn: 287316
Summary: These intrinsics have been unused for clang for a while. This patch removes them. We auto upgrade them to extractelements, a scalar operation and then an insertelement. This matches the sequence used by clangs intrinsic file.
Reviewers: zvi, delena, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26660
llvm-svn: 287083
Note that the existing metadata checking was re-added by hand because the
script doesn't currently know how to generate checks for lines outside of
functions.
llvm-svn: 286460
Removing the limitation in visitInsertElementInst() causes several regressions
because we're not prepared to fold sequences of shuffles or inserts and extracts
separated by shuffles. Fixing that appears to be a difficult mission because we
are purposely trying to avoid creating shuffles with arbitrary shuffle masks
because some targets may choke on those.
https://llvm.org/bugs/show_bug.cgi?id=30923
llvm-svn: 286423
As the test change shows, we can increase the critical path by adding
a 'not' instruction, so make sure that we're actually removing an
instruction if we do this transform.
This transform could also cause us to miss folds of min/max pairs.
llvm-svn: 286315
For example, it invalidates the domtree, causing assertions
in later passes which need dominator infos. Make it preserve
GlobalsAA, as suggested by Eli.
Differential Revision: https://reviews.llvm.org/D26381
llvm-svn: 286271
This was reverted at r285866 because there was a crash handling a scalar
select of vectors. I added a check for that pattern and a test case based
on the example provided in the post-commit thread for r285732.
llvm-svn: 286113
This reverts commit r285732.
This change introduced a new assertion failure in the following
testcase at -O2:
typedef short __v8hi __attribute__((__vector_size__(16)));
__v8hi foo(__v8hi &V1, __v8hi &V2, unsigned mask) {
__v8hi Result = V1;
if (mask & 0x80)
Result[0] = V2[0];
return Result;
}
llvm-svn: 285866
I think the former 'test50' had a typo making it functionally equivalent
to the former 'test49'; changed the predicate to provide more coverage.
llvm-svn: 285706
This patch introduces the combine:
(C1 shift (A add C2)) -> ((C1 shift C2) shift A)
iff A and C2 are both positive
If both A and C2 are know to be positive then we can safely split into 2 shifts, permitting the folding of the Inner shift.
Fix for the spec benchmark case mentioned by @nadav on PR15141 (assuming we can prove that the inputs as positive).
Differential Revision: https://reviews.llvm.org/D26000
llvm-svn: 285696
Try harder to detect obfuscated min/max patterns: the initial pattern was added with D9352 / rL236202.
There was a bug fix for PR27137 at rL264996, but I think we can do better by folding the corresponding
smax pattern and commuted variants.
The codegen tests demonstrate the effect of ValueTracking on the backend via SelectionDAGBuilder. We
can't expose these differences minimally in IR because we don't have smin/smax intrinsics for IR.
Differential Revision: https://reviews.llvm.org/D26091
llvm-svn: 285499
The original patch of the A->B->A BitCast optimization was reverted by r274094 because it may cause infinite loop inside compiler https://llvm.org/bugs/show_bug.cgi?id=27996.
The problem is with following code
xB = load (type B);
xA = load (type A);
+yA = (A)xB; B -> A
+zAn = PHI[yA, xA]; PHI
+zBn = (B)zAn; // A -> B
store zAn;
store zBn;
optimizeBitCastFromPhi generates
+zBn = (B)zAn; // A -> B
and expects it will be combined with the following store instruction to another
store zAn
Unfortunately before combineStoreToValueType is called on the store instruction, optimizeBitCastFromPhi is called on the new BitCast again, and this pattern repeats indefinitely.
optimizeBitCastFromPhi only generates BitCast for load/store instructions, only the BitCast before store can cause the reexecution of optimizeBitCastFromPhi, and BitCast before store can easily be handled by InstCombineLoadStoreAlloca.cpp. So the solution to the problem is if all users of a CI are store instructions, we should not do optimizeBitCastFromPhi on it. Then optimizeBitCastFromPhi will not be called on the new BitCast instructions.
Differential Revision: https://reviews.llvm.org/D23896
llvm-svn: 285116
This test was apparently checking for 2 independent folds, but we have
plenty of tests for those individual folds already. We are lacking
vector tests, however, because we don't have the shift folds for vectors.
llvm-svn: 284243