Summary:
When salvaging a dbg.declare/dbg.addr we should not add
DW_OP_stack_value to the DIExpression
(see test/Transforms/InstCombine/salvage-dbg-declare.ll).
Consider this example
%vla = alloca i32, i64 2
call void @llvm.dbg.declare(metadata i32* %vla, metadata !1, metadata !DIExpression())
Instcombine will turn it into
%vla1 = alloca [2 x i32]
%vla1.sub = getelementptr inbounds [2 x i32], [2 x i32]* %vla, i64 0, i64 0
call void @llvm.dbg.declare(metadata [2 x i32]* %vla1.sub, metadata !19, metadata !DIExpression())
If the GEP can be eliminated, then the dbg.declare will be salvaged
and we should get
%vla1 = alloca [2 x i32]
call void @llvm.dbg.declare(metadata [2 x i32]* %vla1, metadata !19, metadata !DIExpression())
The problem was that salvageDebugInfo did not recognize dbg.declare
as being indirect (%vla1 points to the value, it does not hold the
value), so we incorrectly got
call void @llvm.dbg.declare(metadata [2 x i32]* %vla1, metadata !19, metadata !DIExpression(DW_OP_stack_value))
I also made sure that llvm::salvageDebugInfo and
DIExpression::prependOpcodes do not add DW_OP_stack_value to
the DIExpression in case no new operands are added to the
DIExpression. That way we avoid to, unneccessarily, turn a
register location expression into an implicit location expression
in some situations (see test11 in test/Transforms/LICM/sinking.ll).
Reviewers: aprantl, vsk
Reviewed By: aprantl, vsk
Subscribers: JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D48837
llvm-svn: 336191
This patch changes order of transform in InstCombineCompares to avoid
performing transforms based on ranges which produce complex bit arithmetics
before more simple things (like folding with constants) are done. See PR37636
for the motivating example.
Differential Revision: https://reviews.llvm.org/D48584
Reviewed By: spatel, lebedev.ri
llvm-svn: 336172
This extends D48485 to allow another pair of binops (add/or) to be combined either
with or without a leading shuffle:
or X, C --> add X, C (when X and C have no common bits set)
Here, we need value tracking to determine that the 'or' can be reversed into an 'add',
and we've added general infrastructure to allow extending to other opcodes or moving
to where other passes could use that functionality.
Differential Revision: https://reviews.llvm.org/D48662
llvm-svn: 336128
Due to current limitations in constant analysis, we need flags
on add or mul to show propagation for the potential transform
suggested in these tests (no other binops currently report
identity constants).
llvm-svn: 336101
Summary:
This patch introduce new intrinsic -
strip.invariant.group that was described in the
RFC: Devirtualization v2
Reviewers: rsmith, hfinkel, nlopes, sanjoy, amharc, kuhar
Subscribers: arsenm, nhaehnle, JDevlieghere, hiraditya, xbolva00, llvm-commits
Differential Revision: https://reviews.llvm.org/D47103
Co-authored-by: Krzysztof Pszeniczny <krzysztof.pszeniczny@gmail.com>
llvm-svn: 336073
This was discussed in D48401 as another improvement for:
https://bugs.llvm.org/show_bug.cgi?id=37806
If we have 2 different variable values, then we shuffle (select) those lanes,
shuffle (select) the constants, and then perform the binop. This eliminates a binop.
The new shuffle uses the same shuffle mask as the existing shuffle, so there's no
danger of creating a difficult shuffle.
All of the earlier constraints still apply, but we also check for extra uses to
avoid creating more instructions than we'll remove.
Additionally, we're disallowing the fold for div/rem because that could expose a
UB hole.
Differential Revision: https://reviews.llvm.org/D48678
llvm-svn: 335974
This is an enhancement to D48401 that was discussed in:
https://bugs.llvm.org/show_bug.cgi?id=37806
We can convert a shift-left-by-constant into a multiply (we canonicalize IR in the other
direction because that's generally better of course). This allows us to remove the shuffle
as we do in the regular opcodes-are-the-same cases.
This requires a small hack to make sure we don't introduce any extra poison:
https://rise4fun.com/Alive/ZGv
Other examples of opcodes where this would work are add+sub and fadd+fsub, but we already
canonicalize those subs into adds, so there's nothing to do for those cases AFAICT. There
are planned enhancements for opcode transforms such or -> add.
Note that there's a different fold needed if we've already managed to simplify away a binop
as seen in the test based on PR37806, but we manage to get that one case here because this
fold is positioned above the demanded elements fold currently.
Differential Revision: https://reviews.llvm.org/D48485
llvm-svn: 335888
I think the intrinsics named 'avx512.mask.' should refer to the previous behavior of taking a mask argument in the intrinsic instead of using a 'select' or 'and' instruction in IR to accomplish the masking. This is more consistent with the goal that eventually we will have no intrinsics that have masking builtin. When we reach that goal, we should have no intrinsics named "avx512.mask".
llvm-svn: 335744
This prevents InstCombine from creating mis-sized dbg.values when
replacing a sequence of casts with a simpler cast. For example, in:
(fptrunc (floor (fpext X))) -> (floorf X)
We no longer emit dbg.value(X) (with a 32-bit float operand) to describe
(fpext X) (which is a 64-bit float).
This was diagnosed by the debugify check added in r335682.
llvm-svn: 335696
Not sure why this logic seems to be repeated in 2 different places,
one called by the other.
On AMDGPU addrspace(3) globals start allocating at 0, so these
checks will be incorrect (not that real code actually tries
to compare these addresses)
llvm-svn: 335649
Similar to other patches in this series:
https://reviews.llvm.org/rL335512https://reviews.llvm.org/rL335527https://reviews.llvm.org/rL335597https://reviews.llvm.org/rL335616
...this is filling a gap in analysis that is exposed by an unrelated select-of-constants transform.
I didn't see a way to unify the sext cases because each div/rem opcode results in a different fold.
Note that in this case, the backend might want to convert the select into math:
Name: sext urem
%e = sext i1 %x to i32
%r = urem i32 %y, %e
=>
%c = icmp eq i32 %y, -1
%z = zext i1 %c to i32
%r = add i32 %z, %y
llvm-svn: 335622
Note: I didn't add a hasOneUse() check because the existing,
related fold doesn't have that check. I suspect that the
improved analysis and codegen make these some of the rare
canonicalization cases where we allow an increase in
instructions.
llvm-svn: 335597
Turn canonicalized subtraction back into (-1 - B) and combine it with (A + 1) into (A - B).
This is similar to the folding already done for (B ^ -1) + Const into (-1 + Const) - B.
Differential Revision: https://reviews.llvm.org/D48535
llvm-svn: 335579
We canonicalize to select with a zext-add and either zext-sub or sext-sub,
so this shows a pattern that's not conforming to the general trend.
llvm-svn: 335506
This one shows another pattern that we'll need to match
in some cases, but the current ordering of folds allows
us to match this as 2 binops before simplification takes
place.
llvm-svn: 335347
With non-commutative binops, we could be using the same
variable value as operand 0 in 1 binop and operand 1 in
the other, so we have to check for that possibility and
bail out.
llvm-svn: 335312
This is the simplest case from PR37806:
https://bugs.llvm.org/show_bug.cgi?id=37806
If we have a common variable operand used in a pair of binops with vector constants
that are vector selected together, then we can constant shuffle the constant vectors
to eliminate the shuffle instruction.
This has some tricky parts that are hopefully addressed in the tests and their
respective comments:
1. If the shuffle mask contains an undef element, then that lane of the result is
undef:
http://llvm.org/docs/LangRef.html#shufflevector-instruction
Therefore, we can replace the constant in that lane with an undef value except
for div/rem. With div/rem, an undef in the divisor would cause the whole op to
be undef. So I'm using the same hack as in D47686 - replace the undefs with '1'.
2. Intersect the wrapping and FMF of the original binops for the new binop. There
should be no extra poison or fast-math potential in the new binop that wasn't
possible in the original code.
3. Disregard other uses. Given that we're eliminating uses (shortening the
dependency chain), I think that's always the right IR canonicalization. But
I purposely chose the udiv test to demonstrate the scenario where both
intermediate values have other uses because that seems likely worse for
codegen with an expensive math op. This seems like a very rare possibility to
me, so I don't think it requires a backend patch first.
Differential Revision: https://reviews.llvm.org/D48401
llvm-svn: 335283
The previous code worked with vectors, but it failed when the
vector constants contained undef elements.
The matchers handle those cases.
llvm-svn: 335262
Summary:
Use the expanded features of the TableGen generic tables to avoid manually
adding the combinatorially exploded set of intrinsics. The
getAMDGPUImageDimIntrinsic lookup function is early-out,
i.e. non-AMDGPU intrinsics will never look at the underlying table.
Use a generic approach for getting the new intrinsic overload to keep the
code simple, and make the image dmask handling more generic:
- handle non-sampler image loads
- handle the case where the set of demanded elements is not a prefix
There is some overlap between this code and an optimization that happens
in the backend during code generation. They currently complement each other:
- only the codegen optimization can generate vec3 loads
- only the InstCombine optimization can handle D16
The InstCombine optimization also likely covers more cases since the
codegen optimization is fairly ad-hoc. Ideally, we'll remove the optimization
in codegen once the infrastructure for vec3 is in place (which will probably
take a long time).
Modify the test cases to use dimension-aware intrinsics. This makes it
easier to see that the test coverage for the new intrinsics is equivalent,
and the old style intrinsics will be removed in a follow-up commit anyway.
Change-Id: I4b91ea661413d13004956fe4ef7d13d41b8ce3ad
Reviewers: arsenm, rampitec, majnemer
Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D48165
llvm-svn: 335230
This patch replaces calls to X86-specific intrinsics with floor-ceil semantics
with calls to target-independent @llvm.floor.* and @llvm.ceil.* intrinsics. This
doesn't affect the resulting machine code, as those intrinsics are lowered to
the same instructions, but exposes these specific rounding cases to generic
optimizations.
Differential Revision: https://reviews.llvm.org/D48067
llvm-svn: 335039
Summary:
When iterating users of a multiply in processUMulZExtIdiom, the
call to setOperand in the truncation case may replace the use
being visited; make sure the iterator has been advanced before
doing that replacement.
Reviewers: majnemer, davide
Reviewed By: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D48192
llvm-svn: 334844
This is r334704 (which was reverted in r334732) with a fix for
types like x86_fp80. We need to use getTypeAllocSizeInBits and
not getTypeStoreSizeInBits to avoid dropping debug info for
such types.
Original commit msg:
> Summary:
> Do not convert a DbgDeclare to DbgValue if the store
> instruction only refer to a fragment of the variable
> described by the DbgDeclare.
>
> Problem was seen when for example having an alloca for an
> array or struct, and there were stores to individual elements.
> In the past we inserted a DbgValue intrinsics for each store,
> just as if the store wrote the whole variable.
>
> When handling store instructions we insert a DbgValue that
> indicates that the variable is "undefined", as we do not know
> which part of the variable that is updated by the store.
>
> When ConvertDebugDeclareToDebugValue is used with a load/phi
> instruction we assert that the referenced value is large enough
> to cover the whole variable. Afaict this should be true for all
> scenarios where those methods are used on trunk. If the assert
> blows in the future I guess we could simply skip to insert a
> dbg.value instruction.
>
> In the future I think we should examine which part of the variable
> that is accessed, and add a DbgValue instrinsic with an appropriate
> DW_OP_LLVM_fragment expression.
>
> Reviewers: dblaikie, aprantl, rnk
>
> Reviewed By: aprantl
>
> Subscribers: JDevlieghere, llvm-commits
>
> Tags: #debug-info
>
> Differential Revision: https://reviews.llvm.org/D48024
llvm-svn: 334830
Summary:
We already do it for splat constants, but not just values.
Also, undef cases are mostly non-functional.
The original commit was reverted because
it broke tests for amdgpu backend, which i didn't check.
Now, the backed was updated to recognize these new
patterns, so we are good.
https://bugs.llvm.org/show_bug.cgi?id=37603https://rise4fun.com/Alive/cplX
Reviewers: spatel, craig.topper, mareko, bogner, rampitec, nhaehnle, arsenm
Reviewed By: spatel, rampitec, nhaehnle
Subscribers: wdng, nhaehnle, llvm-commits
Differential Revision: https://reviews.llvm.org/D47980
llvm-svn: 334818
Summary:
Do not convert a DbgDeclare to DbgValue if the store
instruction only refer to a fragment of the variable
described by the DbgDeclare.
Problem was seen when for example having an alloca for an
array or struct, and there were stores to individual elements.
In the past we inserted a DbgValue intrinsics for each store,
just as if the store wrote the whole variable.
When handling store instructions we insert a DbgValue that
indicates that the variable is "undefined", as we do not know
which part of the variable that is updated by the store.
When ConvertDebugDeclareToDebugValue is used with a load/phi
instruction we assert that the referenced value is large enough
to cover the whole variable. Afaict this should be true for all
scenarios where those methods are used on trunk. If the assert
blows in the future I guess we could simply skip to insert a
dbg.value instruction.
In the future I think we should examine which part of the variable
that is accessed, and add a DbgValue instrinsic with an appropriate
DW_OP_LLVM_fragment expression.
Reviewers: dblaikie, aprantl, rnk
Reviewed By: aprantl
Subscribers: JDevlieghere, llvm-commits
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D48024
llvm-svn: 334704
Summary:
`%ret = add nuw i8 %x, C`
From [[ https://llvm.org/docs/LangRef.html#add-instruction | langref ]]:
nuw and nsw stand for “No Unsigned Wrap” and “No Signed Wrap”,
respectively. If the nuw and/or nsw keywords are present,
the result value of the add is a poison value if unsigned
and/or signed overflow, respectively, occurs.
So if `C` is `-1`, `%x` can only be `0`, and the result is always `-1`.
I'm not sure we want to use `KnownBits`/`LVI` here, because there is
exactly one possible value (all bits set, `-1`), so some other pass
should take care of replacing the known-all-ones with constant `-1`.
The `test/Transforms/InstCombine/set-lowbits-mask-canonicalize.ll` change *is* confusing.
What happening is, before this: (omitting `nuw` for simplicity)
1. First, InstCombine D47428/rL334127 folds `shl i32 1, %NBits`) to `shl nuw i32 -1, %NBits`
2. Then, InstSimplify D47883/rL334222 folds `shl nuw i32 -1, %NBits` to `-1`,
3. `-1` is inverted to `0`.
But now:
1. *This* InstSimplify fold `%ret = add nuw i32 %setbit, -1` -> `-1` happens first,
before InstCombine D47428/rL334127 fold could happen.
Thus we now end up with the opposite constant,
and it is all good: https://rise4fun.com/Alive/OA9https://rise4fun.com/Alive/sldC
Was mentioned in D47428 review.
Follow-up for D47883.
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47908
llvm-svn: 334298
Summary:
`%r = shl nuw i8 C, %x`
As per langref:
```
If the nuw keyword is present, then the shift produces
a poison value if it shifts out any non-zero bits.
```
Thus, if the sign bit is set on `C`, then `%x` can only be `0`,
which means that `%r` can only be `C`.
Or in other words, set sign bit means that the signed value
is negative, so the constant is `<= 0`.
https://rise4fun.com/Alive/WMkhttps://rise4fun.com/Alive/udv
Was mentioned in D47428 review.
We already handle the `0` constant, https://godbolt.org/g/UZq1sJ, so this only handles negative constants.
Could use computeKnownBits() / LazyValueInfo,
but the cost-benefit analysis (https://reviews.llvm.org/D47891)
suggests it isn't worth it.
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47883
llvm-svn: 334222
The bug report:
https://bugs.llvm.org/show_bug.cgi?id=36036
...requests a DAG change for this, but an IR canonicalization
probably handles most cases. If we still want to match this
pattern in the backend, there's a proposal for that too:
D47831
Alive proofs including nsw/nuw cases that were first noted in:
D46988
https://rise4fun.com/Alive/Kmp
This patch is largely copied from the existing code that was
initially added with:
D40984
...but I didn't see much gain from trying to share code.
llvm-svn: 334137
Summary:
This is [[ https://bugs.llvm.org/show_bug.cgi?id=37603 | PR37603 ]].
https://godbolt.org/g/VCMNpShttps://rise4fun.com/Alive/idM
When doing bit manipulations, it is quite common to calculate some bit mask,
and apply it to some value via `and`.
The typical C code looks like:
```
int mask_signed_add(int nbits) {
return (1 << nbits) - 1;
}
```
which is translated into (with `-O3`)
```
define dso_local i32 @mask_signed_add(int)(i32) local_unnamed_addr #0 {
%2 = shl i32 1, %0
%3 = add nsw i32 %2, -1
ret i32 %3
}
```
But there is a second, less readable variant:
```
int mask_signed_xor(int nbits) {
return ~(-(1 << nbits));
}
```
which is translated into (with `-O3`)
```
define dso_local i32 @mask_signed_xor(int)(i32) local_unnamed_addr #0 {
%2 = shl i32 -1, %0
%3 = xor i32 %2, -1
ret i32 %3
}
```
Since we created such a mask, it is quite likely that we will use it in `and` next.
And then we may get rid of `not` op by folding into `andn`.
But now that i have actually looked:
https://godbolt.org/g/VTUDmU
_some_ backend changes will be needed too.
We clearly loose `bzhi` recognition.
Reviewers: spatel, craig.topper, RKSimon
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47428
llvm-svn: 334127
These were added for D5603 / rL219542, and there's a proposal to
change one side in D47807.
These are tests of constant propagation, so they shouldn't have
ever been tested/housed under InstCombine.
llvm-svn: 334107
We should never get different CodeGen based on whether the code is being
compiled in debug mode so we must skip over @llvm.dbg.value (and similar)
calls.
Should fix at least the worst part of PR37690.
llvm-svn: 334090
When adjusting a cmp in order to canonicalize an abs/nabs select pattern we need
to use the type of the existing operand when creating a new operand not the
type of a select operand, as the two may be different.
This fixes PR37686.
llvm-svn: 334019
As noted in rL333782, we can be both better for optimization and
safer with this transform:
BinOp (shuffle V1, Mask), C --> shuffle (BinOp V1, NewC), Mask
The only potentially unsafe-to-speculate binops are integer div/rem.
All other binops are always safe (although I don't see a way to
assert that in code here).
For opcodes like shifts that can produce poison, it can't matter
here because we know the lanes with undef are dropped by the
subsequent shuffle.
Differential Revision: https://reviews.llvm.org/D47686
llvm-svn: 333962
After r333856, opt -debugify would just stop emitting debug value
intrinsics after encountering a musttail call. This wasn't sufficient to
avoid verifier failures.
Debug value intrinicss for all instructions preceding a musttail call
must also be emitted before the musttail call.
llvm-svn: 333866
When we optimize select basing on fact that div by 0 is undef
we should not traverse the instruction which are not guaranteed to
transfer execution to next instruction. Guard intrinsic is an example.
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47576
llvm-svn: 333864
There's a patchwork of existing transforms trying to handle
these cases, but as seen in the changed test, we weren't
catching them all.
llvm-svn: 333845
As noted in the review thread for rL333782, we could have
made a bug harder to hit if we were simplifying instructions
before trying other folds.
The shuffle transform in question isn't ever a simplification;
it's just a canonicalization. So I've renamed that to make that
clearer.
This is NFCI at this point, but I've regenerated the test file
to show the cosmetic value naming difference of using
instcombine's RAUW vs. the builder.
Possible follow-ups:
1. Move reassociation folds after simplifies too.
2. Refactor common code; we shouldn't have so much repetition.
llvm-svn: 333820
As noted in the review thread for rL333782, we're lacking coverage
for this transform, so add tests for each binop opcode with constant
operand.
llvm-svn: 333818
This bug:
https://bugs.llvm.org/show_bug.cgi?id=37648
...was created with the enhancement to this transform with rL332479.
The urem test shows the disaster potential: any undef divisor lane makes
the whole op undef.
The test diffs show that vector demanded elements turns some of the potential,
but not all, unused binop operands back into undef already.
llvm-svn: 333782
This is the planned enhancement to D47163 / rL333611.
We want to match cmp/select sizes because that will be recognized
as min/max more easily and lead to better codegen (especially for
vector types).
As mentioned in D47163, this improves some of the tests that would
also be folded by D46380, so we may want to adjust that patch to
match the new patterns where the extend op occurs after the select.
llvm-svn: 333689
Convert a vector load intrinsic into an llvm load instruction.
This is beneficial when the underlying object being addressed
comes from a constant, since we get constant-folding for free.
Differential Revision: https://reviews.llvm.org/D46273
llvm-svn: 333643
In post-commit review, Eric Christopher notes that many
new MSan warnings are being observed with this patch.
The probable reason is: if 'y' is undef here and we could
evaluate it twice and get different results.
We can't increase the number of uses of a value.
llvm-svn: 333631
Don't always:
cast (select (cmp x, y), z, C) --> select (cmp x, y), (cast z), C'
This is something that came up as far back as D26556, and I lost track of it.
I suspect that this transform is part of the underlying problem that is
inspiring some of the recent proposals that seek to match larger patterns
that include a cast op. Even if that's not true, this transform causes
problems for codegen (particularly with vector types).
A transform to actively match the size of cmp and select operand sizes should
follow. This patch just removes the harmful canonicalization in the other
direction.
Differential Revision: https://reviews.llvm.org/D47163
llvm-svn: 333611
Turning a table lookup intrinsic into a shuffle vector instruction
can be beneficial. If the mask used for the lookup is the constant
vector {7,6,5,4,3,2,1,0}, then the back-end generates byte reverse
instructions instead.
Differential Revision: https://reviews.llvm.org/D46133
llvm-svn: 333550
Libfuzzer tests have been fixed to prevent being optimized.
Original commit message:
If the nsw flag is used in the absolute value then it is undefined for INT_MIN. For all other value it will produce a positive number. So we can assume the result is positive.
This breaks some InstCombine abs/nabs combining tests because we simplify the second compare from known bits rather than as the whole pattern. Looks like we can probably fix it by adding a neg+abs/nabs combine to just swap the select operands. N
Differential Revision: https://reviews.llvm.org/D47041
llvm-svn: 333300
Setting the "Debug Info Version" module flag makes it possible to pipe
synthetic debug info into llc, which is useful for testing backends.
llvm-svn: 333237
If the nsw flag is used in the absolute value then it is undefined for INT_MIN. For all other value it will produce a positive number. So we can assume the result is positive.
This breaks some InstCombine abs/nabs combining tests because we simplify the second compare from known bits rather than as the whole pattern. Looks like we can probably fix it by adding a neg+abs/nabs combine to just swap the select operands. Need to check alive to make sure there are no corner cases.
Differential Revision: https://reviews.llvm.org/D47041
llvm-svn: 333226
Reassociation of math ops in some contexts (especially vector contexts)
has generally only been happening when the 'fast' FMF was set. This
enables reassoication when only the finer grained controls 'reassoc' and
'nsz' are set.
Differential Revision: https://reviews.llvm.org/D47335
llvm-svn: 333221
The ARM/ARM64 AESE and AESD instructions have a builtin XOR as the first step in
the instruction. Therefore, if the AES key is zero and the AES data was
previously XORed, it can be combined into a single instruction.
Differential Revision: https://reviews.llvm.org/D47239
Patch by Michael Brase!
llvm-svn: 333193
Summary:
Finally fixes [[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]].
Now that the backend is all done, we can finally fold it!
The canonical unfolded masked merge pattern is
```(x & m) | (y & ~m)```
There is a second, equivalent variant:
```(x | ~m) & (y | m)```
Only one of them (the or-of-and's i think) is canonical.
And if the mask is not a constant, we should fold it to:
```((x ^ y) & M) ^ y```
https://rise4fun.com/Alive/ndQw
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: nicholas, RKSimon, llvm-commits
Differential Revision: https://reviews.llvm.org/D46814
llvm-svn: 333106
Also, produce the canonical IR abs (s<0) to be more efficient.
This is the libcall equivalent of the clang builtin change from:
rL333038
Pasting from that commit message:
The stdlib functions are defined in section 7.20.6.1 of the C standard with:
"If the result cannot be represented, the behavior is undefined."
That lets us mark the negation with 'nsw' because "sub i32 0, INT_MIN" would
be UB/poison.
llvm-svn: 333042
Summary: Previous patch does not care if a value is changed between calloc and strlen. This needs to be removed from InstCombine and maybe moved to DSE later after some rework.
Reviewers: efriedma
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47218
llvm-svn: 333022
We can eliminate old value if bound_ctrl = 1 and row_mask = bank_mask = 0xf.
This is alternative implementation working with the intrinsic in InstCombine.
Original review for past-ISel optimization: D46570.
Differential Revision: https://reviews.llvm.org/D46596
llvm-svn: 332956
In all cases, we're pulling the cast above the select.
That's not a good canonicalization if we're creating
a select that then mismatches the operand size of its
condition.
llvm-svn: 332883
Summary:
This patch fixes PR37526 by simplifying the newly generated LoadInst
instructions. If the pointer address is a bitcast from the pointer to
the NewType, we can just remove this extra bitcast instead of creating
the new one. This fixes the PR37526 + may speed up the whole compilation
process.
Reviewers: spatel, RKSimon, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47144
llvm-svn: 332855
We already do this for min/max (see the blob above the diff),
so we should do the same for abs/nabs.
A sign-bit check (<s 0) is used as a predicate for other IR
transforms and it's likely the best for codegen.
This might solve the motivating cases for D47037 and D47041,
but I think those patches still make sense. We can't guarantee
this canonicalization if the icmp has more than one use.
Differential Revision: https://reviews.llvm.org/D47076
llvm-svn: 332819
According to alive this is valid. I'm hoping to use this to make an assumption that the sign bit is zero after this sequence. The only way it wouldn't be is if the input was INT__MIN, but by preserving the flags we can make doing this to INT_MIN UB.
The nuw flags is weird because it creates such a contradiction that the original number would have to be positive meaning we could remove the select entirely, but we don't get that far.
Differential Revision: https://reviews.llvm.org/D46988
llvm-svn: 332623
The existing comment said that the functions were available only
on GNU/Linux (and on certain Android versions), but only checked
T.isGNUEnvironment() which also is true on MinGW (for arch-windows-gnu
triplets), which doesn't have such functions.
Existing checks in the initialize function in TargetLibraryInfo.cpp
also use only T.isOSLinux() to check for glibc features.
This fixes use of stdio on MinGW.
Differential Revision: https://reviews.llvm.org/D47002
llvm-svn: 332581
The canonicalization was restricted to shuffle masks with
a 1-to-1 mapping to the constant vector, but that disqualifies
the common splat pattern. This is part of solving PR37463:
https://bugs.llvm.org/show_bug.cgi?id=37463
llvm-svn: 332479
Summary: If file stream arg is not captured and source is fopen, we could replace IO calls by unlocked IO ("_unlocked" function variants) to gain better speed,
Reviewers: efriedma, RKSimon, spatel, sanjoy, hfinkel, majnemer, lebedev.ri, rja
Reviewed By: rja
Subscribers: rja, srhines, efriedma, lebedev.ri, llvm-commits
Differential Revision: https://reviews.llvm.org/D45736
llvm-svn: 332452
Summary:
Part of the InstCombine code for simplifying GEPs looks through
addrspacecasts. However, this was done by updating a variable
also used by the next transformation, for marking GEPs as
inbounds. This led to replacing a GEP with a similar instruction
in a different addrspace, which caused an assertion failure in RAUW.
This caused julia issue https://github.com/JuliaLang/julia/issues/27055
Patch by Jeff Bezanson <jeff@juliacomputing.com>
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D46722
llvm-svn: 332302
Summary:
This change adds handling of the atomic memset intrinsic to the
code path that simplifies the regular memset. In practice this means
that we will now also expand a small constant-length atomic memset
into a single unordered atomic store.
Reviewers: apilipenko, skatkov, mkazantsev, anna, reames
Reviewed By: reames
Subscribers: reames, llvm-commits
Differential Revision: https://reviews.llvm.org/D46660
llvm-svn: 332132
If the sprintf function is static (as on mingw-w64, where many stdio
functions are static inline wrappers), earlier optimization passes
could optimize out the return value altogether, and make it void,
which could break optimizations of this libcall that touch the
return value.
This fixes the issue discussed in PR37408 for the sprintf function.
Differential Revision: https://reviews.llvm.org/D46752
llvm-svn: 332106
Summary:
This change reworks the handling of atomic memcpy within the instcombine pass.
Previously, a constant length atomic memcpy would be lowered into loads & stores
as long as no more than 16 load/store pairs are created. This is quite different
from the lowering done for a non-atomic memcpy; which only ever lowers into a single
load/store pair of no more than 8 bytes. Larger constant-sized memcpy calls are
expanded to load/stores in later passes, such as SelectionDAG lowering.
In this change the behaviour for atomic memcpy is unified with non-atomic memcpy;
atomic memcpy is now treated in the same was as non-atomic memcpy has always been.
We leave it to later passes to lower longer-length atomic memcpy calls.
Due to the structure of the pass's handling of memtransfer intrinsics, this change
also gives us handling of atomic memmove that we did not previously have.
Reviewers: apilipenko, skatkov, mkazantsev, anna, reames
Reviewed By: reames
Subscribers: reames, llvm-commits
Differential Revision: https://reviews.llvm.org/D46658
llvm-svn: 332093
These rotates take the form
(x << (n & mask)) | (x >> (-n & mask)) where mask is bitwidth - 1.
If x has been promoted to a wider type than its original bit width due to type promotion we fail to narrower it and therefore don't recognize it as a rotate.
llvm-svn: 332068
This reverts commit SVN r331889, which could trigger failed
assertions for cases where the snprintf function is declared
with a vaguely differing signature (e.g. being defined as
static inline), see PR37408.
llvm-svn: 332043
The previous handling for guard widening in InstCombine was extremely restrictive. In particular, it didn't handle the common case where we had two guards separated by a single icmp. Handle this by scanning through a small fixed window of instructions to find the next guard if needed.
Differential Revision: https://reviews.llvm.org/D46203
llvm-svn: 331935
This is safe as long as the udiv is not exact. The pattern is not common in
C++ code, but comes up all the time in code generated by XLA's GPU backend.
Differential Revision: https://reviews.llvm.org/D46647
llvm-svn: 331933
In order to set breakpoints on labels and list source code around
labels, we need collect debug information for labels, i.e., label
name, the function label belong, line number in the file, and the
address label located. In order to keep these information in LLVM
IR and to allow backend to generate debug information correctly.
We create a new kind of metadata for labels, DILabel. The format
of DILabel is
!DILabel(scope: !1, name: "foo", file: !2, line: 3)
We hope to keep debug information as much as possible even the
code is optimized. So, we create a new kind of intrinsic for label
metadata to avoid the metadata is eliminated with basic block.
The intrinsic will keep existing if we keep it from optimized out.
The format of the intrinsic is
llvm.dbg.label(metadata !1)
It has only one argument, that is the DILabel metadata. The
intrinsic will follow the label immediately. Backend could get the
label metadata through the intrinsic's parameter.
We also create DIBuilder API for labels to be used by Frontend.
Frontend could use createLabel() to allocate DILabel objects, and use
insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR.
Differential Revision: https://reviews.llvm.org/D45024
Patch by Hsiangkai Wang.
llvm-svn: 331841
This pattern came up in D46494.
I'm pretty sure we want to canonicalize it from
(x | ~m) & (y & m)
to
(x & m) | (y & ~m)
https://rise4fun.com/Alive/TEM
llvm-svn: 331625
Add logic for the special case when a cmp+select can clearly be
reduced to just a bitwise logic instruction, and remove an
over-reaching chunk of general purpose bit magic. The primary goal
is to remove cases where we are not improving the IR instruction
count when doing these select transforms, and in all cases here that
is true.
In the motivating 3-way compare tests, there are further improvements
because we can combine/propagate select values (not sure if that
belongs in instcombine, but it's there for now).
DAGCombiner has folds to turn some of these selects into bit magic,
so there should be no difference in the end result in those cases.
Not all constant combinations are handled there yet, however, so it
is possible that some targets will see more cmov/csel codegen with
this change in IR canonicalization.
Ideally, we'll go further to *not* turn selects into multiple
logic/math ops in instcombine, and we'll canonicalize to selects.
But we should make sure that this step does not result in regressions
first (and if it does, we should fix those in the backend).
The general direction for this change was discussed here:
http://lists.llvm.org/pipermail/llvm-dev/2016-September/105373.htmlhttp://lists.llvm.org/pipermail/llvm-dev/2017-July/114885.html
Alive proofs for the new bit magic:
https://rise4fun.com/Alive/XG7
Differential Revision: https://reviews.llvm.org/D46086
llvm-svn: 331486
instcombine should transform the relevant cases if the OverflowingBinaryOperator/PossiblyExactOperator can be proven to be safe.
Change-Id: I7aec62a31a894e465e00eb06aed80c3ea0c9dd45
llvm-svn: 331265
This test had values that differed in only in capitalization,
and that causes problems for the auto-generating check line
script. So I changed that in rL331226, but I accidentally
forgot to change a subsequent use of a param.
llvm-svn: 331228
Summary:
As discussed in D45733, we want to do this in InstCombine.
https://rise4fun.com/Alive/LGk
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: chandlerc, xbolva00, llvm-commits
Differential Revision: https://reviews.llvm.org/D45867
llvm-svn: 331205
Summary:
Masked merge has a pattern of: `((x ^ y) & M) ^ y`.
But, there is no difference between `((x ^ y) & M) ^ y` and `((x ^ y) & ~M) ^ x`,
We should canonicalize the pattern to non-inverted mask.
https://rise4fun.com/Alive/Yol
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45664
llvm-svn: 331112
Summary:
Masked merge has a pattern of: `((x ^ y) & M) ^ y`.
But, there is no difference between `((x ^ y) & M) ^ y` and `((x ^ y) & ~M) ^ x`,
We should canonicalize the pattern to non-inverted mask.
Differential Revision: https://reviews.llvm.org/D45663
llvm-svn: 331111
Summary:
Currently, we
1. match `LHS` matcher to the `first` operand of binary operator,
2. and then match `RHS` matcher to the `second` operand of binary operator.
If that does not match, we swap the `LHS` and `RHS` matchers:
1. match `RHS` matcher to the `first` operand of binary operator,
2. and then match `LHS` matcher to the `second` operand of binary operator.
This works ok.
But it complicates writing of commutative matchers, where one would like to match
(`m_Value()`) the value on one side, and use (`m_Specific()`) it on the other side.
This is additionally complicated by the fact that `m_Specific()` stores the `Value *`,
not `Value **`, so it won't work at all out of the box.
The last problem is trivially solved by adding a new `m_c_Specific()` that stores the
`Value **`, not `Value *`. I'm choosing to add a new matcher, not change the existing
one because i guess all the current users are ok with existing behavior,
and this additional pointer indirection may have performance drawbacks.
Also, i'm storing pointer, not reference, because for some mysterious-to-me reason
it did not work with the reference.
The first one appears trivial, too.
Currently, we
1. match `LHS` matcher to the `first` operand of binary operator,
2. and then match `RHS` matcher to the `second` operand of binary operator.
If that does not match, we swap the ~~`LHS` and `RHS` matchers~~ **operands**:
1. match ~~`RHS`~~ **`LHS`** matcher to the ~~`first`~~ **`second`** operand of binary operator,
2. and then match ~~`LHS`~~ **`RHS`** matcher to the ~~`second`~ **`first`** operand of binary operator.
Surprisingly, `$ ninja check-llvm` still passes with this.
But i expect the bots will disagree..
The motivational unittest is included.
I'd like to use this in D45664.
Reviewers: spatel, craig.topper, arsenm, RKSimon
Reviewed By: craig.topper
Subscribers: xbolva00, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D45828
llvm-svn: 331085