In r339636 the alias analysis rules were changed with regards to tail calls
and byval arguments. Previously, tail calls were assumed not to alias
allocas from the current frame. This has been updated, to not assume this
for arguments with the byval attribute.
This patch aligns TailCallElim with the new rule. Tail marking can now be
more aggressive and mark more calls as tails, e.g.:
define void @test() {
%f = alloca %struct.foo
call void @bar(%struct.foo* byval %f)
ret void
}
define void @test2(%struct.foo* byval %f) {
call void @bar(%struct.foo* byval %f)
ret void
}
define void @test3(%struct.foo* byval %f) {
%agg.tmp = alloca %struct.foo
%0 = bitcast %struct.foo* %agg.tmp to i8*
%1 = bitcast %struct.foo* %f to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 40, i1 false)
call void @bar(%struct.foo* byval %agg.tmp)
ret void
}
The problematic case where a byval parameter is captured by a call is still
handled correctly, and will not be marked as a tail (see PR7272).
llvm-svn: 343986
Summary:
If we have a symbol with (linkonce|weak)_odr linkage, we do not want
to dead strip it even it is not prevailing.
IR level (linkonce|weak)_odr symbol can become non-prevailing when we mix
ELF objects and IR objects where the (linkonce|weak)_odr symbol in the ELF
object is prevailing and the ones in the IR objects are not. Stripping
them will prevent us from doing optimizations with them.
By not dead stripping them, We will convert these symbols to
available_externally linkage as a result of non-prevailing and eventually
dropping them after inlining.
I modified cache-prevailing.ll to use linkonce linkage as it is
testing whether cache prevailing bit is effective or not, not
we should treat linkonce_odr alive or not
Reviewers: tejohnson, pcc
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D52893
llvm-svn: 343970
Currently running the @insertelem_after_gep function below through the InstCombine pass with opt produces invalid IR.
Input:
```
define void @insertelem_after_gep(<16 x i32>* %t0) {
%t1 = bitcast <16 x i32>* %t0 to [16 x i32]*
%t2 = addrspacecast [16 x i32]* %t1 to [16 x i32] addrspace(3)*
%t3 = getelementptr inbounds [16 x i32], [16 x i32] addrspace(3)* %t2, i64 0, i64 0
%t4 = insertelement <16 x i32 addrspace(3)*> undef, i32 addrspace(3)* %t3, i32 0
call void @extern_vec_pointers_func(<16 x i32 addrspace(3)*> %t4)
ret void
}
```
Output:
```
define void @insertelem_after_gep(<16 x i32>* %t0) {
%t3 = getelementptr inbounds <16 x i32>, <16 x i32>* %t0, i64 0, i64 0
%t4 = insertelement <16 x i32 addrspace(3)*> undef, i32 addrspace(3)* %t3, i32 0
call void @my_extern_func(<16 x i32 addrspace(3)*> %t4)
ret void
}
```
Which although causes no complaints when produced, isn't valid IR as the insertelement use of the %t3 GEP expects an address space.
```
opt: /tmp/bad.ll:52:73: error: '%t3' defined with type 'i32*' but expected 'i32 addrspace(3)*'
%t4 = insertelement <16 x i32 addrspace(3)*> undef, i32 addrspace(3)* %t3, i32 0
```
I've fixed this by adding an addrspacecast after the GEP in the InstCombine pass, and including a check for this type mismatch to the verifier.
Reviewers: spatel, lebedev.ri
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52294
llvm-svn: 343956
At the point when we perform `emitTransformedIndex`, we have a broken IR (in
particular, we have Phis for which not every incoming value is properly set). On
such IR, it is illegal to create SCEV expressions, because their internal
simplification process may try to prove some predicates and break when it
stumbles across some broken IR.
The only purpose of using SCEV in this particular place is attempt to simplify
the generated code slightly. It seems that the result isn't worth it, because
some trivial cases (like addition of zero and multiplication by 1) can be
handled separately if needed, but more generally InstCombine is able to achieve
the goals we want to achieve by using SCEV.
This patch fixes a functional crash described in PR39160, and as side-effect it
also generates a bit smarter code in some simple cases. It also may cause some
optimality loss (i.e. we will now generate `mul` by power of `2` instead of
shift etc), but there is nothing what InstCombine could not handle later. In
case of dire need, we can support more trivial cases just in place.
Note that this patch only fixes one particular case of the general problem that
LV misuses SCEV, attempting to create SCEVs or prove predicates on invalid IR.
The general solution, however, seems complex enough.
Differential Revision: https://reviews.llvm.org/D52881
Reviewed By: fhahn, hsaito
llvm-svn: 343954
This patch fixes PR39099.
When strided loads are predicated, each of them will form an interleaved-group
(with gaps). However, subsequent stages of vectorization (planning and
transformation) assume that if a load is part of an Interleave-Group it is not
predicated, resulting in wrong code - unmasked wide loads are created.
The Interleaving Analysis does take care not to have conditional interleave
groups of size > 1, but until we extend the planning and transformation stages
to support masked-interleave-groups we should also avoid having them for
size == 1.
Reviewers: Ayal, hsaito, dcaballe, fhahn
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D52682
llvm-svn: 343931
We established the (unfortunately complicated) rules for UB/poison
propagation with vector ops in:
D48893
D48987
D49047
It's clear from the affected tests that we are potentially creating
poison where none existed before the transforms. For add/sub/mul,
the answer is simple: just drop the flags because the extra undef
vector lanes are generally more valuable for analysis and codegen.
llvm-svn: 343819
This is a follow-up to rL343482 / D52439.
This was a pattern that initially caused the commit to be reverted because
the transform requires a bitcast as shown here.
llvm-svn: 343794
We're a long way from D50992 and D51553, but this is where we have to start.
We weren't back-propagating undefs into binop constant values for anything but
add/sub/mul/and/or/xor.
This is likely because we have to be careful about not introducing UB/poison
with div/rem/shift. But I suspect we already are getting the poison part wrong
for add/sub/mul (although it may not be possible to expose the bug currently
because we use SimplifyDemandedVectorElts from a limited set of opcodes).
See the discussion/implementation from D48987 and D49047.
This patch just enables functionality for FP ops because those do not have
UB/poison potential.
llvm-svn: 343727
Modified the testcases to use both pass managers
Use single commandline flag for both pass managers.
Differential Revision: https://reviews.llvm.org/D52708
Reviewers: sebpop, tejohnson, brzycki, SirishP
Reviewed By: tejohnson, brzycki
llvm-svn: 343662
These are candidates for the same fold that was implemented in
D52439, but FP types require bitcasting (and that changes the
extra uses profitability calculation).
llvm-svn: 343587
This is an attempt to get out of a local-minimum that instcombine currently
gets stuck in. We essentially combine two optimisations at once, ~a - ~b = b-a
and min(~a, ~b) = ~max(a, b), only doing the transform if the result is at
least neutral. This involves using IsFreeToInvert, which has been expanded a
little to include selects that can be easily inverted.
This is trying to fix PR35875, using the ideas from Sanjay. It is a large
improvement to one of our rgb to cmy kernels.
Differential Revision: https://reviews.llvm.org/D52177
llvm-svn: 343569
Summary:
This is a continuation of the fix for PR34627 "InstCombine assertion at vector gep/icmp folding". (I just realized bugpoint had fuzzed the original test for me, so I had fixed another trigger of the same assert in adjacent code in InstCombine.)
This patch avoids optimizing an icmp (to look only at the base pointers) when the resulting icmp would have a different type.
The patch adds a testcase and also cleans up and shrinks the pre-existing test for the adjacent assert trigger.
Reviewers: lebedev.ri, majnemer, spatel
Reviewed By: lebedev.ri
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52494
llvm-svn: 343486
This was originally committed at rL343407, but reverted at
rL343458 because it crashed trying to handle a case where
the destination type is FP. This version of the patch adds
a check for that possibility. Tests added at rL343480.
Original commit message:
This transform is requested for the backend in:
https://bugs.llvm.org/show_bug.cgi?id=39016
...but I figured it was worth doing in IR too, and it's probably
easier to implement here, so that's this patch.
In the simplest case, we are just truncating a scalar value. If the
extract index doesn't correspond to the LSBs of the scalar, then we
have to shift-right before the truncate. Endian-ness makes this tricky,
but hopefully the ASCII-art helps visualize the transform.
Differential Revision: https://reviews.llvm.org/D52439
llvm-svn: 343482
The first attempt at this transform:
rL343407
...was reverted:
rL343458
...because it did not handle the case where we bitcast to FP.
The patch was already limited to avoid the case where we
bitcast from FP, but we might want to transform that too.
llvm-svn: 343480
This caused Chromium builds to fail with "Illegal Trunc" assertion.
See https://crbug.com/890723 for repro.
> This transform is requested for the backend in:
> https://bugs.llvm.org/show_bug.cgi?id=39016
> ...but I figured it was worth doing in IR too, and it's probably
> easier to implement here, so that's this patch.
>
> In the simplest case, we are just truncating a scalar value. If the
> extract index doesn't correspond to the LSBs of the scalar, then we
> have to shift-right before the truncate. Endian-ness makes this tricky,
> but hopefully the ASCII-art helps visualize the transform.
>
> Differential Revision: https://reviews.llvm.org/D52439
llvm-svn: 343458
This transform is requested for the backend in:
https://bugs.llvm.org/show_bug.cgi?id=39016
...but I figured it was worth doing in IR too, and it's probably
easier to implement here, so that's this patch.
In the simplest case, we are just truncating a scalar value. If the
extract index doesn't correspond to the LSBs of the scalar, then we
have to shift-right before the truncate. Endian-ness makes this tricky,
but hopefully the ASCII-art helps visualize the transform.
Differential Revision: https://reviews.llvm.org/D52439
llvm-svn: 343407
As noted in post-commit comments for D52548, the limitation on
increasing vector length can be applied by opcode.
As a first step, this patch only allows insertelement to be
widened because that has no logical downsides for IR and has
little risk of pessimizing codegen.
This may cause PR39132 to go into hiding during a full compile,
but that bug is not fixed.
llvm-svn: 343406
InstCombine would propagate shufflevector insts that had wider output vectors onto
predecessors, which would sometimes push undef's onto the divisor of a div/rem and
result in bad codegen.
I've fixed this by just banning propagating shufflevector back if the result of
the shufflevector is wider than the input vectors.
Patch by: @sheredom (Neil Henning)
Differential Revision: https://reviews.llvm.org/D52548
llvm-svn: 343329
These are the updated baseline tests for D52548 -
I'm putting the tests next to the tests where the transform
functions as expected, so we can see the intended/unintended
consequences.
Patch by: @sheredom (Neil Henning)
llvm-svn: 343328
This shouldn't really happen in practice I hope, but we tried to handle other constant cases. We missed this one because we checked for ConstantVector without realizing that zero becomes ConstantAggregateZero instead.
So instead just check for Constant and use getAggregateElement which will do the dirty work for us.
llvm-svn: 343270
When C is not zero and infinites are not allowed (C / X) > 0 is a sign
test. Depending on the sign of C, the predicate must be swapped.
E.g.:
foo(double X) {
if ((-2.0 / X) <= 0) ...
}
=>
foo(double X) {
if (X >= 0) ...
}
Patch by: @marels (Martin Elshuber)
Differential Revision: https://reviews.llvm.org/D51942
llvm-svn: 343228
This patch extends LoopInterchange to move LCSSA to the right place
after interchanging. This is required for LoopInterchange to become a
function pass.
An alternative to the manual moving of the PHIs, we could also re-form
the LCSSA phis for a set of interchanged loops, but that's more
expensive.
Reviewers: efriedma, mcrosier, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D52154
llvm-svn: 343132
Adding NonNull as attributes to returned pointers has the unfortunate side
effect of disabling tail calls. This patch ignores the NonNull attribute when
we decide whether to tail merge, in the same way that we ignore the NoAlias
attribute, as it has no affect on the call sequence.
Differential Revision: https://reviews.llvm.org/D52238
llvm-svn: 343091
If the fsub in this pattern was replaced by an actual fneg
instruction, we would need to add a fold to recognize that
because fneg would not be a binop.
llvm-svn: 343041
Summary:
We are overly conservative in loop vectorizer with respect to stores to loop
invariant addresses.
More details in https://bugs.llvm.org/show_bug.cgi?id=38546
This is the first part of the fix where we start with vectorizing loop invariant
values to loop invariant addresses.
This also includes changes to ORE for stores to invariant address.
Reviewers: anemet, Ayal, mkuper, mssimpso
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D50665
llvm-svn: 343028
Summary:
In D49565/r337503, the type id record writing was fixed so that only
referenced type ids were emitted into each per-module index for ThinLTO
distributed builds. However, this still left an efficiency issue: each
per-module index checked all type ids for membership in the referenced
set, yielding O(M*N) performance (M indexes and N type ids).
Change the TypeIdMap in the summary to be indexed by GUID, to facilitate
correlating with type identifier GUIDs referenced in the function
summary TypeIdInfo structures. This allowed simplifying other
places where a map from type id GUID to type id map entry was previously
being used to aid this correlation.
Also fix AsmWriter code to handle the rare case of type id GUID
collision.
For a large internal application, this reduced the thin link time by
almost 15%.
Reviewers: pcc, vitalybuka
Subscribers: mehdi_amini, inglorion, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D51330
llvm-svn: 343021
The motivating case from:
https://bugs.llvm.org/show_bug.cgi?id=33026
...has no shuffles now. This kind of pattern may occur during
vectorization when targets have lumpy ISAs like SSE/AVX.
llvm-svn: 342988
In this patch, I'm adding an extra check to the Latch's terminator in llvm::UnrollRuntimeLoopRemainder,
similar to how it is already done in the llvm::UnrollLoop.
The compiler would crash if this function is called with a malformed loop.
Patch by Rodrigo Caetano Rocha!
Differential Revision: https://reviews.llvm.org/D51486
llvm-svn: 342958