Lack of an attribute means that the function hasn't been checked for what vector width it requires. So if the caller or the callee doesn't have the attribute we should make sure the combined function after inlining does not have the attribute.
If the caller already doesn't have the attribute we can just avoid adding it. Otherwise if the callee doesn't have the attribute just remove the caller's attribute.
llvm-svn: 347841
Packing the flags into one bitcode word will save effort in
adding new flags in the future.
Differential Revision: https://reviews.llvm.org/D54755
llvm-svn: 347806
In PR39807 we incorrectly handle circumstances where calls are common'd
from conditional blocks into the parent BB. Calls that can be inlined
must always have DebugLocs, however we strip them during commoning, which
the IR verifier asserts on.
Fix this by using applyMergedLocation: it will perform the same DebugLoc
stripping of conditional Locs, but will also generate an unknown location
DebugLoc that satisfies the requirement for inlinable calls to always have
locations.
Some of the prior logic for selecting a DebugLoc is now likely redundant;
I'll generate a follow-up to remove it (involves editing more regression
tests).
Differential Revision: https://reviews.llvm.org/D54997
llvm-svn: 347782
This commit caused failures because it failed to correctly handle cases where
we hoist a phi, then hoist a use of that phi, then have to rehoist that use. We
need to make sure that we rehoist the use to _after_ the hoisted phi, which we
do by always rehoisting to the immediate dominator instead of just rehoisting
everything to the original preheader.
An option is also added to control whether control flow is hoisted, which is
off in this commit but will be turned on in a subsequent commit.
Differential Revision: https://reviews.llvm.org/D52827
llvm-svn: 347776
Combine
sat(sat(X + C1) + C2) -> sat(X + (C1+C2))
and
sat(sat(X - C1) - C2) -> sat(X - (C1+C2))
if the sign of C1 and C2 matches.
In the unsigned case we can compute C1+C2 with saturating arithmetic,
and InstSimplify will reduce this just to the saturation value. For
the signed case, we cannot perform the simplification if the result
of the addition overflows.
This change is part of https://reviews.llvm.org/D54534.
llvm-svn: 347773
Canonicalize ssub.sat(X, C) to ssub.sat(X, -C) if C is constant and
not signed minimum. This will help further optimizations to apply.
This change is part of https://reviews.llvm.org/D54534.
llvm-svn: 347772
Always-overflow was already determined for unsigned addition, but
not subtraction. This patch establishes parity.
This allows us to perform some additional simplifications for
signed saturating subtractions.
This change is part of https://reviews.llvm.org/D54534.
llvm-svn: 347771
If ValueTracking can determine that the add/sub can newer overflow,
replace it with the corresponding nuw/nsw add/sub.
Additionally, for the unsigned case, if ValueTracking determines
that the add/sub always overflows, replace the result with the
saturation value.
This change is part of https://reviews.llvm.org/D54534.
llvm-svn: 347770
If a saturating add intrinsic has one constant argument, make sure
it is on the RHS. This will simplify further transformations.
This change is part of https://reviews.llvm.org/D54534.
llvm-svn: 347769
Summary:
If the original reduction root instruction was vectorized, it might be
removed from the tree. It means that the insertion point may become
invalidated and the whole vectorization of the reduction leads to the
incorrect output result.
The ReductionRoot instruction must be marked as externally used so it
could not be removed. Otherwise it might cause inconsistency with the
cost model and we may end up with too optimistic optimization.
Reviewers: RKSimon, spatel, hfinkel, mkuper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D54955
llvm-svn: 347759
InlineCost also treats them as free and the current implementation
can cause assertion failures if PHI nodes are moved outside the region
from entry BBs to the region.
It also updates the code to use the instructionsWithoutDebug iterator.
Reviewers: davidxl, davide, vsk, graham-yiu-huawei
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D54748
llvm-svn: 347683
It fixes a bug that doesn't update Phi inputs of the only live successor that
is in the list of block's successors more than once.
Thanks @uabelho for finding this.
Differential Revision: https://reviews.llvm.org/D54849
Reviewed By: anna
llvm-svn: 347640
Summary:
Removing ncompatible attributes at indirect-call promoted callsites, not removing it results in
at least a IR verification error.
Reviewers: davidxl, xur, mssimpso
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D54913
llvm-svn: 347605
This reverts commit r346970.
It was causing PR39774, a crash in slp-vectorizer on a rather simple loop
with just a bunch of 'and's in the body.
llvm-svn: 347541
OriginalOp of a Predicate refers to the original IR value,
before renaming. While solving in IPSCCP, we have to use
the operand of the ssa_copy instead, to avoid missing
updates for nested conditions on the same IR value.
Fixes PR39772.
llvm-svn: 347524
Support funnel shifts in InstCombine demanded bits simplification.
If the shift amount is constant, we can determine both the demanded
bits of the operands, as well as the known bits of the result.
If one of the operands has no demanded bits, it will be replaced
by undef and the funnel shift will be simplified into a simple shift
due to the simplifications added in D54778.
Differential Revision: https://reviews.llvm.org/D54869
llvm-svn: 347515
This changeset is modeled after Intel's submission for SVML. It enables
trigonometry functions vectorization via SLEEF: http://sleef.org/.
* A new vectorization library enum is added to TargetLibraryInfo.h: SLEEF.
* A new option is added to TargetLibraryInfoImpl - ClVectorLibrary: SLEEF.
* A comprehensive test case is included in this changeset.
* In a separate changeset (for clang), a new vectorization library argument is
added to -fveclib: -fveclib=SLEEF.
Trigonometry functions that are vectorized by sleef:
acos
asin
atan
atanh
cos
cosh
exp
exp2
exp10
lgamma
log10
log2
log
sin
sinh
sqrt
tan
tanh
tgamma
Patch by Stefan Teleman
Differential Revision: https://reviews.llvm.org/D53927
llvm-svn: 347510
The following simplifications are implemented:
* `fshl(X, 0, C) -> shl X, C%BW`
* `fshl(X, undef, C) -> shl X, C%BW` (assuming undef = 0)
* `fshl(0, X, C) -> lshr X, BW-C%BW`
* `fshl(undef, X, C) -> lshr X, BW-C%BW` (assuming undef = 0)
* `fshr(X, 0, C) -> shl X, (BW-C%BW)`
* `fshr(X, undef, C) -> shl X, BW-C%BW` (assuming undef = 0)
* `fshr(0, X, C) -> lshr X, C%BW`
* `fshr(undef, X, C) -> lshr, X, C%BW` (assuming undef = 0)
The simplification is only performed if the shift amount C is constant,
because we can explicitly compute C%BW and BW-C%BW in this case.
Differential Revision: https://reviews.llvm.org/D54778
llvm-svn: 347505
When removing edges, we also update Phi inputs and may end up removing
a Phi if it has only one input. We should not do it for edges that leave the current
loop because these Phis are LCSSA Phis and need to be preserved.
Thanks @dmgreen for finding this!
Differential Revision: https://reviews.llvm.org/D54841
llvm-svn: 347484
The MergeFunctions pass was originally intended to emit aliases
instead of thunks where possible (unnamed_addr). However, for a
long time this functionality was behind a flag hardcoded to false,
bitrotted and was eventually removed in r309313.
Originally the functionality was first disabled in r108417 due to
lack of support for aliases in Mach-O. I believe that this is no
longer the case nowadays, but not really familiar with this area.
In the interest of being conservative, this patch reintroduces the
aliasing functionality behind a default disabled -mergefunc-use-aliases
flag.
Differential Revision: https://reviews.llvm.org/D53285
llvm-svn: 347407
LVI was symbolically executing binary operators only when the RHS was
constant, missing the case where we have a ConstantRange for the RHS,
but not an actual constant. Tested using check-all and by
bootstrapping. Compile time is not impacted measurably.
Differential Revision: https://reviews.llvm.org/D19859
llvm-svn: 347379
These are part of D54666, so adding them here before the patch to
show the baseline (currently unoptimized) results.
Patch by: @nikic (Nikita Popov)
llvm-svn: 347331
This patch fixes PR39695.
The original LoopSink only considers memory alias in loop body. But PR39695 shows that instructions following sink candidate in preheader should also be checked. This is a conservative patch, it simply adds whole preheader block to alias set. It may lose some optimization opportunity, but I think that is very rare because: 1 in the most common case st/ld to the same address, the load should already be optimized away. 2 usually preheader is not very large.
Differential Revision: https://reviews.llvm.org/D54659
llvm-svn: 347325
This patch fixes the issue noticed in D54532.
The problem is that cst_pred_ty-based matchers like m_Zero() currently do not match
scalar undefs (as expected), but *do* match vector undefs. This may lead to optimization
inconsistencies in rare cases.
There is only one existing test for which output changes, reverting the change from D53205.
The reason here is that vector fsub undef, %x is no longer matched as an m_FNeg(). While I
think that the new output is technically worse than the previous one, it is consistent with
scalar, and I don't think it's really important either way (generally that undef should have
been folded away prior to reassociation.)
I've also added another test case for this issue based on InstructionSimplify. It took some
effort to find that one, as in most cases undef folds are either checked first -- and in the
cases where they aren't it usually happens to not make a difference in the end. This is the
only case I was able to come up with. Prior to this patch the test case simplified to undef
in the scalar case, but zeroinitializer in the vector case.
Patch by: @nikic (Nikita Popov)
Differential Revision: https://reviews.llvm.org/D54631
llvm-svn: 347318
The initial version of patch lacked Phi nodes updates in destinations of removed
edges. This version contains this update and tests on this situation.
Differential Revision: https://reviews.llvm.org/D54021
llvm-svn: 347289
Summary:
Currently, when vectorizing stores to uniform addresses, the only
instance we prevent vectorization is if there are multiple stores to the
same uniform address causing an unsafe dependency.
This patch teaches LAA to avoid vectorizing loops that have an unsafe
cross-iteration dependency between a load and a store to the same uniform address.
Fixes PR39653.
Reviewers: Ayal, efriedma
Subscribers: rkruppe, llvm-commits
Differential Revision: https://reviews.llvm.org/D54538
llvm-svn: 347220
The general approach taken is to make note of loop invariant branches, then when
we see something conditional on that branch, such as a phi, we create a copy of
the branch and (empty versions of) its successors and hoist using that.
This has no impact by itself that I've been able to see, as LICM typically
doesn't see such phis as they will have been converted into selects by the time
LICM is run, but once we start doing phi-to-select conversion later it will be
important.
Differential Revision: https://reviews.llvm.org/D52827
llvm-svn: 347190
This patch introduces infrastructure and the simplest case for constant-folding
of branch and switch instructions within loop into unconditional branches.
It is useful as a cleanup for such passes as loop unswitching that sometimes
produce such branches.
Only the simplest case supported in this patch: after the folding, no block
should become dead or stop being part of the loop. Support for more
sophisticated cases will go separately in follow-up patches.
Differential Revision: https://reviews.llvm.org/D54021
Reviewed By: anna
llvm-svn: 347183