Summary:
r284533 added hot and cold section prefixes based on profile
information, to enable grouping of hot/cold functions at link time.
However, it used "cold" as the prefix for cold sections, but gold only
recognizes "unlikely" (which is used by gcc for cold sections).
Therefore, cold sections were not properly being grouped. Switch to
using "unlikely"
Reviewers: danielcdh, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32983
llvm-svn: 302502
This is another step towards getting rid of dyn_castNotVal,
so we can recommit:
https://reviews.llvm.org/rL300977
As the tests show, we were missing the lshr case for constants
and both ashr/lshr vector splat folds. The ashr case with constant
was being performed inefficiently in 2 steps. It's also possible
there was a latent bug in that case because we can't do that fold
if the constant is positive:
http://rise4fun.com/Alive/Bge
llvm-svn: 302465
Transforms/IndVarSimplify/2011-10-27-lftrnull will fail if this regresses.
Transforms/GVN/PRE/2011-06-01-NonLocalMemdepMiscompile.ll has been changed to still test what it was
trying to test.
llvm-svn: 302446
This patch uses KnownOnes of the input of ctlz/cttz to bound the value that can be returned from these intrinsics. This makes these intrinsics more similar to the handling for ctpop which already uses known bits to produce a similar bound.
Differential Revision: https://reviews.llvm.org/D32521
llvm-svn: 302444
Summary:
Re-applying r301766 with a fix to a typo and a regression test.
The log message for r301766 was:
==================================================================================
InstructionSimplify: Canonicalize shuffle operands. NFC-ish.
Summary:
Apply canonicalization rules:
1. Input vectors with no elements selected from can be replaced with undef.
2. If only one input vector is constant it shall be the second one.
This allows constant-folding to cover more ad-hoc simplifications that
were in place and avoid duplication for RHS and LHS checks.
There are more rules we may want to add in the future when we see a
justification. e.g. mask elements that select undef elements can be
replaced with undef.
==================================================================================
Reviewers: spatel, RKSimon
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32863
llvm-svn: 302373
We can simplify (or (icmp X, C1), (icmp X, C2)) to 'true' or one of the icmps in many cases.
I had to check some of these with Alive to prove to myself it's right, but everything seems
to check out. Eg, the deleted code in instcombine was completely ignoring predicates with
mismatched signedness.
This is a follow-up to:
https://reviews.llvm.org/rL301260https://reviews.llvm.org/D32143
llvm-svn: 302370
Loop Idiom recognition was generating memset in a case that
would result generating a division operation to an unsafe location.
Differential Revision: https://reviews.llvm.org/D32674
llvm-svn: 302238
The sibling folds for 'and' with casts were added with https://reviews.llvm.org/rL273200.
This is a preliminary step for adding the 'or' variants for the folds added with https://reviews.llvm.org/rL301260.
The reason for the strange form with constant LHS in the 1st test is because there's another missing fold in that
case for the inverted predicate. That should be fixed when we add the ConstantRange functionality for 'or-of-icmps'
that already exists for 'and-of-icmps'.
I'm hoping to share more code for the and/or cases, so we won't have these differences. This will allow us to remove
code from InstCombine. It's also possible that we can remove some code here in InstSimplify. I think we have some
duplicated folds because patterns are not matched in a general way.
Differential Revision: https://reviews.llvm.org/D32876
llvm-svn: 302189
Change checkRippleForAdd from a heuristic to a full check -
if it is provable that the add does not overflow return true, otherwise false.
Patch by Yoav Ben-Shalom
Differential Revision: https://reviews.llvm.org/D32686
llvm-svn: 302093
Fixes PR31789 - When loop-vectorize tries to use these intrinsics for a
non-default address space pointer we fail with a "Calling a function with a
bad singature!" assertion. This patch solves this by adding the 'vector of
pointers' argument as an overloaded type which will determine the address
space.
Differential revision: https://reviews.llvm.org/D31490
llvm-svn: 302018
Summary:
Currently, loop deletion deletes loop where the only values
that are used outside the loop are loop-invariant.
This patch adds logic to delete loops where the loop is proven to be
never executed (i.e. the only predecessor of the loop preheader has a
constant conditional branch as terminator, and the preheader is not the
taken target). This will remove loops that become dead after
loop-unswitching generates constant conditional branches.
The next steps are:
1. moving the loop deletion implementation to LoopUtils.
2. Add logic in loop-simplifyCFG which will support changing conditional
constant branches to unconditional branches. If loops become unreachable in this
process, they can be removed using `deleteDeadLoop` function.
Reviewers: chandlerc, efriedma, sanjoy, reames
Reviewed by: sanjoy
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D32494
llvm-svn: 302015
We should always expect values to be named before running the module summary
analysis (see NameAnonGlobals pass), so it's fine if we crash in that case.
llvm-svn: 301991
This was originally checked in here:
https://reviews.llvm.org/rL301923
And reverted here:
https://reviews.llvm.org/rL301924
Because there's a clang test that would fail after this. I fixed/removed the
offending CHECK lines in:
https://reviews.llvm.org/rL301928
So let's try this again. Original commit message:
This is the fold that causes the infinite loop in BoringSSL
(https://github.com/google/boringssl/blob/master/crypto/cipher/e_rc2.c)
when we fix instcombine demanded bits to prefer 'not' ops as in https://reviews.llvm.org/D32255.
There are 2 or 3 problems with dyn_castNotVal, and I don't think we can
reinstate https://reviews.llvm.org/D32255 until dyn_castNotVal is completely eliminated.
1. As shown here, it transforms 'not' into random xor. This transform is harmful to SCEV and codegen because 'not' can often be folded while random xor cannot.
2. It does not transform vector constants. This is actually a good thing, but if you don't believe the above argument, then we shouldn't have excluded vectors.
3. It tries to avoid transforming not(not(X)). That's nice, but it doesn't match the greedy nature of instcombine. If we DeMorganize a pattern that has an extra 'not' in it: ~(~(~X) & Y) --> (~X | ~Y)
That's just another case of DeMorgan, so we should trust that we'll fold that pattern too: (~X | ~ Y) --> ~(X & Y)
Differential Revision: https://reviews.llvm.org/D32665
llvm-svn: 301929
This is the fold that causes the infinite loop in BoringSSL
(https://github.com/google/boringssl/blob/master/crypto/cipher/e_rc2.c)
when we fix instcombine demanded bits to prefer 'not' ops as in D32255.
There are 2 or 3 problems with dyn_castNotVal, and I don't think we can
reinstate D32255 until dyn_castNotVal is completely eliminated.
1. As shown here, it transforms 'not' into random xor. This transform is
harmful to SCEV and codegen because 'not' can often be folded while
random xor cannot.
2. It does not transform vector constants. This is actually a good thing,
but if you don't believe the above argument, then we shouldn't have
excluded vectors.
3. It tries to avoid transforming not(not(X)). That's nice, but it doesn't
match the greedy nature of instcombine. If we DeMorganize a pattern
that has an extra 'not' in it:
~(~(~X) & Y) --> (~X | ~Y)
That's just another case of DeMorgan, so we should trust that we'll fold
that pattern too:
(~X | ~ Y) --> ~(X & Y)
Differential Revision: https://reviews.llvm.org/D32665
llvm-svn: 301923
This change caused buildbot failures, apparently because we're not
passing around types that InstSimplify is used to seeing. I'm not overly
familiar with InstSimplify, so I'm reverting this until I can figure out
what exactly is wrong.
llvm-svn: 301885
In particular (since it wouldn't fit nicely in the summary):
(select (icmp eq V 0) P (getelementptr P V)) -> (getelementptr P V)
Differential Revision: https://reviews.llvm.org/D31435
llvm-svn: 301880
In the testcase attached, we believe %tmp1 implies %tmp4.
where:
br i1 %tmp1, label %bb2, label %bb7
br i1 %tmp4, label %bb5, label %bb7
because Wwhile looking at PredicateInfo stuffs we end up calling
isImpliedTrueByMatchingCmp() with the arguments backwards.
Differential Revision: https://reviews.llvm.org/D32718
llvm-svn: 301849
If we have ~(~X & Y), it only makes sense to transform it to (X | ~Y) when we do not need
the intermediate (~X & Y) value. In that case, we would need an extra instruction to
generate ~Y + 'or' (as shown in the test changes).
It's ok if we have multiple uses of ~X or Y, however. In those cases, we may not reduce the
instruction count or critical path, but we might improve throughput because we can generate
~X and ~Y in parallel. Whether that actually makes perf sense or not for a target is something
we can't answer in IR.
Differential Revision: https://reviews.llvm.org/D32703
llvm-svn: 301848
We may not be able to rewrite indirect branch target, but we also want to take it into
account when folding, i.e. if it and all its successor's predecessors go to the same
destination, we can fold, i.e. no need to thread.
llvm-svn: 301816
Summary: [JumpThread] Do RAUW in case Cond folds to a constant in the CFG
Reviewers: sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32407
llvm-svn: 301804
retainAutoreleasedReturnValue that retains the returned value.
This commit fixes a bug in ARC optimizer where it moves a release
between a call and a retainAutoreleasedReturnValue, causing the returned
object to be released before the retainAutoreleasedReturnValue can
retain it.
This commit accomplishes that by doing a lookahead and checking whether
the call prevents the release from moving upwards. In the long term, we
should treat the region between the retainAutoreleasedReturnValue and
the call as a critical section and disallow moving anything there
(possibly using operand bundles).
rdar://problem/20449878
llvm-svn: 301724
Eliminates some more cases where some subset of the addressing
computation remains flat. Some cases with addrspacecasts
in nested constant expressions are still left behind however.
llvm-svn: 301704