As discussed in D13348 - the INSERTQI range combining code is wrong in that it confuses the insertion bit index with an extraction bit index.
The remaining legal combines are very unlikely (especially once we've converted to shuffles in D13348) so I'm removing the optimization.
llvm-svn: 250160
In JumpThreading pass, the branch weight metadata is not updated after CFG modification. Consider the jump threading on PredBB, BB, and SuccBB. After jump threading, the weight on BB->SuccBB should be adjusted as some of it is contributed by the edge PredBB->BB, which doesn't exist anymore. This patch tries to update the edge weight in metadata on BB->SuccBB by scaling it by 1 - Freq(PredBB->BB) / Freq(BB->SuccBB).
Differential revision: http://reviews.llvm.org/D10979
llvm-svn: 250089
GlobalOpt currently merges stores into the initialisers of internal,
externally_initialized globals, but should not do so as the value of the global
may change between the initialiser and any code in the module being run.
llvm-svn: 250035
C semantics force sub-int-sized values (e.g. i8, i16) to be promoted to int
type (e.g. i32) whenever arithmetic is performed on them.
For targets with native i8 or i16 operations, usually InstCombine can shrink
the arithmetic type down again. However InstCombine refuses to create illegal
types, so for targets without i8 or i16 registers, the lengthening and
shrinking remains.
Most SIMD ISAs (e.g. NEON) however support vectors of i8 or i16 even when
their scalar equivalents do not, so during vectorization it is important to
remove these lengthens and truncates when deciding the profitability of
vectorization.
The algorithm this uses starts at truncs and icmps, trawling their use-def
chains until they terminate or instructions outside the loop are found (or
unsafe instructions like inttoptr casts are found). If the use-def chains
starting from different root instructions (truncs/icmps) meet, they are
unioned. The demanded bits of each node in the graph are ORed together to form
an overall mask of the demanded bits in the entire graph. The minimum bitwidth
that graph can be truncated to is the bitwidth minus the number of leading
zeroes in the overall mask.
The intention is that this algorithm should "first do no harm", so it will
never insert extra cast instructions. This is why the use-def graphs are
unioned, so that subgraphs with different minimum bitwidths do not need casts
inserted between them.
This algorithm works hard to reduce compile time impact. DemandedBits are only
queried if there are extends of illegal types and if a truncate to an illegal
type is seen. In the general case, this results in a simple linear scan of the
instructions in the loop.
No non-noise compile time impact was seen on a clang bootstrap build.
llvm-svn: 250032
Doing so could cause the post-unswitching convergent ops to be
control-dependent on the unswitch condition where they were not before.
This check could be refined to allow unswitching where the convergent
operation was already control-dependent on the unswitch condition.
llvm-svn: 249874
With this patch we can now read and write inline stacks in sample
profiles to the binary encoded profiles.
In a subsequent patch, I will add a string table to the binary encoding.
Right now function names are emitted as strings every time we find them.
This is too bloated and will produce large files in applications with
lots of inlining.
llvm-svn: 249861
Pass MemCpyOpt doesn't check if a store instruction is nontemporal.
As a consequence, adjacent nontemporal stores are always merged into a
memset call.
Example:
;;;
define void @foo(<4 x float>* nocapture %p) {
entry:
store <4 x float> zeroinitializer, <4 x float>* %p, align 16, !nontemporal !0
%p1 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 1
store <4 x float> zeroinitializer, <4 x float>* %p1, align 16, !nontemporal !0
ret void
}
!0 = !{i32 1}
;;;
In this example, the two nontemporal stores are combined to a memset of zero
which does not preserve the nontemporal hint. Later on the backend (tested on a
x86-64 corei7) expands that memset call into a sequence of two normal 16-byte
aligned vector stores.
opt -memcpyopt example.ll -S -o - | llc -mcpu=corei7 -o -
Before:
xorps %xmm0, %xmm0
movaps %xmm0, 16(%rdi)
movaps %xmm0, (%rdi)
With this patch, we no longer merge nontemporal stores into calls to memset.
In this example, llc correctly expands the two stores into two movntps:
xorps %xmm0, %xmm0
movntps %xmm0, 16(%rdi)
movntps %xmm0, (%rdi)
In theory, we could extend the usage of !nontemporal metadata to memcpy/memset
calls. However a change like that would only have the effect of forcing the
backend to expand !nontemporal memsets back to sequences of store instructions.
A memset library call would not have exactly the same semantic of a builtin
!nontemporal memset call. So, SelectionDAG will have to conservatively expand
it back to a sequence of !nontemporal stores (effectively undoing the merging).
Differential Revision: http://reviews.llvm.org/D13519
llvm-svn: 249820
Summary:
These non-semantic changes will help make a later change adding
support for deopt operand bundles more streamlined.
Reviewers: reames, swaroop.sridhar
Subscribers: sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D13491
llvm-svn: 249779
Summary:
`getConstantEvolutionLoopExitValue` and `ComputeExitCountExhaustively`
assumed all phi nodes in the loop header have the same order of incoming
values. This is not correct, and this commit changes
`getConstantEvolutionLoopExitValue` and `ComputeExitCountExhaustively`
to lookup the backedge value of a phi node using the loop's latch block.
Unfortunately, there is still some code duplication
`getConstantEvolutionLoopExitValue` and `ComputeExitCountExhaustively`.
At some point in the future we should extract out a helper class /
method that can evolve constant evolution phi nodes across iterations.
Fixes 25060. Thanks to Mattias Eriksson for the spot-on analysis!
Depends on D13457.
Reviewers: atrick, hfinkel
Subscribers: materi, llvm-commits
Differential Revision: http://reviews.llvm.org/D13458
llvm-svn: 249712
This is a partial fix for PR24886:
https://llvm.org/bugs/show_bug.cgi?id=24886
Without this IR transform, the backend (x86 at least) was producing inefficient code.
This patch is making 2 assumptions:
1. The canonical form of a fabs() operation is, in fact, the LLVM fabs() intrinsic.
2. The high bit of an FP value is always the sign bit; as noted in the bug report, this isn't specified by the LangRef.
Differential Revision: http://reviews.llvm.org/D13076
llvm-svn: 249702
This was requested in D13076: if we're going to canonicalize to fabs(), ValueTracking
should know that fabs() clears sign bits.
In this patch (as in D13076), we're not handling vectors yet even though computeKnownBits'
fabs() case itself should be vector-ready via the splat in this patch.
Fixing this will require follow-on patches to correct other logic that uses 'getScalarType'.
Differential Revision: http://reviews.llvm.org/D13222
llvm-svn: 249701
Summary:
After r249211, SCEV can see through some LCSSA phis. Add a
`replacementPreservesLCSSAForm` check before replacing uses of these phi
nodes with a simplified use of the induction variable to avoid breaking
LCSSA.
Fixes 25047.
Depends on D13460.
Reviewers: atrick, hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13461
llvm-svn: 249575
Summary:
Some target intrinsics can access multiple elements, using the pointer as a
base address (e.g. AArch64 ld4). When trying to CSE such instructions,
it must be checked the available value comes from a compatible instruction
because the pointer is not enough to discriminate whether the value is
correct.
Reviewers: ssijaric
Subscribers: mcrosier, llvm-commits, aemerson
Differential Revision: http://reviews.llvm.org/D13475
llvm-svn: 249523
This will allow us to optimize code such as:
int f(int *p) {
int x;
return p == &x;
}
as well as:
int *allocate(void);
int f() {
int x;
int *p = allocate();
return p == &x;
}
The folding can only be done under certain circumstances. Even though p and &x
cannot alias, the comparison must still return true if the pointer
representations are equal. If a user successfully generates a p that's a
correct guess for &x, comparison should return true even though p is an invalid
pointer.
This patch argues that if the address of the alloca isn't observable outside the
function, the function can act as-if the address is impossible to guess from the
outside. The tricky part is keeping the act consistent: if we fold p == &x to
false in one place, we must make sure to fold any other comparisons based on
those pointers similarly. To ensure that, we only fold when &x is involved
exactly once in comparison instructions.
Differential Revision: http://reviews.llvm.org/D13358
llvm-svn: 249490
Summary:
After r249211, `getSCEV(X) == getSCEV(Y)` does not guarantee that X and
Y are related in the dominator tree, even if X is an operand to Y (I've
included a toy example in comments, and a real example as a test case).
This commit changes `SimplifyIndVar` to require a `DominatorTree`. I
don't think this is a problem because `ScalarEvolution` requires it
anyway.
Fixes PR25051.
Depends on D13459.
Reviewers: atrick, hfinkel
Subscribers: joker.eph, llvm-commits, sanjoy
Differential Revision: http://reviews.llvm.org/D13460
llvm-svn: 249471
This is a cleaned up patch from the one written by John Regehr based on the findings of the Souper superoptimizer.
When writing tests, I was surprised to find that instsimplify apparently doesn't know how to collapse bit test sequences based purely on known bits. This required me to split my tests across both instsimplify and instcombine.
Differential Revision: http://reviews.llvm.org/D13250
llvm-svn: 249453
As mentioned in the bug, I'd missed the presence of a getScalarType in the caller of the new implies method. As a result, when we ended up with a implication over two vectors, we'd trip an assert and crash.
Differential Revision: http://reviews.llvm.org/D13441
llvm-svn: 249442
If the mask of a select instruction is a ConstantVector, method
SimplifyDemandedVectorElts iterates over the mask elements to identify which
values are selected from the select inputs.
Before this patch, method SimplifyDemandedVectorElts always used method
Constant::isNullValue() to check if a value in the mask was zero. Unfortunately
that method always returns false when called on a ConstantExpr.
This patch fixes the problem in SimplifyDemandedVectorElts by adding an explicit
check for ConstantExpr values. Now, if a value in the mask is a ConstantExpr, we
avoid calling isNullValue() on it.
Fixes PR24922.
Differential Revision: http://reviews.llvm.org/D13219
llvm-svn: 249390
Otherwise, the map will observe changes as long as MergeFunctions is alive. This
is bad because follow-up passes could replace-all-uses-with on the key of an
entry in the map. The value handle callback of ValueMap however asserts that the
key type matches.
rdar://22971893
llvm-svn: 249327
The most important part required to make clang
devirtualization works ( ͡°͜ʖ ͡°).
The code is able to find non local dependencies, but unfortunatelly
because the caller can only handle local dependencies, I had to add
some restrictions to look for dependencies only in the same BB.
http://reviews.llvm.org/D12992
llvm-svn: 249196
Summary:
This change teaches SCEV that to prove `A u< B` it is sufficient to
prove each of these facts individually:
- B >= 0
- A s< B
- A >= 0
In practice, SCEV sometimes finds it easier to prove these facts
individually than to prove `A u< B` as one atomic step.
Reviewers: reames, atrick, nlewycky, hfinkel
Subscribers: sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D13042
llvm-svn: 249168
When trying to optimize fortified library functions use the right
location to insert new instructions in order to preserve correct
def-use order.
This fixes an issue where a misplaced instruction definition would
happen to be *after* one of its use after a RAUW, forming invalid IR.
This behavior was introduced by r227250.
Differential Revision: http://reviews.llvm.org/D13301
rdar://problem/22802369
llvm-svn: 249092
Summary:
Some passes may open up opportunities for optimizations, leaving empty
lifetime start/end ranges. For example, with the following code:
void foo(char *, char *);
void bar(int Size, bool flag) {
for (int i = 0; i < Size; ++i) {
char text[1];
char buff[1];
if (flag)
foo(text, buff); // BBFoo
}
}
the loop unswitch pass will create 2 versions of the loop, one with
flag==true, and the other one with flag==false, but always leaving
the BBFoo basic block, with lifetime ranges covering the scope of the for
loop. Simplify CFG will then remove BBFoo in the case where flag==false,
but will leave the lifetime markers.
This patch teaches InstCombine to remove trivially empty lifetime marker
ranges, that is ranges ending right after they were started (ignoring
debug info or other lifetime markers in the range).
This fixes PR24598: excessive compile time after r234581.
Reviewers: reames, chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13305
llvm-svn: 249018
Summary:
The instructions SeenExprs records may be deleted during rewriting.
FindClosestMatchingDominator should ignore these deleted instructions.
Fixes PR24301.
Reviewers: grosser
Subscribers: grosser, llvm-commits
Differential Revision: http://reviews.llvm.org/D13315
llvm-svn: 248983
Summary:
Given an array of i2 elements, 4 consecutive scalar loads will be lowered to
i8-sized loads and thus will access 4 consecutive bytes in memory. If we
vectorize these loads into a single <4 x i2> load, it'll access only 1 byte in
memory. Hence, we should prohibit vectorization in such cases.
PS: Initial patch was proposed by Arnold.
Reviewers: aschwaighofer, nadav, hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13277
llvm-svn: 248943
Usually large blocks are not a problem. But if a large block (> 10k instructions)
contains many (potential) chains of vector instructions, and those chains are
spread over a wide range of instructions, then scheduling becomes a compile time problem.
This change introduces a limit for the accumulate scheduling region size of a block.
For real-world functions this limit will never be exceeded (it's about 10x larger than
the maximum value seen in the test-suite and external test suite).
llvm-svn: 248917
This patch teaches InstCombiner how to convert a SSSE3/AVX2 byte shuffle to a
builtin shuffle if the mask is constant.
Converting byte shuffle intrinsic calls to builtin shuffles can help finding
more opportunities for combining shuffles later on in selection dag.
We may end up with byte shuffles with constant masks as the result of inlining.
Differential Revision: http://reviews.llvm.org/D13252
llvm-svn: 248913
This commit changes the interface of the vld[1234], vld[234]lane, and vst[1234],
vst[234]lane ARM neon intrinsics and associates an address space with the
pointer that these intrinsics take. This changes, e.g.,
<2 x i32> @llvm.arm.neon.vld1.v2i32(i8*, i32)
to
<2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8*, i32)
This change ensures that address spaces are fully taken into account in the ARM
target during lowering of interleaved loads and stores.
Differential Revision: http://reviews.llvm.org/D12985
llvm-svn: 248887
Currently SimplifyDemandedVectorElts can only peek through bitcasts if the vectors have the same number of elements.
This patch fixes and enables some existing (disabled) code to support bitcasting to vectors with more/fewer elements. It currently only accepts cases when vectors alias cleanly (i.e. number of elements are an exact multiple of the other vector).
This was added to improve the demanded vector elements support for SSE vector shifts which require the __m128i (<2 x i64>) argument type to be bitcast to the vector type for the builtin shift. I've added extra tests for various additional bitcasts.
Differential Revision: http://reviews.llvm.org/D12935
llvm-svn: 248784
Summary: This patch adds block frequency analysis to LoopUnswitch pass to recognize hot/cold regions. For cold regions the pass only performs trivial unswitches since they do not increase code size, and for hot regions everything works as before. This helps to minimize code growth in cold regions and be more aggressive in hot regions. Currently the default cold regions are blocks with frequencies below 20% of function entry frequency, and it can be adjusted via -loop-unswitch-cold-block-frequency flag. The entire feature is controlled via -loop-unswitch-with-block-frequency flag and it is off by default.
Reviewers: broune, silvas, dnovillo, reames
Subscribers: davidxl, llvm-commits
Differential Revision: http://reviews.llvm.org/D11605
llvm-svn: 248777
Place new and update dbg.declare calls immediately after the
corresponding alloca.
Current code in replaceDbgDeclareForAlloca puts the new dbg.declare
at the end of the basic block. LLVM codegen has problems emitting
debug info in a situation when dbg.declare appears after all uses of
the variable. This usually kinda works for inlining and ASan (two
users of this function) but not for SafeStack (see the pending change
in http://reviews.llvm.org/D13178).
llvm-svn: 248769
`ScalarEvolution::isImpliedCondOperandsViaNoOverflow` tries to cast the
operand type of the comparison it is given to an `IntegerType`. This is
incorrect because it could actually be simplifying a comparison between
two pointers. Switch it to using `getTypeSizeInBits` instead, which
does the right thing for both pointers and integers.
Fixed PR24956.
llvm-svn: 248743
Patch by Jake VanAdrighem!
Summary:
Fix the way we sort the llvm.used and llvm.compiler.used members.
This bug seems to have been introduced in rL183756 through a set of improper casts to GlobalValue*. In subsequent patches this problem was missed and transformed into a getName call on a ConstantExpr.
Reviewers: silvas
Subscribers: silvas, llvm-commits
Differential Revision: http://reviews.llvm.org/D12851
llvm-svn: 248728
This was split off of http://reviews.llvm.org/D13040 to make it easier to test the correctness of the implication logic. For the moment, this only handles a single easy case which shows up when eliminating and combining range checks. In the (near) future, I plan to extend this for other cases which show up in range checks, but I wanted to make those changes incrementally once the framework was in place.
At the moment, the implication logic will be used by three places. One in InstSimplify (this review) and two in SimplifyCFG (http://reviews.llvm.org/D13040 & http://reviews.llvm.org/D13070). Can anyone think of other locations this style of reasoning would make sense?
Differential Revision: http://reviews.llvm.org/D13074
llvm-svn: 248719
Originally, debug intrinsics and annotation intrinsics may prevent
the loop to be rerolled, now they are ignored.
Differential Revision: http://reviews.llvm.org/D13150
llvm-svn: 248718
Before this change `HasSameValue` would return true for distinct
`alloca` instructions if they happened to be allocating the same
type (`alloca` instructions are not specified as reading memory). This
change adds an explicit whitelist of instruction types for which
"identical" instructions compute the same value.
Fixes PR24952.
llvm-svn: 248690
This is one step towards solving PR24766:
https://llvm.org/bugs/show_bug.cgi?id=24766
We were not producing the same IR for these two C functions because the store
to the temp bool causes extra zexts:
#include <stdbool.h>
bool switchy(char x1, char x2, char condition) {
bool conditionMet = false;
switch (condition) {
case 0: conditionMet = (x1 == x2); break;
case 1: conditionMet = (x1 <= x2); break;
}
return conditionMet;
}
bool switchy2(char x1, char x2, char condition) {
switch (condition) {
case 0: return (x1 == x2);
case 1: return (x1 <= x2);
}
return false;
}
As noted in the code comments, this test case manages to avoid the more general existing
phi optimizations where there are only 2 phi inputs or where there are no constant phi
args mixed in with the casts ops. It seems like a corner case, but if we don't catch it,
then I don't think we can get SimplifyCFG to further optimize towards the canonical form
for this function shown in the bug report.
Differential Revision: http://reviews.llvm.org/D12866
llvm-svn: 248689
Summary:
Factor the code that rewrites invokes to calls and rewrites WinEH
terminators to their "unwind to caller" equivalents into a helper in
Utils/Local, and use it in the three places I'm aware of that need to do
this.
Reviewers: andrew.w.kaylor, majnemer, rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13152
llvm-svn: 248677
Summary:
This is the second part of fixing bug 24848 https://llvm.org/bugs/show_bug.cgi?id=24848.
If both operands of a comparison have range metadata, they should be used to constant fold the comparison.
Reviewers: sanjoy, hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13177
llvm-svn: 248650
Summary:
If the trip count of a specific backedge is `N`, then we know that
backedge is effectively guarded by the condition `{0,+,1} u< N`. This
change teaches SCEV to use this condition to prove things in
`isLoopBackedgeGuardedByCond`.
Depends on D12948
Depends on D12949
The original checkin, r248608 had to be backed out due to an issue with
a ObjCXX unit test. That issue is now fixed, so re-landing.
Reviewers: atrick, reames, majnemer, hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12950
llvm-svn: 248638
Summary:
This change teaches SCEV's `isImpliedCond` two new identities:
A u< B u< -C => (A + C) u< (B + C)
A s< B s< INT_MIN - C => (A + C) s< (B + C)
While these are useful on their own, they're really intended to support
D12950.
The original checkin, r248606 had to be backed out due to an issue with
a ObjCXX unit test. That issue is now fixed, so re-landing.
Reviewers: atrick, reames, majnemer, nlewycky, hfinkel
Subscribers: aadg, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D12948
llvm-svn: 248637
This is a fix for PR22723:
https://llvm.org/bugs/show_bug.cgi?id=22723
My first attempt at this was to change what I thought was the root problem:
xor (zext i1 X to i32), 1 --> zext (xor i1 X, true) to i32
...but we create the opposite pattern in InstCombiner::visitZExt(), so infinite loop!
My next idea was to fix the matchIfNot() implementation in PatternMatch, but that would
mean potentially returning a different size for the match than what was input. I think
this would require all users of m_Not to check the size of the returned match, so I
abandoned that idea.
I settled on just fixing the exact case presented in the PR. This patch does allow the
2 functions in PR22723 to compile identically (x86):
bool test(bool x, bool y) { return !x | !y; }
bool test(bool x, bool y) { return !x || !y; }
...
andb %sil, %dil
xorb $1, %dil
movb %dil, %al
retq
Differential Revision: http://reviews.llvm.org/D12705
llvm-svn: 248634
BranchProbability now is represented by its numerator and denominator in uint32_t type. This patch changes this representation into a fixed point that is represented by the numerator in uint32_t type and a constant denominator 1<<31. This is quite similar to the representation of BlockMass in BlockFrequencyInfoImpl.h. There are several pros and cons of this change:
Pros:
1. It uses only a half space of the current one.
2. Some operations are much faster like plus, subtraction, comparison, and scaling by an integer.
Cons:
1. Constructing a probability using arbitrary numerator and denominator needs additional calculations.
2. It is a little less precise than before as we use a fixed denominator. For example, 1 - 1/3 may not be exactly identical to 1 / 3 (this will lead to many BranchProbability unit test failures). This should not matter when we only use it for branch probability. If we use it like a rational value for some precise calculations we may need another construct like ValueRatio.
One important reason for this change is that we propose to store branch probabilities instead of edge weights in MachineBasicBlock. We also want clients to use probability instead of weight when adding successors to a MBB. The current BranchProbability has more space which may be a concern.
Differential revision: http://reviews.llvm.org/D12603
llvm-svn: 248633
Summary:
If the trip count of a specific backedge is `N`, then we know that
backedge is effectively guarded by the condition `{0,+,1} u< N`. This
change teaches SCEV to use this condition to prove things in
`isLoopBackedgeGuardedByCond`.
Depends on D12948
Depends on D12949
Reviewers: atrick, reames, majnemer, hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12950
llvm-svn: 248608
Summary:
This change teaches SCEV's `isImpliedCond` two new identities:
A u< B u< -C => (A + C) u< (B + C)
A s< B s< INT_MIN - C => (A + C) s< (B + C)
While these are useful on their own, they're really intended to support
D12950.
Reviewers: atrick, reames, majnemer, nlewycky, hfinkel
Subscribers: aadg, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D12948
llvm-svn: 248606
...because that's what the cost model was intended to do.
As discussed in D12882, this fix has a temporary unintended consequence for
SimplifyCFG: it causes us to not speculate an fdiv. However, two wrongs make
PR24818 right, and two wrongs make PR24343 act right even though it's really
still wrong.
I intend to correct SimplifyCFG and add to CodeGenPrepare to account for this
cost model change and preserve the righteousness for the bug report cases.
https://llvm.org/bugs/show_bug.cgi?id=24818https://llvm.org/bugs/show_bug.cgi?id=24343
Differential Revision: http://reviews.llvm.org/D12882
llvm-svn: 248439
Patch by: simoncook
Unlike BitCasts, AddrSpaceCasts do not always produce an output the same size as its input, which was previously assumed. This fixes cases where two address spaces do not have the same size pointer, as an assertion failure would occur when trying to prove deferenceability. LoopUnswitch is used in the particular test, but LICM also exhibits the same problem.
Differential Revision: http://reviews.llvm.org/D13008
llvm-svn: 248422
Add two new ways of accessing the unsafe stack pointer:
* At a fixed offset from the thread TLS base. This is very similar to
StackProtector cookies, but we plan to extend it to other backends
(ARM in particular) soon. Bionic-side implementation here:
https://android-review.googlesource.com/170988.
* Via a function call, as a fallback for platforms that provide
neither a fixed TLS slot, nor a reasonable TLS implementation (i.e.
not emutls).
This is a re-commit of a change in r248357 that was reverted in
r248358.
llvm-svn: 248405
Summary:
This is the first part of fixing bug 24848 https://llvm.org/bugs/show_bug.cgi?id=24848.
When range metadata is provided, it should be used to constant fold comparisons with constant values.
Reviewers: sanjoy, hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12988
llvm-svn: 248402
This changes the behavior of AddAligntmentAssumptions to match its
comment. I.e, prove the asserted alignment in the context of the caller,
not the callee.
Thanks to Mehdi Amini for seeing the issue here! Also to Artur Pilipenko
who also saw a fix for the issue.
rdar://22521387
Differential Revision: http://reviews.llvm.org/D12997
llvm-svn: 248390
Invoking a function which returns an aggregate can sometimes be
transformed to return a scalar value. However, this means that we need
to create an insertvalue instruction(s) to recreate the correct
aggregate type. We achieved this by inserting an insertvalue
instruction at the invoke's normal successor. However, this is not
feasible if the normal successor uses the invoke's return value inside a
PHI node.
Instead, split the edge between the invoke and the unwind successor and
create the insertvalue instruction in the new basic block. The new
basic block's successor will be the old invoke successor which leaves
us with IR which is well behaved.
This fixes PR24906.
llvm-svn: 248387
This change allows dead store elimination to remove zero and null stores into memory freshly allocated with calloc-like function.
Differential Revision: http://reviews.llvm.org/D13021
llvm-svn: 248374
Add two new ways of accessing the unsafe stack pointer:
* At a fixed offset from the thread TLS base. This is very similar to
StackProtector cookies, but we plan to extend it to other backends
(ARM in particular) soon. Bionic-side implementation here:
https://android-review.googlesource.com/170988.
* Via a function call, as a fallback for platforms that provide
neither a fixed TLS slot, nor a reasonable TLS implementation (i.e.
not emutls).
llvm-svn: 248357
Apart from checking that GlobalVariable is a constant, we should check
that it's not a weak constant, in which case we can't propagate its
value.
llvm-svn: 248327
ARM counterpart to r248291:
In the comparison failure block of a cmpxchg expansion, the initial
ldrex/ldxr will not be followed by a matching strex/stxr.
On ARM/AArch64, this unnecessarily ties up the execution monitor,
which might have a negative performance impact on some uarchs.
Instead, release the monitor in the failure block.
The clrex instruction was designed for this: use it.
Also see ARMARM v8-A B2.10.2:
"Exclusive access instructions and Shareable memory locations".
Differential Revision: http://reviews.llvm.org/D13033
llvm-svn: 248294
We know that an argmemonly function can only access memory pointed to by it's pointer arguments. Rather than needing to consider all possible stores as aliasing (as we do for a readonly function), we can only consider the aliasing of the pointer arguments.
Note that this change only addresses hoisting. I'm thinking about how to address speculation safety as well, but that will be a different change.
FYI, argmemonly disallows accessing memory through non-pointer typed arguments.
Differential Revision: http://reviews.llvm.org/D12771
llvm-svn: 248220
We're currently losing any fast-math flags when synthesizing fcmps for
min/max reductions. In LV, make sure we copy over the scalar inst's
flags. In LoopUtils, we know we only ever match patterns with
hasUnsafeAlgebra, so apply that to any synthesized ops.
llvm-svn: 248201
Because -indvars widens induction variables through arithmetic,
`NeverNegative` cannot be a property of the `WidenIV` (a `WidenIV`
manages information for all transitive uses of an IV being widened,
including uses of `-1 * IV`). Instead it must live on `NarrowIVDefUse`
which manages information for a specific def-use edge in the transitive
use list of an induction variable.
This change also adds a test case that demonstrates the problem with
r248045.
llvm-svn: 248107
Summary:
If an induction variable is provably non-negative, its sign extension is
equal to its zero extension. This means narrow uses like
icmp slt iNarrow %indvar, %rhs
can be widened into
icmp slt iWide zext(%indvar), sext(%rhs)
Reviewers: atrick, mcrosier, hfinkel
Subscribers: hfinkel, reames, llvm-commits
Differential Revision: http://reviews.llvm.org/D12745
llvm-svn: 248045
Currently LazyValueInfo will report only alloca's as having nonnull range.
For loads with !nonnull metadata it will bailout with no additional information.
Same is true for calls returning nonnull pointers.
This change extends LazyValueInfo to handle additional nonnull instructions.
Differential Revision: http://reviews.llvm.org/D12932
llvm-svn: 247985
The SSE4A instructions EXTRQ/INSERTQ only use the lower 64-bits (or less) for many of their input vector operands and all of them have undefined upper 64-bits results.
Differential Revision: http://reviews.llvm.org/D12680
llvm-svn: 247934
This test uses a gcov file generated in a little-endian host. The gcov
reader does not allow different endianness, so the test fails on big
endian hosts.
XFAILing for now.
llvm-svn: 247920
This adds enough machinery to support reading simple GCC AutoFDO
profiles. It now supports reading flat profiles (no function calls).
Subsequent patches will add support for:
- Inlined calls (in particular, the inline call stack is not traversed
to accumulate samples).
- Working sets and modules. These are used mostly for GCC's LIPO
optimizations, so they're not needed in LLVM atm. I'm not sure that
we will ever need them. For now, I've if0'd around the calls.
The patch also adds support in GCOV.h for gcov version V704 (generated
by GCC's profile conversion tool).
llvm-svn: 247874
Summary:
`signum(x)` is sometimes implemented as `(x >> 63) | (-x >>> 63)` (for
an `i64` `x`). This change adds a matcher for that pattern, and an
instcombine rule to optimize `signum(x) s< 1`.
Later, we can also consider optimizing:
icmp slt signum(x), 0 --> icmp slt x, 0
icmp sle signum(x), 1 --> true
etc.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12703
llvm-svn: 247846
Clang now passes the adjectives as an argument to catchpad.
Getting the CatchObj working is simply a matter of threading another
static alloca through codegen, first as an alloca, then as a frame
index, and finally as a frame offset.
llvm-svn: 247844
When building LLVM as a (potentially dynamic) library that can be linked against
by multiple compilers, the default triple is not really meaningful.
We allow to explicitely set it to an empty string when configuring LLVM.
In this case, said "target independent" tests in the test suite that are using
the default triple are disabled by matching the newly available feature
"default_triple".
Reviewers: probinson, echristo
Differential Revision: http://reviews.llvm.org/D12660
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 247775
We only checked that a global is initialized with constants, which is
incorrect. We should be checking that GlobalVariable *is* a constant,
not just initialized with it.
llvm-svn: 247769
In `IndVarSimplify::ExpandSCEVIfNeeded`,
`SCEVExpander::findExistingExpansion` may return an `llvm::Value` that
differs in type from the SCEV it was asked to find an expansion for (but
computes the same value). In such cases, we fall back on
`expandCodeFor`; and rely on LLVM to CSE the two equivalent
expressions (different only by a no-op cast) into a single computation.
I tried a few other approaches to fixing PR24783, all of which turned
out to be more complex than this current version:
1. Move the `ExpandSCEVIfNeeded` logic into `expandCodeFor`. This got
problematic because currently we do not pass in the `Loop *` into
`expandCodeFor`. Changing the interface to do this is a more
invasive change, and really does not make much semantic sense unless
the SCEV being passed in is an add recurrence.
There is also the problem of `expandCodeFor` being used in places
other than `indvars` -- there may be performance / correctness
issues elsewhere if `expandCodeFor` is moved from always generating
IR from scratch to cache-like model.
2. Have `findExistingExpansion` only return expression with the correct
type. This would make `isHighCostExpansionHelper` and thus
`isHighCostExpansion` more conservative than necessary.
3. Insert casts on the value returned by `findExistingExpansion` if
needed using `InsertNoopCastOfTo`. This is complicated because
`InsertNoopCastOfTo` depends on internal state of its
`SCEVExpander` (specifically `Builder.GetInserPoint()`), and this
may not be set up when `ExpandSCEVIfNeeded` is called.
4. Manually insert casts on the value returned by
`findExistingExpansion` if needed using `InsertNoopCastOfTo` via
`CastInst::Create`. This is probably workable, but figuring out the
location where the cast instruction needs to be inserted has enough
edge cases (arguments, constants, invokes, LCSSA must be preserved)
makes me feel what I have right now is simplest solution.
llvm-svn: 247749
The patch extends the optimization to cases where the constant's
magnitude is so small or large that the rounding of the conversion
is irrelevant. The "so small" case includes negative zero.
Differential review: http://reviews.llvm.org/D11210
llvm-svn: 247708
LazuValueInfo can prove that value is nonnull based on the context information.
Make use of this ability to infer nonnull attributes for the call arguments.
Differential Revision: http://reviews.llvm.org/D12836
llvm-svn: 247707
Summary:
This change lets a `PlaceSafepoints` client change how wide the trip
count of a loop has to be for the loop to be considerd "counted", via
`CountedLoopTripWidth`. It also removes the boolean `SkipCounted` flag
and the `upperTripBound` constant -- we can get the old behavior of
`SkipCounted` == `false` by setting `CountedLoopTripWidth` to `13` (2 ^
13 == 8192).
Reviewers: reames
Subscribers: llvm-commits, sanjoy
Differential Revision: http://reviews.llvm.org/D12789
llvm-svn: 247656
Summary: This patch replaces isKnownNonNull() with isKnownNonNullAt() when checking nullness of passing arguments at callsite. In this way it can handle cases where the argument does not have nonnull attribute but has a dominating null check from the CFG. It also adds assertions in isKnownNonNull() and isKnownNonNullFromDominatingCondition() to make sure the value checked is pointer type (as defined in LLVM document). These assertions might trip failures in things which are not covered under llvm/test, but fixes should be pretty obvious.
Reviewers: reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12779
llvm-svn: 247587
GetElementPointers must have the first argument's type compared
for structural equivalence. Previously the code erroneously compared the
pointer's type, but this code was dead because all pointer types (of the
same address space) are the same. The pointee must be compared instead
(using the type stored in the GEP, not from the pointer type which will
be erased anyway).
Author: jrkoenig
Reviewers: dschuff, nlewycky, jfb
Subscribers: nlewycky, llvm-commits
Differential revision: http://reviews.llvm.org/D12820
llvm-svn: 247570