This introduces new instructions neccessary to implement MSVC-compatible
exception handling support. Most of the middle-end and none of the
back-end haven't been audited or updated to take them into account.
Differential Revision: http://reviews.llvm.org/D11097
llvm-svn: 243766
This patch is a follow up from r240560 and is a step further into
mitigating the compile time performance issues in CaptureTracker.
By providing the CaptureTracker with a "cached ordered basic block"
instead of computing it every time, MemDepAnalysis can use this cache
throughout its calls to AA->callCapturesBefore, avoiding to recompute it
for every scanned instruction. In the same testcase used in r240560,
compile time is reduced from 2min to 30s.
This also fixes PR22348.
rdar://problem/19230319
Differential Revision: http://reviews.llvm.org/D11364
llvm-svn: 243750
Summary:
Make Scalar Evolution able to propagate NSW and NUW flags from instructions to SCEVs in some cases. This is based on reasoning about when poison from instructions with these flags would trigger undefined behavior. This gives a 13% speed-up on some Eigen3-based Google-internal microbenchmarks for NVPTX.
There does not seem to be clear agreement about when poison should be considered to propagate through instructions. In this analysis, poison propagates only in cases where that should be uncontroversial.
This change makes LSR able to create induction variables for expressions like &ptr[i + offset] for loops like this:
for (int i = 0; i < limit; ++i) {
sum += ptr[i + offset];
}
Here ptr is a 64 bit pointer and offset is a 32 bit integer. For NVPTX, LSR currently creates an induction variable for i + offset instead, which is not as fast. Improving this situation is what brings the 13% speed-up on some Eigen3-based Google-internal microbenchmarks for NVPTX.
There are more details in this discussion on llvmdev.
June: http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-June/thread.html#87234
July: http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-July/thread.html#87392
Patch by Bjarke Roune
Reviewers: eliben, atrick, sanjoy
Subscribers: majnemer, hfinkel, jingyue, meheff, llvm-commits
Differential Revision: http://reviews.llvm.org/D11212
llvm-svn: 243460
no-alias with non-addr-taken globals: they cannot alias a captured
pointer.
If the non-global underlying object would have been a capture were it to
alias the global, we can firmly conclude no-alias. It isn't reasonable
for a transformation to introduce a capture in a way observable by an
alias analysis. Consider, even if it were to temporarily capture one
globals address into another global and then restore the other global
afterward, there would be no way for the load in the alias query to
observe that capture event correctly. If it observes it then the
temporary capturing would have changed the meaning of the program,
making it an invalid transformation. Even instrumentation passes or
a pass which is synthesizing stores to global variables to expose race
conditions in programs could not trigger this unless it queried the
alias analysis infrastructure mid-transform, in which case it seems
reasonable to return results from before the transform started.
See the comments in the change for a more detailed outlining of the
theory here.
This should address the primary performance regression found when the
non-conservatively-correct path of the alias query was disabled.
Differential Revision: http://reviews.llvm.org/D11410
llvm-svn: 243405
out the per-function modref data structures when functions were deleted
or when globals were deleted.
I don't actually know how the global deletion side of this bug hasn't
been hit before, but for the other it just-so-happens that functions
aren't likely to be deleted in the particular part of the LTO pipeline
where we currently enable GMR, so we got lucky.
With this patch, I can self-host with GMR enabled in the normal pass
pipeline!
I was a bit concerned about the compile-time impact of this chang, which
is part of what motivated my prior string of patches to make the
per-function datastructure very dense and fast to walk. With those
changes in place, I can't measure a significant compile time difference
(the difference is around 0.1% which is *way* below the noise) before
and after this patch when building a linked bitcode for all of Clang.
Differential Revision: http://reviews.llvm.org/D11453
llvm-svn: 243385
This is effectively an NFC but we can no longer print the index of the
pointer group so instead I print its address. This still lets us
cross-check the section that list the checks against the section that
list the groups (see how I modified the test).
E.g. before we printed this:
Run-time memory checks:
Check 0:
Comparing group 0:
%arrayidxC = getelementptr inbounds i16, i16* %c, i64 %store_ind
%arrayidxC1 = getelementptr inbounds i16, i16* %c, i64 %store_ind_inc
Against group 1:
%arrayidxA = getelementptr i16, i16* %a, i64 %ind
%arrayidxA1 = getelementptr i16, i16* %a, i64 %add
...
Grouped accesses:
Group 0:
(Low: %c High: (78 + %c))
Member: {%c,+,4}<%for.body>
Member: {(2 + %c),+,4}<%for.body>
Now we print this (changes are underlined):
Run-time memory checks:
Check 0:
Comparing group (0x7f9c6040c320):
~~~~~~~~~~~~~~
%arrayidxC1 = getelementptr inbounds i16, i16* %c, i64 %store_ind_inc
%arrayidxC = getelementptr inbounds i16, i16* %c, i64 %store_ind
Against group (0x7f9c6040c358):
~~~~~~~~~~~~~~
%arrayidxA1 = getelementptr i16, i16* %a, i64 %add
%arrayidxA = getelementptr i16, i16* %a, i64 %ind
...
Grouped accesses:
Group 0x7f9c6040c320:
~~~~~~~~~~~~~~
(Low: %c High: (78 + %c))
Member: {(2 + %c),+,4}<%for.body>
Member: {%c,+,4}<%for.body>
llvm-svn: 243354
Summary:
This function is not used in this change but will be used in a
subsequent change.
Reviewers: mcrosier, chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9180
llvm-svn: 243347
Summary:
Was D9784: "Remove loop variant range check when induction variable is
strictly increasing"
This change re-implements D9784 with the two differences:
1. It does not use SCEVExpander and does not generate new
instructions. Instead, it does a quick local search for existing
`llvm::Value`s that it needs when modifying the `icmp`
instruction.
2. It is more general -- it deals with both increasing and decreasing
induction variables.
I've added all of the tests included with D9784, and two more.
As an example on what this change does (copied from D9784):
Given C code:
```
for (int i = M; i < N; i++) // i is known not to overflow
if (i < 0) break;
a[i] = 0;
}
```
This transformation produces:
```
for (int i = M; i < N; i++)
if (M < 0) break;
a[i] = 0;
}
```
Which can be unswitched into:
```
if (!(M < 0))
for (int i = M; i < N; i++)
a[i] = 0;
}
```
I went back and forth on whether the top level logic should live in
`SimplifyIndvar::eliminateIVComparison` or be put into its own
routine. Right now I've put it under `eliminateIVComparison` because
even though the `icmp` is not *eliminated*, it no longer is an IV
comparison. I'm open to putting it in its own helper routine if you
think that is better.
Reviewers: reames, nicholas, atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11278
llvm-svn: 243331
The pointer size of the addrspacecasted pointer might not have matched,
so this would have hit an assert in accumulateConstantOffset.
I think this was here to allow constant folding of a load of an
addrspacecasted constant. Accumulating the offset through the
addrspacecast doesn't make much sense, so something else is necessary
to allow folding the load through this cast.
llvm-svn: 243300
r243250 appeared to break clang/test/Analysis/dead-store.c on one of the build
slaves, but I couldn't reproduce this failure locally. Probably a false
positive as I saw this test was broken by r243246 or r243247 too but passed
later without people fixing anything.
llvm-svn: 243253
Summary:
This patch updates TargetTransformInfoImplCRTPBase::getGEPCost to consider
addressing modes. It now returns TCC_Free when the GEP can be completely folded
to an addresing mode.
I started this patch as I refactored SLSR. Function isGEPFoldable looks common
and is indeed used by some WIP of mine. So I extracted that logic to getGEPCost.
Furthermore, I noticed getGEPCost wasn't directly tested anywhere. The best
testing bed seems CostModel, but its getInstructionCost method invokes
getAddressComputationCost for GEPs which provides very coarse estimation. So
this patch also makes getInstructionCost call the updated getGEPCost for GEPs.
This change inevitably breaks some tests because the cost model changes, but
nothing looks seriously wrong -- if we believe the new cost model is the right
way to go, these tests should be updated.
This patch is not perfect yet -- the comments in some tests need to be updated.
I want to know whether this is a right approach before fixing those details.
Reviewers: chandlerc, hfinkel
Subscribers: aschwaighofer, llvm-commits, aemerson
Differential Revision: http://reviews.llvm.org/D9819
llvm-svn: 243250
Summary:
The goal is to start moving us closer to the model where
RuntimePointerChecking will compute and store the checks. Then a client
can filter the check according to its requirements and then use the
filtered list of checks with addRuntimeCheck.
Before the patch, this is all done in addRuntimeCheck. So the patch
starts to split up addRuntimeCheck while providing the old API under
what's more or less a wrapper now.
The new underlying addRuntimeCheck takes a collection of checks now,
expands the code for the bounds then generates the code for the checks.
I am not completely happy with making expandBounds static because now it
needs so many explicit arguments but I don't want to make the type
PointerBounds part of LAI. This should get fixed when addRuntimeCheck
is moved to LoopVersioning where it really belongs, IMO.
Audited the assembly diff of the testsuite (including externals). There
is a tiny bit of assembly churn that is due to the different order the
code for the bounds is expanded now
(MultiSource/Benchmarks/Prolangs-C/bison/conflicts.s and with LoopDist
on 456.hmmer/fast_algorithms.s).
Reviewers: hfinkel
Subscribers: klimek, llvm-commits
Differential Revision: http://reviews.llvm.org/D11205
llvm-svn: 243239
more dense datastructure. We actually only have 3 bits of information
and an often-null pointer here. This fits very nicely into a
pointer-size value in the DenseMap from Function -> Info. Then we take
one more pointer hop to get to a secondary DenseMap from GlobalValue ->
ModRefInfo when we actually have precise info for particular globals.
This is more code than I would really like to do this packing, but it
ended up reasonably cleanly laid out. It should ensure we don't hit
scaling limitations with more widespread use of GMR.
llvm-svn: 242991
This takes the operation of merging a callee's information into the
current information and embeds it into the FunctionInfo type itself.
This is much cleaner as now we don't need to expose iteration of the
globals, etc.
Also, switched all the uses of a raw integer two maintain the mod/ref
info during the SCC walk into just directly manipulating it in the
FunctionInfo object.
llvm-svn: 242976
typed interface as a precursor to rewriting how it is stored.
This way we know that the access paths are controlled and it should be
easy to store these bits in a different way.
No functionality changed.
llvm-svn: 242974
preparation for de-coupling the AA implementations.
In order to do this, they had to become fake-scoped using the
traditional LLVM pattern of a leading initialism. These can't be actual
scoped enumerations because they're bitfields and thus inherently we use
them as integers.
I've also renamed the behavior enums that are specific to reasoning
about the mod/ref behavior of functions when called. This makes it more
clear that they have a very narrow domain of applicability.
I think there is a significantly cleaner API for all of this, but
I don't want to try to do really substantive changes for now, I just
want to refactor the things away from analysis groups so I'm preserving
the exact original design and just cleaning up the names, style, and
lifting out of the class.
Differential Revision: http://reviews.llvm.org/D10564
llvm-svn: 242963
This replaces the next-to-last std::map with a DenseMap. While DenseMap
doesn't yet make tons of sense (there are 32 bytes or so in the value
type), my next change will reduce the value type to a single pointer --
we only need a pointer and 3 bits, and that is exactly what we can have.
llvm-svn: 242956
The MSVC ABI requires that we generate an alias for the vtable which
means looking through a GlobalAlias which cannot be overridden improves
our ability to devirtualize.
Found while investigating PR20801.
Patch by Andrew Zhogin!
Differential Revision: http://reviews.llvm.org/D11306
llvm-svn: 242955
efficient, NFC.
Previously, we built up vectors of function pointers to track readers
and writers. The primary problem here is that we would add the same
function to this vector every time we found an instruction that reads or
writes to the pointer. This could be a *lot* of redudant function
pointers. Instead of doing that, we can use a SmallPtrSet.
This does more than just reduce the size of the list of readers or
writers. We walk the entire lists of each and do a map lookup for each
one. By having sets, we will only do one map lookup per reader or writer
function.
But only one user of the pointer analyzer actually needs this
information, so we can also skip accumulating it (and doing a lot of
heap allocations) for all the other pointer analysis. This is
particularly useful because there are very many more pointers in some of
the other cases.
llvm-svn: 242950
This almost certainly doesn't matter in some deep sense, but std::set is
essentially always going to be slower here. Now the alias query should
be essentially constant time instead of having to chase the set tree
each time.
llvm-svn: 242893
it wasn't one of the indirect globals (which clearly cannot be an
allocation function call). Also only do a single lookup into this map
instead of two. NFC.
llvm-svn: 242892
Since we have to iterate this map not that infrequently, we should use
a map that is efficient for iteration. It is also almost certainly much
faster for lookups as well. There is more to do in terms of reducing the
wasted overhead of GMR's runtime though. Not sure how much is worthwhile
though.
The loop improvements should hopefully address the code review that
Duncan gave when he saw this code as I moved it around.
llvm-svn: 242891
part of simplifying its interface and usage in preparation for porting
to work with the new pass manager.
Note that this will likely expose that we have dead arguments, members,
and maybe even pass requirements for AA. I'll be cleaning those up in
seperate patches. This just zaps the actual update API.
Differential Revision: http://reviews.llvm.org/D11325
llvm-svn: 242881
GlobalsModRef) with CallbackVHs that trigger the same behavior.
This is technically more expensive, but in benchmarking some LTO runs,
it seems unlikely to even be above the noise floor. The only way I was
able to measure the performance of GMR at all was to run nothing else
but this one analysis on a linked clang bitcode file. The call graph
analysis still took 5x more time than GMR, and this change at most made
GMR 2% slower (this is well within the noise, so its hard for me to be
sure that this is an actual change). However, in a real LTO run over the
same bitcode, the GMR run takes so little time that the pass timers
don't measure it.
With this, I can remove the last update API from the AliasAnalysis
interface, but I'll actually remove the interface hook point in
a follow-up commit.
Differential Revision: http://reviews.llvm.org/D11324
llvm-svn: 242878
Summary:
In the benchmark (https://github.com/vetter/shoc) we are researching,
the duplicated load is not eliminated because MemoryDependenceAnalysis
hit the BlockScanLimit. This patch change it into a command line option
instead of a hardcoded value.
Patched by Xuetian Weng.
Test Plan: test/Analysis/MemoryDependenceAnalysis/memdep-block-scan-limit.ll
Reviewers: jingyue, reames
Subscribers: reames, llvm-commits
Differential Revision: http://reviews.llvm.org/D11366
llvm-svn: 242842
A patch by Chakshu Grover!
This patch allows constfolding of trunc,rint,nearbyint,ceil and floor intrinsics using APFloat class.
Differential Revision: http://reviews.llvm.org/D11144
llvm-svn: 242763
directly model in the new PM.
This also was an incredibly brittle and expensive update API that was
never fully utilized by all the passes that claimed to preserve AA, nor
could it reasonably have been extended to all of them. Any number of
places add uses of values. If we ever wanted to reliably instrument
this, we would want a callback hook much like we have with ValueHandles,
but doing this for every use addition seems *extremely* expensive in
terms of compile time.
The only user of this update mechanism is GlobalsModRef. The idea of
using this to keep it up to date doesn't really work anyways as its
analysis requires a symmetric analysis of two different memory
locations. It would be very hard to make updates be sufficiently
rigorous to *guarantee* symmetric analysis in this way, and it pretty
certainly isn't true today.
However, folks have been using GMR with this update for a long time and
seem to not be hitting the issues. The reported issue that the update
hook fixes isn't even a problem any more as other changes to
GetUnderlyingObject worked around it, and that issue stemmed from *many*
years ago. As a consequence, a prior patch provided a flag to control
the unsafe behavior of GMR, and this patch removes the update mechanism
that has questionable compile-time tradeoffs and is causing problems
with moving to the new pass manager. Note the lack of test updates --
not one test in tree actually requires this update, even for a contrived
case.
All of this was extensively discussed on the dev list, this patch will
just enact what that discussion decides on. I'm sending it for review in
part to show what I'm planning, and in part to show the *amazing* amount
of work this avoids. Every call to the AA here is something like three
to six indirect function calls, which in the non-LTO pipeline never do
any work! =[
Differential Revision: http://reviews.llvm.org/D11214
llvm-svn: 242605
basic changes to the IR such as folding pointers through PHIs, Selects,
integer casts, store/load pairs, or outlining.
This leaves the feature available behind a flag. This flag's default
could be flipped if necessary, but the real-world performance impact of
this particular feature of GMR may not be sufficiently significant for
many folks to want to run the risk.
Currently, the risk here is somewhat mitigated by half-hearted attempts
to update GlobalsModRef when the rest of the optimizer changes
something. However, I am currently trying to remove that update
mechanism as it makes migrating the AA infrastructure to a form that can
be readily shared between new and old pass managers very challenging.
Without this update mechanism, it is possible that this still unlikely
failure mode will start to trip people, and so I wanted to try to
proactively avoid that.
There is a lengthy discussion on the mailing list about why the core
approach here is flawed, and likely would need to look totally different
to be both reasonably effective and resilient to basic IR changes
occuring. This patch is essentially the first of two which will enact
the result of that discussion. The next patch will remove the current
update mechanism.
Thanks to lots of folks that helped look at this from different angles.
Especial thanks to Michael Zolotukhin for doing some very prelimanary
benchmarking of LTO without GlobalsModRef to get a rough idea of the
impact we could be facing here. So far, it looks very small, but there
are some concerns lingering from other benchmarking. The default here
may get flipped if performance results end up pointing at this as a more
significant issue.
Also thanks to Pete and Gerolf for reviewing!
Differential Revision: http://reviews.llvm.org/D11213
llvm-svn: 242512
Those new constructors make it more natural to construct an object for a function. For example, previously to build a LoopInfo for a function, we need four statements:
DominatorTree DT;
LoopInfo LI;
DT.recalculate(F);
LI.analyze(DT);
Now we only need one statement:
LoopInfo LI(DominatorTree(F));
http://reviews.llvm.org/D11274
llvm-svn: 242486
The benefit of turning the parameter of LoopInfo::analyze() to const& is that it now can accept a rvalue.
http://reviews.llvm.org/D11250
llvm-svn: 242426
Summary:
The checking pointer grouping algorithm assumes that the
starts/ends of the pointers are well formed (start <= end).
The runtime memory checking algorithm also assumes this by doing:
start0 < end1 && start1 < end0
to detect conflicts. This check only works if start0 <= end0 and
start1 <= end1.
This change correctly orders the interval ends by either checking
the stride (if it is constant) or by using min/max SCEV expressions.
Reviewers: anemet, rengolin
Subscribers: rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D11149
llvm-svn: 242400
This is made a static public member function to allow the transition of
this logic from LAA to LoopDistribution. (Technically, it could be an
implementation-local static function but then it would not be accessible
from LoopDistribution.)
llvm-svn: 242376
This new wrapper pass is useful when we want to do branch probability analysis conditionally (e.g. only in PGO mode) but don't want to add one more pass dependence.
http://reviews.llvm.org/D11241
llvm-svn: 242349
Summary:
This patch allows phi nodes like
%x = phi [ %incptr, ... ] [ %var, ... ]
%incptr = getelementptr %x, 1
to be analyzed by BasicAliasAnalysis.
In aliasPHI, we can detect incoming values that are recursive GEPs with a
constant offset. Instead of trying to analyze a recursive GEP (and failing),
we now ignore it and instead set the size of the memory referenced by
the PHINode to UnknownSize. This represents all the possible memory
locations the pointer represented by the PHINode could be advanced to
by the GEP.
For now, this new behavior is turned off by default to allow debugging of
performance degradations seen with SPEC/x86 and Hexagon benchmarks.
The flag -basicaa-recphi turns it on.
Reviewers: hfinkel, sanjoy
Subscribers: tobiasvk_caf, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D10368
llvm-svn: 242320
inspection.
While we want to handle calls specially in this code because they should
have been modeled by the call graph analysis that precedes it, we should
*not* be re-implementing the predicates for whether an instruction reads
or writes memory. Those are well defined already. Notably, at least the
following issues seem to be clearly missed before:
- Ordered atomic loads can "write" to memory by causing writes from other
threads to become visible. Similarly for ordered atomic stores.
- AtomicRMW instructions quite obviously both read and write to memory.
- AtomicCmpXchg instructions also read and write to memory.
- Fences read and write to memory.
- Invokes of intrinsics or memory allocation functions.
I don't have any test cases, and I suspect this has never really come up
in the real world. But there is no reason why it wouldn't, and it makes
the code simpler to do this the right way.
While here, I've tried to make the loops significantly simpler as well
and added helpful comments as to what is going on.
llvm-svn: 242281
This is useful when we want to do block frequency analysis
conditionally (e.g. only in PGO mode) but don't want to add
one more pass dependence.
Patch by congh.
Approved by dexonsmith.
Differential Revision: http://reviews.llvm.org/D11196
llvm-svn: 242248
I am planning to add more nested classes inside RuntimePointerCheck so
all these triple-nesting would be hard to follow.
Also rename it to RuntimePointerChecking (i.e. append 'ing').
llvm-svn: 242218
Summary:
The iteration order within a member of DepCands is deterministic
and therefore we don't have to sort the accesses within a member.
We also don't have to copy the indices of the pointers into a
vector, since we can iterate over the members of the class.
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11145
llvm-svn: 242033
Summary:
This at least saves compile time. I also encountered a case where
ephemeral values affect whether other variables are promoted, causing
performance issues. It may be a bug in LSR, but I didn't manage to
reduce it yet. Anyhow, I believe it's in general not worth considering
ephemeral values in LSR.
Reviewers: atrick, hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11115
llvm-svn: 242011
r236894 caused PR23626 (Clang miscompiles webkit's base64 decoder), and was
reverted in r237984. This reapplies the patch with an additional test case for
PR23626 and the associated fix (both scales and offsets in the
BasicAliasAnalysis::constantOffsetHeuristic should initially be zero).
Patch by Nick White, thanks!
llvm-svn: 241981
The following functions are moved from the LoopVectorizer to VectorUtils:
- getGEPInductionOperand
- stripGetElementPtr
- getUniqueCastUse
- getStrideFromPointer
These used to be static functions in LoopVectorize, but will also be used by
the upcoming loop versioning LICM transformation.
Patch by Ashutosh Nema!
llvm-svn: 241980
This change adds new attribute called "argmemonly". Function marked with this attribute can only access memory through it's argument pointers. This attribute directly corresponds to the "OnlyAccessesArgumentPointees" ModRef behaviour in alias analysis.
Differential Revision: http://reviews.llvm.org/D10398
llvm-svn: 241979
No in-tree alias analysis used this facility, and it was not called in
any particularly rigorous way, so it seems unlikely to be correct.
Note that one of the only stateful AA implementations in-tree,
GlobalsModRef is completely broken currently (and any AA passes like it
are equally broken) because Module AA passes are not effectively
invalidated when a function pass that fails to update the AA stack runs.
Ultimately, it doesn't seem like we know how we want to build stateful
AA, and until then trying to support and maintain correctness for an
untested API is essentially impossible. To that end, I'm planning to rip
out all of the update API. It can return if and when we need it and know
how to build it on top of the new pass manager and as part of *tested*
stateful AA implementations in the tree.
Differential Revision: http://reviews.llvm.org/D10889
llvm-svn: 241975
Summary:
This introduces new instructions neccessary to implement MSVC-compatible
exception handling support. Most of the middle-end and none of the
back-end haven't been audited or updated to take them into account.
Reviewers: rnk, JosephTremoulet, reames, nlewycky, rjmccall
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11041
llvm-svn: 241888
Currently canCheckPtrAtRT returns two flags NeedRTCheck and CanDoRT.
NeedRTCheck says whether we need checks and CanDoRT whether we can
generate the checks. The idea is to encode three states with these:
Need/Can:
(1) false/dont-care: no checks are needed
(2) true/false: we need checks but can't generate them
(3) true/true: we need checks and we can generate them
This is pretty unnecessary since the caller (analyzeLoop) is only
interested in whether we can generate the checks if we actually need
them (i.e. 1 or 3).
So this change cleans up to return just that (CanDoRTIfNeeded) and pulls
all the underlying logic into canCheckPtrAtRT.
By doing all this, we simplify analyzeLoop which is the complex function
in LAA.
There is further room for improvement here by using RtCheck.Need
directly rather than a new local variable NeedRTCheck but that's for a
later patch.
llvm-svn: 241866
Summary:
The checking pointer group construction algorithm relied on the iteration on DepCands.
We would need the same leaders across runs and the same iteration order over the underlying std::set for determinism.
This changes the algorithm to process the pointers in the order in which they were added to the runtime check, which is deterministic.
We need to update the tests, since the order in which pointers appear has changed.
No new tests were added, since it is impossible to test for non-determinism.
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11064
llvm-svn: 241809
The original name was too close to NeedRTCheck which is what the actual
memcheck analysis returns. This flag, as the new name suggests, is only
used to whether to initiate that analysis.
Also a comment is added to answer one question I had about this code for
a long time. Namely, how does this flag differ from
isDependencyCheckNeeded since they are seemingly set at the same time.
llvm-svn: 241784
DataLayout is no longer optional. It was initialized with or without
a DataLayout, and the DataLayout when supplied could have been the
one from the TargetMachine.
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.
Reviewers: echristo
Subscribers: jholewinski, llvm-commits, rafael, yaron.keren
Differential Revision: http://reviews.llvm.org/D11021
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241774
This commit ([LAA] Fix estimation of number of memchecks) regressed the
logic a bit. We shouldn't quit the analysis if we encounter a pointer
without known bounds *unless* we actually need to emit a memcheck for
it.
The original code was using NumComparisons which is now computed
differently. Instead I compute NeedRTCheck from NumReadPtrChecks and
NumWritePtrChecks.
As side note, I find the separation of NeedRTCheck and CanDoRT
confusing, so I will try to merge them in a follow-up patch.
llvm-svn: 241756
r239285 ([LoopAccessAnalysis] Teach LAA to check the memory dependence
between strided accesses.) introduced a new case under
MemoryDepChecker::isDependent. We normally have debug output for each
case.
llvm-svn: 241707
Summary:
Often filter-like loops will do memory accesses that are
separated by constant offsets. In these cases it is
common that we will exceed the threshold for the
allowable number of checks.
However, it should be possible to merge such checks,
sice a check of any interval againt two other intervals separated
by a constant offset (a,b), (a+c, b+c) will be equivalent with
a check againt (a, b+c), as long as (a,b) and (a+c, b+c) overlap.
Assuming the loop will be executed for a sufficient number of
iterations, this will be true. If not true, checking against
(a, b+c) is still safe (although not equivalent).
As long as there are no dependencies between two accesses,
we can merge their checks into a single one. We use this
technique to construct groups of accesses, and then check
the intervals associated with the groups instead of
checking the accesses directly.
Reviewers: anemet
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10386
llvm-svn: 241673
Summary:
Initially, these intrinsics seemed like part of a family of "frame"
related intrinsics, but now I think that's more confusing than helpful.
Initially, the LangRef specified that this would create a new kind of
allocation that would be allocated at a fixed offset from the frame
pointer (EBP/RBP). We ended up dropping that design, and leaving the
stack frame layout alone.
These intrinsics are really about sharing local stack allocations, not
frame pointers. I intend to go further and add an `llvm.localaddress()`
intrinsic that returns whatever register (EBP, ESI, ESP, RBX) is being
used to address locals, which should not be confused with the frame
pointer.
Naming suggestions at this point are welcome, I'm happy to re-run sed.
Reviewers: majnemer, nicholas
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11011
llvm-svn: 241633
From the linker's perspective, an available_externally global is equivalent
to an external declaration (per isDeclarationForLinker()), so it is incorrect
to consider it to be a weak definition.
Also clean up some logic in the dead argument elimination pass and clarify
its comments to better explain how its behavior depends on linkage,
introduce GlobalValue::isStrongDefinitionForLinker() and start using
it throughout the optimizers and backend.
Differential Revision: http://reviews.llvm.org/D10941
llvm-svn: 241413
The expressions we delinearize do not necessarily have to have a SCEVAddRecExpr
at the outermost level. At this moment, the additional flexibility is not
exploited in LLVM itself, but in Polly we will soon soonish use this
functionality. For LLVM, this change should not affect existing functionality
(which is covered by test/Analysis/Delinearization/)
llvm-svn: 240952
If we have a caller that knows a particular argument can never be null, we can exploit this fact while simplifying values in the inline cost analysis. This has the effect of reducing the cost for inlining when a null check is present in the callee, but the value is known non null in the caller. In particular, any dependent control flow can be discounted from the cost estimate.
Note that we use the parameter attributes at the call site to memoize the analysis within the caller's code. The setting of this attribute is done in InstCombine, the inline cost analysis just consumes it. This is intentional and important because we want the inline cost analysis results to be easily cachable themselves. We're not currently doing so, but initial results on LTO indicate this will quickly become important.
Differential Revision: http://reviews.llvm.org/D9129
llvm-svn: 240828
Summary:
Scalar evolution does not propagate the non-wrapping flags to values
that are derived from a non-wrapping induction variable because
the non-wrapping property could be flow-sensitive.
This change is a first attempt to establish the non-wrapping property in
some simple cases. The main idea is to look through the operations
defining the pointer. As long as we arrive to a non-wrapping AddRec via
a small chain of non-wrapping instruction, the pointer should not wrap
either.
I believe that this essentially is what Andy described in
http://article.gmane.org/gmane.comp.compilers.llvm.cvs/220731 as the way
forward.
Reviewers: aschwaighofer, nadav, sanjoy, atrick
Reviewed By: atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10472
llvm-svn: 240798
Summary:
Because LSR happens at a late stage where mul of a power of 2 is
typically canonicalized to shl, this canonicalization emits code that
can be better CSE'ed.
Test Plan:
Transforms/LoopStrengthReduce/shl.ll shows how this change makes GVN more
powerful. Fixes some existing tests due to this change.
Reviewers: sanjoy, majnemer, atrick
Reviewed By: majnemer, atrick
Subscribers: majnemer, llvm-commits
Differential Revision: http://reviews.llvm.org/D10448
llvm-svn: 240573
CaptureTracking becomes very expensive in large basic blocks while
calling PointerMayBeCaptured. PointerMayBeCaptured scans the BB the
number of times equal to the number of uses of 'BeforeHere', which is
currently capped at 20 and bails out with Tracker->tooManyUses().
The bottleneck here is the number of calls to PointerMayBeCaptured * the
basic block scan. In a testcase with a 82k instruction BB,
PointerMayBeCaptured is called 130k times, leading to 'shouldExplore'
taking 527k runs, this currently takes ~12min.
To fix this we locally (within PointerMayBeCaptured) number the
instructions in the basic block using a DenseMap to cache instruction
positions/numbers. We build the cache incrementally every time we need
to scan an unexplored part of the BB, improving compile time to only
take ~2min.
This triggers in the flow: DeadStoreElimination -> MepDepAnalysis ->
CaptureTracking.
Side note: after multiple runs in the test-suite I've seen no
performance nor compile time regressions, but could note a couple of
compile time improvements:
Performance Improvements - Compile Time Delta Previous Current StdDev
SingleSource/Benchmarks/Misc-C++/bigfib -4.48% 0.8547 0.8164 0.0022
MultiSource/Benchmarks/TSVC/LoopRerolling-dbl/LoopRerolling-dbl -1.47% 1.3912 1.3707 0.0056
Differential Revision: http://reviews.llvm.org/D7010
llvm-svn: 240560
This will allow classes to implement the AA interface without deriving
from the class or referencing an internal enum of some other class as
their return types.
Also, to a pretty fundamental extent, concepts such as 'NoAlias',
'MayAlias', and 'MustAlias' are first class concepts in LLVM and we
aren't saving anything by scoping them heavily.
My mild preference would have been to use a scoped enum, but that
feature is essentially completely broken AFAICT. I'm extremely
disappointed. For example, we cannot through any reasonable[1] means
construct an enum class (or analog) which has scoped names but converts
to a boolean in order to test for the possibility of aliasing.
[1]: Richard Smith came up with a "solution", but it requires class
templates, and lots of boilerplate setting up the enumeration multiple
times. Something like Boost.PP could potentially bundle this up, but
even that would be quite painful and it doesn't seem realistically worth
it. The enum class solution would probably work without the need for
a bool conversion.
Differential Revision: http://reviews.llvm.org/D10495
llvm-svn: 240255
accurately describe what is being tracked.
While these two enums do track mod/ref information and aliasing
information, they don't represent the exact same things as either the
mod/ref enums or the alias result enum in AA. They're definitions are
dominated by the structure of their lattice and the bit's various
semantics. This patch just calls them what they are and tries to spell
out usefully distinct names for these things.
This will clear the path for using a raw unscoped enum to represent some
of these concepts across LLVM's analysis library.
No functionality changed here.
Differential Revision: http://reviews.llvm.org/D10494
llvm-svn: 240254
Summary:
Since FunctionMap has llvm::Function pointers as keys, the order in
which the traversal happens can differ from run to run, causing spurious
FileCheck failures. Have CallGraph::print sort the CallGraphNodes by
name before printing them.
Reviewers: bogner, chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10575
llvm-svn: 240191
The patch is generated using this command:
tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \
-checks=-*,llvm-namespace-comment -header-filter='llvm/.*|clang/.*' \
llvm/lib/
Thanks to Eugene Kosov for the original patch!
llvm-svn: 240137
Summary:
Currently intrinsics don't affect the creation of the call graph.
This is not accurate with respect to statepoint and patchpoint
intrinsics -- these do call (or invoke) LLVM level functions.
This change fixes this inconsistency by adding a call to the external
node for call sites that call these non-leaf intrinsics. This coupled
with the fact that these intrinsics also escape the function pointer
they call gives us a conservatively correct call graph.
Reviewers: reames, chandlerc, atrick, pgavlin
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10526
llvm-svn: 240039
The personality routine currently lives in the LandingPadInst.
This isn't desirable because:
- All LandingPadInsts in the same function must have the same
personality routine. This means that each LandingPadInst beyond the
first has an operand which produces no additional information.
- There is ongoing work to introduce EH IR constructs other than
LandingPadInst. Moving the personality routine off of any one
particular Instruction and onto the parent function seems a lot better
than have N different places a personality function can sneak onto an
exceptional function.
Differential Revision: http://reviews.llvm.org/D10429
llvm-svn: 239940
names for counts with the word 'Count' to make them less ambiguous.
This will be an actual error if we use unscoped enums for any of these,
and generally this seems much clearer to read.
Also, use clang-format to normalize the formatting of this code which
seems to have been needlessly odd.
No functionality changed here.
llvm-svn: 239887
This is now living in MemoryLocation, which is what it pertains to. It
is also an enum there rather than a static data member which is left
never defined.
llvm-svn: 239886
that it is its own entity in the form of MemoryLocation, and update all
the callers.
This is an entirely mechanical change. References to "Location" within
AA subclases become "MemoryLocation", and elsewhere
"AliasAnalysis::Location" becomes "MemoryLocation". Hope that helps
out-of-tree folks update.
llvm-svn: 239885
virtual interface on AliasAnalysis only deals with ModRef information.
This interface was both computing memory locations by using TLI and
other tricks to estimate the size of memory referenced by an operand,
and computing ModRef information through similar investigations. This
change narrows the scope of the virtual interface on AliasAnalysis
slightly.
Note that all of this code could live in BasicAA, and be done with
a single investigation of the argument, if it weren't for the fact that
the generic code in AliasAnalysis::getModRefBehavior for a callsite
calls into the virtual aspect of (now) getArgModRefInfo. But this
patch's arrangement seems a not terrible way to go for now.
The other interesting wrinkle is how we could reasonably extend LLVM
with support for custom memory location sizes and mod/ref behavior for
library routines. After discussions with Hal on the review, the
conclusion is that this would be best done by fleshing out the much
desired support for extensions to TLI, and support these types of
queries in that interface where we would likely be doing other library
API recognition and analysis.
Differential Revision: http://reviews.llvm.org/D10259
llvm-svn: 239884
Summary:
When propagating mass through irregular loops, the mass flowing through
each loop header may not be equal. This was causing wrong frequencies
to be computed for irregular loop headers.
Fixed by keeping track of masses flowing through each of the headers in
an irregular loop. To do this, we now keep track of per-header backedge
weights. After the loop mass is distributed through the loop, the
backedge weights are used to re-distribute the loop mass to the loop
headers.
Since each backedge will have a mass proportional to the different
branch weights, the loop headers will end up with a more approximate
weight distribution (as opposed to the current distribution that assumes
that every loop header is the same).
Reviewers: dexonsmith
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10348
llvm-svn: 239843
Any combination of +-inf/+-inf is NaN so it's already ignored with
nnan and we can skip checking for ninf. Also rephrase logic in comments
a bit.
llvm-svn: 239821
This change is hopefully NFC. The only tricky part is that I changed the context instruction being used to the branch rather than the comparison. I believe both to be correct, but the branch is strictly more powerful. With the moved code, using the branch instruction is required for the basic block comparison test to return the same result. The previous code was able to directly access both the branch and the comparison where the revised code is not.
Differential Revision: http://reviews.llvm.org/D9652
llvm-svn: 239797
Summary:
ValueTracking used to overwrite the analysis results computed from
assumes and dominating conditions. This patch fixes this issue.
Test Plan: test/Analysis/ValueTracking/assume.ll
Reviewers: hfinkel, majnemer
Reviewed By: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10283
llvm-svn: 239718
The CFLAA code currently calls ConstantExpr::getAsInstruction which creates an instruction from a constant expr.
We then pass that instruction to the InstVisitor to analyze it.
Its not necessary to create these instructions as we can just cast from Constant to Operator in the visitor. This is how other InstVisitor’s such as SelectionDAGBuilder handle ConstantExpr.
llvm-svn: 239616
Determining proper debug locations for instructions created in
PHITransAddr is tricky. We use a simple approach here and simply copy
debug locations from instructions computing load address to
"corresponding" instructions re-creating the address computation
in predecessor basic blocks.
This may not always be correct, given all the rearrangement and
simplification going on, and debug locations may jump around a lot,
as the basic blocks we copy locations between may be very far from
each other.
Still, this would work good in most simple cases (e.g. when chain
of address computing instruction is short, or our mapping turns out
to be 1-to-1), and we desire to have *some* reasonable debug locations
associated with newly inserted instructions.
See http://reviews.llvm.org/D10351 review thread for more details.
Test Plan: regression test suite
Reviewers: spatel, dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10351
llvm-svn: 239479
For GEP instructions isDereferenceablePointer checks that all indices are constant and within bounds. Replace this index calculation logic to a call to accumulateConstantOffset. Separated from the http://reviews.llvm.org/D9791
Reviewed By: sanjoy
Differential Revision: http://reviews.llvm.org/D9874
llvm-svn: 239299
Summary:
We need to add a runtime memcheck for pair of accesses (x,y) where at least one of x and y
are writes.
Assuming we have w writes and r reads, currently this number is estimated as being
w* (w+r-1). This estimation will count (write,write) pairs twice and will overestimate
the number of checks required.
This change adds a getNumberOfChecks method to RuntimePointerCheck, which
will count the number of runtime checks needed (similar in implementation to
needsAnyChecking) and uses it to produce the correct number of runtime checks.
Test Plan:
llvm test suite
spec2k
spec2k6
Performance results: no changes observed (not surprising since the formula for 1 writer is basically the same, which would covers most cases - at least with the current check limit).
Reviewers: anemet
Reviewed By: anemet
Subscribers: mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D10217
llvm-svn: 239295
Interleaved memory accesses are grouped and vectorized into vector load/store and shufflevector.
E.g. for (i = 0; i < N; i+=2) {
a = A[i]; // load of even element
b = A[i+1]; // load of odd element
... // operations on a, b, c, d
A[i] = c; // store of even element
A[i+1] = d; // store of odd element
}
The loads of even and odd elements are identified as an interleave load group, which will be transfered into vectorized IRs like:
%wide.vec = load <8 x i32>, <8 x i32>* %ptr
%vec.even = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
%vec.odd = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
The stores of even and odd elements are identified as an interleave store group, which will be transfered into vectorized IRs like:
%interleaved.vec = shufflevector <4 x i32> %vec.even, %vec.odd, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
store <8 x i32> %interleaved.vec, <8 x i32>* %ptr
This optimization is currently disabled by defaut. To try it by adding '-enable-interleaved-mem-accesses=true'.
llvm-svn: 239291
There were several SelectInst combines that always returned an existing
instruction instead of modifying an old one or creating a new one.
These are prime candidates for moving to InstSimplify.
llvm-svn: 239229
port it to the new pass manager.
All this does is extract the inner "location" class used by AA into its
own full fledged type. This seems *much* cleaner as MemoryDependence and
soon MemorySSA also use this heavily, and it doesn't make much sense
being inside the AA infrastructure.
This will also make it much easier to break apart the AA infrastructure
into something that stands on its own rather than using the analysis
group design.
There are a few places where this makes APIs not make sense -- they were
taking an AliasAnalysis pointer just to build locations. I'll try to
clean those up in follow-up commits.
Differential Revision: http://reviews.llvm.org/D10228
llvm-svn: 239003
Unreachable values may use themselves in strange ways due to their
dominance property. Attempting to translate through them can lead to
infinite recursion, crashing LLVM. Instead, claim that we weren't able
to translate the value.
This fixes PR23096.
llvm-svn: 238702
If the type isn't trivially moveable emplace can skip a potentially
expensive move. It also saves a couple of characters.
Call sites were found with the ASTMatcher + some semi-automated cleanup.
memberCallExpr(
argumentCountIs(1), callee(methodDecl(hasName("push_back"))),
on(hasType(recordDecl(has(namedDecl(hasName("emplace_back")))))),
hasArgument(0, bindTemporaryExpr(
hasType(recordDecl(hasNonTrivialDestructor())),
has(constructExpr()))),
unless(isInTemplateInstantiation()))
No functional change intended.
llvm-svn: 238602
Summary:
In continuation to an earlier commit to DependenceAnalysis.cpp by jingyue (r222100), the type for all subscripts in a coupled group need to be the same since constraints from one subscript may be propagated to another during testing. During testing, new SCEVs may be created and the operands for these need to be the same.
This patch extends unifySubscriptType() to work on lists of subscript pairs, ensuring a common extended type for all of them.
Test Plan:
Added a test case to NonCanonicalizedSubscript.ll which causes dependence analysis to crash without this fix.
All regression tests pass.
Reviewers: spop, sebpop, jingyue
Reviewed By: jingyue
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9698
llvm-svn: 238573
The patch evaluates the expansion cost of exitValue in indVarSimplify pass, and only does the rewriting when the expansion cost is low or loop can be deleted with the rewriting. It provides an option "-replexitval=" to control the default aggressiveness of the exitvalue rewriting. It also fixes some missing cases in SCEVExpander::isHighCostExpansionHelper to enhance the evaluation of SCEV expansion cost.
Differential Revision: http://reviews.llvm.org/D9800
llvm-svn: 238507
BranchProbabilityInfo was leaking 3MB of memory when running 'opt -O2 verify-uselistorder.lto.bc'. This was due to the Weights member not being cleared once the pass is no longer needed.
This adds the releaseMemory override to clear that field. The other fields are cleared at the end of runOnFunction so can stay there.
llvm-svn: 238462
model the dense vector instruction bonuses.
Previously, this code really didn't effectively compute the density of
inlined vector instructions and apply the intended inliner bonus. It
would try to compute it repeatedly while analyzing the function and
didn't handle the case where future vector instructions would tip the
scales back towards the bonus.
Instead, speculatively apply all possible bonuses to the threshold
initially. Once we *know* that a certain bonus can not be applied,
subtract it. This should delay early bailout enough to get much more
consistent results without actually causing us to analyze huge swaths of
code. I expect some (hopefully mild) compile time hit here, and some
swings in performance, but this was definitely the intended behavior of
these bonuses.
This also dramatically simplifies the computation of the bonuses to not
interact with each other in confusing ways. The previous code didn't do
a good job of this and the values for bonuses may be surprising but are
at least now clearly written in the code.
Finally, fix code to be in line with comments and use zero as the
bailout condition.
Patch by Easwaran Raman, with some comment tweaks by me to try and
further clarify what is going on with this code.
http://reviews.llvm.org/D8267
llvm-svn: 238276
This change does a few things:
- Move some InstCombine transforms to InstSimplify
- Run SimplifyCall from within InstCombine::visitCallInst
- Teach InstSimplify to fold [us]mul_with_overflow(X, undef) to 0.
llvm-svn: 237995
Make sure if we're truncating a constant that would then be sign extended
that the sign extension of the truncated constant is the same as the
original constant.
> Canonicalize min/max expressions correctly.
>
> This patch introduces a canonical form for min/max idioms where one operand
> is extended or truncated. This often happens when the other operand is a
> constant. For example:
>
> %1 = icmp slt i32 %a, i32 0
> %2 = sext i32 %a to i64
> %3 = select i1 %1, i64 %2, i64 0
>
> Would now be canonicalized into:
>
> %1 = icmp slt i32 %a, i32 0
> %2 = select i1 %1, i32 %a, i32 0
> %3 = sext i32 %2 to i64
>
> This builds upon a patch posted by David Majenemer
> (https://www.marc.info/?l=llvm-commits&m=143008038714141&w=2). That pass
> passively stopped instcombine from ruining canonical patterns. This
> patch additionally actively makes instcombine canonicalize too.
>
> Canonicalization of expressions involving a change in type from int->fp
> or fp->int are not yet implemented.
llvm-svn: 237821
Now that Intrinsic::ID is a typed enum, we can forward declare it and so return it from this method.
This updates all users which were either using an unsigned to store it, or had a now unnecessary cast.
llvm-svn: 237810
Summary:
Introduce dereferenceable, dereferenceable_or_null metadata for loads
with the same semantic as corresponding attributes.
This patch depends on http://reviews.llvm.org/D9253
Patch by Artur Pilipenko!
Reviewers: hfinkel, sanjoy, reames
Reviewed By: sanjoy, reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9365
llvm-svn: 237720
Summary:
Allow hoisting of loads from values marked with dereferenceable_or_null
attribute. For values marked with the attribute perform
context-sensitive analysis to determine whether it's known-non-null or
not.
Patch by Artur Pilipenko!
Reviewers: hfinkel, sanjoy, reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9253
llvm-svn: 237593
Summary:
This allows other passes (such as SLSR) to compute the SCEV expression for an
imaginary GEP.
Test Plan: no regression
Reviewers: atrick, sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9786
llvm-svn: 237589
When dependence analysis encounters a non-constant distance between
memory accesses it aborts the analysis and falls back to run-time checks
only. In this case we weren't resetting the array of dependences.
llvm-svn: 237574
"Store to invariant address..." is moved as the last line. This is not
the prime result of the analysis. Plus it simplifies some of the tests.
llvm-svn: 237573
This teaches the min/max idiom detector in ValueTracking to see through
casts such as SExt/ZExt/Trunc. SCEV can already do this, so we're bringing
non-SCEV analyses up to the same level.
The returned LHS/RHS will not match the type of the original SelectInst
any more, so a CastOp is returned too to inform the caller how to
convert to the SelectInst's type.
No in-tree users yet; this will be used by InstCombine in a followup.
llvm-svn: 237452
collectUpperBound hits an assertion when the back edge count is wider then the desired type.
If that happens, truncate the backedge count.
Patch by Philip Pfaffe!
llvm-svn: 237439
Summary:
Extract method haveNoCommonBitsSet so that we don't have to duplicate this logic in
InstCombine and SeparateConstOffsetFromGEP.
This patch also makes SeparateConstOffsetFromGEP more precise by passing
DominatorTree to computeKnownBits.
Test Plan: value-tracking-domtree.ll that tests ValueTracking indeed leverages dominating conditions
Reviewers: broune, meheff, majnemer
Reviewed By: majnemer
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D9734
llvm-svn: 237407
Function 'ConstantFoldScalarCall' (in ConstantFolding.cpp) works under the
wrong assumption that a call to 'convert.from.fp16' returns a value of
type 'float'.
However, intrinsic 'convert.from.fp16' can be overloaded; for example, we
can call 'convert.from.fp16.f64' to convert from half to double; etc.
Before this patch, the following example would have triggered an assertion
failure in opt (with -constprop):
```
define double @foo() {
entry:
%0 = call double @llvm.convert.from.fp16.f64(i16 0)
ret double %0
}
```
This patch fixes the problem in ConstantFolding.cpp. When folding a call to
convert.from.fp16, we perform a different kind of conversion based on the call
return type.
Added test 'Transform/ConstProp/convert-from-fp16.ll'.
Differential Revision: http://reviews.llvm.org/D9771
llvm-svn: 237377
This version doesn't need begin/end but can instead just take a type which has begin/end methods.
Use this to replace an eligible foreach loop in LoopInfo found by David Blaikie in r237224.
Reviewed by David Blaikie.
llvm-svn: 237301
We already had a method to iterate over all the incoming values of a PHI. This just changes all eligible code to use it.
Ineligible code included anything which cared about the index, or was also trying to get the i'th incoming BB.
llvm-svn: 237169
Summary:
There are several unhandled edge cases in BasicAA's GetLinearExpression
method. This changes fixes outstanding issues, including zext / sext of
a constant with the sign bit set, and the refusal to decompose zexts or
sexts of wrapping arithmetic.
Test Plan: Unit tests added in //q.ext.ll//.
Patch by Nick White.
Reviewers: hfinkel, sanjoy
Reviewed By: hfinkel, sanjoy
Subscribers: sanjoy, llvm-commits, hfinkel
Differential Revision: http://reviews.llvm.org/D6682
llvm-svn: 236894