This patch makes CFLAA answer some ModRef queries. Because we don't
distinguish between reading/writing when making StratifiedSets, we're
unable to offer any of the readonly-related answers.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D21858
llvm-svn: 274197
This is breaking an optimizaton remark test in clang. I've identified a couple fixes for that, but want to understand it better before I commit to anything.
llvm-svn: 274102
If a operation for a recurrence is an addition with no signed wrap and both input sign bits are 0, then the result sign bit must also be 0. Similar for the negative case.
I found this deficiency while playing around with a loop in the x86 backend that contained a signed division that could be optimized into an unsigned division if we could prove both inputs were positive. One of them being the loop induction variable. With this patch we can perform the conversion for this case. One of the test cases here is a contrived variation of the loop I was looking at.
Differential revision: http://reviews.llvm.org/D21493
llvm-svn: 274098
This patch enhances dot graph viewer to show hot regions
with hot bbs/edges displayed in red. The ratio of the bb
freq to the max freq of the function needs to be no less
than the value specified by view-hot-freq-percent option.
The default value is 10 (i.e. 10%).
llvm-svn: 273996
MBFI supports profile count dumping and function
name based filtering. Add these two feature to
BFI as well. The filtering option is shared between
BFI and MBFI: -view-bfi-func-name=..
llvm-svn: 273992
BFI and MBFI's dot traits class share most of the
code and all future enhancement. This patch extracts
common implementation into base class BFIDOTGraphTraitsBase.
This patch also enables BFI graph to show branch probability
on edges as MBFI does before.
llvm-svn: 273990
the new pass manager.
This adds operator<< overloads for the various bits of the
LazyCallGraph, dump methods for use from the debugger, and debug logging
using them to the CGSCC pass manager.
Having this was essential for debugging the call graph update patch, and
I've extracted what I could from that patch here to minimize the delta.
llvm-svn: 273961
Apparently, MSVC complains if there's an implicit conversion from
`unsigned` to `unsigned long long`, if the `unsigned` is the result of
a bit shift.
llvm-svn: 273955
It did not handle correctly cases without GEP.
The following loop wasn't vectorized:
for (int i=0; i<len; i++)
*to++ = *from++;
I use getPtrStride() to find Stride for memory access and return 0 is the Stride is not 1 or -1.
Re-commit rL273257 - revision: http://reviews.llvm.org/D20789
llvm-svn: 273864
This intrinsic safely loads a function pointer from a virtual table pointer
using type metadata. This intrinsic is used to implement control flow integrity
in conjunction with virtual call optimization. The virtual call optimization
pass will optimize away llvm.type.checked.load intrinsics associated with
devirtualized calls, thereby removing the type check in cases where it is
not needed to enforce the control flow integrity constraint.
This patch also introduces the capability to copy type metadata between
global variables, and teaches the virtual call optimization pass to do so.
Differential Revision: http://reviews.llvm.org/D21121
llvm-svn: 273756
The bitset metadata currently used in LLVM has a few problems:
1. It has the wrong name. The name "bitset" refers to an implementation
detail of one use of the metadata (i.e. its original use case, CFI).
This makes it harder to understand, as the name makes no sense in the
context of virtual call optimization.
2. It is represented using a global named metadata node, rather than
being directly associated with a global. This makes it harder to
manipulate the metadata when rebuilding global variables, summarise it
as part of ThinLTO and drop unused metadata when associated globals are
dropped. For this reason, CFI does not currently work correctly when
both CFI and vcall opt are enabled, as vcall opt needs to rebuild vtable
globals, and fails to associate metadata with the rebuilt globals. As I
understand it, the same problem could also affect ASan, which rebuilds
globals with a red zone.
This patch solves both of those problems in the following way:
1. Rename the metadata to "type metadata". This new name reflects how
the metadata is currently being used (i.e. to represent type information
for CFI and vtable opt). The new name is reflected in the name for the
associated intrinsic (llvm.type.test) and pass (LowerTypeTests).
2. Attach metadata directly to the globals that it pertains to, rather
than using the "llvm.bitsets" global metadata node as we are doing now.
This is done using the newly introduced capability to attach
metadata to global variables (r271348 and r271358).
See also: http://lists.llvm.org/pipermail/llvm-dev/2016-June/100462.html
Differential Revision: http://reviews.llvm.org/D21053
llvm-svn: 273729
This patch also has a refactor that kills StratifiedAttr, and leaves us
with StratifiedAttrs, because having both was mildly redundant.
This patch makes us correctly handle stratified attributes when doing
interprocedural analysis. It also adds another attribute, AttrCaller,
which acts like AttrUnknown. We can filter out AttrCaller values when
during interprocedural analysis, since the caller should have
information about what arguments it's passing to its callee.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D21645
llvm-svn: 273636
Summary:
This instcombine rule folds away trunc operations that have value available from a prior load or store.
This kind of code can be generated as a result of GVN widening the load or from source code as well.
Reviewers: reames, majnemer, sanjoy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D21246
llvm-svn: 273608
Previously, we just unified any arguments that seemed to be related to
each other. With this patch, we now respect dereference levels, etc.
which should make us substantially more accurate. Proper handling of
StratifiedAttrs will be done in a later patch.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D21536
llvm-svn: 273596
This was noted in http://reviews.llvm.org/D21610 . The previous code
predated the use of APInt ( http://reviews.llvm.org/rL47654 ), so it
had to account for the fixed width of uint64_t.
Now that we're using the variable width APInt, we can remove some
complexity.
llvm-svn: 273584
When simplifying a load we need to make sure that the type of the
simplified value matches the type of the instruction we're processing.
In theory, we can handle casts here as we deal with constant data, but
since it's not implemented at the moment, we at least need to bail out.
This fixes PR28262.
llvm-svn: 273562
This is similar to the computeKnownBits improvement in rL268479.
There's probably more we can do for vector logic instructions, but
this should let us see non-splat constant masking ops that can
become vector selects instead of and/andn/or sequences.
Differential Revision: http://reviews.llvm.org/D21610
llvm-svn: 273459
It did not handle correctly cases without GEP.
The following loop wasn't vectorized:
for (int i=0; i<len; i++)
*to++ = *from++;
I use getPtrStride() to find Stride for memory access and return 0 is the Stride is not 1 or -1.
Differential revision: http://reviews.llvm.org/D20789
llvm-svn: 273257
This patch makes us perform interprocedural analysis on functions that
don't have internal linkage. It also removes a test that should've been
deleted in an earlier commit (since other tests now cover everything
that the newly-removed test covers).
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D21513
llvm-svn: 273229
This patch adds function summaries, so that we don't need to recompute
various properties about function parameters/return values at each
callsite of a function. It also adds many interprocedural tests for
CFLAA.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D21475#inline-182390
llvm-svn: 273219
By moving this transform to InstSimplify from InstCombine, we sidestep the problem/question
raised by PR27869:
https://llvm.org/bugs/show_bug.cgi?id=27869
...where InstCombine turns an icmp+zext into a shift causing us to miss the fold.
Credit to David Majnemer for a draft patch of the changes to InstructionSimplify.cpp.
Differential Revision: http://reviews.llvm.org/D21512
llvm-svn: 273200
On the surface, this might not look like it does anything... but
actually it brings in the declaration "extern template class
AnalysisManager<Loop>;", which suppresses the instantiation of the
constructor, which avoids the funny interaction between "extern
template" and -fvisibility-inlines-hidden.
llvm-svn: 273133
Access it through -passes=print-lcg-dot
Let me know any suggestions for changing the rendering; I'm not
particularly attached to what is implemented here.
llvm-svn: 273082
The way we elide max expressions when computing trip counts is incorrect
-- it breaks cases like this:
```
static int wrapping_add(int a, int b) {
return (int)((unsigned)a + (unsigned)b);
}
void test() {
volatile int end_buf = 2147483548; // INT_MIN - 100
int end = end_buf;
unsigned counter = 0;
for (int start = wrapping_add(end, 200); start < end; start++)
counter++;
print(counter);
}
```
Note: the `NoWrap` variable that was being tested has little to do with
the values flowing into the max expression; it is a property of the
induction variable.
test/Transforms/LoopUnroll/nsw-tripcount.ll was added to solely test
functionality I'm reverting in this change, so I've deleted the test
fully.
llvm-svn: 273079
This is a functional change for LLE and LDist. The other clients (LV,
LVerLICM) already had this explicitly enabled.
The temporary boolean parameter to LAA is removed that allowed turning
off speculation of symbolic strides. This makes LAA's caching interface
LAA::getInfo only take the loop as the parameter. This makes the
interface more friendly to the new Pass Manager.
The flag -enable-mem-access-versioning is moved from LV to a LAA which
now allows turning off speculation globally.
llvm-svn: 273064
pass manager passes' `run` methods.
This removes a bunch of SFINAE goop from the pass manager and just
requires pass authors to accept `AnalysisManager<IRUnitT> &` as a dead
argument. This is a small price to pay for the simplicity of the system
as a whole, despite the noise that changing it causes at this stage.
This will also helpfull allow us to make the signature of the run
methods much more flexible for different kinds af passes to support
things like intelligently updating the pass's progression over IR units.
While this touches many, many, files, the changes are really boring.
Mostly made with the help of my trusty perl one liners.
Thanks to Sean and Hal for bouncing ideas for this with me in IRC.
llvm-svn: 272978
This is still NFCI, so the list of clients that allow symbolic stride
speculation does not change (yes: LV and LoopVersioningLICM, no: LLE,
LDist). However since the symbolic strides are now managed by LAA
rather than passed by client a new bool parameter is used to enable
symbolic stride speculation.
The existing test Transforms/LoopVectorize/version-mem-access.ll checks
that stride speculation is performed for LV.
The previously added test Transforms/LoopLoadElim/symbolic-stride.ll
ensures that no speculation is performed for LLE.
The next patch will change the functionality and turn on symbolic stride
speculation in all of LAA's clients and remove the bool parameter.
llvm-svn: 272970
We should update results of the BranchProbabilityInfo after removing block in JumpThreading. Otherwise
we will get dangling pointer inside BranchProbabilityInfo cache.
Differential Revision: http://reviews.llvm.org/D20957
llvm-svn: 272891
This patch makes CFLAA ignore non-pointer values, since we can now
sanely do that with the escaping/unknown attributes. Additionally,
StratifiedAttrs make more sense to sit on nodes than edges (since
they're properties of values, and ultimately end up on the nodes of
StratifiedSets). So, this patch puts said attributes on nodes.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D21387
llvm-svn: 272833
We would fail to validate the type of the tan function which would cause
downstream users of isValidProtoForLibFunc to assert.
This fixes PR28143.
llvm-svn: 272802
Use Optional<T> to denote the absence of a solution, not
SCEVCouldNotCompute. This makes the usage of SolveQuadraticEquation
somewhat simpler.
llvm-svn: 272752
If a local_unnamed_addr attribute is attached to a global, the address
is known to be insignificant within the module. It is distinct from the
existing unnamed_addr attribute in that it only describes a local property
of the module rather than a global property of the symbol.
This attribute is intended to be used by the code generator and LTO to allow
the linker to decide whether the global needs to be in the symbol table. It is
possible to exclude a global from the symbol table if three things are true:
- This attribute is present on every instance of the global (which means that
the normal rule that the global must have a unique address can be broken without
being observable by the program by performing comparisons against the global's
address)
- The global has linkonce_odr linkage (which means that each linkage unit must have
its own copy of the global if it requires one, and the copy in each linkage unit
must be the same)
- It is a constant or a function (which means that the program cannot observe that
the unique-address rule has been broken by writing to the global)
Although this attribute could in principle be computed from the module
contents, LTO clients (i.e. linkers) will normally need to be able to compute
this property as part of symbol resolution, and it would be inefficient to
materialize every module just to compute it.
See:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160509/356401.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160516/356738.html
for earlier discussion.
Part of the fix for PR27553.
Differential Revision: http://reviews.llvm.org/D20348
llvm-svn: 272709
This change teaches llvm::isGuaranteedToTransferExecutionToSuccessor
that calls to @llvm.assume always terminate. Most other relevant
intrinsics should be covered by the "CS.onlyReadsMemory() ||
CS.onlyAccessesArgMemory()" bit but we were missing @llvm.assumes
because we state that it clobbers memory.
Added an LICM test case, but this change is not specific to LICM.
llvm-svn: 272703
This patch also includes some refactoring.
Prior to this patch, we tagged all CFLAA attributes as unknown. This is
suboptimal, since it meant that any Value used as an argument would be
considered to alias any other Value that existed.
Now that we have the machinery to tag sets below the set for an
arbitrary value with attributes, it's okay to be less conservative with
arguments. (Specifically, we still tag the set under an argument with
unknown).
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D21262
llvm-svn: 272690
This patch refactors CFLAA's graph building code. This makes keeping
track of common state (TargetLibraryInfo, ...) easier.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D21261
llvm-svn: 272688
Summary:
The SimplifyLibCalls part of InstCombine generates calls to those otherwise.
I wonder if at some point we shouldn't just call disableAllFunctions() and
then enable functions on a whitelist basis...
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96495
Reviewers: arsenm, tstellarAMD
Subscribers: llvm-commits, kzhuravl
Differential Revision: http://reviews.llvm.org/D21282
llvm-svn: 272664
This is a bit gnarly since LVI is maintaining its own cache.
I think this port could be somewhat cleaner, but I'd rather not spend
too much time on it while we still have the old pass hanging around and
limiting how much we can clean things up.
Once the old pass is gone it will be easier (less time spent) to clean
it up anyway.
This is the last dependency needed for porting JumpThreading which I'll
do in a follow-up commit (there's no printer pass for LVI or anything to
test it, so porting a pass that depends on it seems best).
I've been mostly following:
r269370 / D18834 which ported Dependence Analysis
r268601 / D19839 which ported BPI
llvm-svn: 272593
Summary:
AAResults::callCapturesBefore would previously ignore operand
bundles. It was possible for a later instruction to miss its memory
dependency on a call site that would only access the pointer through a
bundle.
Patch by Oscar Blumberg!
Reviewers: sanjoy
Differential Revision: http://reviews.llvm.org/D21286
llvm-svn: 272580
Summary:
Make isGuaranteedToExecute use the
isGuaranteedToTransferExecutionToSuccessor helper, and make that helper
a bit more accurate.
There's a potential performance impact here from assuming that arbitrary
calls might not return. This probably has little impact on loads and
stores to a pointer because most things alias analysis can reason about
are dereferenceable anyway. The other impacts, like less aggressive
hoisting of sdiv by a variable and less aggressive hoisting around
volatile memory operations, are unlikely to matter for real code.
This also impacts SCEV, which uses the same helper. It's a minor
improvement there because we can tell that, for example, memcpy always
returns normally. Strictly speaking, it's also introducing
a bug, but it's not any worse than everywhere else we assume readonly
functions terminate.
Fixes http://llvm.org/PR27857.
Reviewers: hfinkel, reames, chandlerc, sanjoy
Subscribers: broune, llvm-commits
Differential Revision: http://reviews.llvm.org/D21167
llvm-svn: 272489
Add an option to enable the analysis of MachineFunction register
usage to extract the list of clobbered registers.
When enabled, the CodeGen order is changed to be bottom up on the Call
Graph.
The analysis is split in two parts, RegUsageInfoCollector is the
MachineFunction Pass that runs post-RA and collect the list of
clobbered registers to produce a register mask.
An immutable pass, RegisterUsageInfo, stores the RegMask produced by
RegUsageInfoCollector, and keep them available. A future tranformation
pass will use this information to update every call-sites after
instruction selection.
Patch by Vivek Pandya <vivekvpandya@gmail.com>
Differential Revision: http://reviews.llvm.org/D20769
llvm-svn: 272403
Prior to this patch, we used argument/global stratified attributes in
order to note that a value could have come from either dereferencing a
global/arg, or from the assignment from a global/arg.
Now, AttrUnknown is placed on sets when we see a dereference, instead of
the global/arg attributes. This allows us to be more aggressive in the
future when we see global/arg attributes without AttrUnknown.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D21110
llvm-svn: 272335
Instead of directly using MaxFunctionCount and function entry count to determine callee hotness, use the isHotFunction/isColdFunction methods provided by ProfileSummaryInfo.
Differential revision: http://reviews.llvm.org/D21045
llvm-svn: 272321
We can safely rely on a NoWrap add recurrence causing UB down the road
only if we know the loop does not have a exit expressed in a way that is
opaque to ScalarEvolution (e.g. by a function call that conditionally
calls exit(0)).
I believe with this change PR28012 is fixed.
Note: I had to change some llvm-lit tests in LoopReroll, since it looks
like they were depending on this incorrect behavior.
llvm-svn: 272237
This is NFC as far as externally visible behavior is concerned, but will
keep us from spinning in the worklist traversal algorithm unnecessarily.
llvm-svn: 272182
Absence of may-unwind calls is not enough to guarantee that a
UB-generating use of an add-rec poison in the loop latch will actually
cause UB. We also need to guard against calls that terminate the thread
or infinite loop themselves.
This partially addresses PR28012.
llvm-svn: 272181
The worklist algorithm introduced in rL271151 didn't check to see if the
direct users of the post-inc add recurrence propagates poison. This
change fixes the problem and makes the code structure more obvious.
Note for release managers: correctness wise, this bug wasn't a
regression introduced by rL271151 -- the behavior of SCEV around
post-inc add recurrences was strictly improved (in terms of correctness)
in rL271151.
llvm-svn: 272179
As suggested by clang-tidy's performance-unnecessary-copy-initialization.
This can easily hit lifetime issues, so I audited every change and ran the
tests under asan, which came back clean.
llvm-svn: 272126
This patch does a few things:
- Unifies AttrAll and AttrUnknown (since they were used for more or less
the same purpose anyway).
- Introduces AttrEscaped, an attribute that notes that a value escapes
our analysis for a given set, but not that an unknown value flows into
said set.
- Removes functions that take bit indices, since we also had functions
that took bitsets, and the use of both (with similar names) was
unclear and bug-prone.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D21000
llvm-svn: 272040
In some cases, when simplifying with SCEV, we might consider pointer values as
just usual integer values. Thus, we might get a different type from what we
had originally in the map of simplified values, and hence we need to check
types before operating on the values.
This fixes PR28015.
llvm-svn: 271931
Now that `Value::getPointerDereferenceableBytes` looks beyond just
attributes, the name `isDereferenceableFromAttribute` is misleading.
Just inline the function, since it is small and only used once.
llvm-svn: 271456
... and merge into `Value::getPointerDereferenceableBytes`. This was
suggested by Artur Pilipenko in D20764 -- since we no longer allow loads
of unsized types, there is no need anymore to have this special logic.
llvm-svn: 271455
Summary:
Make sure that the SCEVExpander Builder insert point and any
saved/restored insert points are kept consistent (i.e. their Instruction
and BasicBlock match) when moving instructions in SCEVExpander.
This fixes an issue triggered by
http://reviews.llvm.org/D18001 [LSR] Create fewer redundant instructions.
Test case will be added in reapply commit of above change:
http://reviews.llvm.org/D18480 Reapply [LSR] Create fewer redundant instructions.
Reviewers: sanjoy
Subscribers: mzolotukhin, sanjoy, qcolombet, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D20703
llvm-svn: 271424
This patch extends CFLAA to recognize allocation functions such as
malloc, free, etc, so we can treat them more aggressively.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D20776
llvm-svn: 271421
Patch by Taewook Oh
Summary: Patch for Bug 27478. Make BasicAliasAnalysis claims NoAlias if two GEPs index different fields of the same structure.
Reviewers: hfinkel, dberlin
Subscribers: dberlin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D20665
llvm-svn: 271415
Summary:
Change some of the internal interfaces in Loads.cpp to keep track of the
number of bytes we're trying to prove dereferenceable using an explicit
`Size` parameter.
Before this, the `Size` parameter was implicitly inferred from the
pointee type of the pointer whose dereferenceability we were trying to
prove, causing us to be conservative around bitcasts. This was
unfortunate since bitcast instructions are no-ops and should never
break optimizations. With an explicit `Size` parameter, we're more
precise (as shown in the test cases), and the code is simpler.
We should eventually move towards a `DerefQuery` struct that groups
together a base pointer, an offset, a size and an alignment; but this
patch is a first step.
Reviewers: apilipenko, dblaikie, hfinkel, reames
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D20764
llvm-svn: 271406
Code like the following is considered broken, and doesn't need to be
supported by our AA magicks:
void getFoo(int *P) {
int *PAlias = (int *)((char *)NULL + (uintptr_t)P);
}
This patch makes CFLAA drop support for code like this.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D20775
llvm-svn: 271322
This adds support to the backed to actually support SjLj EH as an exception
model. This is *NOT* the default model, and requires explicitly opting into it
from the frontend. GCC supports this model and for MinGW can still be enabled
via the `--using-sjlj-exceptions` options.
Addresses PR27749!
llvm-svn: 271244
Consolidate documentation by removing comments from the .cpp file where
the comments in the .cpp file were copy-pasted from the header.
llvm-svn: 271157
Summary:
This change teaches SCEV to see reduce `(extractvalue
0 (op.with.overflow X Y))` into `op X Y` (with a no-wrap tag if
possible).
This was first checked in at r265912 but reverted in r265950 because it
exposed some issues around how SCEV handled post-inc add recurrences.
Those issues have now been fixed.
Reviewers: atrick, regehr
Subscribers: mcrosier, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D18684
llvm-svn: 271152
Fixes PR27315.
The post-inc version of an add recurrence needs to "follow the same
rules" as a normal add or subtract expression. Otherwise we miscompile
programs like
```
int main() {
int a = 0;
unsigned a_u = 0;
volatile long last_value;
do {
a_u += 3;
last_value = (long) ((int) a_u);
if (will_add_overflow(a, 3)) {
// Leave, and don't actually do the increment, so no UB.
printf("last_value = %ld\n", last_value);
exit(0);
}
a += 3;
} while (a != 46);
return 0;
}
```
This patch changes SCEV to put no-wrap flags on post-inc add recurrences
only when the poison from a potential overflow will go ahead to cause
undefined behavior.
To avoid regressing performance too much, I've assumed infinite loops
without side effects is undefined behavior to prove poison<->UB
equivalence in more cases. This isn't ideal, but is not new to LLVM as
a whole, and far better than the situation I'm trying to fix.
llvm-svn: 271151
r270777 improved the precision of alloca vs. inbounbds GEP alias queries: if
we have (a) an inbounds GEP and (b) a pointer based on an alloca, and the
beginning of the object the GEP points to would have a negative offset with
respect to the alloca, then the GEP can not alias pointer (b).
This makes the same logic fire when (b) is based on a GlobalVariable instead
of an alloca.
Differential Revision: http://reviews.llvm.org/D20652
llvm-svn: 270893
The memory location that corresponds to a volatile operation is very
special. They are observed by the machine in ways which we cannot
reason about.
Differential Revision: http://reviews.llvm.org/D20555
llvm-svn: 270879
It turns out that too many passes are relying on alias analysis results
for control dependencies. Until we fix that by introducing a more accurate
modelling of control dependencies, special case assume in MemorySSA instead.
Also introduce tests to ensure we don't regress the FunctionAttrs or LICM
passes.
Differential Revision: http://reviews.llvm.org/D20658
llvm-svn: 270823
If a we have (a) a GEP and (b) a pointer based on an alloca, and the
beginning of the object the GEP points would have a negative offset with
repsect to the alloca, then the GEP can not alias pointer (b).
For example, consider code like:
struct { int f0, int f1, ...} foo;
...
foo alloca;
foo *random = bar(alloca);
int *f0 = &alloca.f0
int *f1 = &random->f1;
Which is lowered, approximately, to:
%alloca = alloca %struct.foo
%random = call %struct.foo* @random(%struct.foo* %alloca)
%f0 = getelementptr inbounds %struct, %struct.foo* %alloca, i32 0, i32 0
%f1 = getelementptr inbounds %struct, %struct.foo* %random, i32 0, i32 1
Assume %f1 and %f0 alias. Then %f1 would point into the object allocated
by %alloca. Since the %f1 GEP is inbounds, that means %random must also
point into the same object. But since %f0 points to the beginning of %alloca,
the highest %f1 can be is (%alloca + 3). This means %random can not be higher
than (%alloca - 1), and so is not inbounds, a contradiction.
Differential Revision: http://reviews.llvm.org/D20495
llvm-svn: 270777
Getting accurate locations for loops is important, because those locations are
used by the frontend to generate optimization remarks. Currently, optimization
remarks for loops often appear on the wrong line, often the first line of the
loop body instead of the loop itself. This is confusing because that line might
itself be another loop, or might be somewhere else completely if the body was
inlined function call. This happens because of the way we find the loop's
starting location. First, we look for a preheader, and if we find one, and its
terminator has a debug location, then we use that. Otherwise, we look for a
location on an instruction in the loop header.
The fallback heuristic is not bad, but will almost always find the beginning of
the body, and not the loop statement itself. The preheader location search
often fails because there's often not a preheader, and even when there is a
preheader, depending on how it was formed, it sometimes carries the location of
some preceeding code.
I don't see any good theoretical way to fix this problem. On the other hand,
this seems like a straightforward solution: Put the debug location in the
loop's llvm.loop metadata. A companion Clang patch will cause Clang to insert
llvm.loop metadata with appropriate locations when generating debugging
information. With these changes, our loop remarks have much more accurate
locations.
Differential Revision: http://reviews.llvm.org/D19738
llvm-svn: 270771
There was a typo in r267758. It caused invalid accesses when
given something like "void @free(...)", as NumParams == 0, and
we then try to look at the 0th parameter.
Turns out, most of these were untested; add both attribute
and missing-prototype checks for all libc libfuncs.
Differential Revision: http://reviews.llvm.org/D20543
llvm-svn: 270750
Summary:
**Description**
This makes `WidenIV::widenIVUse` (IndVarSimplify.cpp) fail to widen narrow IV uses in some cases. The latter affects IndVarSimplify which may not eliminate narrow IV's when there actually exists such a possibility, thereby producing ineffective code.
When `WidenIV::widenIVUse` gets a NarrowUse such as `{(-2 + %inc.lcssa),+,1}<nsw><%for.body3>`, it first tries to get a wide recurrence for it via the `getWideRecurrence` call.
`getWideRecurrence` returns recurrence like this: `{(sext i32 (-2 + %inc.lcssa) to i64),+,1}<nsw><%for.body3>`.
Then a wide use operation is generated by `cloneIVUser`. The generated wide use is evaluated to `{(-2 + (sext i32 %inc.lcssa to i64))<nsw>,+,1}<nsw><%for.body3>`, which is different from the `getWideRecurrence` result. `cloneIVUser` sees the difference and returns nullptr.
This patch also fixes the broken LLVM tests by adding missing <nsw> entries introduced by the correction.
**Minimal reproducer:**
```
int foo(int a, int b, int c);
int baz();
void bar()
{
int arr[20];
int i = 0;
for (i = 0; i < 4; ++i)
arr[i] = baz();
for (; i < 20; ++i)
arr[i] = foo(arr[i - 4], arr[i - 3], arr[i - 2]);
}
```
**Clang command line:**
```
clang++ -mllvm -debug -S -emit-llvm -O3 --target=aarch64-linux-elf test.cpp -o test.ir
```
**Expected result:**
The ` -mllvm -debug` log shows that all the IV's for the second `for` loop have been eliminated.
Reviewers: sanjoy
Subscribers: atrick, asl, aemerson, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D20058
llvm-svn: 270695
Similar in spirit to D20497 :
If all elements of a constant vector are known non-zero, then we can say that the
whole vector is known non-zero.
It seems like we could extend this to FP scalar/vector too, but isKnownNonZero()
says it only works for integers and pointers for now.
Differential Revision: http://reviews.llvm.org/D20544
llvm-svn: 270562
We could try harder to handle non-splat vector constants too,
but that seems much rarer to me.
Note that the div test isn't resolved because there's a check
for isIntegerTy() guarding that transform.
Differential Revision: http://reviews.llvm.org/D20497
llvm-svn: 270369
When it has a DataLayout, DecomposeGEPExpression() should return the same object
as GetUnderlyingObject(). Per the FIXME, it currently always has a DL, so the
runtime check is redundant and can become an assert.
llvm-svn: 270268
Before r257832, the threshold used by SimpleInliner was explicitly specified or generated from opt levels and passed to the base class Inliner's constructor. There, it was first overridden by explicitly specified -inline-threshold. The refactoring in r257832 did not preserve this behavior for all opt levels. This change brings back the original behavior.
Differential Revision: http://reviews.llvm.org/D20452
llvm-svn: 270153
This patch changes the order in which we attempt to prove the independence of
strided accesses. We previously did this after we knew the dependence distance
was positive. With this change, we check for independence before handling the
negative distance case. The patch prevents LAA from reporting forward
dependences for independent strided accesses.
This change was requested in the review of D19984.
llvm-svn: 270072
... for AddRec's in loops for which SCEV is unable to compute a max
tripcount. This is the NUW variant of r269211 and fixes PR27691.
(Note: PR27691 is not a correct or stability bug, it was created to
track a pending task).
llvm-svn: 269790
This patch renames the option enabling the store-to-load forwarding conflict
detection optimization. This change was requested in the review of D20241.
llvm-svn: 269668
Also s/Cycles/Iters/ in NumCyclesForStoreLoadThroughMemory to make it
clear that this is not about clock cycles but loop cycles/iterations.
llvm-svn: 269667
Fix "Logic error" warnings of the type "Called C++ object pointer is
null" reported by Clang Static Analyzer on the following files:
lib/Analysis/ScalarEvolution.cpp,
lib/Analysis/LoopInfo.cpp.
Patch by Apelete Seketeli!
llvm-svn: 269424
Summary:
...loop after the last iteration.
This is really hard to do correctly. The core problem is that we need to
model liveness through the induction PHIs from iteration to iteration in
order to get the correct results, and we need to correctly de-duplicate
the common subgraphs of instructions feeding some subset of the
induction PHIs. All of this can be driven either from a side effect at
some iteration or from the loop values used after the loop finishes.
This patch implements this by storing the forward-propagating analysis
of each instruction in a cache to recall whether it was free and whether
it has become live and thus counted toward the total unroll cost. Then,
at each sink for a value in the loop, we recursively walk back through
every value that feeds the sink, including looping back through the
iterations as needed, until we have marked the entire input graph as
live. Because we cache this, we never visit instructions more than twice
-- once when we analyze them and put them into the cache, and once when
we count their cost towards the unrolled loop. Also, because the cache
is only two bits and because we are dealing with relatively small
iteration counts, we can store all of this very densely in memory to
avoid this from becoming an excessively slow analysis.
The code here is still pretty gross. I would appreciate suggestions
about better ways to factor or split this up, I've stared too long at
the algorithmic side to really have a good sense of what the design
should probably look at.
Also, it might seem like we should do all of this bottom-up, but I think
that is a red herring. Specifically, the simplification power is *much*
greater working top-down. We can forward propagate very effectively,
even across strange and interesting recurrances around the backedge.
Because we use data to propagate, this doesn't cause a state space
explosion. Doing this level of constant folding, etc, would be very
expensive to do bottom-up because it wouldn't be until the last moment
that you could collapse everything. The current solution is essentially
a top-down simplification with a bottom-up cost accounting which seems
to get the best of both worlds. It makes the simplification incremental
and powerful while leaving everything dead until we *know* it is needed.
Finally, a core property of this approach is its *monotonicity*. At all
times, the current UnrolledCost is a conservatively low estimate. This
ensures that we will never early-exit from the analysis due to exceeding
a threshold when if we had continued, the cost would have gone back
below the threshold. These kinds of bugs can cause incredibly hard to
track down random changes to behavior.
We could use a techinque similar (but much simpler) within the inliner
as well to avoid considering speculated code in the inline cost.
Reviewers: chandlerc
Subscribers: sanjoy, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D11758
llvm-svn: 269388
Summary:
Currently we consider such instructions as simplified, which is incorrect,
because if their user isn't simplified, we can't actually simplify them too.
This biases our estimates of profitability: for instance the analyzer expects
much more gains from unrolling memcpy loops than there actually are.
Reviewers: hfinkel, chandlerc
Subscribers: mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D17365
llvm-svn: 269387
Ported DA to the new PM by splitting the former DependenceAnalysis Pass
into a DependenceInfo result type and DependenceAnalysisWrapperPass type
and adding a new PM-style DependenceAnalysis analysis pass returning the
DependenceInfo.
Patch by Philip Pfaffe, most of the review by Justin.
Differential Revision: http://reviews.llvm.org/D18834
llvm-svn: 269370
SCEVExpander::replaceCongruentIVs assumes the backedge value of an
SCEV-analysable PHI to always be an instruction, when this is not
necessarily true. For now address this by bailing out of the
optimization if the backedge value of the PHI is a non-Instruction.
llvm-svn: 269213
`SCEVExpander::replaceCongruentIVs` bypasses `hoistIVInc` if both the
original and the isomorphic increments are PHI nodes. Doing this can
break SSA if the isomorphic increment is not dominated by the original
increment. Get rid of the bypass, and let `hoistIVInc` do the right
thing.
Fixes PR27232 (compile time crash/hang).
llvm-svn: 269212
... for AddRec's in loops for which SCEV is unable to compute a max
tripcount. This is not a problem for "normal" loops[0] that don't have
guards or assumes, but helps in cases where we have guards or assumes in
the loop that can be used to constrain incoming values over the backedge.
This partially fixes PR27691 (we still don't handle the NUW case).
[0]: for "normal" loops, in the cases where we'd be able to prove
no-wrap via isKnownPredicate, we'd also be able to compute a max
tripcount.
llvm-svn: 269211
Equivalent GEP indices with different types are treated as different
indices altogether, leading to an incorrect AA result. Fix the issue
by comparing indices based on their values.
Thanks to Mikael Holmén for reporting the issue!
Differential Revision: http://reviews.llvm.org/D19935
llvm-svn: 269197
Extract a part of isDereferenceableAndAlignedPointer functionality to Value:
Reviewed By: hfinkel, sanjoy
Differential Revision: http://reviews.llvm.org/D17611
llvm-svn: 269190
Do simplifications common to all shift instructions based on the amount shifted:
1. If the shift amount is known larger than the bitwidth, the result is undefined.
2. If the valid bits of the shift amount are all known to be 0, it's a shift by zero, so the shift operand is the result.
Note that we could generalize the shift-by-zero transform into a shift-by-constant if all of the valid bits in the shift
amount are known, but that would have to be done in InstCombine rather than here because it would mean we need to create
a new shift instruction.
Differential Revision: http://reviews.llvm.org/D19874
llvm-svn: 269114
The plan is to eventually make this logic simpler, however I expect it to
be a little tricky for the foreseeable future (at least until we're rid of
pointee types), so move it here so that it can be reused to build a summary
index for devirtualization.
Differential Revision: http://reviews.llvm.org/D20005
llvm-svn: 269081
This removes a redundant stride versioning step (we already
do it in getPtrStride, so it has no effect) and uses PSE to
get the SCEV expressions for the source and destination
(this might have changed when getPtrStride was called).
I discovered this through code inspection, and couldn't
produce a regression test for it.
llvm-svn: 269052
Summary:
The idea is very close to what we do for assume intrinsics: we mark the
guard intrinsics as writing to arbitrary memory to maintain control
dependence, but under the covers we teach AA that they do not mod any
particular memory location.
Reviewers: chandlerc, hfinkel, gbiv, reames
Subscribers: george.burgess.iv, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D19575
llvm-svn: 269007
We can use calls to @llvm.experimental.guard to prove predicates,
relying on the fact that in all locations domianted by a call to
@llvm.experimental.guard the predicate it is guarding is known to be
true.
llvm-svn: 268997
When we encounter unsafe memory dependencies, loop distribution could
help.
Even though, the diagnostics is in LAA, it's only currently emitted in
the vectorizer.
llvm-svn: 268987
When deciding if a vector calculation can be done in a smaller bitwidth, use sign bit information from ValueTracking to add more information and allow more truncations.
llvm-svn: 268921
A number of libcalls don't exist in any particular lib but are, instead,
defined in math.h as inline functions (even in C mode!). Don't rely on
their existence when lowering @llvm.{cos,sin,floor,..}.f32, promote them
instead.
N.B. We had logic to handle FREM but were missing out on a number of
others. This change generalizes the FREM handling.
llvm-svn: 268875
This test was crashing, and currently it breaks bootstrapping clang with debuginfo
Differential Revision: http://reviews.llvm.org/D20008
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 268715
This message used to be correct, when all we cared about was whether the
dependence was safe (i.e. NoDep) or unsafe. With the current more
precise characterization, this is a forward dep.
llvm-svn: 268695
We assumed that ConstantVectors would be rather uninteresting from the
perspective of analysis. However, this is not the case due to a quirk
of how LLVM handles vectors of i1. Vectors of i1 are not
ConstantDataVectors like vectors of i8, i16, i32 or i64 because i1's
SizeInBits differs from it's StoreSizeInBytes. This leads to it being
categorized as a ConstantVector instead of a ConstantDataVector.
Instead, treat ConstantVector more uniformly.
This fixes PR27591.
llvm-svn: 268479
A loop pass that didn't preserve this entire set of passes wouldn't
play well with other loop passes, since these are generally a basic
requirement to do any interesting transformations to a loop.
Adds a helper to get the set of analyses a loop pass should preserve,
and checks that any loop pass we run satisfies the requirement.
llvm-svn: 268444
In the "LoopDispositions:" section:
- Instead of printing out a list, print out a "dictionary" to make it
obvious by inspection which disposition is for which loop. This is
just a cosmetic change.
- Print dispositions for parent _and_ sibling loops. I will use this
to write a test case.
llvm-svn: 268405
Summary
When a non-escaping pointer is compared to a global value, the
comparison can be folded even if the corresponding malloc/allocation
call cannot be elided.
We need to make sure the global value is not null, since comparisons to
null cannot be folded.
In future, we should also handle cases when the the comparison
instruction dominates the pointer escape.
Reviewers: sanjoy
Subscribers s.egerton, llvm-commits
Differential Revision: http://reviews.llvm.org/D19549
llvm-svn: 268390
We were overly cautious in our analysis of loops which have invokes
which unwind to EH pads. The loop unroll transform is safe because it
only clones blocks in the loop body, it does not try to split critical
edges involving EH pads. Instead, move the necessary safety check to
LoopUnswitch.
N.B. The safety check for loop unswitch is covered by an existing test
which fails without it.
llvm-svn: 268357
that it computes. Currently this is used for testing and precision
tuning, but it might be used by optimizations later.
Differential Revision: http://reviews.llvm.org/D19179
llvm-svn: 268291
As shown in the diff, we used to add to CFLAA's cache by doing
`Cache[Fn] = buildSetsFrom(Fn)`. `buildSetsFrom(Fn)` may cause `Cache`
to reallocate its underlying storage, if this happens and `Cache[Fn]`
was evaluated prior to `buildSetsFrom(Fn)`, then we'll store the result
to a bad address.
Patch by Jia Chen.
llvm-svn: 268269
There are currently some bugs in tree around SCEV caching an incorrect
loop disposition. Printing out loop dispositions will let us write
whitebox tests as those are fixed.
The dispositions are printed as a list in "inside out" order,
i.e. innermost loop first.
llvm-svn: 268177
matchSelectPattern attempts to see through casts which mask min/max
patterns from being more obvious. Under certain circumstances, it would
misidentify a sequence of instructions as a min/max because it assumed
that folding casts would preserve the result. This is not the case for
floating point <-> integer casts.
This fixes PR27575.
llvm-svn: 268086