We already had a method to iterate over all the incoming values of a PHI. This just changes all eligible code to use it.
Ineligible code included anything which cared about the index, or was also trying to get the i'th incoming BB.
llvm-svn: 237169
Summary:
There are several unhandled edge cases in BasicAA's GetLinearExpression
method. This changes fixes outstanding issues, including zext / sext of
a constant with the sign bit set, and the refusal to decompose zexts or
sexts of wrapping arithmetic.
Test Plan: Unit tests added in //q.ext.ll//.
Patch by Nick White.
Reviewers: hfinkel, sanjoy
Reviewed By: hfinkel, sanjoy
Subscribers: sanjoy, llvm-commits, hfinkel
Differential Revision: http://reviews.llvm.org/D6682
llvm-svn: 236894
Summary:
This addresses PR 22718. When branch weights are too large, they were
being clamped to the range [1, MaxWeightForBB]. But this clamping is
only applied to edges that go outside the range, so it distorts the
relative branch probabilities.
This patch changes the weight calculation to scale every branch so the
relative probabilities are preserved. The scaling is done differently
now. First, all the branch weights are added up, and if the sum exceeds
32 bits, it computes an integer scale to bring all the weights within
the range.
The patch fixes an existing test that had slightly wrong branch
probabilities due to the previous clamping. It now gets branch weights
scaled accordingly.
Reviewers: dexonsmith
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9442
llvm-svn: 236750
Created an abstraction for log2, llvm::Log2 in Support/MathExtras.h
Hid Android problems inside of it
Differential Revision: http://reviews.llvm.org/D9467
llvm-svn: 236680
Summary:
When computing branch weights in BPI, we used to disallow branches with
weight 0. This is a minor nuisance, because a branch with weight 0 is
different to "don't have information". In the context of
instrumentation, it may mean "never executed", in the context of
sampling, it means "never or seldom executed".
In allowing 0 weight branches, I ran into issues with the switch
expansion code in selection DAG. It is currently hardwired to not handle
branches with weight 0. To maintain the current behaviour, I changed it
to use 1 when it finds 0, but perhaps the algorithm needs changes to
tolerate branches with weight zero.
Reviewers: hansw
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9533
llvm-svn: 236617
The patch disabled unrolling in loop vectorization pass when VF==1 on x86 architecture,
by setting MaxInterleaveFactor to 1. Unrolling in loop vectorization pass may introduce
the cost of overflow check, memory boundary check and extra prologue/epilogue code when
regular unroller will unroll the loop another time. Disable it when VF==1 remove the
unnecessary cost on x86. The same can be done for other platforms after verifying
interleaving/memory bound checking to be not perf critical on those platforms.
Differential Revision: http://reviews.llvm.org/D9515
llvm-svn: 236613
It got this in some cases (if one of them was an identified object), but not in all cases.
This caused stores to undef to block load-forwarding in some cases, etc.
Added test to Transforms/GVN to verify optimization occurs as expected.
llvm-svn: 236511
32-bit x86 MSVC-style exceptions are functionaly similar to 64-bit, but
they take no arguments. Instead, they implicitly use the value of EBP
passed in by the caller as a pointer to the parent's frame. In LLVM, we
can represent this as llvm.frameaddress(1), and feed that into all of
our calls to llvm.framerecover.
The next steps are:
- Add an alloca to the fs:00 linked list of handlers
- Add something like llvm.sjlj.lsda or generalize it to store in the
alloca
- Move state number calculation to WinEHPrepare, arrange for
FunctionLoweringInfo to call it
- Use the state numbers to insert explicit loads and stores in the IR
llvm-svn: 236172
Finish off PR23080 by renaming the debug info IR constructs from `MD*`
to `DI*`. The last of the `DIDescriptor` classes were deleted in
r235356, and the last of the related typedefs removed in r235413, so
this has all baked for about a week.
Note: If you have out-of-tree code (like a frontend), I recommend that
you get everything compiling and tests passing with the *previous*
commit before updating to this one. It'll be easier to keep track of
what code is using the `DIDescriptor` hierarchy and what you've already
updated, and I think you're extremely unlikely to insert bugs. YMMV of
course.
Back to *this* commit: I did this using the rename-md-di-nodes.sh
upgrade script I've attached to PR23080 (both code and testcases) and
filtered through clang-format-diff.py. I edited the tests for
test/Assembler/invalid-generic-debug-node-*.ll by hand since the columns
were off-by-three. It should work on your out-of-tree testcases (and
code, if you've followed the advice in the previous paragraph).
Some of the tests are in badly named files now (e.g.,
test/Assembler/invalid-mdcompositetype-missing-tag.ll should be
'dicompositetype'); I'll come back and move the files in a follow-up
commit.
llvm-svn: 236120
Specifically, if a pointer accesses different underlying objects in each
iteration, don't look through the phi node defining the pointer.
The motivating case is the underlyling-objects-2.ll testcase. Consider
the loop nest:
int **A;
for (i)
for (j)
A[i][j] = A[i-1][j] * B[j]
This loop is transformed by Load-PRE to stash away A[i] for the next
iteration of the outer loop:
Curr = A[0]; // Prev_0
for (i: 1..N) {
Prev = Curr; // Prev = PHI (Prev_0, Curr)
Curr = A[i];
for (j: 0..N)
Curr[j] = Prev[j] * B[j]
}
Since A[i] and A[i-1] are likely to be independent pointers,
getUnderlyingObjects should not assume that Curr and Prev share the same
underlying object in the inner loop.
If it did we would try to dependence-analyze Curr and Prev and the
analysis of the corresponding SCEVs would fail with non-constant
distance.
To fix this, the getUnderlyingObjects API is extended with an optional
LoopInfo parameter. This is effectively what controls whether we want
the above behavior or the original. Currently, I only changed to use
this approach for LoopAccessAnalysis.
The other testcase is to guard the opposite case where we do want to
look through the loop PHI. If we step through an array by incrementing
a pointer, the underlying object is the incoming value of the phi as the
loop is entered.
Fixes rdar://problem/19566729
llvm-svn: 235634
Move isDereferenceablePointer function to Analysis. This function recursively tracks dereferencability over a chain of values like other functions in ValueTracking.
This refactoring is motivated by further changes to support dereferenceable_or_null attribute (http://reviews.llvm.org/D8650). isDereferenceablePointer will be extended to perform context-sensitive analysis and IR is not a good place to have such functionality.
Patch by: Artur Pilipenko <apilipenko@azulsystems.com>
Differential Revision: reviews.llvm.org/D9075
llvm-svn: 235611
An assert was triggered when attempting to create a new SCEV
with operands of different types in the visitAddRecExpr. In this
test case, the operand types of the numerator and denominator
are different. The SCEV division code should generate a
conservative answer when this happens.
Differential Revision: http://reviews.llvm.org/D9021
llvm-svn: 235511
Summary:
This lets us use range based for loops.
Reviewers: chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9169
llvm-svn: 235416
Summary:
MemorySSA uses this algorithm as well, and this enables us to reuse the code in both places.
There are no actual algorithm or datastructure changes in here, just code movement.
Reviewers: qcolombet, chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9118
llvm-svn: 235406
n/1 generates a quotient equal to n and a remainder of 0.
If this case is not recognized, then the SCEV divide() function
can return a remainder that is greater than or equal to the
denominator, which means the delinearized subscripts for the
test case will be incorrect.
Differential Revision: http://reviews.llvm.org/D9003
llvm-svn: 235311
Continuing PR23080, gut `DIType` and its various subclasses, leaving
behind thin wrappers around the pointer types in the new debug info
hierarchy.
llvm-svn: 235064
This commit makes LLVM not estimate branch probabilities when doing a
single bit bitmask tests.
The code that originally made me discover this is:
if ((a & 0x1) == 0x1) {
..
}
In this case we don't actually have any branch probability information
and should not assume to have any. LLVM transforms this into:
%and = and i32 %a, 1
%tobool = icmp eq i32 %and, 0
So, in this case, the result of a bitwise and is compared against 0,
but nevertheless, we should not assume to have probability
information.
CodeGen/ARM/2013-10-11-select-stalls.ll started failing because the
changed probabilities changed the results of
ARMBaseInstrInfo::isProfitableToIfCvt() and led to an Ifcvt of the
diamond in the test. AFAICT, the test was never meant to test this and
thus changing the test input slightly to not change the probabilities
seems like the best way to preserve the meaning of the test.
llvm-svn: 234979
Inlining such intrinsics is very difficult, since you need to
simultaneously transform many calls to llvm.framerecover and potentially
duplicate the functions containing them. Normally this intrinsic isn't
added until EH preparation, which is part of the backend pass pipeline
after inlining. However, if it were to get fed through the inliner,
this change will ensure that it doesn't break the code.
llvm-svn: 234937
With commit r219944, InstCombine can now turn a sqrtl into a llvm.fabs.f64.
The call graph edge originally representing the call to sqrtl becomes invalid.
This patch modifies CGPassManager::RefreshCallGraph() to remove the invalid
call graph edge, which can triggers an assert in
CallGraphNode::addCalledFunction().
Phabricator Review: http://reviews.llvm.org/D7705
Patch by Lawrence Hu <lawrence@codeaurora.org>.
llvm-svn: 234902
if ((a & 0x1) == 0x1) {
..
}
In this case we don't actually have any branch probability information and
should not assume to have any. LLVM transforms this into:
%and = and i32 %a, 1
%tobool = icmp eq i32 %and, 0
So, in this case, the result of a bitwise and is compared against 0,
but nevertheless, we should not assume to have probability
information.
llvm-svn: 234898
Gut the `DIDescriptor` wrappers around `MDLocalScope` subclasses. Note
that `DILexicalBlock` wraps `MDLexicalBlockBase`, not `MDLexicalBlock`.
llvm-svn: 234850
Summary:
Teach `isHighCostExpansion` to consider divisions by power-of-two
constants as cheap and add a test case. This change is needed for a new
user of `isHighCostExpansion` that will be added in a subsequent change.
Depends on D8995.
Reviewers: atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8993
llvm-svn: 234845
Summary:
Move isHighCostExpansion from IndVarSimplify to SCEVExpander. This
exposed function will be used in a subsequent change.
Reviewers: bogner, atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8995
llvm-svn: 234844
Gut all the non-pointer API from the variable wrappers, except an
implicit conversion from `DIGlobalVariable` to `DIDescriptor`. Note
that if you're updating out-of-tree code, `DIVariable` wraps
`MDLocalVariable` (`MDVariable` is a common base class shared with
`MDGlobalVariable`).
llvm-svn: 234840
Fix oversight in -analyze output. PtrRtCheck contains the pointers that
need to be checked against each other and not whether memchecks are
necessary.
For instance in the testcase PtrRtCheck has four elements but all
no-alias so no checking is necessary.
llvm-svn: 234833
The patch is generated using clang-tidy misc-use-override check.
This command was used:
tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py \
-checks='-*,misc-use-override' -header-filter='llvm|clang' \
-j=32 -fix -format
http://reviews.llvm.org/D8925
llvm-svn: 234679
CallSite roughly behaves as a common base CallInst and InvokeInst. Bring
the behavior closer to that model by making upcasts explicit. Downcasts
remain implicit and work as before.
Following dyn_cast as a mental model checking whether a Value *V isa
CallSite now looks like this:
if (auto CS = CallSite(V)) // think dyn_cast
instead of:
if (CallSite CS = V)
This is an extra token but I think it is slightly clearer. Making the
ctor explicit has the advantage of not accidentally creating nullptr
CallSites, e.g. when you pass a Value * to a function taking a CallSite
argument.
llvm-svn: 234601
Summary:
Some optimizations such as jump threading and loop unswitching can negatively
affect performance when applied to divergent branches. The divergence analysis
added in this patch conservatively estimates which branches in a GPU program
can diverge. This information can then help LLVM to run certain optimizations
selectively.
Test Plan: test/Analysis/DivergenceAnalysis/NVPTX/diverge.ll
Reviewers: resistor, hfinkel, eliben, meheff, jholewinski
Subscribers: broune, bjarke.roune, madhur13490, tstellarAMD, dberlin, echristo, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D8576
llvm-svn: 234567
(Re-apply r234361 with a fix and a testcase for PR23157)
Both run-time pointer checking and the dependence analysis are capable
of dealing with uniform addresses. I.e. it's really just an orthogonal
property of the loop that the analysis computes.
Run-time pointer checking will only try to reason about SCEVAddRec
pointers or else gives up. If the uniform pointer turns out the be a
SCEVAddRec in an outer loop, the run-time checks generated will be
correct (start and end bounds would be equal).
In case of the dependence analysis, we work again with SCEVs. When
compared against a loop-dependent address of the same underlying object,
the difference of the two SCEVs won't be constant. This will result in
returning an Unknown dependence for the pair.
When compared against another uniform access, the difference would be
constant and we should return the right type of dependence
(forward/backward/etc).
The changes also adds support to query this property of the loop and
modify the vectorizer to use this.
Patch by Ashutosh Nema!
llvm-svn: 234424
Both run-time pointer checking and the dependence analysis are capable
of dealing with uniform addresses. I.e. it's really just an orthogonal
property of the loop that the analysis computes.
Run-time pointer checking will only try to reason about SCEVAddRec
pointers or else gives up. If the uniform pointer turns out the be a
SCEVAddRec in an outer loop, the run-time checks generated will be
correct (start and end bounds would be equal).
In case of the dependence analysis, we work again with SCEVs. When
compared against a loop-dependent address of the same underlying object,
the difference of the two SCEVs won't be constant. This will result in
returning an Unknown dependence for the pair.
When compared against another uniform access, the difference would be
constant and we should return the right type of dependence
(forward/backward/etc).
The changes also adds support to query this property of the loop and
modify the vectorizer to use this.
Patch by Ashutosh Nema!
llvm-svn: 234361
The plan here is to push the API changes out from the common components
(like Constant::getGetElementPtr and IRBuilder::CreateGEP related
functions) and just update callers to either pass the type if it's
obvious, or pass null.
Do this with LoadInst as well and anything else that comes up, then to
start porting specific uses to not pass null anymore - this may require
some refactoring in each case.
llvm-svn: 234042
Require the pointee type to be passed explicitly and assert that it is
correct. For now it's possible to pass nullptr here (and I've done so in
a few places in this patch) but eventually that will be disallowed once
all clients have been updated or removed. It'll be a long road to get
all the way there... but if you have the cahnce to update your callers
to pass the type explicitly without depending on a pointer's element
type, that would be a good thing to do soon and a necessary thing to do
eventually.
llvm-svn: 233938
Summary:
This change teaches ScalarEvolution::isLoopBackedgeGuardedByCond to look
at edges within the loop body that dominate the latch. We don't do an
exhaustive search for all possible edges, but only a quick walk up the
dom tree.
This re-lands r233447. r233447 was reverted because it caused massive
compile-time regressions. This change has a fix for the same issue.
llvm-svn: 233829
Summary:
This is part 1 of fixes to address the problems described in
https://llvm.org/bugs/show_bug.cgi?id=22719.
The restriction to limit loop scales to 4,096 does not really prevent
overflows anymore, as the underlying algorithm has changed and does
not seem to suffer from this problem.
Additionally, artificially restricting loop scales to such a low number
skews frequency information, making loops of equal hotness appear to
have very different hotness properties.
The only loops that are artificially restricted to a scale of 4096 are
infinite loops (those loops with an exit mass of 0). This prevents
infinite loops from skewing the frequencies of other regions in the CFG.
At the end of propagation, frequencies are scaled to values that take no
more than 64 bits to represent. When the range of frequencies to be
represented fits within 61 bits, it pushes up the scaling factor to a
minimum of 8 to better distinguish small frequency values. Otherwise,
small frequency values are all saturated down at 1.
Tested on x86_64.
Reviewers: dexonsmith
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8718
llvm-svn: 233826
Generate tables in the .xdata section representing what actions to take
when an exception is thrown. This currently fills in state for
cleanups, catch handlers are still unfinished.
llvm-svn: 233636
This pushes the use of PointerType::getElementType up into several
callers - I'll essentially just have to keep pushing that up the stack
until I can eliminate every call to it...
llvm-svn: 233604
Summary:
This change teaches ScalarEvolution::isLoopBackedgeGuardedByCond to look
at edges within the loop body that dominate the latch. We don't do an
exhaustive search for all possible edges, but only a quick walk up the
dom tree.
Reviewers: atrick, hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8627
llvm-svn: 233447
Summary:
With the introduction of MarkPendingLoopPredicates in r157092, I don't
think the bailout is needed anymore.
Reviewers: atrick, nicholas
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8624
llvm-svn: 233296
Summary:
`ComputeNumSignBits` returns incorrect results for `srem` instructions.
This change fixes the issue and adds a test case.
Reviewers: nadav, nicholas, atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8600
llvm-svn: 233225
A load from an invariant location is assumed to not alias any otherwise potentially aliasing stores. Our implementation only applied this rule to store instructions themselves whereas they it should apply for any memory accessing instruction. This results in both FRE and PRE becoming more effective at eliminating invariant loads.
Note that as a follow on change I will likely move this into AliasAnalysis itself. That's where the TBAA constant flag is handled and the semantics are essentially the same. I'd like to separate the semantic change from the refactoring and thus have extended the hack that's already in MemoryDependenceAnalysis for this change.
Differential Revision: http://reviews.llvm.org/D8591
llvm-svn: 233140
The changes to InstCombine (& SCEV) do seem a bit silly - it doesn't make
anything obviously better to have the caller access the pointers element
type (the thing I'm trying to remove) than the GEP itself, but it's a
helpful migration step. This will allow me to more obviously lock down
GEP (& Load, etc) API usage, then fix all the code that accesses pointer
element types except the places that need to be removed (most of the
InstCombines) anyway - at which point I'll need to just remove all that
code because it won't be meaningful anymore (there will be no pointer
types, so no bitcasts to combine)
SCEV looks like it'll need some restructuring - we'll have to do a bit
more work for GEP canonicalization, since it'll depend on how it's used
if we can even manage to canonicalize it to a non-ugly GEP. I guess we
can do some fun stuff like voting (do 2 out of 3 load from the GEP with
a certain type that gives a pretty GEP? Does every typed use of the GEP
use either a specific type or a generic type (i8*, etc)?)
llvm-svn: 233131
Simplify boolean expressions using `true` and `false` with `clang-tidy`
Patch by Richard Thomson.
Reviewed By: nlewycky
Differential Revision: http://reviews.llvm.org/D8528
llvm-svn: 233091
Currently this is only used to tweak the backend's memcpy inlining
heuristics, testing that isn't very helpful. A real test case will
follow in the next commit, where this behavior would cause a real
miscompilation.
llvm-svn: 232895
r216771 introduced a change to MemoryDependenceAnalysis that allowed it
to reason about acquire/release operations. However, this change does
not ensure that the acquire/release operations pair. Unfortunately,
this leads to miscompiles as we won't see an acquire load as properly
memory effecting. This largely reverts r216771.
This fixes PR22708.
llvm-svn: 232889
When estimating SROA savings, we want to see if an address is derived
off an alloca in the caller. For store instructions, operand 1 is the
address operand, but the current code uses operand 0. Use
getPointerOperand for loads and stores to fix this.
Patch by Easwaran Raman.
http://reviews.llvm.org/D8425
llvm-svn: 232827
Summary:
This change teaches isImpliedCond to infer things like "X sgt 0" => "X -
1 sgt -1". The `ConstantRange` class has the logic to do the heavy
lifting, this change simply gets ScalarEvolution to exploit that when
reasonable.
Depends on D8345
Reviewers: atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8346
llvm-svn: 232576
Summary:
This change splits `makeICmpRegion` into `makeAllowedICmpRegion` and
`makeSatisfyingICmpRegion` with slightly different contracts. The first
one is useful for determining what values some expression //may// take,
given that a certain `icmp` evaluates to true. The second one is useful
for determining what values are guaranteed to //satisfy// a given
`icmp`.
Reviewers: nlewycky
Reviewed By: nlewycky
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8345
llvm-svn: 232575
Also, add several entries to vectorizable functions table, and
corresponding tests. The table isn't complete, it'll be populated later.
Review: http://reviews.llvm.org/D8131
llvm-svn: 232531
Given that the stated purpose of `CheckFailed()` is to provide a nice
spot for a breakpoint, it'd be nice not to have to use a regex to break
on it. Recover the ability to simply use `b CheckFailed` by
specializing the message-only version, and by changing the variadic
version to call into the message-only version.
llvm-svn: 232268
Summary:
ScalarEvolutionExpander assumes that the header block of a loop is a
legal place to have a use for a phi node. This is true only for phis
that are either in the header or dominate the header block, but it is
not true for phi nodes that are strictly internal to the loop body.
This change teaches ScalarEvolutionExpander to place uses of PHI nodes
in the basic block the PHI nodes belong to. This is always legal, and
`hoistIVInc` ensures that the said position dominates `IsomorphicInc`.
Reviewers: atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8311
llvm-svn: 232189
There's a missed optimization opportunity where we could look at the full chain of computation and take the intersection of the flags instead of only looking one instruction deep.
llvm-svn: 232134
Instead, run both EH preparation passes, and have them both ignore
functions with unrecognized EH personalities. Pass delegation involved
some hacky code for creating an AnalysisResolver that we don't need now.
llvm-svn: 231995
This patch adds limited support in ValueTracking for inferring known bits of a value from conditional expressions which must be true to reach the instruction we're trying to optimize. At this time, the feature is off by default. Once landed, I'm hoping for feedback from others on both profitability and compile time impact.
Forms of conditional value propagation have been tried in LLVM before and have failed due to compile time problems. In an attempt to side step that, this patch only considers conditions where the edge leaving the branch dominates the context instruction. It does not attempt full dataflow. Even with that restriction, it handles many interesting cases:
* Early exits from functions
* Early exits from loops (for context instructions in the loop and after the check)
* Conditions which control entry into loops, including multi-version loops (such as those produced during vectorization, IRCE, loop unswitch, etc..)
Possible applications include optimizing using information provided by constructs such as: preconditions, assumptions, null checks, & range checks.
This patch implements two approaches to the problem that need further benchmarking. Approach 1 is to directly walk the dominator tree looking for interesting conditions. Approach 2 is to inspect other uses of the value being queried for interesting comparisons. From initial benchmarking, it appears that Approach 2 is faster than Approach 1, but this needs to be further validated.
Differential Revision: http://reviews.llvm.org/D7708
llvm-svn: 231879
The debug message was pretty confusing here. It only reported the
situation with memchecks without the result of the dependence analysis.
Now it prints whether the loop is safe from the POV of the dependence
analysis and if yes, whether we need memchecks.
llvm-svn: 231854
This is the final patch that actually introduces the new parameter of
partition mapping to RuntimePointerCheck::needsChecking.
Another API (LAI::getInstructionsForAccess) is also exposed that helps
to map pointers to instructions because ultimately we partition
instructions.
The WIP version of the Loop Distribution pass in D6930 has been adapted
to use all this. See for example, how
InstrPartitionContainer::computePartitionSetForPointers sets up the
partitions using the above API and then calls to LAI::addRuntimeCheck
with the pointer partitions.
llvm-svn: 231818
Now the analysis won't "fail" if the memchecks exceed the threshold. It
is the transform pass' responsibility to perform the check.
This allows the transform pass to further analyze/eliminate the
memchecks. E.g. in Loop distribution we only need to check pointers
that end up in different partitions.
Note that there is a slight change of functionality here. The logic in
analyzeLoop is that if dependence checking fails due to non-constant
distance between the pointers, another attempt is made to prove safety
of the dependences purely using run-time checks.
Before this patch we could fail the loop due to exceeding the memcheck
threshold after the first step, now we only check the threshold in the
client after the full analysis. There is no measurable compile-time
effect but I wanted to record this here.
llvm-svn: 231817
The check for the number of memchecks will be moved to the client of
this analysis. Besides allowing for transform-specific thresholds, this
also lets Loop Distribution post-process the memchecks; Loop
Distribution only needs memchecks between pointers of different
partitions.
The motivation for this first patch is to untangle the CanDoRT check
from the NumComparison check before moving the NumComparison part.
CanDoRT means that we couldn't determine the bounds for the pointer.
Note that NumComparison is set independent of this flag.
llvm-svn: 231816
The dependences are now expose through the new getInterestingDependences
API so we can use that with -analyze too and fix the FIXME.
This lets us remove the test that relied on -debug to check the
dependences.
llvm-svn: 231807
Gather an array of interesting dependences rather than just failing
after the first unsafe one and regarding the loop unsafe. Loop
Distribution needs to be able to collect all dependences in order to
isolate the dependence cycles into their own partition.
Since the dependence checking algorithm is quadratic in terms of
accesses sharing the same underlying pointer, I am applying a cut-off
threshold (MaxInterestingDependence). Exceeding that, the logic reverts
back to the original approach deeming the loop unsafe upon encountering
the first unsafe dependence.
The main idea of the patch is to split isDepedent from directly
answering the question whether the dep is safe for vectorization to
return a dependence type which then gets mapped to old boolean result
using Dependence::isSafeForVectorization.
Tested that this was compile-time neutral on SpecINT2006 LTO bitcode
inputs. No assembly change on the testsuite including external.
llvm-svn: 231806
LoopDistribution needs to query various results of the dependence
analysis. This series will expose some more APIs and state of the
dependence checker.
This patch is a simple one to just expose the DepChecker instance. The
set is compile-time neutral measured with LTO bitcode files of
SpecINT2006. Also there is no assembly change on the testsuite.
llvm-svn: 231805
This crash occurs due to memory corruption when trying to update dependency
direction based on Constraints.
This crash was observed during lnt regression of Polybench benchmark test case dynprog.
Review: http://reviews.llvm.org/D8059
llvm-svn: 231788
This crash in Dependency analysis is because we assume here that in case of UsefulGEP
both source and destination have the same number of operands which may not be true.
This incorrect assumption results in crash while populating Pairs. Fix the same.
This crash was observed during lnt regression for code such as-
struct s{
int A[10][10];
int C[10][10][10];
} S;
void dep_constraint_crash_test(int k,int N) {
for( int i=0;i<N;i++)
for( int j=0;j<N;j++)
S.A[0][0] = S.C[0][0][k];
}
Review: http://reviews.llvm.org/D8162
llvm-svn: 231784
CFLAA didn't know how to properly handle ConstantExprs; it would silently
ignore them. This was a problem if the ConstantExpr is, say, a GEP of a global,
because CFLAA wouldn't realize that there's a global there. :)
llvm-svn: 231743
We now treat pointers given to ptrtoint and pointers retrieved from
inttoptr as similar to arguments or globals (can alias anything, etc.)
This solves some of the problems we were having with giving incorrect
results.
llvm-svn: 231741
Summary:
Now that the DataLayout is a mandatory part of the module, let's start
cleaning the codebase. This patch is a first attempt at doing that.
This patch is not exactly NFC as for instance some places were passing
a nullptr instead of the DataLayout, possibly just because there was a
default value on the DataLayout argument to many functions in the API.
Even though it is not purely NFC, there is no change in the
validation.
I turned as many pointer to DataLayout to references, this helped
figuring out all the places where a nullptr could come up.
I had initially a local version of this patch broken into over 30
independant, commits but some later commit were cleaning the API and
touching part of the code modified in the previous commits, so it
seemed cleaner without the intermediate state.
Test Plan:
Reviewers: echristo
Subscribers: llvm-commits
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231740
Summary:
This removes some duplicated code, and also helps optimization: e.g. in
the test case added, `%idx ULT 128` in `@x` is not currently optimized
to `true` by `-indvars` but will be, after this change.
The only functional change in ths commit is that for add recurrences,
ScalarEvolution::getRange will be more aggressive -- computing the
unsigned (resp. signed) range for a SCEVAddRecExpr will now look at the
NSW (resp. NUW) bits and check for signed (resp. unsigned) overflow.
This can be a strict improvement in some cases (such as the attached
test case), and should be no worse in other cases.
Reviewers: atrick, nlewycky
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8142
llvm-svn: 231709
Summary:
Unused in this commit, but will be used in a subsequent change (D8142)
by a FileCheck test.
Reviewers: atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8143
llvm-svn: 231708
Summary:
See the two test cases.
; Can fold fcmp with undef on one side by choosing NaN for the undef
; Can fold fcmp with undef on both side
; fcmp u_pred undef, undef -> true
; fcmp o_pred undef, undef -> false
; because whatever you choose for the first undef
; you can choose NaN for the other undef
Reviewers: hfinkel, chandlerc, majnemer
Reviewed By: majnemer
Subscribers: majnemer, llvm-commits
Differential Revision: http://reviews.llvm.org/D7617
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231626
-debug-pass is not specified, as the string is only used when dumping pass
information. There is a big cost of determining the name in
ReginBase<Tr>:getNameStr() if the region's entry or exit block doesn't have a
name. This is the case for the Release build, as names are not preserved by the
front-end.
RegionPass is mainly used by Polly, resulting in long compile time for one file
of a customer application with the Release build (1m24s) vs Release+Asserts
build (10s) when Polly is used. With this change, the compile time with the
Release build went down to 8s.
Patch by Sanjin Sijaric <ssijaric@codeaurora.org>!
Phabricator: http://reviews.llvm.org/D8076
llvm-svn: 231485
Summary:
Teach SCEV to prove no overflow for an add recurrence by proving
something about the range of another add recurrence a loop-invariant
distance away from it.
Reviewers: atrick, hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7980
llvm-svn: 231305
Summary:
DataLayout keeps the string used for its creation.
As a side effect it is no longer needed in the Module.
This is "almost" NFC, the string is no longer
canonicalized, you can't rely on two "equals" DataLayout
having the same string returned by getStringRepresentation().
Get rid of DataLayoutPass: the DataLayout is in the Module
The DataLayout is "per-module", let's enforce this by not
duplicating it more than necessary.
One more step toward non-optionality of the DataLayout in the
module.
Make DataLayout Non-Optional in the Module
Module->getDataLayout() will never returns nullptr anymore.
Reviewers: echristo
Subscribers: resistor, llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D7992
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231270
Summary:
This makes it more obvious that the enum definition and the
"StandardName" array is in sync. Mechanically refactored w/ a
python script.
Test Plan: still compiles
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7845
llvm-svn: 231172
Summary:
This does not conceptually belongs here. Instead provide a shortcut
getModule() that provides access to the DataLayout.
Reviewers: chandlerc, echristo
Reviewed By: echristo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8027
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231147
Accidentally committed a few more of these cleanup changes than
intended. Still breaking these out & tidying them up.
This reverts commit r231135.
llvm-svn: 231136
There doesn't seem to be any need to assert that iterator assignment is
between iterators over the same node - if you want to reuse an iterator
variable to iterate another node, that's perfectly acceptable. Just
don't mix comparisons between iterators into disjoint sequences, as
usual.
llvm-svn: 231135
Move the specialized metadata nodes for the new debug info hierarchy
into place, finishing off PR22464. I've done bootstraps (and all that)
and I'm confident this commit is NFC as far as DWARF output is
concerned. Let me know if I'm wrong :).
The code changes are fairly mechanical:
- Bumped the "Debug Info Version".
- `DIBuilder` now creates the appropriate subclass of `MDNode`.
- Subclasses of DIDescriptor now expect to hold their "MD"
counterparts (e.g., `DIBasicType` expects `MDBasicType`).
- Deleted a ton of dead code in `AsmWriter.cpp` and `DebugInfo.cpp`
for printing comments.
- Big update to LangRef to describe the nodes in the new hierarchy.
Feel free to make it better.
Testcase changes are enormous. There's an accompanying clang commit on
its way.
If you have out-of-tree debug info testcases, I just broke your build.
- `upgrade-specialized-nodes.sh` is attached to PR22564. I used it to
update all the IR testcases.
- Unfortunately I failed to find way to script the updates to CHECK
lines, so I updated all of these by hand. This was fairly painful,
since the old CHECKs are difficult to reason about. That's one of
the benefits of the new hierarchy.
This work isn't quite finished, BTW. The `DIDescriptor` subclasses are
almost empty wrappers, but not quite: they still have loose casting
checks (see the `RETURN_FROM_RAW()` macro). Once they're completely
gutted, I'll rename the "MD" classes to "DI" and kill the wrappers. I
also expect to make a few schema changes now that it's easier to reason
about everything.
llvm-svn: 231082
This work is currently being rethought along different lines and
if this work is needed it can be resurrected out of svn. Remove it
for now as no current work in ongoing on it and it's unused. Verified
with the authors before removal.
llvm-svn: 230780
It is not sound to mark the increment operation as `nuw` or `nsw`
based on a proof off of the add recurrence if the increment operation
we emit happens to be a `sub` instruction.
I could not come up with a test case for this -- the cases where
SCEVExpander decides to emit a `sub` instruction is quite small, and I
cannot think of a way I'd be able to get SCEV to prove that the
increment does not overflow in those cases.
Differential Revision: http://reviews.llvm.org/D7899
llvm-svn: 230673
accesses are via different types
Noticed this while generalizing the code for loop distribution.
I confirmed with Arnold that this was indeed a bug and managed to create
a testcase.
llvm-svn: 230647
Also remove the somewhat misleading initializers from
VectorizationFactor and VectorizationInterleave. They will get
initialized with the default ctor since no cl::init is provided.
llvm-svn: 230608
(The change was landed in r230280 and caused the regression PR22674.
This version contains a fix and a test-case for PR22674).
When emitting the increment operation, SCEVExpander marks the
operation as nuw or nsw based on the flags on the preincrement SCEV.
This is incorrect because, for instance, it is possible that {-6,+,1}
is <nuw> while {-6,+,1}+1 = {-5,+,1} is not.
This change teaches SCEV to mark the increment as nuw/nsw only if it
can explicitly prove that the increment operation won't overflow.
Apart from the attached test case, another (more realistic)
manifestation of the bug can be seen in
Transforms/IndVarSimplify/pr20680.ll.
Differential Revision: http://reviews.llvm.org/D7778
llvm-svn: 230533
With a diabolically crafted test case, we could recurse
through this code and return true instead of false.
The larger engineering crime is the use of magic numbers.
Added FIXME comments for those.
llvm-svn: 230515
The bug was a result of getPreStartForExtend interpreting nsw/nuw
flags on an add recurrence more strongly than is legal. {S,+,X}<nsw>
implies S+X is nsw only if the backedge of the loop is taken at least
once.
NOTE: I had accidentally committed an unrelated change with the commit
message of this change in r230275 (r230275 was reverted in r230279).
This is the correct change for this commit message.
Differential Revision: http://reviews.llvm.org/D7808
llvm-svn: 230291
When emitting the increment operation, SCEVExpander marks the
operation as nuw or nsw based on the flags on the preincrement SCEV.
This is incorrect because, for instance, it is possible that {-6,+,1}
is <nuw> while {-6,+,1}+1 = {-5,+,1} is not.
This change teaches SCEV to mark the increment as nuw/nsw only if it
can explicitly prove that the increment operation won't overflow.
Apart from the attached test case, another (more realistic) manifestation
of the bug can be seen in Transforms/IndVarSimplify/pr20680.ll.
NOTE: this change was landed with an incorrect commit message in
rL230275 and was reverted for that reason in rL230279. This commit
message is the correct one.
Differential Revision: http://reviews.llvm.org/D7778
llvm-svn: 230280
230275 got committed with an incorrect commit message due to a mixup
on my side. Will re-land in a few moments with the correct commit
message.
llvm-svn: 230279
The bug was a result of getPreStartForExtend interpreting nsw/nuw
flags on an add recurrence more strongly than is legal. {S,+,X}<nsw>
implies S+X is nsw only if the backedge of the loop is taken at least
once.
Differential Revision: http://reviews.llvm.org/D7808
llvm-svn: 230275
This patch adds the isProfitableToHoist API. For AArch64, we want to prevent a
fmul from being hoisted in cases where it is more profitable to form a
fmsub/fmadd.
Phabricator Review: http://reviews.llvm.org/D7299
Patch by Lawrence Hu <lawrence@codeaurora.org>
llvm-svn: 230241
The LoopInfo in combination with depth_first is used to enumerate the
loops.
Right now -analyze is not yet complete. It only prints the result of
the analysis, the report and the run-time checks. Printing the unsafe
depedences will require a bit more reshuffling which I'd like to do in a
follow-on to this patchset. Unsafe dependences are currently checked
via -debug-only=loop-accesses in the new test.
This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.
llvm-svn: 229898
The only difference between these two is that VectorizerReport adds a
vectorizer-specific prefix to its messages. When LAA is used in the
vectorizer context the prefix is added when we promote the
LoopAccessReport into a VectorizerReport via one of the constructors.
This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.
llvm-svn: 229897
When I split out LoopAccessReport from this, I need to create some temps
so constness becomes necessary.
This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.
llvm-svn: 229896
This allows the analysis to be attempted with any loop. This feature
will be used with -analysis. (LV only requests the analysis on loops
that have already satisfied these tests.)
This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.
llvm-svn: 229895
Also add pass name as an argument to VectorizationReport::emitAnalysis.
This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.
llvm-svn: 229894
This is a function pass that runs the analysis on demand. The analysis
can be initiated by querying the loop access info via LAA::getInfo. It
either returns the cached info or runs the analysis.
Symbolic stride information continues to reside outside of this analysis
pass. We may move it inside later but it's not a priority for me right
now. The idea is that Loop Distribution won't support run-time stride
checking at least initially.
This means that when querying the analysis, symbolic stride information
can be provided optionally. Whether stride information is used can
invalidate the cache entry and rerun the analysis. Note that if the
loop does not have any symbolic stride, the entry should be preserved
across Loop Distribution and LV.
Since currently the only user of the pass is LV, I just check that the
symbolic stride information didn't change when using a cached result.
On the LV side, LoopVectorizationLegality requests the info object
corresponding to the loop from the analysis pass. A large chunk of the
diff is due to LAI becoming a pointer from a reference.
A test will be added as part of the -analyze patch.
Also tested that with AVX, we generate identical assembly output for the
testsuite (including the external testsuite) before and after.
This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.
llvm-svn: 229893
LAA will be an on-demand analysis pass, so we need to cache the result
of the analysis. canVectorizeMemory is renamed to analyzeLoop which
computes the result. canVectorizeMemory becomes the query function for
the cached result.
This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.
llvm-svn: 229892
The transformation passes will query this and then emit them as part of
their own report. The currently only user LV is modified to do just
that.
This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.
llvm-svn: 229891
As LAA is becoming a pass, we can no longer pass the params to its
constructor. This changes the command line flags to have external
storage. These can now be accessed both from LV and LAA.
VectorizerParams is moved out of LoopAccessInfo in order to shorten the
code to access it.
This commits also has the fix (D7731) to the break dependence cycle
between the analysis and vector libraries.
This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.
llvm-svn: 229890
This reverts commit r229651.
I'd like to ultimately revert r229650 but this reformat stands in the
way. I'll reformat the affected files once the the loop-access pass is
fully committed.
llvm-svn: 229889
r229622: "[LoopAccesses] Make VectorizerParams global"
r229623: "[LoopAccesses] Stash the report from the analysis rather than emitting it"
r229624: "[LoopAccesses] Cache the result of canVectorizeMemory"
r229626: "[LoopAccesses] Create the analysis pass"
r229628: "[LoopAccesses] Change debug messages from LV to LAA"
r229630: "[LoopAccesses] Add canAnalyzeLoop"
r229631: "[LoopAccesses] Add missing const to APIs in VectorizationReport"
r229632: "[LoopAccesses] Split out LoopAccessReport from VectorizerReport"
r229633: "[LoopAccesses] Add -analyze support"
r229634: "[LoopAccesses] Change LAA:getInfo to return a constant reference"
r229638: "Analysis: fix buildbots"
llvm-svn: 229650
This should fix the compilation failure on the MSVC buildbots which find a
std::make_unique and llvm::make_unique via ADL, resulting in ambiguity.
llvm-svn: 229638
The LoopInfo in combination with depth_first is used to enumerate the
loops.
Right now -analyze is not yet complete. It only prints the result of
the analysis, the report and the run-time checks. Printing the unsafe
depedences will require a bit more reshuffling which I'd like to do in a
follow-on to this patchset. Unsafe dependences are currently checked
via -debug-only=loop-accesses in the new test.
This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.
llvm-svn: 229633
The only difference between these two is that VectorizerReport adds a
vectorizer-specific prefix to its messages. When LAA is used in the
vectorizer context the prefix is added when we promote the
LoopAccessReport into a VectorizerReport via one of the constructors.
This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.
llvm-svn: 229632
When I split out LoopAccessReport from this, I need to create some temps
so constness becomes necessary.
This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.
llvm-svn: 229631
This allows the analysis to be attempted with any loop. This feature
will be used with -analysis. (LV only requests the analysis on loops
that have already satisfied these tests.)
This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.
llvm-svn: 229630
Will be used by the new RuntimePointerCheck::print.
This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.
llvm-svn: 229629
Also add pass name as an argument to VectorizationReport::emitAnalysis.
This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.
llvm-svn: 229628
This is a function pass that runs the analysis on demand. The analysis
can be initiated by querying the loop access info via LAA::getInfo. It
either returns the cached info or runs the analysis.
Symbolic stride information continues to reside outside of this analysis
pass. We may move it inside later but it's not a priority for me right
now. The idea is that Loop Distribution won't support run-time stride
checking at least initially.
This means that when querying the analysis, symbolic stride information
can be provided optionally. Whether stride information is used can
invalidate the cache entry and rerun the analysis. Note that if the
loop does not have any symbolic stride, the entry should be preserved
across Loop Distribution and LV.
Since currently the only user of the pass is LV, I just check that the
symbolic stride information didn't change when using a cached result.
On the LV side, LoopVectorizationLegality requests the info object
corresponding to the loop from the analysis pass. A large chunk of the
diff is due to LAI becoming a pointer from a reference.
A test will be added as part of the -analyze patch.
Also tested that with AVX, we generate identical assembly output for the
testsuite (including the external testsuite) before and after.
This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.
llvm-svn: 229626
blockNeedsPredication is in LoopAccess in order to share it with the
vectorizer. It's a utility needed by LoopAccess not strictly provided
by it but it's a good place to share it. This makes the function static
so that it no longer required to create an LoopAccessInfo instance in
order to access it from LV.
This was actually causing problems because it would have required
creating LAI much earlier that LV::canVectorizeMemory().
This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.
llvm-svn: 229625
LAA will be an on-demand analysis pass, so we need to cache the result
of the analysis. canVectorizeMemory is renamed to analyzeLoop which
computes the result. canVectorizeMemory becomes the query function for
the cached result.
This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.
llvm-svn: 229624
The transformation passes will query this and then emit them as part of
their own report. The currently only user LV is modified to do just
that.
This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.
llvm-svn: 229623
As LAA is becoming a pass, we can no longer pass the params to its
constructor. This changes the command line flags to have external
storage. These can now be accessed both from LV and LAA.
VectorizerParams is moved out of LoopAccessInfo in order to shorten the
code to access it.
This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.
llvm-svn: 229622
LoopAccessAnalysis will be used as the name of the pass.
This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.
llvm-svn: 229621
extensions.
This change also removes `DEBUG(dbgs() << "SCEV: untested prestart
overflow check\n");` because that case has a unit test now.
Differential Revision: http://reviews.llvm.org/D7645
llvm-svn: 229600
I could not come up with a test case for this one; but I don't think
`getPreStartForSignExtend` can assume `AR` is `nsw` -- there is one
place in scalar evolution that calls `getSignExtendAddRecStart(AR,
...)` without proving that `AR` is `nsw`
(line 1564)
OperandExtendedAdd =
getAddExpr(WideStart,
getMulExpr(WideMaxBECount,
getZeroExtendExpr(Step, WideTy)));
if (SAdd == OperandExtendedAdd) {
// If AR wraps around then
//
// abs(Step) * MaxBECount > unsigned-max(AR->getType())
// => SAdd != OperandExtendedAdd
//
// Thus (AR is not NW => SAdd != OperandExtendedAdd) <=>
// (SAdd == OperandExtendedAdd => AR is NW)
const_cast<SCEVAddRecExpr *>(AR)->setNoWrapFlags(SCEV::FlagNW);
// Return the expression with the addrec on the outside.
return getAddRecExpr(getSignExtendAddRecStart(AR, Ty, this),
getZeroExtendExpr(Step, Ty),
L, AR->getNoWrapFlags());
}
Differential Revision: http://reviews.llvm.org/D7640
llvm-svn: 229594
This change is a logical suspect in 22587 and 22590. Given it's of minimal importanance and I can't get clang to build on my home machine, I'm reverting so that I can deal with this next week.
llvm-svn: 229322
Canonicalize access to function attributes to use the simpler API.
getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind)
=> getFnAttribute(Kind)
getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind)
=> hasFnAttribute(Kind)
llvm-svn: 229192
Two minor tweaks I noticed when reading through the code:
- No need to recompute begin() on every iteration. We're not modifying the instructions in this loop.
- We can ignore PHINodes and Dbg intrinsics. The current code does this anyways, but it will spend slightly more time doing so and will count towards the limit of instructions in the block. It seems really silly to give up due the presence of PHIs...
Differential Revision: http://reviews.llvm.org/D7624
llvm-svn: 229175
LLVM's include tree and the use of using declarations to hide the
'legacy' namespace for the old pass manager.
This undoes the primary modules-hostile change I made to keep
out-of-tree targets building. I sent an email inquiring about whether
this would be reasonable to do at this phase and people seemed fine with
it, so making it a reality. This should allow us to start bootstrapping
with modules to a certain extent along with making it easier to mix and
match headers in general.
The updates to any code for users of LLVM are very mechanical. Switch
from including "llvm/PassManager.h" to "llvm/IR/LegacyPassManager.h".
Qualify the types which now produce compile errors with "legacy::". The
most common ones are "PassManager", "PassManagerBase", and
"FunctionPassManager".
llvm-svn: 229094
Summary:
Instances of the AssumptionCache are per function, so we can't re-use
the same AssumptionCache instance when recursing in the CallAnalyzer to
analyze a different function. Instead we have to pass the
AssumptionCacheTracker to the CallAnalyzer so it can get the right
AssumptionCache on demand.
Reviewers: hfinkel
Subscribers: llvm-commits, hans
Differential Revision: http://reviews.llvm.org/D7533
llvm-svn: 228957
We would crash if we couldn't locate a Function that either Location's
Value belonged to. Now we just print out a debug message and return
conservatively.
llvm-svn: 228901
This allows IDEs to recognize the entire set of header files for
each of the core LLVM projects.
Differential Revision: http://reviews.llvm.org/D7526
Reviewed By: Chris Bieneman
llvm-svn: 228798
If the landingpad of the invoke is using a personality function that
catches asynch exceptions, then it can catch a trap.
Also add some landingpads to invalid LLVM IR test cases that lack them.
Over-the-shoulder reviewed by David Majnemer.
llvm-svn: 228782
Without a valid data layout, deferenceable(N) doesn't get parsed or
propagated. Since this is the key item we are testing, add a dependency
on the pass.
Differential Revision: http://reviews.llvm.org/D7508
llvm-svn: 228611
Make use of the newly introduced inst_range to clean up two loops. Clean
up a third one while at it.
Differential Revision: http://reviews.llvm.org/D7455
llvm-svn: 228596
When creating a scev for sext({X,+,Y}), scev checks if the expression
is equivalent to {sext X,+,zext Y}. If it can prove that, it also
tags the original {X,+,Y} as <nsw>, which is not correct.
In the test case I run `-scalar-evolution` twice because the bug
manifests only once SCEV has run through and seen the `sext`
expressions (and then does a in-place mutation on {X,+,Y}).
Differential Revision: http://reviews.llvm.org/D7495
llvm-svn: 228586
For the attached test case different types are used in the ICmpInst
and SelectInst that represent the min/max expressions. However, if the
ICmpInst type is smaller a comparison with the sign/zero extended
operands would have yielded the same result. This situation might
arise after the instruction combination pass was applied.
Differential Revision: http://reviews.llvm.org/D7338
llvm-svn: 228572
add recurrences don't overflow.
This change makes the optimization more restrictive. It still assumes
that an overflowing `add nsw` is undefined behavior; and this change
will need revisiting once we have a consistent semantics for poison
values.
Differential Revision: http://reviews.llvm.org/D7331
llvm-svn: 228552
Summary:
The alias.scope metadata represents sets of things an instruction might
alias with. When generically combining the metadata from two
instructions the result must be the union of the original sets, because
the new instruction might alias with anything any of the original
instructions aliased with.
Reviewers: hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7490
llvm-svn: 228525
different fields.
We can show that two GEPs off of the same (possibly multidimensional)
array of structs, into different fields, can't alias. Quoting:
For two GEPOperators GEP1 and GEP2, if we find that:
- both GEPs begin indexing from the exact same pointer;
- the last indices in both GEPs are constants, indexing into a struct;
- said indices are different, hence,the pointed-to fields are different;
- and both GEPs only index through arrays prior to that;
this lets us determine that the struct that GEP1 indexes into and the
struct that GEP2 indexes into must either precisely overlap or be
completely disjoint. Because they cannot partially overlap, indexing
into different non-overlapping fields of the struct will never alias.
The other BasicAA::aliasGEP rules worked in some cases, but not all
(for example, the i32x3 struct in the testcase).
We can add this simple ad-hoc rule to complement them.
rdar://19717375
Differential Revision: http://reviews.llvm.org/D7453
llvm-svn: 228498
It is a variation of SimplifyBinOp, but it takes into account
FastMathFlags.
It is needed in inliner and loop-unroller to accurately predict the
transformation's outcome (previously we dropped the flags and were too
conservative in some cases).
Example:
float foo(float *a, float b) {
float r;
if (a[1] * b)
r = /* a lot of expensive computations */;
else
r = 1;
return r;
}
float boo(float *a) {
return foo(a, 0.0);
}
Without this patch, we don't inline 'foo' into 'boo'.
llvm-svn: 228432
This will allow it to be shared with the new Loop Distribution pass.
getFirstInst is currently duplicated across LoopVectorize.cpp and
LoopAccessAnalysis.cpp. This is a short-term work-around until we figure out
a better solution.
NFC. (The code moved is adjusted a bit for the name of the Loop member and
that PtrRtCheck is now a reference rather than a pointer.)
llvm-svn: 228418
Since testing the function indirectly is tricky, introduce a direct
print-memderefs pass, in the same spirit as print-memdeps, which prints
dereferenceability information matched by FileCheck.
Differential Revision: http://reviews.llvm.org/D7075
llvm-svn: 228369
Summary: When evaluating floating point instructions in the inliner, ask the TTI whether it is an expensive operation. By default, it's not an expensive operation. This keeps the default behavior the same as before. The ARM TTI has been updated to return back TCC_Expensive for targets which don't have hardware floating point.
Reviewers: chandlerc, echristo
Reviewed By: echristo
Subscribers: t.p.northover, aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D6936
llvm-svn: 228263
Other than moving code and adding the boilerplate for the new files, the code
being moved is unchanged.
There are a few global functions that are shared with the rest of the
LoopVectorizer. I moved these to the new module as well (emitLoopAnalysis,
stripIntegerCast, replaceSymbolicStrideSCEV) along with the Report class used
by emitLoopAnalysis. There is probably room for further improvement in this
area.
I kept DEBUG_TYPE "loop-vectorize" because it's used as the PassName with
emitOptimizationRemarkAnalysis. This will obviously have to change.
NFC. This is part of the patchset that splits out the memory dependence logic
from LoopVectorizationLegality into a new class LoopAccessAnalysis.
LoopAccessAnalysis will be used by the new Loop Distribution pass.
llvm-svn: 227756
terms of the new pass manager's TargetIRAnalysis.
Yep, this is one of the nicer bits of the new pass manager's design.
Passes can in many cases operate in a vacuum and so we can just nest
things when convenient. This is particularly convenient here as I can
now consolidate all of the TargetMachine logic on this analysis.
The most important change here is that this pushes the function we need
TTI for all the way into the TargetMachine, and re-creates the TTI
object for each function rather than re-using it for each function.
We're now prepared to teach the targets to produce function-specific TTI
objects with specific subtargets cached, etc.
One piece of feedback I'd love here is whether its worth renaming any of
this stuff. None of the names really seem that awesome to me at this
point, but TargetTransformInfoWrapperPass is particularly ... odd.
TargetIRAnalysisWrapper might make more sense. I would want to do that
rename separately anyways, but let me know what you think.
llvm-svn: 227731
getTTI method used to get an actual TTI object.
No functionality changed. This just threads the argument and ensures
code like the inliner can correctly look up the callee's TTI rather than
using a fixed one.
The next change will use this to implement per-function subtarget usage
by TTI. The changes after that should eliminate the need for FTTI as that
will have become the default.
llvm-svn: 227730
produce it.
This adds a function to the TargetMachine that produces this analysis
via a callback for each function. This in turn faves the way to produce
a *different* TTI per-function with the correct subtarget cached.
I've also done the necessary wiring in the opt tool to thread the target
machine down and make it available to the pass registry so that we can
construct this analysis from a target machine when available.
llvm-svn: 227721
base which it adds a single analysis pass to, to instead return the type
erased TargetTransformInfo object constructed for that TargetMachine.
This removes all of the pass variants for TTI. There is now a single TTI
*pass* in the Analysis layer. All of the Analysis <-> Target
communication is through the TTI's type erased interface itself. While
the diff is large here, it is nothing more that code motion to make
types available in a header file for use in a different source file
within each target.
I've tried to keep all the doxygen comments and file boilerplate in line
with this move, but let me know if I missed anything.
With this in place, the next step to making TTI work with the new pass
manager is to introduce a really simple new-style analysis that produces
a TTI object via a callback into this routine on the target machine.
Once we have that, we'll have the building blocks necessary to accept
a function argument as well.
llvm-svn: 227685
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
This patch folds fcmp in some cases of interest in Julia. The patch adds a function CannotBeOrderedLessThanZero that returns true if a value is provably not less than zero. I.e. the function returns true if the value is provably -0, +0, positive, or a NaN. The patch extends InstructionSimplify.cpp to fold instances of fcmp where:
- the predicate is olt or uge
- the first operand is provably not less than zero
- the second operand is zero
The motivation for handling these cases optimizing away domain checks for sqrt in Julia for common idioms such as sqrt(x*x+y*y)..
http://reviews.llvm.org/D6972
llvm-svn: 227298
Summary:
Also add enum types for __C_specific_handler and _CxxFrameHandler3 for
which we know a few things.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7214
llvm-svn: 227284
According to my reading of the LangRef, volatiles are only ordered with respect to other volatiles. It is entirely legal and profitable to forward unrelated loads over the volatile load. This patch implements this for GVN by refining the transition rules MemoryDependenceAnalysis uses when encountering a volatile.
The added test cases show where the extra flexibility is profitable for local dependence optimizations. I have a related change (227110) which will extend this to non-local dependence (i.e. PRE), but that's essentially orthogonal to the semantic change in this patch. I have tested the two together and can confirm that PRE works over a volatile load with both changes. I will be submitting a PRE w/volatiles test case seperately in the near future.
Differential Revision: http://reviews.llvm.org/D6901
llvm-svn: 227112
This change is mostly motivated by exposing information about the original query instruction to the actual scanning work in getPointerDependencyFrom when used by GVN PRE. In a follow up change, I will use this to be more precise with regards to the semantics of volatile instructions encountered in the scan of a basic block.
Worth noting, is that this change (despite appearing quite simple) is not semantically preserving. By providing more information to the helper routine, we allow some optimizations to kick in that weren't previously able to (when called from this code path.) In particular, we see that treatment of !invariant.load becomes more precise. In theory, we might see a difference with an ordered/atomic instruction as well, but I'm having a hard time actually finding a test case which shows that.
Test wise, I've included new tests for !invariant.load which illustrate this difference. I've also included some updated TBAA tests which highlight that this change isn't needed for that optimization to kick in - it's handled inside alias analysis itself.
Eventually, it would be nice to factor the !invariant.load handling inside alias analysis as well.
Differential Revision: http://reviews.llvm.org/D6895
llvm-svn: 227110
manager to support the actual uses of it. =]
When I ported instcombine to the new pass manager I discover that it
didn't work because TLI wasn't available in the right places. This is
a somewhat surprising and/or subtle aspect of the new pass manager
design that came up before but I think is useful to be reminded of:
While the new pass manager *allows* a function pass to query a module
analysis, it requires that the module analysis is already run and cached
prior to the function pass manager starting up, possibly with
a 'require<foo>' style utility in the pass pipeline. This is an
intentional hurdle because using a module analysis from a function pass
*requires* that the module analysis is run prior to entering the
function pass manager. Otherwise the other functions in the module could
be in who-knows-what state, etc.
A somewhat surprising consequence of this design decision (at least to
me) is that you have to design a function pass that leverages
a module analysis to do so as an optional feature. Even if that means
your function pass does no work in the absence of the module analysis,
you have to handle that possibility and remain conservatively correct.
This is a natural consequence of things being able to invalidate the
module analysis and us being unable to re-run it. And it's a generally
good thing because it lets us reorder passes arbitrarily without
breaking correctness, etc.
This ends up causing problems in one case. What if we have a module
analysis that is *definitionally* impossible to invalidate. In the
places this might come up, the analysis is usually also definitionally
trivial to run even while other transformation passes run on the module,
regardless of the state of anything. And so, it follows that it is
natural to have a hard requirement on such analyses from a function
pass.
It turns out, that TargetLibraryInfo is just such an analysis, and
InstCombine has a hard requirement on it.
The approach I've taken here is to produce an analysis that models this
flexibility by making it both a module and a function analysis. This
exposes the fact that it is in fact safe to compute at any point. We can
even make it a valid CGSCC analysis at some point if that is useful.
However, we don't want to have a copy of the actual target library info
state for each function! This state is specific to the triple. The
somewhat direct and blunt approach here is to turn TLI into a pimpl,
with the state and mutators in the implementation class and the query
routines primarily in the wrapper. Then the analysis can lazily
construct and cache the implementations, keyed on the triple, and
on-demand produce wrappers of them for each function.
One minor annoyance is that we will end up with a wrapper for each
function in the module. While this is a bit wasteful (one pointer per
function) it seems tolerable. And it has the advantage of ensuring that
we pay the absolute minimum synchronization cost to access this
information should we end up with a nice parallel function pass manager
in the future. We could look into trying to mark when analysis results
are especially cheap to recompute and more eagerly GC-ing the cached
results, or we could look at supporting a variant of analyses whose
results are specifically *not* cached and expected to just be used and
discarded by the consumer. Either way, these seem like incremental
enhancements that should happen when we start profiling the memory and
CPU usage of the new pass manager and not before.
The other minor annoyance is that if we end up using the TLI in both
a module pass and a function pass, those will be produced by two
separate analyses, and thus will point to separate copies of the
implementation state. While a minor issue, I dislike this and would like
to find a way to cleanly allow a single analysis instance to be used
across multiple IR unit managers. But I don't have a good solution to
this today, and I don't want to hold up all of the work waiting to come
up with one. This too seems like a reasonable thing to incrementally
improve later.
llvm-svn: 226981
I had already factored this analysis specifically to enable doing this,
but hadn't actually committed the necessary wiring to get at this from
the new pass manager. This also nicely shows how the separate cache
object can be directly managed by the new pass manager.
This analysis didn't have any direct tests and so I've added a printer
pass and a boring test case. I chose to print the i1 value which is
being assumed rather than the call to llvm.assume as that seems much
more useful for testing... but suggestions on an even better printing
strategy welcome. My main goal was to make sure things actually work. =]
llvm-svn: 226868
Specifically, gc.result benefits from this greatly. Instead of:
gc.result.int.*
gc.result.float.*
gc.result.ptr.*
...
We now have a gc.result.* that can specialize to literally any type.
Differential Revision: http://reviews.llvm.org/D7020
llvm-svn: 226857
ScalarEvolution currently lowers a subtraction recurrence to an add
recurrence with the same no-wrap flags as the subtraction. This is
incorrect because `sub nsw X, Y` is not the same as `add nsw X, -Y`
and `sub nuw X, Y` is not the same as `add nuw X, -Y`. This patch
fixes the issue, and adds two test cases demonstrating the bug.
Differential Revision: http://reviews.llvm.org/D7081
llvm-svn: 226755
pass and a LoopPrinterPass with the expected associated wiring.
I've added a RUN line to the only test case (!!!) we have that actually
prints loops. Everything seems to be working.
This is somewhat exciting as this is the first analysis using another
analysis to go in for the new pass manager. =D I also believe it is the
last analysis necessary for porting instcombine, but of course I may yet
discover more.
llvm-svn: 226560
cleaner to derive from the generic base.
Thise removes a ton of boiler plate code and somewhat strange and
pointless indirections. It also remove a bunch of the previously needed
friend declarations. To fully remove these, I also lifted the verify
logic into the generic LoopInfoBase, which seems good anyways -- it is
generic and useful logic even for the machine side.
llvm-svn: 226385
unused variables in a no-asserts build.
I've fixed this by putting the entire loop behind an #ifndef as it
contains nothing other than asserts.
llvm-svn: 226377
a LoopInfoWrapperPass to wire the object up to the legacy pass manager.
This switches all the clients of LoopInfo over and paves the way to port
LoopInfo to the new pass manager. No functionality change is intended
with this iteration.
llvm-svn: 226373
TargetLibraryAnalysis pass.
There are actually no direct tests of this already in the tree. I've
added the most basic test that the pass manager bits themselves work,
and the TLI object produced will be tested by an upcoming patches as
they port passes which rely on TLI.
This is starting to point out the awkwardness of the invalidate API --
it seems poorly fitting on the *result* object. I suspect I will change
it to live on the analysis instead, but that's not for this change, and
I'd rather have a few more passes ported in order to have more
experience with how this plays out.
I believe there is only one more analysis required in order to start
porting instcombine. =]
llvm-svn: 226160
The pass is really just a means of accessing a cached instance of the
TargetLibraryInfo object, and this way we can re-use that object for the
new pass manager as its result.
Lots of delta, but nothing interesting happening here. This is the
common pattern that is developing to allow analyses to live in both the
old and new pass manager -- a wrapper pass in the old pass manager
emulates the separation intrinsic to the new pass manager between the
result and pass for analyses.
llvm-svn: 226157
While the term "Target" is in the name, it doesn't really have to do
with the LLVM Target library -- this isn't an abstraction which LLVM
targets generally need to implement or extend. It has much more to do
with modeling the various runtime libraries on different OSes and with
different runtime environments. The "target" in this sense is the more
general sense of a target of cross compilation.
This is in preparation for porting this analysis to the new pass
manager.
No functionality changed, and updates inbound for Clang and Polly.
llvm-svn: 226078
it's defined in the current module. Clang generates this situation for the
C++14 sized deallocation functions, because it generates a weak definition in
case one isn't provided by the C++ runtime library.
llvm-svn: 226069
utils/sort_includes.py.
I clearly haven't done this in a while, so more changed than usual. This
even uncovered a missing include from the InstrProf library that I've
added. No functionality changed here, just mechanical cleanup of the
include order.
llvm-svn: 225974
a print method.
This was formulated on a bad idea, but sadly I didn't uncover how bad
this was until I got further down the path. I had hoped that we could
provide a low boilerplate way of printing analyses, but it just doesn't
seem like this really fits the needs of the analyses. Not all analyses
really want to do printing, and those that do don't all use the same
interface. Instead, with the new pass manager let's just take advantage
of the fact that creating an explicit printer pass like the LCG has is
pretty low boilerplate already and rely on that for testing.
llvm-svn: 225861
I'm adding generic analysis printing utility pass support which will
require such a method (or a specialization) so this will let the
existing printing logic satisfy that.
llvm-svn: 225854
Even before I sunk the debug flag into the opt tool this had been made
obsolete by factoring the pass and analysis managers into a single set
of templates that all used the core flag. No functionality changed here.
llvm-svn: 225842
the generic functionality of the pass managers themselves.
In the new infrastructure, the pass "manager" isn't actually interesting
at all. It just pipelines a single chunk of IR through N passes. We
don't need to know anything about the IR or the passes to do this really
and we can replace the 3 implementations of the exact same functionality
with a single generic PassManager template, complementing the single
generic AnalysisManager template.
I've left typedefs in place to give convenient names to the various
obvious instantiations of the template.
With this, I think I've nuked almost all of the redundant logic in the
managers, and I think the overall design is actually simpler for having
single templates that clearly indicate there is no special logic here.
The logging is made somewhat more annoying by this change, but I don't
think the difference is worth having heavy-weight traits to help log
things.
llvm-svn: 225783
The functions {pred,succ,use,user}_{begin,end} exist, but many users
have to check *_begin() with *_end() by hand to determine if the
BasicBlock or User is empty. Fix this with a standard *_empty(),
demonstrating a few usecases.
llvm-svn: 225760
template.
This consolidates three copies of nearly the same core logic. It adds
"complexity" to the ModuleAnalysisManager in that it makes it possible
to share a ModuleAnalysisManager across multiple modules... But it does
so by deleting *all of the code*, so I'm OK with that. This will
naturally make fixing bugs in this code much simpler, etc.
The only down side here is that we have to use 'typename' and 'this->'
in various places, and the implementation is lifted into the header.
I'll take that for the code size reduction.
The convenient names are still typedef-ed and used throughout so that
users can largely ignore this aspect of the implementation.
The follow-up change to this will do the exact same refactoring for the
PassManagers. =D
It turns out that the interesting different code is almost entirely in
the adaptors. At the end, that should be essentially all that is left.
llvm-svn: 225757
so has clang-format. Notably, this fixes a bunch of formatting in the
CGSCC pass manager side of things that has been improved in clang-format
recently.
llvm-svn: 225743