This adds the following to the new PM based inliner in PGO mode:
* Use block frequency analysis to derive callsite's profile count and use
that to adjust thresholds of hot and cold callsites.
* Incrementally update the BFI of the caller after a callee gets inlined
into it. This incremental update is only within an invocation of the run
method - BFI is not preserved across calls to run.
Update the function entry count of the callee after inlining it into a
caller.
* I've tuned the thresholds for the hot and cold callsites using a hacked
up version of the old inliner that explicitly computes BFI on a set of
internal benchmarks and spec. Once the new PM based pipeline stabilizes
(IIRC Chandler mentioned there are known issues) I'll benchmark this
again and adjust the thresholds if required.
Inliner PGO support.
Differential revision: https://reviews.llvm.org/D28331
llvm-svn: 292666
Unfortunately, recognizing these in value tracking may cause us to hit
a hack in InstCombiner::visitICmpInst() more often:
http://lists.llvm.org/pipermail/llvm-dev/2017-January/109340.html
...but besides being the obviously Right Thing To Do, there's a clear
codegen win from identifying these patterns for several targets.
llvm-svn: 292655
To import a type identifier we read the summary and create external
references to the symbols defined when exporting.
Differential Revision: https://reviews.llvm.org/D28546
llvm-svn: 292654
Summary:
This rewrites store expression/leader handling. We no longer use the
value operand as the leader, instead, we store it separately. We also
now store the stored value as part of the expression, and compare it
when comparing stores for equality. This enables us to get rid of a
bunch of our previous hacks and machinations, as the existing
machinery takes care of everything *except* updating the stored value
on classes. The only time we have to update it is if the storecount
goes to 0, and when we do, we destroy it.
Since we no longer use the value operand as the leader, during elimination, we have to use the value operand. Doing this also fixes a bunch of store forwarding cases we were missing.
Any value operand we use is guaranteed to either be updated by previous eliminations, or minimized by future ones.
(IE the fact that we don't use the most dominating value operand when it's not a constant does not affect anything).
Sadly, this change also exposes that we didn't pay attention to the
output of the pr31594.ll test, as it also very clearly exposes the
same store leader bug we are fixing here.
(I added pr31682.ll anyway, but maybe we think that's too large to be useful)
On the plus side, propagate-ir-flags.ll now passes due to the
corrected store forwarding.
This change was 3 stage'd on darwin and linux, with the full test-suite.
Reviewers:
davide
Subscribers:
llvm-commits
llvm-svn: 292648
This is the third attemp to recommit r292526.
The original summary:
Currently, a GEP is considered free only if its indices are all constant.
TTI::getGEPCost() can give target-specific more accurate analysis. TTI is
already used for the cost of many other instructions.
llvm-svn: 292633
This is the second attemp to recommit r292526.
The original summary:
Currently, a GEP is considered free only if its indices are all constant.
TTI::getGEPCost() can give target-specific more accurate analysis. TTI is
already used for the cost of many other instructions.
llvm-svn: 292616
Simplify a packss/packus truncation based on the elements of the mask that are actually demanded.
Differential Revision: https://reviews.llvm.org/D28777
llvm-svn: 292591
Like several other loop passes (the vectorizer, etc) this pass doesn't
really fit the model of a loop pass. The critical distinction is that it
isn't intended to be pipelined together with other loop passes. I plan
to add some documentation to the loop pass manager to make this more
clear on that side.
LoopSink is also different because it doesn't really need a lot of the
infrastructure of our loop passes. For example, if there aren't loop
invariant instructions causing a preheader to exist, there is no need to
form a preheader. It also doesn't need LCSSA because this pass is
only involved in sinking invariant instructions from a preheader into
the loop, not reasoning about live-outs.
This allows some nice simplifications to the pass in the new PM where we
can directly walk the loops once without restructuring them.
Differential Revision: https://reviews.llvm.org/D28921
llvm-svn: 292589
Part of the assert has been left active for further debugging.
The other part has been turned into a stat for tracking for the
moment.
llvm-svn: 292583
This recommits r292526 which is reverted in r292529 after fixing the test case.
The original summary:
Currently, a GEP is considered free only if its indices are all constant.
TTI::getGEPCost() can give target-specific more accurate analysis. TTI is
already used for the cost of many other instructions.
llvm-svn: 292570
Summary:
Fence instructions are currently marked as `ModRef` for all memory locations.
We can improve this for constant memory locations (such as constant globals),
since fence instructions cannot modify these locations.
This helps us to forward constant loads across fences (added test case in GVN).
There were no changes in behaviour for similar test cases in early-cse and licm.
Reviewers: dberlin, sanjoy, reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28914
llvm-svn: 292546
This can prove that:
extern int f;
int g() {
int x = 0;
for (int i = 0; i < 365; ++i) {
x /= f;
}
return x;
}
always returns zero. Thanks to Sanjoy for confirming this
transformation actually made sense (bugs are mine).
llvm-svn: 292531
Currently, a GEP is considered free only if its indices are all constant.
TTI::getGEPCost() can give target-specific more accurate analysis. TTI is
already used for the cost of many other instructions.
Differential Revision: https://reviews.llvm.org/D28693
llvm-svn: 292526
Summary:
In case of non-alloca pointers, we check for whether it is a pointer
from malloc-like calls and it is not captured. In such case, we can
promote the pointer, as the caller will have no way to access this pointer
even if there is unwinding in middle of the loop.
Reviewers: hfinkel, sanjoy, reames, eli.friedman
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28834
llvm-svn: 292510
Type identifiers are exported by:
- Adding coarse-grained information about how to test the type
identifier to the summary.
- Creating symbols in the object file (aliases and absolute symbols)
containing fine-grained information about the type identifier.
Differential Revision: https://reviews.llvm.org/D28424
llvm-svn: 292462
This changes the vectorizer to explicitly use the loopsimplify and lcssa utils,
instead of "requiring" the transformations as if they were analyses.
This is not NFC, since it changes the LCSSA behavior - we no longer run LCSSA
for all loops, but rather only for the loops we expect to modify.
Differential Revision: https://reviews.llvm.org/D28868
llvm-svn: 292456
We currently check whether a reduction has a single outside user. We don't
really need to require that - we just need to make sure a single value is
used externally. The number of external users of that value shouldn't actually
matter.
Differential Revision: https://reviews.llvm.org/D28830
llvm-svn: 292424
claims to test.
LoopSimplify was unifying the multiple exits in this test case, making
it never even test the multiple exit handling of LoopDeletion. Doh.
Now it works (thanks to a great idea from mkuper) and will fail if we
ever change something to make it stop working.
llvm-svn: 292331
Summary: Partial unrolling should have separate threshold with full unrolling.
Reviewers: efriedma, mzolotukhin
Reviewed By: efriedma, mzolotukhin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28831
llvm-svn: 292293
Summary: Add a test case for LICM when promoting locals that may be read after the throw within the loop.
Reviewers: eli.friedman, hfinkel, sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28822
llvm-svn: 292261
If a memory instruction will be vectorized, but it's pointer operand is
non-consecutive-like, the instruction is a gather or scatter operation. Its
pointer operand will be non-uniform. This should fix PR31671.
Reference: https://llvm.org/bugs/show_bug.cgi?id=31671
Differential Revision: https://reviews.llvm.org/D28819
llvm-svn: 292254
runnig LCSSA over them prior to running the loop pipeline.
This also teaches the loop PM to verify that LCSSA form is preserved
throughout the pipeline's run across the loop nest.
Most of the test updates just leverage this new functionality. One has to be
relaxed with the new PM as IVUsers is less powerful when it sees LCSSA input.
Differential Revision: https://reviews.llvm.org/D28743
llvm-svn: 292241
Also, add the corresponding match to the AssumptionCache's 'Affected Values' list.
Differential Revision: https://reviews.llvm.org/D28485
llvm-svn: 292239
Add missing fabs(fpext) optimzation that worked with the call,
and also fixes it creating a second fpext when there were multiple
uses.
llvm-svn: 292172
Simplify a pshufb shuffle mask based on the elements of the mask that are actually demanded.
Differential Revision: https://reviews.llvm.org/D28745
llvm-svn: 292101
First, I've moved a test of IVUsers from the LSR tree to a dedicated
IVUsers test directory. I've also simplified its RUN line now that the
new pass manager's loop PM is providing analyses on their own.
No functionality changed, but it makes subsequent changes cleaner.
llvm-svn: 292060
cover domtree and alias analysis. These are the pretty clear analyses
that we would always want to survive this pass.
To make these survive, we also need to preserve the assumption cache.
Added a test that verifies the important bits of this preservation.
llvm-svn: 292037
Allows LLVM to optimize sequences like the following:
%add = add nuw i32 %x, 1
%cmp = icmp ugt i32 %add, %y
Into:
%cmp = icmp uge i32 %x, %y
Previously, only signed comparisons were being handled.
Decrements could also be handled, but 'sub nuw %x, 1' is currently canonicalized to
'add %x, -1' in InstCombineAddSub, losing the nuw flag. Removing that canonicalization
seems like it might have far-reaching ramifications so I kept this simple for now.
Patch by Matti Niemenmaa!
Differential Revision: https://reviews.llvm.org/D24700
llvm-svn: 291975
Summary:
This is a testcase where phi node cycling happens, and because we do
not order the leaders by domination or anything similar, the leader
keeps changing.
Using std::set for the members is too expensive, and we actually don't
need them sorted all the time, only at leader changes.
We could keep both a set and a vector, and keep them mostly sorted and
resort as necessary, or use a set and a fibheap, but all of this seems
premature.
After running some statistics, we are able to avoid the vast majority
of sorting by keeping a "next leader" field. Most congruence classes only have
leader changes once or twice during GVN.
Reviewers: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28594
llvm-svn: 291968
Summary:
Memory Dependence Analysis was limited to return only local dependencies
for invariant.group handling. Now it returns NonLocal when it finds it
and then by asking getNonLocalPointerDependency we get found dep.
Thanks to this we are able to devirtualize loops!
void indirect(A &a, int n) {
for (int i = 0 ; i < n; i++)
a.foo();
}
void test(int n) {
A a;
indirect(a);
}
After inlining a.foo() will be changed to direct call, even if foo and A::A()
is external (but only if vtable definition is be available).
Reviewers: nlewycky, dberlin, chandlerc, rsmith
Subscribers: mehdi_amini, davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D28137
llvm-svn: 291762
This test seems to have largely been relying on asserts being tripped.
It had a very specific and somewhat uninteresting grep of the output,
but it never really did anything to cause SCEV to be preserved across
loop simplify, certainly not explicitly. And a later addition to it
actually added CHECK lines despite the test never running FileCheck.
Now we actually print SCEV before and after loop simplify to make sure
it is *changing* and being *updated*. Which seems to be much more likely
the point of the test.
llvm-svn: 291740
This means that we can use a shorter instruction sequence in the case where
the size is a power of two and on the boundary between two representations.
Differential Revision: https://reviews.llvm.org/D28421
llvm-svn: 291706
classes, and updating checking to allow for equivalence through
reachability.
(Sadly, the checking here is not perfect, and can't be made perfect,
so we'll have to disable it after we are satisfied with correctness.
Right now it is just "very unlikely" to happen.)
llvm-svn: 291698
The removed assert seems bogus - it's perfectly legal for the roots of the
vectorized subtrees to be equal even if the original scalar values aren't,
if the original scalars happen to be equivalent.
This fixes PR31599.
Differential Revision: https://reviews.llvm.org/D28539
llvm-svn: 291692
Summary:
Revert LowerTypeTests: Split the pass in two: a resolution phase and a lowering phase.
This change separates how type identifiers are resolved from how intrinsic
calls are lowered. All information required to lower an intrinsic call
is stored in a new TypeIdLowering data structure. The idea is that this
data structure can either be initialized using the module itself during
regular LTO, or using the module summary in ThinLTO backends.
Original URL: https://reviews.llvm.org/D28341
Reviewers: pcc
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D28532
llvm-svn: 291684
updated instructions:
pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd.
special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq.
In case if the real operands bitwidth <= 16.
Differential Revision: https://reviews.llvm.org/D28104
llvm-svn: 291657
These are interesting again because the user may not be aware that this
is a common reason preventing LICM.
A const is removed from an instruction pointer declaration in order to
pass it to ORE.
Differential Revision: https://reviews.llvm.org/D27940
llvm-svn: 291649
This patch reverts r291588: [PGO] Turn off comdat renaming in IR PGO by default,
as we are seeing some hash mismatches in our internal tests.
llvm-svn: 291621
Bail out instead of asserting when we encounter this situation,
which can actually happen.
The reason the test uses the new PM is that the "bad" phi, incidentally, gets
cleaned up by LoopSimplify. But LICM can create this kind of phi and preserve
loop simplify form, so the cleanup has no chance to run.
This fixes PR31190.
We may want to solve this in a less conservative manner, since this phi is
actually uniform within the inner loop (or we may want LICM to output a cleaner
promotion to begin with).
Differential Revision: https://reviews.llvm.org/D28490
llvm-svn: 291589
Summary:
In IR PGO we append the function hash to comdat functions to avoid the
potential hash mismatch. This turns out not legal in some cases: if the comdat
function is address-taken and used in comparison. Renaming changes the semantic.
This patch turns off comdat renaming by default.
To alleviate the hash mismatch issue, we now rename the profile variable
for comdat functions. Profile allows co-existing multiple versions of profiles
with different hash value. The inlined copy will always has the correct profile
counter. The out-of-line copy might not have the correct count. But we will
not have the bogus mismatch warning.
Reviewers: davidxl
Subscribers: llvm-commits, xur
Differential Revision: https://reviews.llvm.org/D28416
llvm-svn: 291588
In some cases StructurizeCfg updates root node, but dominator info
remains unchanges, it causes crash when expensive checks are enabled.
To cope with this problem a new method was added to DominatorTreeBase
that allows adding new root nodes, it is called in StructurizeCfg to
put dominator tree in sync.
This change fixes PR27488.
Differential Revision: https://reviews.llvm.org/D28114
llvm-svn: 291530
This patch delays the fix-up step for external induction variable users until
after the dominator tree has been properly updated. This should fix PR30742.
The SCEVExpander in InductionDescriptor::transform can generate code in the
wrong location if the dominator tree is not up-to-date. We should work towards
keeping the dominator tree up-to-date throughout the transformation.
Reference: https://llvm.org/bugs/show_bug.cgi?id=30742
Differential Revision: https://reviews.llvm.org/D28168
llvm-svn: 291462
Summary:
By using stripPointerCasts we can get to the root
value and then walk down the bitcast graph
Reviewers: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28181
llvm-svn: 291405
fabs(x * x) is not generally safe to assume x is positive if x is a NaN.
This is also less general than it could be, so this will be replaced
with a transformation on the intrinsic.
llvm-svn: 291359
Summary: LLVM's non-standard notion of phi nodes means we can't both try to substitute for undef in phi nodes *and* use phi nodes as leaders all the time. This changes NewGVN to use the same semantics as SimplifyPHINode to decide which phi nodes are equivalent.
Reviewers: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28312
llvm-svn: 291308
This is fixing a bug where Loop Vectorization is widening a load but
with a lower alignment. Hoisting the load without propagating the alignment
will allow inst-combine to later deduce a higher alignment that what the pointer
actually is.
Differential Revision: https://reviews.llvm.org/D28408
llvm-svn: 291281
This change separates how type identifiers are resolved from how intrinsic
calls are lowered. All information required to lower an intrinsic call
is stored in a new TypeIdLowering data structure. The idea is that this
data structure can either be initialized using the module itself during
regular LTO, or using the module summary in ThinLTO backends.
Differential Revision: https://reviews.llvm.org/D28341
llvm-svn: 291205
Promotion is always legal when a store within the loop is guaranteed to execute.
However, this is not a necessary condition - for promotion to be memory model
semantics-preserving, it is enough to have a store that dominates every exit
block. This is because if the store dominates every exit block, the fact the
exit block was executed implies the original store was executed as well.
Differential Revision: https://reviews.llvm.org/D28147
llvm-svn: 291171
This code seems to be target dependent which may not be the same for all targets.
Passed the decision whether the given stride is complex or not to the target by sending stride information via SCEV to getAddressComputationCost instead of 'IsComplex'.
Specifically at X86 targets we dont see any significant address computation cost in case of the strided access in general.
Differential Revision: https://reviews.llvm.org/D27518
llvm-svn: 291106
Set up basic YAML I/O support for module summaries, plumb the summary into
the pass and add a few command line flags to test YAML I/O support. Bitcode
support to come separately, as will the code in LowerTypeTests that actually
uses the summary. Also add a couple of tests that pass by virtue of the pass
doing nothing with the summary (which happens to be the correct thing to do
for those tests).
Differential Revision: https://reviews.llvm.org/D28041
llvm-svn: 291069
performing partial redundancy elimination (PRE). Not doing so can cause jumpy line
tables and confusing (though correct) source attributions.
Differential Revision: https://reviews.llvm.org/D27857
llvm-svn: 291037
We can perform the following:
(add (zext (add nuw X, C1)), C2) -> (zext (add nuw X, C1+C2))
This is only possible if C2 is negative and C2 is greater than or equal to negative C1.
llvm-svn: 290927
Summary:
Regardless how the loop body weight is distributed, we should preserve
total loop body weight. i.e. we should have same weight reaching the body of the loop
or its duplicates in peeled and unpeeled case.
Reviewers: mkuper, davidxl, anemet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28179
llvm-svn: 290833
Summary:
gep 0, 0 is equivalent to bitcast. LLVM canonicalizes it
to getelementptr because it make SROA can then handle it.
Simple case like
void g(A &a) {
z(a);
if (glob)
a.foo();
}
void testG() {
A a;
g(a);
}
was not devirtualized with -fstrict-vtable-pointers because luck of
handling for gep 0 in Memory Dependence Analysis
Reviewers: dberlin, nlewycky, chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28126
llvm-svn: 290763
This is similar to the allocfn case - if an alloca is not captured, then it's
necessarily thread-local.
Differential Revision: https://reviews.llvm.org/D28170
llvm-svn: 290738
Summary:
The current loop complete unroll algorithm checks if unrolling complete will reduce the runtime by a certain percentage. If yes, it will apply a fixed boosting factor to the threshold (by discounting cost). The problem for this approach is that the threshold abruptly. This patch makes the boosting factor a function of runtime reduction percentage, capped by a fixed threshold. In this way, the threshold changes continuously.
The patch also simplified the code by reducing one parameter in UP.
The patch only affects code-gen of two speccpu2006 benchmark:
445.gobmk binary size decreases 0.08%, no performance change.
464.h264ref binary size increases 0.24%, no performance change.
Reviewers: mzolotukhin, chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26989
llvm-svn: 290737
The accidentally had trivially dead code. Also needed to adjust the rounding mode to not CUR_DIRECTION so the intrinsics don't get converted to native operations before going through SimplifyDemandedVectorElts.
llvm-svn: 290702
This is an orthogonal and separated layer instead of being embedded
inside the pass manager. While it adds a small amount of complexity, it
is fairly minimal and the composability and control seems worth the
cost.
The logic for this ends up being nicely isolated and targeted. It should
be easy to experiment with different iteration strategies wrapped around
the CGSCC bottom-up walk using this kind of facility.
The mechanism used to track devirtualization is the simplest one I came
up with. I think it handles most of the cases the existing iteration
machinery handles, but I haven't done a *very* in depth analysis. It
does however match the basic intended semantics, and we can tweak or
tune its exact behavior incrementally as necessary. One thing that we
may want to revisit is freshly building the value handle set on each
iteration. While I don't think this will be a significant cost (it is
strictly fewer value handles but more churn of value handes than the old
call graph), it is conceivable that we'll want a somewhat more clever
tracking mechanism. My hope is to layer that on as a follow up patch
with data supporting any implementation complexity it adds.
This code also provides for a basic count heuristic: if the number of
indirect calls decreases and the number of direct calls increases for
a given function in the SCC, we assume devirtualization is responsible.
This matches the heuristics currently used in the legacy pass manager.
Differential Revision: https://reviews.llvm.org/D23114
llvm-svn: 290665
analyses when we're about to break apart an SCC.
We can't wait until after breaking apart the SCC to invalidate things:
1) Which SCC do we then invalidate? All of them?
2) Even if we invalidate all of them, a newly created SCC may not have
a proxy that will convey the invalidation to functions!
Previously we only invalidated one of the SCCs and too late. This led to
stale analyses remaining in the cache. And because the caching strategy
actually works, they would get used and chaos would ensue.
Doing invalidation early is somewhat pessimizing though if we *know*
that the SCC structure won't change. So it turns out that the design to
make the mutation API force the caller to know the *kind* of mutation in
advance was indeed 100% correct and we didn't do enough of it. So this
change also splits two cases of switching a call edge to a ref edge into
two separate APIs so that callers can clearly test for this and take the
easy path without invalidating when appropriate. This is particularly
important in this case as we expect most inlines to be between functions
in separate SCCs and so the common case is that we don't have to so
aggressively invalidate analyses.
The LCG API change in turn needed some basic cleanups and better testing
in its unittest. No interesting functionality changed there other than
more coverage of the returned sequence of SCCs.
While this seems like an obvious improvement over the current state, I'd
like to revisit the core concept of invalidating within the CG-update
layer at all. I'm wondering if we would be better served forcing the
callers to handle the invalidation beforehand in the cases that they
can handle it. An interesting example is when we want to teach the
inliner to *update and preserve* analyses. But we can cross that bridge
when we get there.
With this patch, the new pass manager an build all of the LLVM test
suite at -O3 and everything passes. =D I haven't bootstrapped yet and
I'm sure there are still plenty of bugs, but this gives a nice baseline
so I'm going to increasingly focus on fleshing out the missing
functionality, especially the bits that are just turned off right now in
order to let us establish this baseline.
llvm-svn: 290664
when they are call edges at the leaf but may (transitively) be reached
via ref edges.
It turns out there is a simple rule: insert everything as a ref edge
which is a safe conservative default. Then we let the existing update
logic handle promoting some of those to call edges.
Note that it would be fairly cheap to make these call edges right away
if that is desirable by testing whether there is some existing call path
from the source to the target. It just seemed like slightly more
complexity in this code path that isn't strictly necessary. If anyone
feels strongly about handling this differently I'm happy to change it.
llvm-svn: 290649
This adds a combine that canonicalizes a chain of inserts which broadcasts
a value into a single insert + a splat shufflevector.
This fixes PR31286.
Differential Revision: https://reviews.llvm.org/D27992
llvm-svn: 290641
most of the inliner test cases.
The inliner involves a bunch of interesting code and tends to be where
most of the issues I've seen experimenting with the new PM lie. All of
these test cases pass, but I'd like to keep some more thorough coverage
here so doing a fairly blanket enabling.
There are a handful of interesting tests I've not enabled yet because
they're focused on the always inliner, or on functionality that doesn't
(yet) exist in the inliner.
llvm-svn: 290592
skipping indirectly recursive inline chains.
To do this, we implicitly build an inline stack for each callsite and
check prior to inlining that doing so would not form a cycle. This uses
the exact same technique and even shares some code with the legacy PM
inliner.
This solution remains deeply unsatisfying to me because it means we
cannot actually iterate the inliner externally. Doing so would not be
able to easily detect and avoid such cycles. Some day I would very much
like to have a solution that works without this internal state to detect
cycles, but this is not that day.
llvm-svn: 290590
Nothing really interesting here, but I had to improve the test to use
variables rather than hard coding value names as we happen to end up
with different value names in the new PM.
llvm-svn: 290589
We currently ignore the `allocsize` attribute on functions calls with
the `nobuiltin` attribute when trying to lower `@llvm.objectsize`. We
shouldn't care about `nobuiltin` here: `allocsize` is explicitly added
by the user, not inferred based on a function's symbol.
llvm-svn: 290588
PMULDQ/PMULUDQ vXi64 instructions only use the even numbered v2Xi32 input elements which SimplifyDemandedVectorElts should try and use.
This builds on r290554 which added supported for 128 and 256-bit.
llvm-svn: 290582
This mostly involved converting from grep to FileCheck and tidying up
the IR used.
In one case (invoke_test-3.ll) the test had become completely pointless
as we use 'resume' rather than 'unwind' now, and even then it did not
occur at the end of the line.
llvm-svn: 290570
An earlier commit added support for unmasked scalar operations. At that time isel wouldn't generate an optimal sequence for masked operations, but that has now been fixed.
llvm-svn: 290566
inside of `InlineFunction`. Prior to this, call instructions are
specifically being rewritten and replaced within the inlined region,
invalidating some of the call sites.
Several of these regions are using the same technique to walk the
inlined region so this seems clearly safe up to this point.
I've also added a short circuit to the scan for call sites based on what
other code is doing.
With this, the most common crash I've found in the new inliner code is
fixed. I've turned it on for another test case that covers this
scenario.
I'll make my way through most of the other inliner test cases
just to get some easy coverage next.
llvm-svn: 290562
removing fully-dead comdats without removing dead entries in comdats
with live members.
This factors the core logic out of the current inliner's internals to
a reusable utility and leverages that in both places. The factored out
code should also be (minorly) more efficient in cases where we have very
few dead functions or dead comdats to consider.
I've added a test case to cover this behavior of the always inliner.
This is the last significant bug in the new PM's always inliner I've
found (so far).
llvm-svn: 290557
PMULDQ/PMULUDQ vXi64 instructions only use the even numbered v2Xi32 input elements which SimplifyDemandedVectorElts should try and use.
Differential Revision: https://reviews.llvm.org/D28119
llvm-svn: 290554
The current GVN algorithm folds unconditional branches to, it claims,
expose more PRE oportunities. The folding, if really needed,
(which is not sure, as it's not really proved it improves analysis)
can be done by an earlier cleanup pass instead of GVN itself.
Ack'ed/SGTM'd by Daniel Berlin.
Differential Revision: https://reviews.llvm.org/D28117
llvm-svn: 290546
systematically and document in the test what all is going on.
This replaces the PR-named test that was the only coverage for GlobalDCE
and comdats previously. I wrote this because I wasn't certain how
comdat DCE was supposed to work and wanted to step through what
GlobalDCE did to fully understand it. After talking to folks and reading
the code and really staring at things it all makes sense but it seemed
good to help write down some of this in a more explicit and fully
covering test case.
For example, it seemed like a bug that GlobalDCE didn't consider comdat
participation of ifuncs. Specifically it seemed like an accident because
testing didn't really cover that case. But in fact, ifuncs specifically
cannot participate in a comdat despite having that API. The new test
case covers this and explicitly documents that DCE gets to fire here
even though there are comdats involved.
Also, we didn't have any positive tests for the challenging cases such
as usage cycles between comdat participants that might make them seem
alive except that there is no external edge into the cycle.
llvm-svn: 290537
Summary:
I only do this for unmasked cases for now because isel is failing to fold the mask. I'll try to fix that soon.
I'll do the same thing for packed add/sub/mul/div in a future patch.
Reviewers: delena, RKSimon, zvi, craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27879
llvm-svn: 290535
Summary:
This patch adds support for converting the masked vpermv intrinsics into shufflevector instructions if the indices are constants.
We also need to wrap a select instruction around the shuffle to take care of the masking part. InstCombine will take care of optimizing the select if the mask is constant so I didn't bother checking for that.
Reviewers: zvi, delena, spatel, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27825
llvm-svn: 290530
whether functions are removed, and fix the new PM's always inliner to
actually pass this test.
Without this, the new PM's always inliner leaves all the functions
kicking around which won't work out very well given the semantics of
always inline.
Doing this really highlights how frustrating the current alwaysinline
semantic contract is though -- why can we put it on *external*
functions, etc?
Also I've added a number of tricky and interesting test cases for
removing functions with the always inliner. There is one remaining case
not handled -- fully removing comdats -- and I've left a FIXME about
this.
llvm-svn: 290457
The pass creates some state which expects to be cleaned up by
a later instance of the same pass. opt-bisect happens to expose
this not ideal design because calling skipLoop() will result in
this state not being cleaned up at times and an assertion firing
in `doFinalization()`. Chandler tells me the new pass manager will
give us options to avoid these design traps, but until it's not ready,
we need a workaround for the current pass infrastructure. Fix provided
by Andy Kaylor, see the review for a complete discussion.
Differential Revision: https://reviews.llvm.org/D25848
llvm-svn: 290427
Use a dummy private function with inline asm calls instead of module
level asm blocks for CFI jumptables.
The main advantage is that now jumptable codegen can be affected by
the function attributes (like target_cpu on ARM). Module level asm
gets the default subtarget based on the target triple, which is often
not good enough.
This change also uses asm constraints/arguments to reference
jumptable targets and aliases directly. We no longer do asm name
mangling in an IR pass.
Differential Revision: https://reviews.llvm.org/D28012
llvm-svn: 290384
The code have been developed by Daniel Berlin over the years, and
the new implementation goal is that of addressing shortcomings of
the current GVN infrastructure, i.e. long compile time for large
testcases, lack of phi predication, no load/store value numbering
etc...
The current code just implements the "core" GVN algorithm, although
other pieces (load coercion, phi handling, predicate system) are
already implemented in a branch out of tree. Once the core is stable,
we'll start adding pieces on top of the base framework.
The test currently living in test/Transform/NewGVN are a copy
of the ones in GVN, with proper `XFAIL` (missing features in NewGVN).
A flag will be added in a future commit to enable NewGVN, so that
interested parties can exercise this code easily.
Differential Revision: https://reviews.llvm.org/D26224
llvm-svn: 290346
This patch renumbers the metadata nodes in debug info testcases after
https://reviews.llvm.org/D26769. This is a separate patch because it
causes so much churn. This was implemented with a python script that
pipes the testcases through llvm-as - | llvm-dis - and then goes
through the original and new output side-by side to insert all
comments at a close-enough location.
Differential Revision: https://reviews.llvm.org/D27765
llvm-svn: 290292
In r267672, where the loop distribution pragma was introduced, I tried
it hard to keep the old behavior for opt: when opt is invoked
with -loop-distribute, it should distribute the loop (it's off by
default when ran via the optimization pipeline).
As MichaelZ has discovered this has the unintended consequence of
breaking a very common developer work-flow to reproduce compilations
using opt: First you print the pass pipeline of clang
with -debug-pass=Arguments and then invoking opt with the returned
arguments.
clang -debug-pass will include -loop-distribute but the pass is invoked
with default=off so nothing happens unless the loop carries the pragma.
While through opt (default=on) we will try to distribute all loops.
This changes opt's default to off as well to match clang. The tests are
modified to explicitly enable the transformation.
llvm-svn: 290235
We're currently doing nearly the same thing for @llvm.objectsize in
three different places: two of them are missing checks for overflow,
and one of them could subtly break if InstCombine gets much smarter
about removing alloc sites. Seems like a good idea to not do that.
llvm-svn: 290214
This doesn't implement *every* feature of the existing inliner, but
tries to implement the most important ones for building a functional
optimization pipeline and beginning to sort out bugs, regressions, and
other problems.
Notable, but intentional omissions:
- No alloca merging support. Why? Because it isn't clear we want to do
this at all. Active discussion and investigation is going on to remove
it, so for simplicity I omitted it.
- No support for trying to iterate on "internally" devirtualized calls.
Why? Because it adds what I suspect is inappropriate coupling for
little or no benefit. We will have an outer iteration system that
tracks devirtualization including that from function passes and
iterates already. We should improve that rather than approximate it
here.
- Optimization remarks. Why? Purely to make the patch smaller, no other
reason at all.
The last one I'll probably work on almost immediately. But I wanted to
skip it in the initial patch to try to focus the change as much as
possible as there is already a lot of code moving around and both of
these *could* be skipped without really disrupting the core logic.
A summary of the different things happening here:
1) Adding the usual new PM class and rigging.
2) Fixing minor underlying assumptions in the inline cost analysis or
inline logic that don't generally hold in the new PM world.
3) Adding the core pass logic which is in essence a loop over the calls
in the nodes in the call graph. This is a bit duplicated from the old
inliner, but only a handful of lines could realistically be shared.
(I tried at first, and it really didn't help anything.) All told,
this is only about 100 lines of code, and most of that is the
mechanics of wiring up analyses from the new PM world.
4) Updating the LazyCallGraph (in the new PM) based on the *newly
inlined* calls and references. This is very minimal because we cannot
form cycles.
5) When inlining removes the last use of a function, eagerly nuking the
body of the function so that any "one use remaining" inline cost
heuristics are immediately refined, and queuing these functions to be
completely deleted once inlining is complete and the call graph
updated to reflect that they have become dead.
6) After all the inlining for a particular function, updating the
LazyCallGraph and the CGSCC pass manager to reflect the
function-local simplifications that are done immediately and
internally by the inline utilties. These are the exact same
fundamental set of CG updates done by arbitrary function passes.
7) Adding a bunch of test cases to specifically target CGSCC and other
subtle aspects in the new PM world.
Many thanks to the careful review from Easwaran and Sanjoy and others!
Differential Revision: https://reviews.llvm.org/D24226
llvm-svn: 290161
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.
Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:
(1) The DIGlobalVariable should describe the source level variable,
not how to get to its location.
(2) It makes it unsafe/hard to update the expressions when we call
replaceExpression on the DIGLobalVariable.
(3) It makes it impossible to represent a global variable that is in
more than one location (e.g., a variable with multiple
DW_OP_LLVM_fragment-s). We also moved away from attaching the
DIExpression to DILocalVariable for the same reasons.
This reapplies r289902 with additional testcase upgrades and a change
to the Bitcode record for DIGlobalVariable, that makes upgrading the
old format unambiguous also for variables without DIExpressions.
<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769
llvm-svn: 290153