Also, add a scalar test to demonstrate one of the intermediate folds that
is necessary to accomplish the existing, multi-step test. And simplify
the vector tests to only check the final piece of that multi-step transform.
llvm-svn: 278995
minimal and boring form than the old pass manager's version.
This pass does the very minimal amount of work necessary to inline
functions declared as always-inline. It doesn't support a wide array of
things that the legacy pass manager did support, but is alse ... about
20 lines of code. So it has that going for it. Notably things this
doesn't support:
- Array alloca merging
- To support the above, bottom-up inlining with careful history
tracking and call graph updates
- DCE of the functions that become dead after this inlining.
- Inlining through call instructions with the always_inline attribute.
Instead, it focuses on inlining functions with that attribute.
The first I've omitted because I'm hoping to just turn it off for the
primary pass manager. If that doesn't pan out, I can add it here but it
will be reasonably expensive to do so.
The second should really be handled by running global-dce after the
inliner. I don't want to re-implement the non-trivial logic necessary to
do comdat-correct DCE of functions. This means the -O0 pipeline will
have to be at least 'always-inline,global-dce', but that seems
reasonable to me. If others are seriously worried about this I'd like to
hear about it and understand why. Again, this is all solveable by
factoring that logic into a utility and calling it here, but I'd like to
wait to do that until there is a clear reason why the existing
pass-based factoring won't work.
The final point is a serious one. I can fairly easily add support for
this, but it seems both costly and a confusing construct for the use
case of the always inliner running at -O0. This attribute can of course
still impact the normal inliner easily (although I find that
a questionable re-use of the same attribute). I've started a discussion
to sort out what semantics we want here and based on that can figure out
if it makes sense ta have this complexity at O0 or not.
One other advantage of this design is that it should be quite a bit
faster due to checking for whether the function is a viable candidate
for inlining exactly once per function instead of doing it for each call
site.
Anyways, hopefully a reasonable starting point for this pass.
Differential Revision: https://reviews.llvm.org/D23299
llvm-svn: 278896
It is pretty easy to get it down to O(nlogn + mlogm). This
implementation has the added benefit of automatically deduplicating
entries between the two sets.
llvm-svn: 278837
I have audited all the callers of concatenate and none require duplicate
entries to service concatenation.
These duplicates serve no purpose but to needlessly embiggen the IR.
N.B. Layering getMostGenericAliasScope on top of concatenate makes it
O(nlogn + mlogm) instead of O(n*m).
llvm-svn: 278836
Summary:
This patch adds simple coroutine splitting logic to CoroSplit pass.
Documentation and overview is here: http://llvm.org/docs/Coroutines.html.
Upstreaming sequence (rough plan)
1.Add documentation. (https://reviews.llvm.org/D22603)
2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659)
...
7. Split coroutine into subfunctions <= we are here
8. Coroutine Frame Building algorithm
9. Handle coroutine with unwinds
10+. The rest of the logic
Reviewers: majnemer
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23461
llvm-svn: 278830
This reverts commit r278660.
It causes downstream assertion failure in InstCombine on shuffle
instructions. Comes up in __mm_swizzle_epi32.
llvm-svn: 278672
The new version has several advantages:
1) IMSHO it's more readable and neater
2) It handles loads and stores properly
3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.
With this change we can now finally sink load-modify-store idioms such as:
if (a)
return *b += 3;
else
return *b += 4;
=>
%z = load i32, i32* %y
%.sink = select i1 %a, i32 5, i32 7
%b = add i32 %z, %.sink
store i32 %b, i32* %y
ret i32 %b
When this works for switches it'll be even more powerful.
llvm-svn: 278660
If a loop is not rotated (for example when optimizing for size), the latch is not the backedge. If we promote an expression to post-inc form, we not only increase register pressure and add a COPY for that IV expression but for all IVs!
Motivating testcase:
void f(float *a, float *b, float *c, int n) {
while (n-- > 0)
*c++ = *a++ + *b++;
}
It's imperative that the pointer increments be located in the latch block and not the header block; if not, we cannot use post-increment loads and stores and we have to keep both the post-inc and pre-inc values around until the end of the latch which bloats register usage.
llvm-svn: 278658
IRCE has the ability to further version pre-loops and post-loops that it
created, but this isn't useful at all. This change teaches IRCE to
leave behind some metadata in the loops it creates (by cloning the main
loop) so that these new loops are not re-processed by IRCE.
Today this bug is hidden by another bug -- IRCE does not update LoopInfo
properly so the loop pass manager does not re-invoke IRCE on the loops
it split out. However, once the latter is fixed the bug addressed in
this change causes IRCE to infinite-loop in some cases (e.g. it splits
out a pre-loop, a pre-pre-loop from that, a pre-pre-pre-loop from that
and so on).
llvm-svn: 278617
The (negative) test case is supposed to check that IRCE does not muck
with range checks it cannot handle, not that it does the right thing in
the absence of profiling information.
llvm-svn: 278612
Loops containing `indirectbr` may not be in simplified form, even after
running LoopSimplify. Reject then gracefully, instead of tripping an
assert.
llvm-svn: 278611
Summary:
Refactor the existing support into a LoopDataPrefetch implementation
class and a LoopDataPrefetchLegacyPass class that invokes it.
Add a new LoopDataPrefetchPass for the new pass manager that utilizes
the LoopDataPrefetch implementation class.
Reviewers: mehdi_amini
Subscribers: sanjoy, mzolotukhin, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D23483
llvm-svn: 278591
`IVVisitor::visitCast` used to have the invariant that if the
instruction it was passed was a sext or zext instruction, the result of
the instruction would be wider than the induction variable. This is no
longer true after rL275037, so this change teaches `IndVarSimplify` s
implementation of `IVVisitor::visitCast` to work with the relaxed
invariant.
A corresponding change to SimplifyIndVar to preserve the said invariant
after rL275037 would also work, but given how `IVVisitor::visitCast` is
spelled (no indication of said invariant), I figured the current fix is
cleaner.
Fixes PR28935.
llvm-svn: 278584
InnerLoopVectorizer shouldn't handle a loop with cycles inside the loop
body, even if that cycle isn't a natural loop.
Fixes PR28541.
Differential Revision: https://reviews.llvm.org/D22952
llvm-svn: 278573
They aren't static, and moving them to the entry block across something
else will only result in tears.
Root cause of http://crbug.com/636558.
llvm-svn: 278571
Summary: The refined propagation algorithm is more accurate and robust.
Reviewers: davidxl, dnovillo
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23224
llvm-svn: 278522
Rewrite Visited[Cond] = getValueFromConditionImpl(..., Visited) statement which can lead to a memory corruption since getValueFromConditionImpl changes Visited map and invalidates the iterators.
llvm-svn: 278514
Take range metadata into account for conditions like this:
%length = load i32, i32* %length_ptr, !range !{i32 0, i32 2147483647}
%cmp = icmp ult i32 %a, %length
This is a common pattern for range checks where the length of the array is dynamically loaded.
Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D23267
llvm-svn: 278496
Currently LVI can only gather value constraints from comparisons like:
* icmp <pred> Val, ...
* icmp ult (add Val, Offset), ...
In fact we can handle any predicate in latter comparisons.
Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D23357
llvm-svn: 278493
Summary:
1. Make coroutine representation more robust against optimization that may duplicate instruction by introducing coro.id intrinsics that returns a token that will get fed into coro.alloc and coro.begin. Due to coro.id returning a token, it won't get duplicated and can be used as reliable indicator of coroutine identify when a particular coroutine call gets inlined.
2. Move last three arguments of coro.begin into coro.id as they will be shared if coro.begin will get duplicated.
3. doc + test + code updated to support the new intrinsic.
Reviewers: mehdi_amini, majnemer
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D23412
llvm-svn: 278481
Summary:
This patch adds IsVariadicFunction bit to summary in order
to not import variadic functions. Inliner doesn't inline
variadic functions because it is hard to reason about it.
This one small fix improves Importer by about 16%
(going from 86% to 100% of imported functions that are
inlined anywhere)
on some spec benchmarks like 'int' and others.
Reviewers: eraman, mehdi_amini, tejohnson
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D23339
llvm-svn: 278432
When legal, extending trip count in the loop control logic generates better code compared to truncating IV. This is because
(1) extending trip count is a loop invariant operation (see genLoopLimit where we prove trip count is loop invariant).
(2) Scalar Evolution seems to have problems understanding trunc when computing loop trip count. So removing them allows better analysis performed in Scalar Evolution. (In particular this fixes PR 28363 which is the motivation for this change).
I am not going to perform any performance test. Any degradation caused by this should be an indication of a bug elsewhere.
To prove legality, we rely on SCEV to prove zext(trunc(IV)) == IV (or similarly for sext). If this holds, we can prove equivalence of trunc(IV)==ExitCnt (1) and IV == zext(ExitCnt). Simply take zext of boths sides of (1) and apply the proven equivalence.
This commit contains changes in a newly added testcase which was not included in the previous commit (which was reverted later on).
https://reviews.llvm.org/D23075
llvm-svn: 278421
Summary:
This is an extension of the fix in r271424. That fix dealt with builder
insert points being moved by SCEV expansion, but only for the lifetime
of the expand call. This change modifies the interface so that LSR can
safely call expand multiple times at the same insert point and do the
right thing if one of the expansions decides to move the original insert
point.
This is a fix for PR28719.
Reviewers: sanjoy
Subscribers: llvm-commits, mcrosier, mzolotukhin
Differential Revision: https://reviews.llvm.org/D23342
llvm-svn: 278413
Summary:
This fixes PR 28933 by making sure GVNHoist does not try to recreate memory
accesses when it has not actually moved them.
Reviewers: sebpop
Subscribers: llvm-commits, george.burgess.iv
Differential Revision: https://reviews.llvm.org/D23411
llvm-svn: 278401
Summary:
Keep track of all methods for which we have devirtualized at least
one call and then print them sorted alphabetically. That allows to
avoid duplicates and also makes the order deterministic.
Add optimization names into the remarks, so that it's easier to
understand how has each method been devirtualized.
Fix a bug when wrong methods could have been reported for
tryVirtualConstProp.
Reviewers: kcc, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23297
llvm-svn: 278389
When legal, extending trip count in the loop control logic generates better code compared to truncating IV. This is because
(1) extending trip count is a loop invariant operation (see genLoopLimit where we prove trip count is loop invariant).
(2) Scalar Evolution seems to have problems understanding trunc when computing loop trip count. So removing them allows better analysis performed in Scalar Evolution. (In particular this fixes PR 28363 which is the motivation for this change).
I am not going to perform any performance test. Any degradation caused by this should be an indication of a bug elsewhere.
To prove legality, we rely on SCEV to prove zext(trunc(IV)) == IV (or similarly for sext). If this holds, we can prove equivalence of trunc(IV)==ExitCnt (1) and IV == zext(ExitCnt). Simply take zext of boths sides of (1) and apply the proven equivalence.
https://reviews.llvm.org/D23075
llvm-svn: 278334
Change --no-pgo-warn-missing to -pgo-warn-missing-function
and negate the default. /NFC
Add more test to make sure the warning is off by default
llvm-svn: 278314
Summary:
A particular coroutine usage pattern, where a coroutine is created, manipulated and
destroyed by the same calling function, is common for coroutines implementing
RAII idiom and is suitable for allocation elision optimization which avoid
dynamic allocation by storing the coroutine frame as a static `alloca` in its
caller.
coro.free and coro.alloc intrinsics are used to indicate which code needs to be suppressed
when dynamic allocation elision happens:
```
entry:
%elide = call i8* @llvm.coro.alloc()
%need.dyn.alloc = icmp ne i8* %elide, null
br i1 %need.dyn.alloc, label %coro.begin, label %dyn.alloc
dyn.alloc:
%alloc = call i8* @CustomAlloc(i32 4)
br label %coro.begin
coro.begin:
%phi = phi i8* [ %elide, %entry ], [ %alloc, %dyn.alloc ]
%hdl = call i8* @llvm.coro.begin(i8* %phi, i32 0, i8* null,
i8* bitcast ([2 x void (%f.frame*)*]* @f.resumers to i8*))
```
and
```
%mem = call i8* @llvm.coro.free(i8* %hdl)
%need.dyn.free = icmp ne i8* %mem, null
br i1 %need.dyn.free, label %dyn.free, label %if.end
dyn.free:
call void @CustomFree(i8* %mem)
br label %if.end
if.end:
...
```
If heap allocation elision is performed, we replace coro.alloc with a static alloca on the caller frame and coro.free with null constant.
Also, we need to make sure that if there are any tail calls referencing the coroutine frame, we need to remote tail call attribute, since now coroutine frame lives on the stack.
Documentation and overview is here: http://llvm.org/docs/Coroutines.html.
Upstreaming sequence (rough plan)
1.Add documentation. (https://reviews.llvm.org/D22603)
2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659)
3.Add empty coroutine passes. (https://reviews.llvm.org/D22847)
4.Add coroutine devirtualization + tests.
ab) Lower coro.resume and coro.destroy (https://reviews.llvm.org/D22998)
c) Do devirtualization (https://reviews.llvm.org/D23229)
5.Add CGSCC restart trigger + tests. (https://reviews.llvm.org/D23234)
6.Add coroutine heap elision + tests. <= we are here
7.Add the rest of the logic (split into more patches)
Reviewers: mehdi_amini, majnemer
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D23245
llvm-svn: 278242
Teach LVI how to gather information from conditions in the form of (cond1 && cond2). Our out-of-tree front-end emits range checks in this form.
Reviewed By: sanjoy
Differential Revision: http://reviews.llvm.org/D23200
llvm-svn: 278231
This is a resubmission of previously reverted r277592. It was hitting overly strong assertion in getConstantRange which was relaxed in r278217.
Use LVI to prove that adds do not wrap. The change is motivated by https://llvm.org/bugs/show_bug.cgi?id=28620 bug and it's the first step to fix that problem.
Reviewed By: sanjoy
Differential Revision: http://reviews.llvm.org/D23059
llvm-svn: 278220
Hal pointed out that the semantic of our intrinsic and the libc
call are slightly different. Add a comment while I'm here to
explain why we can't emit an intrinsic. Thanks Hal!
llvm-svn: 278200
Summary:
The inliner not being a function pass requires the work-around of
generating the OptimizationRemarkEmitter and in turn BFI on demand.
This will go away after the new PM is ready.
BFI is only computed inside ORE if the user has requested hotness
information for optimization diagnostitics (-pass-remark-with-hotness at
the 'opt' level). Thus there is no additional overhead without the
flag.
Reviewers: hfinkel, davidxl, eraman
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D22694
llvm-svn: 278185
Summary:
This hopefully fixes PR28825. The problem now was that a value from the
original loop was used in a subloop, which became a sibling after separation.
While a subloop doesn't need an lcssa phi node, a sibling does, and that's
where we broke LCSSA. The most natural way to fix this now is to simply call
formLCSSA on the original loop: it'll do what we've been doing before plus
it'll cover situations described above.
I think we don't need to run formLCSSARecursively here, and we have an assert
to verify this (I've tried testing it on LLVM testsuite + SPECs). I'd be happy
to be corrected here though.
I also changed a run line in the test from '-lcssa -loop-unroll' to
'-lcssa -loop-simplify -indvars', because it exercises LCSSA
preservation to the same extent, but also makes less unrelated
transformation on the CFG, which makes it easier to verify.
Reviewers: chandlerc, sanjoy, silvas
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23288
llvm-svn: 278173
Summary:
We teach alias analysis that invariant.start is readonly.
This helps with GVN and memcopy optimizations that currently treat.
invariant.start as a clobber.
We need to treat this as readonly, so that DSE does not incorrectly
remove stores prior to the invariant.start
Reviewers: sanjoy, reames, majnemer, dberlin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23214
llvm-svn: 278138
no prof data for func warning is turned off by default
due to its high verbosity and minimal usefulness.
Differential Revision: http://reviews.llvm.org/D23295
llvm-svn: 278127
Summary:
In the use optimizer, we need to keep of whether the lower bound still
dominates us or else we may decide a lower bound is still valid when it
is not due to intervening pushes/pops. Fixes PR28880 (and probably a
bunch of other things).
Reviewers: george.burgess.iv
Subscribers: MatzeB, llvm-commits, sebpop
Differential Revision: https://reviews.llvm.org/D23237
llvm-svn: 277978
Summary:
The correctness fix here is that when we CSE a load with another load,
we need to combine the metadata on the two loads. This matches the
behavior of other passes, like instcombine and GVN.
There's also a minor optimization improvement here: for load PRE, the
aliasing metadata on the inserted load should be the same as the
metadata on the original load. Not sure why the old code was throwing
it away.
Issue found by inspection.
Differential Revision: http://reviews.llvm.org/D21460
llvm-svn: 277977
Summary:
CoroSplit pass processes the coroutine twice. First, it lets it go through
complete IPO optimization pipeline as a single function. It forces restart
of the pipeline by inserting an indirect call to an empty function "coro.devirt.trigger"
which is devirtualized by CoroElide pass that triggers a restart of the pipeline by CGPassManager.
(In later patches, when CoroSplit pass sees the same coroutine the second time, it splits it up,
adds coroutine subfunctions to the SCC to be processed by IPO pipeline.)
Documentation and overview is here: http://llvm.org/docs/Coroutines.html.
Upstreaming sequence (rough plan)
1.Add documentation. (https://reviews.llvm.org/D22603)
2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659)
3.Add empty coroutine passes. (https://reviews.llvm.org/D22847)
4.Add coroutine devirtualization + tests.
ab) Lower coro.resume and coro.destroy (https://reviews.llvm.org/D22998)
c) Do devirtualization (https://reviews.llvm.org/D23229)
5.Add CGSCC restart trigger + tests. <= we are here
6.Add coroutine heap elision + tests.
7.Add the rest of the logic (split into more patches)
Reviewers: mehdi_amini, majnemer
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D23234
llvm-svn: 277936
Summary:
This is the 4c patch of the coroutine series. CoroElide pass now checks if PostSplit coro.begin
is referenced by coro.subfn.addr intrinsics. If so replace coro.subfn.addrs with an appropriate coroutine
subfunction associated with that coro.begin.
Documentation and overview is here: http://llvm.org/docs/Coroutines.html.
Upstreaming sequence (rough plan)
1.Add documentation. (https://reviews.llvm.org/D22603)
2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659)
3.Add empty coroutine passes. (https://reviews.llvm.org/D22847)
4.Add coroutine devirtualization + tests.
ab) Lower coro.resume and coro.destroy (https://reviews.llvm.org/D22998)
c) Do devirtualization <= we are here
5.Add CGSCC restart trigger + tests.
6.Add coroutine heap elision + tests.
7.Add the rest of the logic (split into more patches)
Reviewers: majnemer
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D23229
llvm-svn: 277908
Fixes PR28764. Right now there is no way to test this, but (as
mentioned on the PR) with Michael Zolotukhin's yet to be checked in
LoopSimplify verfier, 8 of the llvm-lit tests for IRCE crash.
llvm-svn: 277891
This fixes PR28825. The problem was that we only checked if a value from
a created inner loop is used in the outer loop, and fixed LCSSA for
them. But we missed to fixup LCSSA for values used in exits of the outer
loop.
llvm-svn: 277877
Summary: Hot callsites should have higher threshold than inline hints. This patch uses separate threshold parameter for hot callsites.
Reviewers: davidxl, eraman
Subscribers: Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D22368
llvm-svn: 277860
Summary:
Chrome on Linux uses WholeProgramDevirt for speed ups, and it's
important to detect regressions on both sides: the toolchain,
if fewer methods get devirtualized after an update, and Chrome,
if an innocently looking change caused many hot methods become
virtual again.
The need to track devirtualized methods is not Chrome-specific,
but it's probably the only user of the pass at this time.
Reviewers: kcc
Differential Revision: https://reviews.llvm.org/D23219
llvm-svn: 277856
Summary: We do not care about intrinsic calls when assigning discriminators.
Reviewers: davidxl, dnovillo
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23212
llvm-svn: 277843
Summary:
Having -O0 in opt allows testing that -O0 optimization
pipeline is built correctly.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23208
llvm-svn: 277829
This generated IR based on the order of evaluation, which is different
between GCC and Clang. With that in mind you get bootstrap miscompares
if you compare a Clang built with GCC-built Clang vs. Clang built with
Clang-built Clang. Diagnosing that made my head hurt.
This also reverts commit r277337, which "fixed" the test case.
llvm-svn: 277820
Summary:
Turn (select C, (sext A), B) into (sext (select C, A, B')) when A is i1 and
B is a compatible constant, also for zext instead of sext. This will then be
further folded into logical operations.
The transformation would be valid for non-i1 types as well, but other parts of
InstCombine prefer to have sext from non-i1 as an operand of select.
Motivated by the shader compiler frontend in Mesa for AMDGPU, which emits i32
for boolean operations. With this change, the boolean logic is fully
recovered.
Reviewers: majnemer, spatel, tstellarAMD
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D22747
llvm-svn: 277801
The patch splits a complex && if condition into easier to read and understand
logic. That wrong early exit condition was letting some instructions with not
all operands available pass through when HoistingGeps was true.
Differential Revision: https://reviews.llvm.org/D23174
llvm-svn: 277785
Shifts with a uniform but non-constant count were considered very expensive to
vectorize, because the splat of the uniform count and the shift would tend to
appear in different blocks. That made the splat invisible to ISel, and we'd
scalarize the shift at codegen time.
Since r201655, CodeGenPrepare sinks those splats to be next to their use, and we
are able to select the appropriate vector shifts. This updates the cost model to
to take this into account by making shifts by a uniform cheap again.
Differential Revision: https://reviews.llvm.org/D23049
llvm-svn: 277782
PR28848 had a very nice reduction of the underlying cause of the bug.
Our ValueMap had, in an entry for an Instruction, a ConstantInt.
This is not at all unexpected but should be handled properly.
llvm-svn: 277773
This is the forth patch in the coroutine series. CoroEaly pass now lowers coro.resume
and coro.destroy intrinsics by replacing them with an indirect call to an address
returned by coro.subfn.addr intrinsic. This is done so that CGPassManager recognizes
devirtualization when CoroElide replaces a call to coro.subfn.addr with an appropriate
function address.
Patch by Gor Nishanov!
Differential Revision: https://reviews.llvm.org/D22998
llvm-svn: 277765
I'm removing a misplaced pair of more specific folds from InstCombine in this patch as well,
so we know where those folds are happening in InstSimplify.
llvm-svn: 277738
Summary:
TargetBaseAlign is no longer required since LSV checks if target allows misaligned accesses.
A constant defining a base alignment is still needed for stack accesses where alignment can be adjusted.
Previous patch (D22936) was reverted because tests were failing. This patch also fixes the cause of those failures:
- x86 failing tests either did not have the right target, or the right alignment.
- NVPTX failing tests did not have the right alignment.
- AMDGPU failing test (merge-stores) should allow vectorization with the given alignment but the target info
considers <3xi32> a non-standard type and gives up early. This patch removes the condition and only checks
for a maximum size allowed and relies on the next condition checking for %4 for correctness.
This should be revisited to include 3xi32 as a MVT type (on arsenm's non-immediate todo list).
Note that checking the sizeInBits for a MVT is undefined (leads to an assertion failure),
so we need to create an EVT, hence the interface change in allowsMisaligned to include the Context.
Reviewers: arsenm, jlebar, tstellarAMD
Subscribers: jholewinski, arsenm, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D23068
llvm-svn: 277735
With this patch we compute the MemorySSA once and update it in the code generator.
Differential Revision: https://reviews.llvm.org/D22966
llvm-svn: 277649
Some of these tests need to be cleaned up further to make it obvious
what they're testing, but as a first step remove all instances of
"grep".
llvm-svn: 277648
This reverts commit r277611 and the followup r277614.
Bootstrap builds and chromium builds are crashing during inlining after
this change.
llvm-svn: 277642
This is a follow-up to r277637. It teaches MemorySSA that invariant
loads (and loads of provably constant memory) are always liveOnEntry.
llvm-svn: 277640
This patch makes MemorySSA recognize atomic/volatile loads, and makes
MSSA treat said loads specially. This allows us to be a bit more
aggressive in some cases.
Administrative note: Revision was LGTM'ed by reames in person.
Additionally, this doesn't include the `invariant.load` recognition in
the differential revision, because I feel it's better to commit that
separately. Will commit soon.
Differential Revision: https://reviews.llvm.org/D16875
llvm-svn: 277637
Summary:
InstCombine unfolds expressions of the form `zext(or(icmp, icmp))` to `or(zext(icmp), zext(icmp))` such that in a later iteration of InstCombine the exposed `zext(icmp)` instructions can be optimized. We now combine this unfolding and the subsequent `zext(icmp)` optimization to be performed together. Since the unfolding doesn't happen separately anymore, we also again enable the folding of `logic(cast(icmp), cast(icmp))` expressions to `cast(logic(icmp, icmp))` which had been disabled due to its interference with the unfolding transformation.
Tested via `make check` and `lnt`.
Background
==========
For a better understanding on how it came to this change we subsequently summarize its history. In commit r275989 we've already tried to enable the folding of `logic(cast(icmp), cast(icmp))` to `cast(logic(icmp, icmp))` which had to be reverted in r276106 because it could lead to an endless loop in InstCombine (also see http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160718/374347.html). The root of this problem is that in `visitZExt()` in InstCombineCasts.cpp there also exists a reverse of the above folding transformation, that unfolds `zext(or(icmp, icmp))` to `or(zext(icmp), zext(icmp))` in order to expose `zext(icmp)` operations which would then possibly be eliminated by subsequent iterations of InstCombine. However, before these `zext(icmp)` would be eliminated the folding from r275989 could kick in and cause InstCombine to endlessly switch back and forth between the folding and the unfolding transformation. This is the reason why we now combine the `zext`-unfolding and the elimination of the exposed `zext(icmp)` to happen at one go because this enables us to still allow the cast-folding in `logic(cast(icmp), cast(icmp))` without entering an endless loop again.
Details on the submitted changes
================================
- In `visitZExt()` we combine the unfolding and optimization of `zext` instructions.
- In `transformZExtICmp()` we have to use `Builder->CreateIntCast()` instead of `CastInst::CreateIntegerCast()` to make sure that the new `CastInst` is inserted in a `BasicBlock`. The new calls to `transformZExtICmp()` that we introduce in `visitZExt()` would otherwise cause according assertions to be triggered (in our case this happend, for example, with lnt for the MultiSource/Applications/sqlite3 and SingleSource/Regression/C++/EH/recursive-throw tests). The subsequent usage of `replaceInstUsesWith()` is necessary to ensure that the new `CastInst` replaces the `ZExtInst` accordingly.
- In InstCombineAndOrXor.cpp we again allow the folding of casts on `icmp` instructions.
- The instruction order in the optimized IR for the zext-or-icmp.ll test case is different with the introduced changes.
- The test cases in zext.ll have been adopted from the reverted commits r275989 and r276105.
Reviewers: grosser, majnemer, spatel
Subscribers: eli.friedman, majnemer, llvm-commits
Differential Revision: https://reviews.llvm.org/D22864
Contributed-by: Matthias Reisinger <d412vv1n@gmail.com>
llvm-svn: 277635
This removes the restriction for the icmp constant, but as noted by the FIXME comments,
we still need to change individual checks for binop operand constants.
llvm-svn: 277629
This is a fix for PR28697.
An MDNode can indirectly refer to a GlobalValue, through a
ConstantAsMetadata. When the GlobalValue is deleted, the MDNode operand
is reset to `nullptr`. If the node is uniqued, this can lead to a
hard-to-detect cache invalidation in a Metadata map that's shared across
an LLVMContext.
Consider:
1. A map from Metadata* to `T` called RemappedMDs.
2. A node that references a global variable, `!{i1* @GV}`.
3. Insert `!{i1* @GV} -> SomeT` in the map.
4. Delete `@GV`, leaving behind `!{null} -> SomeT`.
Looking up the generic and uninteresting `!{null}` gives you `SomeT`,
which is likely related to `@GV`. Worse, `SomeT`'s lifetime may be tied
to the deleted `@GV`.
This occurs in practice in the shared ValueMap used since r266579 in the
IRMover. Other code that handles more than one Module (with different
lifetimes) in the same LLVMContext could hit it too.
The fix here is a partial revert of r225223: in the rare case that an
MDNode operand is a ConstantAsMetadata (i.e., wrapping a node from the
Value hierarchy), drop uniquing if it gets replaced with `nullptr`.
This changes step #4 above to leave behind `distinct !{null} -> SomeT`,
which can't be confused with the generic `!{null}`.
In theory, this can cause some churn in the LLVMContext's MDNode
uniquing map when Values are being deleted. However:
- The number of GlobalValues referenced from uniqued MDNodes is
expected to be quite small. E.g., the debug info metadata schema
only references GlobalValues from distinct nodes.
- Other Constants have the lifetime of the LLVMContext, whose teardown
is careful to drop references before deleting the constants.
As a result, I don't expect a compile time regression from this change.
llvm-svn: 277625
We were able to figure out that the result of a call is some constant.
While propagating that fact, we added the constant to the value map.
This is problematic because it results in us losing the call site when
processing the value map.
This fixes PR28802.
llvm-svn: 277611
reason about and less error prone.
The core idea is to fully parse the text without trying to identify
passes or structure. This is done with a single state machine. There
were various bugs in the logic around this previously that were repeated
and scattered across the code. Having a single routine makes it much
easier to fix and get correct. For example, this routine doesn't suffer
from PR28577.
Then the actual pass construction is handled using *much* easier to read
code and simple loops, with particular pass manager construction sunk to
live with other pass construction. This is especially nice as the pass
managers *are* in fact passes.
Finally, the "implicit" pass manager synthesis is done much more simply
by forming "pre-parsed" structures rather than having to duplicate tons
of logic.
One of the bugs fixed by this was evident in the tests where we accepted
a pipeline that wasn't really well formed. Another bug is PR28577 for
which I have added a test case.
The code is less efficient than the previous code but I'm really hoping
that's not a priority. ;]
Thanks to Sean for the review!
Differential Revision: https://reviews.llvm.org/D22724
llvm-svn: 277561
Summary:
Sometimes, bitsets could get really large (>300k entries) and
we might want to drop a check, as it would have a too much cost.
Adding a flag to control how much penalty are we willing to pay
for bitsets.
Reviewers: kcc
Differential Revision: https://reviews.llvm.org/D23088
llvm-svn: 277556
As agreed in post-commit review of r265388, I'm switching the flag to
its original value until the 90% runtime performance regression on
SingleSource/Benchmarks/Stanford/Bubblesort is addressed.
llvm-svn: 277524
Update comment for isOutOfScope and add a testcase for uniform value being used
out of scope.
Differential Revision: https://reviews.llvm.org/D23073
llvm-svn: 277515
This patch enables the vectorizer to generate both scalar and vector versions
of an integer induction variable for a given loop. Previously, we only
generated a scalar induction variable if we knew all its users were going to be
scalar. Otherwise, we generated a vector induction variable. In the case of a
loop with both scalar and vector users of the induction variable, we would
generate the vector induction variable and extract scalar values from it for
the scalar users. With this patch, we now generate both versions of the
induction variable when there are both scalar and vector users and select which
version to use based on whether the user is scalar or vector.
Differential Revision: https://reviews.llvm.org/D22869
llvm-svn: 277474
This patch refactors the logic in collectLoopUniforms and
collectValuesToIgnore, untangling the concepts of "uniform" and "scalar". It
adds isScalarAfterVectorization along side isUniformAfterVectorization to
distinguish the two. Known scalar values include those that are uniform,
getelementptr instructions that won't be vectorized, and induction variables
and induction variable update instructions whose users are all known to be
scalar.
This patch includes the following functional changes:
- In collectLoopUniforms, we mark uniform the pointer operands of interleaved
accesses. Although non-consecutive, these pointers are treated like
consecutive pointers during vectorization.
- In collectValuesToIgnore, we insert a value into VecValuesToIgnore if it
isScalarAfterVectorization rather than isUniformAfterVectorization. This
differs from the previous functionaly in that we now add getelementptr
instructions that will not be vectorized into VecValuesToIgnore.
This patch also removes the ValuesNotWidened set used for induction variable
scalarization since, after the above changes, it is now equivalent to
isScalarAfterVectorization.
Differential Revision: https://reviews.llvm.org/D22867
llvm-svn: 277460
Added ability to estimate the entry count of the extracted function and
the branch probabilities of the exit branches.
Patch by River Riddle!
Differential Revision: https://reviews.llvm.org/D22744
llvm-svn: 277411
Summary: This patch implements CFI for WebAssembly. It modifies the
LowerTypeTest pass to pre-assign table indexes to functions that are
called indirectly, and lowers type checks to test against the
appropriate table indexes. It also modifies the WebAssembly backend to
support a special ".indidx" assembly directive that propagates the table
index assignments out to the linker.
Patch by Dominic Chen
Differential Revision: https://reviews.llvm.org/D21768
llvm-svn: 277398
Using RAUW was wrong here; if we have a switch transform such as:
18 -> 6 then
6 -> 0
If we use RAUW, while performing the second transform the *transformed* 6
from the first will be also replaced, so we end up with:
18 -> 0
6 -> 0
Found by clang stage2 bootstrap; testcase added.
llvm-svn: 277332
It looks like the two independent parts of the rotate operation (a lshr and shl) are being reordered on some bots. Add CHECK-DAGs to account for this.
llvm-svn: 277329
If a switch is sparse and all the cases (once sorted) are in arithmetic progression, we can extract the common factor out of the switch and create a dense switch. For example:
switch (i) {
case 5: ...
case 9: ...
case 13: ...
case 17: ...
}
can become:
if ( (i - 5) % 4 ) goto default;
switch ((i - 5) / 4) {
case 0: ...
case 1: ...
case 2: ...
case 3: ...
}
or even better:
switch ( ROTR(i - 5, 2) {
case 0: ...
case 1: ...
case 2: ...
case 3: ...
}
The division and remainder operations could be costly so we only do this if the factor is a power of two, and emit a right-rotate instead of a divide/remainder sequence. Dense switches can be lowered significantly better than sparse switches and can even be transformed into lookup tables.
llvm-svn: 277325
When extracting a set of blocks make sure to inherit all of the target
dependent attributes to make sure that the function will be valid for
lowering. One example is the "target-features" attribute for x86, if the
extracted region has functionality that relies on a specific feature it
will fail to be lowered.
This also allows for extracted functions to be valid for inlining, at
least back into the parent function, as the target attributes are tested
when inlining for compatibility.
Patch by River Riddle!
Differential Revision: https://reviews.llvm.org/D22713
llvm-svn: 277315
LoopUnroll is a loop pass, so the analysis of OptimizationRemarkEmitter
is added to the common function analysis passes that loop passes
depend on.
The BFI and indirectly BPI used in this pass is computed lazily so no
overhead should be observed unless -pass-remarks-with-hotness is used.
This is how the patch affects the O3 pipeline:
Dominator Tree Construction
Natural Loop Information
Canonicalize natural loops
Loop-Closed SSA Form Pass
Basic Alias Analysis (stateless AA impl)
Function Alias Analysis Results
Scalar Evolution Analysis
+ Lazy Branch Probability Analysis
+ Lazy Block Frequency Analysis
+ Optimization Remark Emitter
Loop Pass Manager
Rotate Loops
Loop Invariant Code Motion
Unswitch loops
Simplify the CFG
Dominator Tree Construction
Basic Alias Analysis (stateless AA impl)
Function Alias Analysis Results
Combine redundant instructions
Natural Loop Information
Canonicalize natural loops
Loop-Closed SSA Form Pass
Scalar Evolution Analysis
+ Lazy Branch Probability Analysis
+ Lazy Block Frequency Analysis
+ Optimization Remark Emitter
Loop Pass Manager
Induction Variable Simplification
Recognize loop idioms
Delete dead loops
Unroll loops
...
llvm-svn: 277203
Patch by Sunita Marathe
Third try, now following fixes to MSan to handle mempcy in such a way that this commit won't break the MSan buildbots. (Thanks, Evegenii!)
llvm-svn: 277189
Some instructions may have their uses replaced with a symbolic constant.
However, the instruction may still have side effects which percludes it
from being removed from the function. EarlyCSE treated such an
instruction as if it were removed, resulting in PR28763.
llvm-svn: 277114
An undef vector element can be treated as if it had any value. Folding
such a vector element to 0 in a bitcast can open up further folding
opportunities.
llvm-svn: 277104
ConstantExpr::getWithOperands does much of the hard work that
ConstantFoldInstOperandsImpl tries to do but more completely.
This lets us fold ExtractValue/InsertValue expressions.
llvm-svn: 277100
A ConstantVector can have ConstantExpr operands and vice versa.
However, the folder had no ability to fold ConstantVectors which, in
some cases, was an optimization barrier.
Instead, rephrase the folder in terms of Constants instead of
ConstantExprs and teach callers how to deal with failure.
llvm-svn: 277099
Summary:
copypasta doc of ImportedFunctionsInliningStatistics class
\brief Calculate and dump ThinLTO specific inliner stats.
The main statistics are:
(1) Number of inlined imported functions,
(2) Number of imported functions inlined into importing module (indirect),
(3) Number of non imported functions inlined into importing module
(indirect).
The difference between first and the second is that first stat counts
all performed inlines on imported functions, but the second one only the
functions that have been eventually inlined to a function in the importing
module (by a chain of inlines). Because llvm uses bottom-up inliner, it is
possible to e.g. import function `A`, `B` and then inline `B` to `A`,
and after this `A` might be too big to be inlined into some other function
that calls it. It calculates this statistic by building graph, where
the nodes are functions, and edges are performed inlines and then by marking
the edges starting from not imported function.
If `Verbose` is set to true, then it also dumps statistics
per each inlined function, sorted by the greatest inlines count like
- number of performed inlines
- number of performed inlines to importing module
Reviewers: eraman, tejohnson, mehdi_amini
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D22491
llvm-svn: 277089
Summary:
The motivation is the same as in D22141: In order to add the hotness
attribute to optimization remarks we need BFI to be available in all
passes that emit optimization remarks. BFI depends on BPI so unless we
make this lazy as well we would still compute BPI unconditionally.
The solution is to use the new LazyBPI pass in LazyBFI and only compute
BPI when computation of BFI is requested by the client.
I extended the laziness test using a LoopDistribute test to also cover
BPI.
Reviewers: hfinkel, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D22835
llvm-svn: 277083
Summary:
Asan stack-use-after-scope check should poison alloca even if there is
no access between start and end.
This is possible for code like this:
for (int i = 0; i < 3; i++) {
int x;
p = &x;
}
"Loop Invariant Code Motion" will move "p = &x;" out of the loop, making
start/end range empty.
PR27453
Reviewers: eugenis
Differential Revision: https://reviews.llvm.org/D22842
llvm-svn: 277072
Summary:
Asan stack-use-after-scope check should poison alloca even if there is
no access between start and end.
This is possible for code like this:
for (int i = 0; i < 3; i++) {
int x;
p = &x;
}
"Loop Invariant Code Motion" will move "p = &x;" out of the loop, making
start/end range empty.
PR27453
Reviewers: eugenis
Differential Revision: https://reviews.llvm.org/D22842
llvm-svn: 277068
This adds boilerplate code for all coroutine passes,
the passes are no-ops for now.
Also, a small test has been added to verify that passes execute in
the expected order or not at all if coroutine support is disabled.
Patch by Gor Nishanov!
Differential Revision: https://reviews.llvm.org/D22847
llvm-svn: 277033
When folding an expression, we run ConstantFoldConstantExpression on
each operand of that expression.
However, ConstantFoldConstantExpression can fail and retur nullptr.
Previously, we would bail on further refining the expression.
Instead, use the original operand and see if we can refine a later
operand.
llvm-svn: 276959
Summary:
When we ask the builder to create a bitcast on a constant, we get back a
constant, not an instruction.
Reviewers: asbirlea
Subscribers: jholewinski, mzolotukhin, llvm-commits, arsenm
Differential Revision: https://reviews.llvm.org/D22878
llvm-svn: 276922
When loading or storing in a field of a struct like "a.b.c", GVN is able to
detect the equivalent expressions, and GVN-hoist would fail in the code
generation. This is because the GEPs are not hoisted as scalar operations to
avoid moving the GEPs too far from their ld/st instruction when the ld/st is not
movable. So we end up having to generate code for the GEP of a ld/st when we
move the ld/st. In the case of a GEP referring to another GEP as in "a.b.c" we
need to code generate all the GEPs necessary to make all the operands available
at the new location for the ld/st. With this patch we recursively walk through
the GEP operands checking whether all operands are available, and in the case of
a GEP operand, it recursively makes all its operands available. Code generation
happens from the inner GEPs out until reaching the GEP that appears as an
operand of the ld/st.
Differential Revision: https://reviews.llvm.org/D22599
llvm-svn: 276841
Instead of DFS numbering basic blocks we now DFS number instructions that avoids
the costly operation of which instruction comes first in a basic block.
Patch mostly written by Daniel Berlin.
Differential Revision: https://reviews.llvm.org/D22777
llvm-svn: 276714
Pre-instrumentation inline (pre-inliner) greatly improves the IR
instrumentation code performance, among other benefits. One issue of the
pre-inliner is it can introduce CFG-mismatch for COMDAT functions. This
is due to the fact that the same COMDAT function may have different early
inline decisions across different modules -- that means different copies
of COMDAT functions will have different CFG checksum.
In this patch, we propose a partially renaming the COMDAT group and its
member function/variable so we have different profile counter for each
version. We will post-fix the COMDAT function and the group name with its
FunctionHash.
Differential Revision: http://reviews.llvm.org/D22600
llvm-svn: 276673
The public InlineFunction utility assumes that the passed in
InlineFunctionInfo has a valid AssumptionCacheTracker.
Patch by River Riddle!
Differential Revision: https://reviews.llvm.org/D22706
llvm-svn: 276609
If we two loads of two different alignments, we must use the minimum of
the two alignments when hoisting. Same deal for stores.
For allocas, use the maximum of the two allocas.
llvm-svn: 276601
Allowed loop vectorization with secondary FP IVs. Like this:
float *A;
float x = init;
for (int i=0; i < N; ++i) {
A[i] = x;
x -= fp_inc;
}
The auto-vectorization is possible when the induction binary operator is "fast" or the function has "unsafe" attribute.
Differential Revision: https://reviews.llvm.org/D21330
llvm-svn: 276554
When vectorizing a tree rooted at a store bundle, we currently try to sort the
stores before building the tree, so that the stores can be vectorized. For other
trees, the order of the root bundle - which determines the order of all other
bundles - is arbitrary. That is bad, since if a leaf bundle of consecutive loads
happens to appear in the wrong order, we will not vectorize it.
This is partially mitigated when the root is a binary operator, by trying to
build a "reversed" tree when that's considered profitable. This patch extends the
workaround we have for binops to trees rooted in a horizontal reduction.
This fixes PR28474.
Differential Revision: https://reviews.llvm.org/D22554
llvm-svn: 276477
Recommiting r275571 after fixing crash reported in PR28270.
Now we erase elements of IOL in deleteDeadInstruction().
Original Summary:
This change use the overlap interval map built from partial overwrite tracking to perform shortening MemIntrinsics.
Add test cases which was missing opportunities before.
llvm-svn: 276452
Summary:
The llvm.invariant.start and llvm.invariant.end intrinsics currently
support specifying invariant memory objects only in the default address
space.
With this change, these intrinsics are overloaded for any adddress space
for memory objects
and we can use these llvm invariant intrinsics in non-default address
spaces.
Example: llvm.invariant.start.p1i8(i64 4, i8 addrspace(1)* %ptr)
This overloaded intrinsic is needed for representing final or invariant
memory in managed languages.
Reviewers: apilipenko, reames
Subscribers: llvm-commits
llvm-svn: 276447
Just because we can constant fold the result of an instruction does not
imply that we can delete the instruction. It may have side effects.
This fixes PR28655.
llvm-svn: 276389
If `-irce-skip-profitability-checks` is passed in, IRCE will kick in in
all cases where it is legal for it to kick in. This flag is intended to
help diagnose and analyse performance issues.
llvm-svn: 276372
Do not clone stored values unless they are GEPs that are special cased to avoid
hoisting them without hoisting their associated ld/st.
Differential revision: https://reviews.llvm.org/D22652
llvm-svn: 276358
rL245171 exposed a hole in InstSimplify that manifested in a strange way in PR28466:
https://llvm.org/bugs/show_bug.cgi?id=28466
It's possible to use trunc + icmp sgt/slt in place of an and + icmp eq/ne, so we need to
recognize that pattern to eliminate selects that are choosing between some value and some
bitmasked version of that value.
Note that there is significant room for improvement (refactoring) and enhancement (more
patterns, possibly in InstCombine rather than here).
Differential Revision: https://reviews.llvm.org/D22537
llvm-svn: 276341
This patch moves the update instruction for vectorized integer induction phi
nodes to the end of the latch block. This ensures consistent placement of all
induction updates across all the kinds of int inductions we create (scalar,
splat vector, or vector phi).
Differential Revision: https://reviews.llvm.org/D22416
llvm-svn: 276339
Summary:
The llvm.invariant.start and llvm.invariant.end intrinsics currently
support specifying invariant memory objects only in the default address space.
With this change, these intrinsics are overloaded for any adddress space for memory objects
and we can use these llvm invariant intrinsics in non-default address spaces.
Example: llvm.invariant.start.p1i8(i64 4, i8 addrspace(1)* %ptr)
This overloaded intrinsic is needed for representing final or invariant memory in managed languages.
Reviewers: tstellarAMD, reames, apilipenko
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D22519
llvm-svn: 276316