In theory this could be generalized to move anything where
we prove the operands are available, but that would require
rewriting PRE. As NewGVN will hopefully come soon, and we're
trying to rewrite PRE in terms of NewGVN+MemorySSA, it's probably
not worth spending too much time on it. Fix provided by
Daniel Berlin!
llvm-svn: 284311
X86. The pass optimizes as a unit the entire wide load + shuffles pattern
produced by interleaved vectorization. This initial patch optimizes one pattern
(64-bit elements interleaved by a factor of 4). Future patches will generalize
to additional patterns.
Patch by Farhana Aleen
Differential revision: http://reviews.llvm.org/D24681
llvm-svn: 284260
This test was apparently checking for 2 independent folds, but we have
plenty of tests for those individual folds already. We are lacking
vector tests, however, because we don't have the shift folds for vectors.
llvm-svn: 284243
Prefer add/zext because they are better supported in terms of value-tracking.
Note that the backend should be prepared for this IR canonicalization
(including vector types) after:
https://reviews.llvm.org/rL284015
Differential Revision: https://reviews.llvm.org/D25135
llvm-svn: 284241
This patch modifies the cost calculation of predicated instructions (div and
rem) to avoid the accumulation of rounding errors due to multiple truncating
integer divisions. The calculation for predicated stores will be addressed in a
follow-on patch since we currently don't scale the cost of predicated stores by
block probability.
Differential Revision: https://reviews.llvm.org/D25333
llvm-svn: 284123
This is with an extra change to avoid calling MemoryLocation::get() on a call instruction.
Differential Revision: https://reviews.llvm.org/D25542
llvm-svn: 284098
This CL didn't actually address the test case in PR30499, and clang
still crashes.
Also revert dependent change "Memory-SSA cleanup of clobbers interface, NFC"
Reverts r283965 and r283967.
llvm-svn: 284093
Reappy r284044 after revert in r284051. Krzysztof fixed the error in r284049.
The original summary:
This patch tries to fully unroll loops having break statement like this
for (int i = 0; i < 8; i++) {
if (a[i] == value) {
found = true;
break;
}
}
GCC can fully unroll such loops, but currently LLVM cannot because LLVM only
supports loops having exact constant trip counts.
The upper bound of the trip count can be obtained from calling
ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the
refactoring work in SCEV to prevent duplicating code.
The feature of using the upper bound is enabled under the same circumstance
when runtime unrolling is enabled since both are used to unroll loops without
knowing the exact constant trip count.
llvm-svn: 284053
This patch tries to fully unroll loops having break statement like this
for (int i = 0; i < 8; i++) {
if (a[i] == value) {
found = true;
break;
}
}
GCC can fully unroll such loops, but currently LLVM cannot because LLVM only
supports loops having exact constant trip counts.
The upper bound of the trip count can be obtained from calling
ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the
refactoring work in SCEV to prevent duplicating code.
The feature of using the upper bound is enabled under the same circumstance
when runtime unrolling is enabled since both are used to unroll loops without
knowing the exact constant trip count.
Differential Revision: https://reviews.llvm.org/D24790
llvm-svn: 284044
Branch folder removes implicit defs if they are the only non-branching
instructions in a block, and the branches do not use the defined registers.
The problem is that in some cases these implicit defs are required for
the liveness information to be correct.
Differential Revision: https://reviews.llvm.org/D25478
llvm-svn: 284036
Summary:
Constant bundle operands may need to retain their constant-ness for
correctness. I'll admit that this is slightly odd, but it looks like
SimplifyCFG already does this for things like @llvm.frameaddress and
@llvm.stackmap, so I suppose adding one more case is not a big deal.
It is possible to add a mechanism to denote bundle operands that need to
remain constants, but that's probably too complicated for the time
being.
Reviewers: jmolloy
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D25502
llvm-svn: 284028
Since this change is known to cause performance degradations in some cases it's commited under a temporary flag which is turned off by default.
Patch by Li Huang
Differential Revision: https://reviews.llvm.org/D18777
llvm-svn: 284022
An arithmetic shift can be safely changed to a logical shift if the first
operand is known positive. This allows ComputeKnownBits (and similar analysis)
to determine the sign bit of the shifted value in some cases. In turn, this
allows InstCombine to canonicalize a signed comparison (a > 0) into an equality
check (a != 0).
PR30577
Differential Revision: https://reviews.llvm.org/D25119
llvm-svn: 284013
As discussed by Andrea on PR30486, we have an unsafe cast to an Instruction type in the select combine which doesn't take into account that it could be a ConstantExpr instead.
Differential Revision: https://reviews.llvm.org/D25466
llvm-svn: 284000
This is a refreshed version of a patch that was reverted: it fixes
the problems reported in both PR30216 and PR30499, and
contains all the test-cases from both bugs.
To hoist stores past loads, we used to search for potential
conflicting loads on the hoisting path by following a MemorySSA
def-def link from the store to be hoisted to the previous
defining memory access, and from there we followed the def-use
chains to all the uses that occur on the hoisting path. The
problem is that the def-def link may point to a store that does
not alias with the store to be hoisted, and so the loads that are
walked may not alias with the store to be hoisted, and even as in
the testcase of PR30216, the loads that may alias with the store
to be hoisted are not visited.
The current patch visits all loads on the path from the store to
be hoisted to the hoisting position and uses the alias analysis
to ask whether the store may alias the load. I was not able to
use the MemorySSA functionality to ask for whether load and
store are clobbered: I'm not sure which function to call, so I
used a call to AA->isNoAlias().
Store past store is still working as before using a MemorySSA
query: I added an extra test to pr30216.ll to make sure store
past store does not regress.
Tested on x86_64-linux with check and a test-suite run.
Differential Revision: https://reviews.llvm.org/D25476
llvm-svn: 283965
When combining an integer load with !range metadata that does not include 0 to a pointer load, make sure emit !nonnull metadata on the newly-created pointer load. This prevents the !nonnull metadata from being dropped during a ptrtoint/inttoptr pair.
This fixes PR30597.
Patch by Ariel Ben-Yehuda!
Differential Revision: https://reviews.llvm.org/D25215
llvm-svn: 283836
Fixed copy+paste vector alignment to correct for per-element scalar loads
Increased to 512-bit data sizes in preparation of avx512 tests
llvm-svn: 283748
Value names may be prefixed with a binary '1' to indicate that the
backend should not modify the symbols due to any platform naming
convention.
This should not show up in the YAML opt record file because it breaks
the YAML parser.
llvm-svn: 283656
Summary:
If heap allocation of a coroutine is elided, we need to make sure that we will update an address stored in the coroutine frame from f.destroy to f.cleanup.
Before this change, CoroSplit synthesized these stores after coro.begin:
```
store void (%f.Frame*)* @f.resume, void (%f.Frame*)** %resume.addr
store void (%f.Frame*)* @f.destroy, void (%f.Frame*)** %destroy.addr
```
In those cases where we did heap elision, but were not able to devirtualize all indirect calls, destroy call will attempt to "free" the coroutine frame stored on the stack. Oops.
Now we use select to put an appropriate coroutine subfunction in the destroy slot. As bellow:
```
store void (%f.Frame*)* @f.resume, void (%f.Frame*)** %resume.addr
%0 = select i1 %need.alloc, void (%f.Frame*)* @f.destroy, void (%f.Frame*)* @f.cleanup
store void (%f.Frame*)* %0, void (%f.Frame*)** %destroy.addr
```
Reviewers: majnemer
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D25377
llvm-svn: 283625
Summary: Add tests for cases where we have zero coverage in RS4GC.
Reviewers: sanjoy, reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25341
llvm-svn: 283591
If we're going to canonicalize IR towards select of constants, try harder to create those.
Also, don't lose the metadata.
This is actually 4 related transforms in one patch:
// select X, (sext X), C --> select X, -1, C
// select X, (zext X), C --> select X, 1, C
// select X, C, (sext X) --> select X, C, 0
// select X, C, (zext X) --> select X, C, 0
Differential Revision: https://reviews.llvm.org/D25126
llvm-svn: 283575
Previously, we marked the branch conditions of latch blocks uniform after
vectorization if they were instructions contained in the loop. However, if a
condition instruction has users other than the branch, it may not remain
uniform. This patch ensures the conditions we mark uniform are only used by the
branch. This should fix PR30627.
Reference: https://llvm.org/bugs/show_bug.cgi?id=30627
llvm-svn: 283563
Summary:
While walking defs of pointer operands we were assuming that the pointer
size would remain constant. This is not true, because addresspacecast
instructions may cast the pointer to an address space with a different
pointer width.
This partial reverts r282612, which was a more conservative solution
to this problem.
Reviewers: reames, sanjoy, apilipenko
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D24772
llvm-svn: 283557
unrolling.
The next code is not vectorized by the SLPVectorizer:
```
int test(unsigned int *p) {
int sum = 0;
for (int i = 0; i < 8; i++)
sum += p[i];
return sum;
}
```
During optimization this loop is fully unrolled and SLPVectorizer is
unable to vectorize it. Patch tries to fix this problem.
Differential Revision: https://reviews.llvm.org/D24796
llvm-svn: 283535
With the ROPI and RWPI relocation models we can't always have pointers
to global data or functions in constant data, so don't try to convert switches
into lookup tables if any value in the lookup table would require a relocation.
We can still safely emit lookup tables of other values, such as simple
constants.
Differential Revision: https://reviews.llvm.org/D24462
llvm-svn: 283530
GetCaseResults assumed that a terminator with one successor was an
unconditional branch. This is not necessarily the case, it could be a
cleanupret.
Strengthen the check by querying whether or not the terminator is
exceptional.
llvm-svn: 283517
Vectorizer tests in the target-independent directory should not have a target
triple. If a test really needs to query a specific backend, it belongs in the
right target subdirectory (which "REQUIRES" the right backend). Otherwise, it
should not specify a triple.
llvm-svn: 283512
Add a weak alias to the renamed Comdat function in IR level instrumentation,
using it's original name. This ensures the same behavior w/ and w/o IR
instrumentation, even for non standard conforming code.
Differential Revision: http://reviews.llvm.org/D25339
llvm-svn: 283490
This adds a new function to DebugInfo.cpp that takes an llvm::Module
as input and removes all debug info metadata that is not directly
needed for line tables, thus effectively stripping all type and
variable information from the module.
The primary motivation for this feature was the bitcode work flow
(cf. http://lists.llvm.org/pipermail/llvm-dev/2016-June/100643.html
for more background). This is not wired up yet, but will be in
subsequent patches. For testing, the new functionality is exposed to
opt with a -strip-nonlinetable-debuginfo option.
The secondary use-case (and one that works right now!) is as a
reduction pass in bugpoint. I added two new bugpoint options
(-disable-strip-debuginfo and -disable-strip-debug-types) to control
the new features. By default it will first attempt to remove all debug
information, then only the type info, and then proceed to hack at any
remaining MDNodes.
llvm-svn: 283473
The purpose of the YAML diagnostic output file is to collect information on
optimizations performed, or not performed, for later processing by tools that
help users (and compiler developers) understand how code was optimized. As
such, the diagnostics that appear in the file should not be coupled to what a
user might want to see summarized for them as the compiler runs, and in fact,
because the user likely does not know what optimization diagnostics their tools
might want to use, the user cannot provide a useful filter regardless. As such,
we shouldn't filter the diagnostics going to the output file.
Differential Revision: https://reviews.llvm.org/D25224
llvm-svn: 283236
Splitting the edge is nontrivial because of the landing pad, and we would
currently assert trying to do it.
Differential Revision: https://reviews.llvm.org/D24680
llvm-svn: 283129
This should fix:
https://llvm.org/bugs/show_bug.cgi?id=30433
There are a couple of open questions about the codegen:
1. Should we let scalar ops be scalars and avoid vector constant loads/splats?
2. Should we have a pass to combine constants such as the inverted pair that we have here?
Differential Revision: https://reviews.llvm.org/D25165
llvm-svn: 283119
Summary:
In the case below, %Result.i19 is defined between coro.save and coro.suspend and used after coro.suspend. We need to correctly place such a value into the coroutine frame.
```
%save = call token @llvm.coro.save(i8* null)
%Result.i19 = getelementptr inbounds %"struct.lean_future<int>::Awaiter", %"struct.lean_future<int>::Awaiter"* %ref.tmp7, i64 0, i32 0
%suspend = call i8 @llvm.coro.suspend(token %save, i1 false)
switch i8 %suspend, label %exit [
i8 0, label %await.ready
i8 1, label %exit
]
await.ready:
%val = load i32, i32* %Result.i19
```
Reviewers: majnemer
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D24418
llvm-svn: 282902
Summary:
Without the fix, if there was a function inlined into the coroutine with debug information, CloneFunctionInto(NewF, &F, VMap, /*ModuleLevelChanges=*/true, Returns); would duplicate all of the debug information including the DICompileUnit.
We know use VMap to indicate that debug metadata for a File, Unit and FunctionType should not be duplicated when we creating clones that will become f.resume, f.destroy and f.cleanup.
Reviewers: majnemer
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24417
llvm-svn: 282899
Summary: Not all coro.subfn.addr intrinsics can be eliminated in CoroElide through devirtualization. Those that remain need to be lowered in CoroCleanup.
Reviewers: majnemer
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D24412
llvm-svn: 282897
Summary: Debug info should *not* affect optimization decisions. This patch updates loop unroller cost model to make it not affected by debug info.
Reviewers: davidxl, mzolotukhin
Subscribers: haicheng, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D25098
llvm-svn: 282894
When building the steps for scalar induction variables, we previously attempted
to determine if all the scalar users of the induction variable were uniform. If
they were, we would only emit the step corresponding to vector lane zero. This
optimization was too aggressive. We generally don't know the entire set of
induction variable users that will be scalar. We have
isScalarAfterVectorization, but this is only a conservative estimate of the
instructions that will be scalarized. Thus, an induction variable may have
scalar users that aren't already known to be scalar. To avoid emitting unused
steps, we can only check that the induction variable is uniform. This should
fix PR30542.
Reference: https://llvm.org/bugs/show_bug.cgi?id=30542
llvm-svn: 282863
Summary:
We don't want to decay hot callsites to import chains of hot
callsites. The same mechanism is used in LIPO.
Reviewers: tejohnson, eraman, mehdi_amini
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D24976
llvm-svn: 282833
Summary:
Not tunned up heuristic, but with this small heuristic there is about
+0.10% improvement on SPEC 2006
Reviewers: tejohnson, mehdi_amini, eraman
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24940
llvm-svn: 282733
Summary:
The patch fixes regression caused by two earlier patches D18777 and D18867.
Reviewers: reames, sanjoy
Differential Revision: http://reviews.llvm.org/D24280
From: Li Huang
llvm-svn: 282650
Also, remove unnecessary function attributes, parameters, and comments.
It looks like at least some of these tests are not minimal though...
llvm-svn: 282620
Pointers in different addrspaces can have different sizes, so it's not valid to look through addrspace cast calculating base and offset for a value.
This is similar to D13008.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D24729
llvm-svn: 282612
There is really no reason for these to be separate.
The vectorizer started this pretty bad tradition that the text of the
missed remarks is pretty meaningless, i.e. vectorization failed. There,
you have to query analysis to get the full picture.
I think we should just explain the reason for missing the optimization
in the missed remark when possible. Analysis remarks should provide
information that the pass gathers regardless whether the optimization is
passing or not.
llvm-svn: 282542
(Re-committed after moving the template specialization under the yaml
namespace. GCC was complaining about this.)
This allows various presentation of this data using an external tool.
This was first recommended here[1].
As an example, consider this module:
1 int foo();
2 int bar();
3
4 int baz() {
5 return foo() + bar();
6 }
The inliner generates these missed-optimization remarks today (the
hotness information is pulled from PGO):
remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)
Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 10 }
Function: baz
Hotness: 30
Args:
- Callee: foo
- String: will not be inlined into
- Caller: baz
...
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 18 }
Function: baz
Hotness: 30
Args:
- Callee: bar
- String: will not be inlined into
- Caller: baz
...
This is a summary of the high-level decisions:
* There is a new streaming interface to emit optimization remarks.
E.g. for the inliner remark above:
ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
DEBUG_TYPE, "NotInlined", &I)
<< NV("Callee", Callee) << " will not be inlined into "
<< NV("Caller", CS.getCaller()) << setIsVerbose());
NV stands for named value and allows the YAML client to process a remark
using its name (NotInlined) and the named arguments (Callee and Caller)
without parsing the text of the message.
Subsequent patches will update ORE users to use the new streaming API.
* I am using YAML I/O for writing the YAML file. YAML I/O requires you
to specify reading and writing at once but reading is highly non-trivial
for some of the more complex LLVM types. Since it's not clear that we
(ever) want to use LLVM to parse this YAML file, the code supports and
asserts that we're writing only.
On the other hand, I did experiment that the class hierarchy starting at
DiagnosticInfoOptimizationBase can be mapped back from YAML generated
here (see D24479).
* The YAML stream is stored in the LLVM context.
* In the example, we can probably further specify the IR value used,
i.e. print "Function" rather than "Value".
* As before hotness is computed in the analysis pass instead of
DiganosticInfo. This avoids the layering problem since BFI is in
Analysis while DiagnosticInfo is in IR.
[1] https://reviews.llvm.org/D19678#419445
Differential Revision: https://reviews.llvm.org/D24587
llvm-svn: 282539
This allows various presentation of this data using an external tool.
This was first recommended here[1].
As an example, consider this module:
1 int foo();
2 int bar();
3
4 int baz() {
5 return foo() + bar();
6 }
The inliner generates these missed-optimization remarks today (the
hotness information is pulled from PGO):
remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)
Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 10 }
Function: baz
Hotness: 30
Args:
- Callee: foo
- String: will not be inlined into
- Caller: baz
...
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 18 }
Function: baz
Hotness: 30
Args:
- Callee: bar
- String: will not be inlined into
- Caller: baz
...
This is a summary of the high-level decisions:
* There is a new streaming interface to emit optimization remarks.
E.g. for the inliner remark above:
ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
DEBUG_TYPE, "NotInlined", &I)
<< NV("Callee", Callee) << " will not be inlined into "
<< NV("Caller", CS.getCaller()) << setIsVerbose());
NV stands for named value and allows the YAML client to process a remark
using its name (NotInlined) and the named arguments (Callee and Caller)
without parsing the text of the message.
Subsequent patches will update ORE users to use the new streaming API.
* I am using YAML I/O for writing the YAML file. YAML I/O requires you
to specify reading and writing at once but reading is highly non-trivial
for some of the more complex LLVM types. Since it's not clear that we
(ever) want to use LLVM to parse this YAML file, the code supports and
asserts that we're writing only.
On the other hand, I did experiment that the class hierarchy starting at
DiagnosticInfoOptimizationBase can be mapped back from YAML generated
here (see D24479).
* The YAML stream is stored in the LLVM context.
* In the example, we can probably further specify the IR value used,
i.e. print "Function" rather than "Value".
* As before hotness is computed in the analysis pass instead of
DiganosticInfo. This avoids the layering problem since BFI is in
Analysis while DiagnosticInfo is in IR.
[1] https://reviews.llvm.org/D19678#419445
Differential Revision: https://reviews.llvm.org/D24587
llvm-svn: 282499
Summary:
We don't currently need this facility for CFI. Disabling individual hot methods proved
to be a better strategy in Chrome.
Also, the design of the feature is suboptimal, as pointed out by Peter Collingbourne.
Reviewers: pcc
Subscribers: kcc
Differential Revision: https://reviews.llvm.org/D24948
llvm-svn: 282461
Summary:
This patch improves thinlto importer
by importing 3x larger functions that are called from hot block.
I compared performance with the trunk on spec, and there
were about 2% on povray and 3.33% on milc. These results seems
to be consistant and match the results Teresa got with her simple
heuristic. Some benchmarks got slower but I think they are just
noisy (mcf, xalancbmki, omnetpp)- running the benchmarks again with
more iterations to confirm. Geomean of all benchmarks including the noisy ones
were about +0.02%.
I see much better improvement on google branch with Easwaran patch
for pgo callsite inlining (the inliner actually inline those big functions)
Over all I see +0.5% improvement, and I get +8.65% on povray.
So I guess we will see much bigger change when Easwaran patch will land
(it depends on new pass manager), but it is still worth putting this to trunk
before it.
Implementation details changes:
- Removed CallsiteCount.
- ProfileCount got replaced by Hotness
- hot-import-multiplier is set to 3.0 for now,
didn't have time to tune it up, but I see that we get most of the interesting
functions with 3, so there is no much performance difference with higher, and
binary size doesn't grow as much as with 10.0.
Reviewers: eraman, mehdi_amini, tejohnson
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24638
llvm-svn: 282437
This patch ensures that we actually scalarize instructions marked scalar after
vectorization. Previously, such instructions may have been vectorized instead.
Differential Revision: https://reviews.llvm.org/D23889
llvm-svn: 282418
Summary:
If coroutine has no suspend points, remove heap allocation and turn a coroutine into a normal function.
Also, if a pattern is detected that coroutine resumes or destroys itself prior to coro.suspend call, turn the suspend point into a simple jump to resume or cleanup label. This pattern occurs when coroutines are used to propagate errors in functions that return expected<T>.
Reviewers: majnemer
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24408
llvm-svn: 282414
The index of the new insertelement instruction was evaluated in the
wrong way, it was considered as the index of the inserted value instead
of index of the position, where the value should be inserted.
llvm-svn: 282401
This patch fixes PR30366.
Function foldUDivShl() worked under the assumption that one of the values
in input to the function was always an instance of llvm::Instruction.
However, function visitUDivOperand() (the only user of foldUDivShl) was
clearly violating that precondition; internally, visitUDivOperand() uses pattern
matches to check the operands of a udiv. Pattern matchers for binary operators
know how to handle both Instruction and ConstantExpr values.
This patch fixes the problem in foldUDivShl(). Now we use pattern matchers
instead of explicit casts to Instruction. The reduced test case from PR30366
has been added to test file InstCombine/udiv-simplify.ll.
Differential Revision: https://reviews.llvm.org/D24565
llvm-svn: 282398
If inserting more than one constant into a vector:
define <4 x float> @foo(<4 x float> %x) {
%ins1 = insertelement <4 x float> %x, float 1.0, i32 1
%ins2 = insertelement <4 x float> %ins1, float 2.0, i32 2
ret <4 x float> %ins2
}
InstCombine could reduce that to a shufflevector:
define <4 x float> @goo(<4 x float> %x) {
%shuf = shufflevector <4 x float> %x, <4 x float> <float undef, float 1.0, float 2.0, float undef>, <4 x i32><i32 0, i32 5, i32 6, i32 3>
ret <4 x float> %shuf
}
Also, InstCombine tries to convert shuffle instruction to single insertelement, if one of the vectors is a constant vector and only a single element from this constant should be used in shuffle, i.e.
shufflevector <4 x float> %v, <4 x float> <float undef, float 1.0, float
undef, float undef>, <4 x i32> <i32 0, i32 5, i32 undef, i32 undef> ->
insertelement <4 x float> %v, float 1.0, 1
Differential Revision: https://reviews.llvm.org/D24182
llvm-svn: 282237
We already have the udiv variant of this transform, so I think this is ok for
InstCombine too even though there is an increase in IR instructions. As the
tests and TODO comments show, the transform can lead to follow-on combines.
This should fix: https://llvm.org/bugs/show_bug.cgi?id=28672
Differential Revision: https://reviews.llvm.org/D24527
llvm-svn: 282209
and also the dependent r282175 "GVN-hoist: do not dereference null pointers"
It's causing compiler crashes building Harfbuzz (PR30499).
llvm-svn: 282199
To hoist stores past loads, we used to search for potential
conflicting loads on the hoisting path by following a MemorySSA
def-def link from the store to be hoisted to the previous
defining memory access, and from there we followed the def-use
chains to all the uses that occur on the hoisting path. The
problem is that the def-def link may point to a store that does
not alias with the store to be hoisted, and so the loads that are
walked may not alias with the store to be hoisted, and even as in
the testcase of PR30216, the loads that may alias with the store
to be hoisted are not visited.
The current patch visits all loads on the path from the store to
be hoisted to the hoisting position and uses the alias analysis
to ask whether the store may alias the load. I was not able to
use the MemorySSA functionality to ask for whether load and
store are clobbered: I'm not sure which function to call, so I
used a call to AA->isNoAlias().
Store past store is still working as before using a MemorySSA
query: I added an extra test to pr30216.ll to make sure store
past store does not regress.
Differential Revision: https://reviews.llvm.org/D24517
llvm-svn: 282168
Without this patch, GVN-hoist would think that a branch instruction is a scalar instruction
and would try to value number it. The patch filters out all such kind of irrelevant instructions.
A bit frustrating is that there is no easy way to discard all those very infrequent instructions,
a bit like isa<TerminatorInst> that stands for a large family of instructions. I'm thinking that
checking for those very infrequent other instructions would cost us more in compilation time
than just letting those instructions getting numbered, so I'm still thinking that a simpler check:
if (isa<TerminatorInst>(I))
return false;
is better than listing all the other less frequent instructions.
Differential Revision: https://reviews.llvm.org/D23929
llvm-svn: 282160
Currently, we give up on loop interchange if we encounter a flow dependency
anywhere in the loop list. Worse yet, we don't even track output dependencies.
This patch updates the dependency matrix computation to track flow and output
dependencies in the same way we track anti dependencies.
This improves an internal workload by 2.2x.
Note the loop interchange pass is off by default and it can be enabled with
'-mllvm -enable-loopinterchange'
Differential Revision: https://reviews.llvm.org/D24564
llvm-svn: 282101
If we identify an instruction as uniform after vectorization, we know that we
should only use the value corresponding to the first vector lane of each unroll
iteration. However, when scalarizing such instructions, we still produce values
for the other vector lanes. This patch prevents us from generating the unused
scalars.
Differential Revision: https://reviews.llvm.org/D24275
llvm-svn: 282087
Summary: Now that we have more precise debug info, we should change back to use maximum to get basic block weight.
Reviewers: dnovillo
Subscribers: andreadb, llvm-commits
Differential Revision: https://reviews.llvm.org/D24788
llvm-svn: 282084
SROA doesn't preserve the llvm.mem.parallel_loop_access metadata when it
transforms loads/stores. This patch fixes a couple occurences of this
issue.
(Partially addresses PR28981).
Differential Revision: https://reviews.llvm.org/D23549
llvm-svn: 281960
Summary: Callsites in the same basic block should share the same hotness. This patch checks for the hottest callsite in the same basic block, and use the hotness for all callsites in that basic block for early inline decisions. It also fixes the test to add "-S" so theat the "CHECK-NOT" is actually checking the content.
Reviewers: dnovillo
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24734
llvm-svn: 281927
Summary: It does not make sense to set equal weights for all unkown branches as we have static branch prediction available.
Reviewers: dnovillo
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24732
llvm-svn: 281912
Summary: The call target count profile is directly derived from LBR branch->target data. This is more reliable than instruction frequency profiles that could be moved across basic block boundaries. This patches uses call target count profile to annotate call instructions.
Reviewers: davidxl, dnovillo
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24410
llvm-svn: 281911
When phi nodes are created in the -mem2reg phase, the @llvm.dbg.declare
entries are converted to @llvm.dbg.value entries at the place where the
store instructions existed. However no entry is created to describe
the resulting value of the phi node.
The effect of this is especially noticeable in for loops which have a
constant for the intial value; the loop control variable's location
would be described as the intial constant value in the loop body once
the -mem2reg optimization phase was run.
This change adds the creation of the @llvm.dbg.value entries to describe
variables whose location is the result of a phi node created in -mem2reg.
Also when the phi node is finally lowered to a machine instruction it
is important that the lowered "load" instruction is placed before the
associated DEBUG_VALUE entry describing the value loaded.
Differential Revision: https://reviews.llvm.org/D23715
llvm-svn: 281895
We were updating metadata but not IR flags. Because we pick an arbitrary instruction to be the CSE candidate, it comes down to luck (50% or less chance) if this results in broken codegen or not, which is why PR30373 which is actually not the fault of the commit it was bisected down to.
Fixes PR30373.
llvm-svn: 281889
Summary: Previously we reline on inst-combine to remove inlinable invoke instructions. This causes trouble because a few extra optimizations are schedule early that could introduce too much CFG change (e.g. simplifycfg removes too much control flow). This patch handles invoke instruction in-place during sample profile annotation, so that we do not rely on instcombine to remove those invoke instructions.
Reviewers: davidxl, dnovillo
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24409
llvm-svn: 281870
Summary:
This fixes an issue when files are compiled with -flto=thin
at default -O0. We need to rename anonymous globals before attempting
to write the module summary because all values need names for
the summary. This was happening at -O1 and above, but not before
the early exit when constructing the pipeline for -O0.
Also add an internal -prepare-for-thinlto option to enable this
to be tested via opt.
Fixes PR30419.
Reviewers: mehdi_amini
Subscribers: probinson, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D24701
llvm-svn: 281840
This is a fix for PR30318.
Clang may generate IR where an alloca is already live when entering a
BB with lifetime.start. In this case, conservatively extend the
alloca lifetime all the way back to the block entry.
llvm-svn: 281784
computeKnownBits() already works for integer vectors, so allow vector types when calling that from InstCombine.
I don't think the change to use m_APInt in computeKnownBits is strictly necessary because we do check for
ConstantVector later, but it's more efficient to handle the splat case without needing to loop on vector elements.
This should work with InstSimplify, but doesn't yet, so I made that a FIXME comment on the test for PR24942:
https://llvm.org/bugs/show_bug.cgi?id=24942
Differential Revision: https://reviews.llvm.org/D24677
llvm-svn: 281777
A follow-up patch will rename this pass and the source file accordingly,
but I figured the non-NFC change will be easier to spot in isolation.
Differential Revision: https://reviews.llvm.org/D24641
llvm-svn: 281744
These 2 helper functions were already using APInt internally, so just
change the API and caller to allow folds for splats. The scalar
regression tests look quite thorough, so I just added a couple of
tests to prove that vectors are handled too.
These folds should be grouped with the other cmp+shift folds though.
That can be an NFC follow-up.
llvm-svn: 281663
GlobalOpt is already dead-code-eliminating global definitions. With
this change it also takes care of declarations.
Hopefully this should make it now a strict superset of GlobalDCE.
This is important for LTO/ThinLTO as we don't want the linker to see
"undefined reference" when it processes the input files: it could
prevent proper internalization (or even load an extra file from a
static archive, changing the behavior of the program!).
llvm-svn: 281653
The patch is to partially fix PR10584. Correlated Value Propagation queries LVI
to check non-null for pointer params of each callsite. If we know the def of
param is an alloca instruction, we know it is non-null and can return early from
LVI. Similarly, CVP queries LVI to check whether pointer for each mem access is
constant. If the def of the pointer is an alloca instruction, we know it is not
a constant pointer. These shortcuts can reduce the cost of CVP significantly.
Differential Revision: https://reviews.llvm.org/D18066
llvm-svn: 281586
This patch moves the processing of pointer induction variables in
collectLoopUniforms from the consecutive pointer phase of the analysis to the
phi node phase. Previously, if a pointer induction variable was used by both a
scalarized non-memory instruction as well as a vectorized memory instruction,
we would incorrectly identify the pointer as uniform. Pointer induction
variables should be treated the same as other phi nodes. That is, they are
uniform if all users of the induction variable and induction variable update
are uniform.
Differential Revision: https://reviews.llvm.org/D24511
llvm-svn: 281485
ObjC library call with call return.
ARC contraction tries to replace uses of an argument passed to an
objective-c library call with the call return value. For example, in the
following IR, it replaces uses of argument %9 and uses of the values
discovered traversing the chain upwards (%7 and %8) with the call return
%10, if they are dominated by the call to @objc_autoreleaseReturnValue.
This transformation enables code-gen to tail-call the call to
@objc_autoreleaseReturnValue, which is necessary to enable auto release
return value optimization.
%7 = tail call i8* @objc_loadWeakRetained(i8** %6)
%8 = bitcast i8* %7 to %0*
%9 = bitcast %0* %8 to i8*
%10 = tail call i8* @objc_autoreleaseReturnValue(i8* %9)
ret %0* %8
Since r276727, llvm started removing redundant bitcasts and as a result
started feeding the following IR to ARC contraction:
%7 = tail call i8* @objc_loadWeakRetained(i8** %6)
%8 = bitcast i8* %7 to %0*
%9 = tail call i8* @objc_autoreleaseReturnValue(i8* %7)
ret %0* %8
ARC contraction no longer does the optimization described above since it
only traverses the chain upwards and fails to recognize that the
function return can be replaced by the call return. This commit changes
ARC contraction to traverse the chain downwards too and replace uses of
bitcasts with the call return.
rdar://problem/28011339
Differential Revision: https://reviews.llvm.org/D24523
llvm-svn: 281419
The constant folder didn't know how to always fold bitcasts of constant integer
vectors. In particular, it was unable to handle the case where a constant vector
had some undef elements, and the resulting (i.e. bitcasted) vector type had more
elements than the original vector type.
Example:
%cast = bitcast <2 x i64><i64 undef, i64 2> to <4 x i32>
On a little endian target, %cast could have been folded to:
<4 x i32><i32 undef, i32 undef, i32 2, i32 0>
This patch improves the folding logic by teaching how to correctly propagate
undef elements in the folded vector.
Differential Revision: https://reviews.llvm.org/D24301
llvm-svn: 281343
InstSimplify doesn't always know how to fold a bitcast of a constant vector.
In particular, the logic in InstSimplify doesn't know how to handle the case
where the constant vector in input contains some undef elements, and the
number of elements is smaller than the number of elements of the bitcast
vector type.
llvm-svn: 281332
Teach SimplifyLibcalls that in can treat functions annotated with
apcs, aapcs or aapcs_vfp like normal C functions if they only take
and return integer or pointer values, and the target is not iOS.
Differential Revision: https://reviews.llvm.org/D24453
llvm-svn: 281322
This patch reverses the edge from DIGlobalVariable to GlobalVariable.
This will allow us to more easily preserve debug info metadata when
manipulating global variables.
Fixes PR30362. A program for upgrading test cases is attached to that
bug.
Differential Revision: http://reviews.llvm.org/D20147
llvm-svn: 281284
Trying to infer the 'returned' attribute if an argument is already
'returned' can lead to verification failure: inference might determine
that a different argument is passed through which would result in two
different arguments marked as 'returned'.
This fixes PR30350.
llvm-svn: 281221