When using bit tests for hole checks, we call AddPredecessorToBlock to give the
phi node a value from the bit test block. This would break if we've
previously called removePredecessor on the default destination because the
switch is fully covered.
Test case by Mark Lacey.
llvm-svn: 235771
(reverted in r235533)
Original commit message:
"Calls to llvm::Value::mutateType are becoming extra-sensitive now that
instructions have extra type information that will not be derived from
operands or result type (alloca, gep, load, call/invoke, etc... ). The
special-handling for mutateType will get more complicated as this work
continues - it might be worth making mutateType virtual & pushing the
complexity down into the classes that need special handling. But with
only two significant uses of mutateType (vectorization and linking) this
seems OK for now.
Totally open to ideas/suggestions/improvements, of course.
With this, and a bunch of exceptions, we can roundtrip an indirect call
site through bitcode and IR. (a direct call site is actually trickier...
I haven't figured out how to deal with the IR deserializer's lazy
construction of Function/GlobalVariable decl's based on the type of the
entity which means looking through the "pointer to T" type referring to
the global)"
The remapping done in ValueMapper for LTO was insufficient as the types
weren't correctly mapped (though I was using the post-mapped operands,
some of those operands might not have been mapped yet so the type
wouldn't be post-mapped yet). Instead use the pre-mapped type and
explicitly map all the types.
llvm-svn: 235651
Move isDereferenceablePointer function to Analysis. This function recursively tracks dereferencability over a chain of values like other functions in ValueTracking.
This refactoring is motivated by further changes to support dereferenceable_or_null attribute (http://reviews.llvm.org/D8650). isDereferenceablePointer will be extended to perform context-sensitive analysis and IR is not a good place to have such functionality.
Patch by: Artur Pilipenko <apilipenko@azulsystems.com>
Differential Revision: reviews.llvm.org/D9075
llvm-svn: 235611
This patch refactors the definition of common utility function "isInductionPHI" to LoopUtils.cpp.
This fixes compilation error when configured with -DBUILD_SHARED_LIBS=ON
llvm-svn: 235577
Only clear out the NSW/NUW flags if we are optimizing 'add'/'sub' while
taking advantage that the sign bit is not set. We do this optimization
to further shrink the mask but shrinking the mask isn't NSW/NUW
preserving in this case.
llvm-svn: 235558
An nsw/nuw operation relies on the values feeding into it to not
overflow if 'poison' is not to be produced. This means that
optimizations which make modifications to the bottom of a chain (like
SimplifyDemandedBits) must strip out nsw/nuw if they cannot ensure that
they will be preserved.
This fixes PR23309.
llvm-svn: 235544
This reverts commit r235458.
It looks like this might be breaking something LTO-ish. Looking into it
& will recommit with a fix/test case/etc once I've got more to go on.
llvm-svn: 235533
Calls to llvm::Value::mutateType are becoming extra-sensitive now that
instructions have extra type information that will not be derived from
operands or result type (alloca, gep, load, call/invoke, etc... ). The
special-handling for mutateType will get more complicated as this work
continues - it might be worth making mutateType virtual & pushing the
complexity down into the classes that need special handling. But with
only two significant uses of mutateType (vectorization and linking) this
seems OK for now.
Totally open to ideas/suggestions/improvements, of course.
With this, and a bunch of exceptions, we can roundtrip an indirect call
site through bitcode and IR. (a direct call site is actually trickier...
I haven't figured out how to deal with the IR deserializer's lazy
construction of Function/GlobalVariable decl's based on the type of the
entity which means looking through the "pointer to T" type referring to
the global)
llvm-svn: 235458
https://llvm.org/bugs/show_bug.cgi?id=23163.
Gep merging sometimes behaves like a reverse CSE/LICM optimization,
which has negative impact on performance. In this patch we restrict
gep merging to happen only when the indexes to be merged are both consts,
which ensures such merge is always beneficial.
The patch makes gep merging only happen in very restrictive cases.
It is possible that some analysis/optimization passes rely on the merged
geps to get better result, and we havn't notice them yet. We will be ready
to further improve it once we see the cases.
Differential Revision: http://reviews.llvm.org/D8911
llvm-svn: 235455
https://llvm.org/bugs/show_bug.cgi?id=23163.
Gep merging sometimes behaves like a reverse CSE/LICM optimizations,
which has negative impact on performance. In this patch we restrict
gep merging to happen only when the indexes to be merged are both consts,
which ensures such merge is always beneficial.
The patch makes gep merging only happen in very restrictive cases.
It is possible that some analysis/optimization passes rely on the merged
geps to get better result, and we havn't notice them yet. We will be ready
to further improve it once we see the cases.
Differential Revision: http://reviews.llvm.org/D9007
llvm-svn: 235451
MemIntrinsic::getDest() looks through pointer casts, and using it
directly when building the new GEP+memset results in stuff like:
%0 = getelementptr i64* %p, i32 16
%1 = bitcast i64* %0 to i8*
call ..memset(i8* %1, ...)
instead of the correct:
%0 = bitcast i64* %p to i8*
%1 = getelementptr i8* %0, i32 16
call ..memset(i8* %1, ...)
Instead, use getRawDest, which just gives you the i8* value.
While there, use the memcpy's dest, as it's live anyway.
In most cases, when the optimization triggers, the memset and memcpy
sizes are the same, so the built memset is 0-sized and eliminated.
The problem occurs when they're different.
Fixes a regression caused by r235232: PR23300.
llvm-svn: 235419
Summary:
This lets us use range based for loops.
Reviewers: chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9169
llvm-svn: 235416
Summary:
After we rewrite a candidate, the instructions used by the old form may
become unused. This patch cleans up these unused instructions so that we
needn't run DCE after SLSR.
Test Plan: removed -dce in all the SLSR tests
Reviewers: broune, meheff
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9101
llvm-svn: 235410
Summary: so that we needn't run DCE after this pass.
Test Plan: removed -dce from the commandline in split-gep.ll and split-gep-and-gvn.ll
Reviewers: meheff
Subscribers: llvm-commits, HaoLiu, hfinkel, jholewinski
Differential Revision: http://reviews.llvm.org/D9096
llvm-svn: 235409
Summary:
MemorySSA uses this algorithm as well, and this enables us to reuse the code in both places.
There are no actual algorithm or datastructure changes in here, just code movement.
Reviewers: qcolombet, chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9118
llvm-svn: 235406
Remove early returns for when `getVariable()` is null, and just assert
that it never happens. The Verifier already confirms that there's a
valid variable on these intrinsics, so we should assume the debug info
isn't broken. I also updated a check for a `!dbg` attachment, which the
Verifier similarly guarantees.
llvm-svn: 235400
This commit fixes the code which adds lifetime markers in InlineFunction to skip
zero-sized allocas instead of asserting on them.
rdar://problem/20531155
llvm-svn: 235312
This patch refactors reduction identification code out of LoopVectorizer and
exposes them as common utilities.
No functional change.
Review: http://reviews.llvm.org/D9046
llvm-svn: 235284
Harden r235258 to support any integer bitwidth. The quick glance at
the reference made me think only i32 and i64 were valid types, but
they're not special, so any overload is legal.
Thanks to David Majnemer for noticing!
llvm-svn: 235261
Followup to r235232, which caused PR23278.
We can't assume the memset and memcpy sizes have the same type, as
nothing in the language reference prevents that.
Instead, zext both to i64 if they disagree.
While there, robustify tests by using i8 %c rather than i8 0 for the
memset character.
llvm-svn: 235258
Stop using `DIDescriptor` and its subclasses in the `DebugInfoFinder`
API, as well as the rest of the API hanging around in `DebugInfo.h`.
llvm-svn: 235240
A common idiom in some code is to do the following:
memset(dst, 0, dst_size);
memcpy(dst, src, src_size);
Some of the memset is redundant; instead, we can do:
memcpy(dst, src, src_size);
memset(dst + src_size, 0,
dst_size <= src_size ? 0 : dst_size - src_size);
Original patch by: Joel Jones
Differential Revision: http://reviews.llvm.org/D498
llvm-svn: 235232
Summary:
An alternative is to use a worklist approach. However, that approach
would break the traversing order so that we couldn't lookup SeenExprs
efficiently. I don't see a clear winner here, so I picked the easier approach.
Along with two minor improvements:
1. preserves ScalarEvolution by forgetting instructions replaced
2. removes dead code locally avoiding the need of running DCE afterwards
Test Plan: add to slsr-add.ll a test that requires multiple iterations
Reviewers: broune, dberlin, atrick, meheff
Reviewed By: atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9058
llvm-svn: 235151
Summary:
This fixes a left-over efficiency issue in D8950.
As Andrew and Daniel suggested, we can store the candidates in a stack
and pop the top element when it does not dominate the current
instruction. This reduces the worst-case time complexity to O(n).
Test Plan: a new test in nary-add.ll that exercises this optimization.
Reviewers: broune, dberlin, meheff, atrick
Reviewed By: atrick
Subscribers: llvm-commits, sanjoy
Differential Revision: http://reviews.llvm.org/D9055
llvm-svn: 235129
This is very similar to D8486 / r232852 (vperm2). If we treat insertps intrinsics
as shufflevectors, we can optimize them better.
I've left all but the full zero case of the zero mask variants out of this patch.
I don't think those can be converted into a single shuffle in all cases, but I'd
be happy to be proven wrong as I was for vperm2f128.
Either way, we'd need to support whatever sequence we come up with for those cases
in the backend before converting them here.
Differential Revision: http://reviews.llvm.org/D8833
llvm-svn: 235124
Remove 'inlinedAt:' from MDLocalVariable. Besides saving some memory
(variables with it seem to be single largest `Metadata` contributer to
memory usage right now in -g -flto builds), this stops optimization and
backend passes from having to change local variables.
The 'inlinedAt:' field was used by the backend in two ways:
1. To tell the backend whether and into what a variable was inlined.
2. To create a unique id for each inlined variable.
Instead, rely on the 'inlinedAt:' field of the intrinsic's `!dbg`
attachment, and change the DWARF backend to use a typedef called
`InlinedVariable` which is `std::pair<MDLocalVariable*, MDLocation*>`.
This `DebugLoc` is already passed reliably through the backend (as
verified by r234021).
This commit removes the check from r234021, but I added a new check
(that will survive) in r235048, and changed the `DIBuilder` API in
r235041 to require a `!dbg` attachment whose 'scope:` is in the same
`MDSubprogram` as the variable's.
If this breaks your out-of-tree testcases, perhaps the script I used
(mdlocalvariable-drop-inlinedat.sh) will help; I'll attach it to PR22778
in a moment.
llvm-svn: 235050
Change `DIBuilder::insertDeclare()` and `insertDbgValueIntrinsic()` to
take an `MDLocation*`/`DebugLoc` parameter which it attaches to the
created intrinsic. Assert at creation time that the `scope:` field's
subprogram matches the variable's. There's a matching `clang` commit to
use the API.
The context for this is PR22778, which is removing the `inlinedAt:`
field from `MDLocalVariable`, instead deferring to the `!dbg` location
attached to the debug info intrinsic. The best way to ensure we always
have a `!dbg` attachment is to require one at creation time. I'll be
adding verifier checks next, but this API change is the best way to
shake out frontend bugs.
Note: I added an `llvm_unreachable()` in `bindings/go` and passed in
`nullptr` for the `DebugLoc`. The `llgo` folks will eventually need to
pass a valid `DebugLoc` here.
llvm-svn: 235041
Summary:
With this patch, SLSR may rewrite
S1: X = B + i * S
S2: Y = B + i' * S
to
S2: Y = X + (i' - i) * S
A secondary improvement: if (i' - i) is a power of 2, emit Y as X + (S << log(i' - i)). (S << log(i' -i)) is in a canonical form and thus more likely GVN'ed than (i' - i) * S.
Test Plan: slsr-add.ll
Reviewers: hfinkel, sanjoy, meheff, broune, eliben
Reviewed By: eliben
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8983
llvm-svn: 235019
Summary:
This transformation reassociates a n-ary add so that the add can partially reuse
existing instructions. For example, this pass can simplify
void foo(int a, int b) {
bar(a + b);
bar((a + 2) + b);
}
to
void foo(int a, int b) {
int t = a + b;
bar(t);
bar(t + 2);
}
saving one add instruction.
Fixes PR22357 (https://llvm.org/bugs/show_bug.cgi?id=22357).
Test Plan: nary-add.ll
Reviewers: broune, dberlin, hfinkel, meheff, sanjoy, atrick
Reviewed By: sanjoy, atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8950
llvm-svn: 234855
Change `DICompileUnit::replaceSubprograms()` and
`DICompileUnit::replaceGlobalVariables()` to match the `MDCompileUnit`
equivalents that they're wrapping.
llvm-svn: 234852
Gut the `DIDescriptor` wrappers around `MDLocalScope` subclasses. Note
that `DILexicalBlock` wraps `MDLexicalBlockBase`, not `MDLexicalBlock`.
llvm-svn: 234850
Summary:
Runtime unrolling of loops needs to emit an expression to compute the
loop's runtime trip-count. Avoid runtime unrolling if this computation
will be expensive.
Depends on D8993.
Reviewers: atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8994
llvm-svn: 234846