Summary:
A reprise of D25849.
This crash was found through fuzzing some time ago and was documented in PR28879.
No check for load size has been added due to the following tests:
- Transforms/GVN/invariant.group.ll
- Transforms/GVN/pr10820.ll
These tests expect load sizes that are not a multiple of eight.
Thanks to @davide for the original patch.
Reviewers: nlopes, davide, RKSimon, reames, efriedma
Reviewed By: efriedma
Subscribers: davide, llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D48330
llvm-svn: 335294
This is the simplest case from PR37806:
https://bugs.llvm.org/show_bug.cgi?id=37806
If we have a common variable operand used in a pair of binops with vector constants
that are vector selected together, then we can constant shuffle the constant vectors
to eliminate the shuffle instruction.
This has some tricky parts that are hopefully addressed in the tests and their
respective comments:
1. If the shuffle mask contains an undef element, then that lane of the result is
undef:
http://llvm.org/docs/LangRef.html#shufflevector-instruction
Therefore, we can replace the constant in that lane with an undef value except
for div/rem. With div/rem, an undef in the divisor would cause the whole op to
be undef. So I'm using the same hack as in D47686 - replace the undefs with '1'.
2. Intersect the wrapping and FMF of the original binops for the new binop. There
should be no extra poison or fast-math potential in the new binop that wasn't
possible in the original code.
3. Disregard other uses. Given that we're eliminating uses (shortening the
dependency chain), I think that's always the right IR canonicalization. But
I purposely chose the udiv test to demonstrate the scenario where both
intermediate values have other uses because that seems likely worse for
codegen with an expensive math op. This seems like a very rare possibility to
me, so I don't think it requires a backend patch first.
Differential Revision: https://reviews.llvm.org/D48401
llvm-svn: 335283
This reverts commit r335206.
As discussed here: https://reviews.llvm.org/rL333740, a fix will come
tomorrow. In the meanwhile, revert this to fix some bots.
llvm-svn: 335272
The previous code worked with vectors, but it failed when the
vector constants contained undef elements.
The matchers handle those cases.
llvm-svn: 335262
Summary:
Use the expanded features of the TableGen generic tables to avoid manually
adding the combinatorially exploded set of intrinsics. The
getAMDGPUImageDimIntrinsic lookup function is early-out,
i.e. non-AMDGPU intrinsics will never look at the underlying table.
Use a generic approach for getting the new intrinsic overload to keep the
code simple, and make the image dmask handling more generic:
- handle non-sampler image loads
- handle the case where the set of demanded elements is not a prefix
There is some overlap between this code and an optimization that happens
in the backend during code generation. They currently complement each other:
- only the codegen optimization can generate vec3 loads
- only the InstCombine optimization can handle D16
The InstCombine optimization also likely covers more cases since the
codegen optimization is fairly ad-hoc. Ideally, we'll remove the optimization
in codegen once the infrastructure for vec3 is in place (which will probably
take a long time).
Modify the test cases to use dimension-aware intrinsics. This makes it
easier to see that the test coverage for the new intrinsics is equivalent,
and the old style intrinsics will be removed in a follow-up commit anyway.
Change-Id: I4b91ea661413d13004956fe4ef7d13d41b8ce3ad
Reviewers: arsenm, rampitec, majnemer
Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D48165
llvm-svn: 335230
This enables da-delinearize in Dependence Analysis for delinearizing array
accesses into multiple dimensions. This can help to increase the power of
Dependence analysis on multi-dimensional arrays and prevent having to fall
back to the slower and less accurate MIV tests. It adds static checks on the
bounds of the arrays to ensure that one dimension doesn't overflow into
another, and brings our code in line with our tests.
Differential Revision: https://reviews.llvm.org/D45872
llvm-svn: 335217
These were being over cautious for costs for one/two op general shuffles - VSHUFPD doesn't have to replicate the same shuffle in both lanes like VSHUFPS does.
llvm-svn: 335216
r335150 should resolve the issues with the clang-with-thin-lto-ubuntu
and clang-with-lto-ubuntu builders.
Original message:
This patch updates IPSCCP to use PredicateInfo to propagate
facts to true branches predicated by EQ and to false branches
predicated by NE.
As a follow up, we should be able to extend it to also propagate additional
facts about nonnull.
Reviewers: davide, mssimpso, dberlin, efriedma
Reviewed By: davide, dberlin
llvm-svn: 335206
conditions feeding a chain of `and`s or `or`s for a branch.
Much like with full non-trivial unswitching, we rely on the pass manager
to handle iterating until all of the profitable unswitches have been
done. This is to allow other more profitable unswitches to fire on any
of the cloned, simpler versions of the loop if viable.
Threading the partial unswiching through the non-trivial unswitching
logic motivated some minor refactorings. If those are too disruptive to
make it reasonable to review this patch, I can separate them out, but
it'll be somewhat timeconsuming so I wanted to send it for initial
review as-is. Feel free to tell me whether it warrants pulling apart.
I've tried to re-use (and factor out) logic form the partial trivial
unswitching, but not as much could be shared as I had haped. Still, this
wasn't as bad as I naively expected.
Some basic testing is added, but I probably need more. Suggestions for
things you'd like to see tested more than welcome. One thing I'd like to
do is add some testing that when we schedule this with loop-instsimplify
it effectively cleans up the cruft created.
Last but not least, this uncovered a bug that has been in loop cloning
the entire time for non-trivial unswitching. Specifically, we didn't
correctly add the outer-most cloned loop to the list of cloned loops.
This meant that LCSSA wouldn't be updated for it hypothetically, and
more significantly that we would never visit it in the loop pass
manager. I noticed this while checking loop-instsimplify by hand. I'll
try to separate this bugfix out into its own patch with a more focused
test. But it is just one line, so shouldn't significantly confuse the
review here.
After this patch, the only missing "feature" in this unswitch I'm aware
of us non-trivial unswitching of switches. I'll try implementing *full*
non-trivial unswitching of switches (which is at least a sound thing to
implement), but *partial* non-trivial unswitching of switches is
something I don't see any sound and principled way to implement. I also
have no interesting test cases for the latter, so I'm not really
worried. The rest of the things that need to be ported are bug-fixes and
more narrow / targeted support for specific issues.
Differential Revision: https://reviews.llvm.org/D47522
llvm-svn: 335203
Summary:
Two utils methods have essentially the same functionality. This is an attempt to merge them into one.
1. lib/Transforms/Utils/Local.cpp : MergeBasicBlockIntoOnlyPred
2. lib/Transforms/Utils/BasicBlockUtils.cpp : MergeBlockIntoPredecessor
Prior to the patch:
1. MergeBasicBlockIntoOnlyPred
Updates either DomTree or DeferredDominance
Moves all instructions from Pred to BB, deletes Pred
Asserts BB has single predecessor
If address was taken, replace the block address with constant 1 (?)
2. MergeBlockIntoPredecessor
Updates DomTree, LoopInfo and MemoryDependenceResults
Moves all instruction from BB to Pred, deletes BB
Returns if doesn't have a single predecessor
Returns if BB's address was taken
After the patch:
Method 2. MergeBlockIntoPredecessor is attempting to become the new default:
Updates DomTree or DeferredDominance, and LoopInfo and MemoryDependenceResults
Moves all instruction from BB to Pred, deletes BB
Returns if doesn't have a single predecessor
Returns if BB's address was taken
Uses of MergeBasicBlockIntoOnlyPred that need to be replaced:
1. lib/Transforms/Scalar/LoopSimplifyCFG.cpp
Updated in this patch. No challenges.
2. lib/CodeGen/CodeGenPrepare.cpp
Updated in this patch.
i. eliminateFallThrough is straightforward, but I added using a temporary array to avoid the iterator invalidation.
ii. eliminateMostlyEmptyBlock(s) methods also now use a temporary array for blocks
Some interesting aspects:
- Since Pred is not deleted (BB is), the entry block does not need updating.
- The entry block was being updated with the deleted block in eliminateMostlyEmptyBlock. Added assert to make obvious that BB=SinglePred.
- isMergingEmptyBlockProfitable assumes BB is the one to be deleted.
- eliminateMostlyEmptyBlock(BB) does not delete BB on one path, it deletes its unique predecessor instead.
- adding some test owner as subscribers for the interesting tests modified:
test/CodeGen/X86/avx-cmp.ll
test/CodeGen/AMDGPU/nested-loop-conditions.ll
test/CodeGen/AMDGPU/si-annotate-cf.ll
test/CodeGen/X86/hoist-spill.ll
test/CodeGen/X86/2006-11-17-IllegalMove.ll
3. lib/Transforms/Scalar/JumpThreading.cpp
Not covered in this patch. It is the only use case using the DeferredDominance.
I would defer to Brian Rzycki to make this replacement.
Reviewers: chandlerc, spatel, davide, brzycki, bkramer, javed.absar
Subscribers: qcolombet, sanjoy, nemanjai, nhaehnle, jlebar, tpr, kbarton, RKSimon, wmi, arsenm, llvm-commits
Differential Revision: https://reviews.llvm.org/D48202
llvm-svn: 335183
The idea of partial unswitching is to take a *part* of a branch's
condition that is loop invariant and just unswitching that part. This
primarily makes sense with i1 conditions of branches as opposed to
switches. When dealing with i1 conditions, we can easily extract loop
invariant inputs to a a branch and unswitch them to test them entirely
outside the loop.
As part of this, we now create much more significant cruft in the loop
body, so this relies on adding cleanup passes to the loop pipeline and
revisiting unswitched loops to do that cleanup before continuing to
process them.
This already appears to be more powerful at unswitching than the old
loop unswitch pass, and so I'd appreciate pretty careful review in case
I'm just missing some correctness checks. The `LIV-loop-condition` test
case is not unswitched by the old unswitch pass, but is with this pass.
Thanks to Sanjoy and Fedor for the review!
Differential Revision: https://reviews.llvm.org/D46706
llvm-svn: 335156
Using OrderedInstructions::dominates as comparator for instructions in
BBs without dominance relation can cause a non-deterministic order
between such instructions. That in turn can cause us to materialize
copies in a non-deterministic order. While this does not effect
correctness, it causes some minor non-determinism in the final generated
code, because values have slightly different labels.
Without this patch, running -print-predicateinfo on a reasonably large
module produces slightly different output on each run.
This patch uses the dominator trees DFSInNum to order instruction from
different BBs, which should enforce a deterministic ordering and
guarantee that dominated instructions come after the instructions that
dominate them.
Reviewers: dberlin, efriedma, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D48230
llvm-svn: 335150
D47985 saw the old SK_Alternate 'alternating' shuffle mask replaced with the SK_Select mask which accepts either input operand for each lane, equivalent to a vector select with a constant condition operand.
This patch updates SLPVectorizer to make full use of this SK_Select shuffle pattern by removing the 'isOdd()' limitation.
The AArch64 regression will be fixed by D48172.
Differential Revision: https://reviews.llvm.org/D48174
llvm-svn: 335130
For both operands are unsigned, the following optimizations are valid, and missing:
1. X > Y && X != 0 --> X > Y
2. X > Y || X != 0 --> X != 0
3. X <= Y || X != 0 --> true
4. X <= Y || X == 0 --> X <= Y
5. X > Y && X == 0 --> false
unsigned foo(unsigned x, unsigned y) { return x > y && x != 0; }
should fold to x > y, but I found we haven't done it right now.
besides, unsigned foo(unsigned x, unsigned y) { return x < y && y != 0; }
Has been folded to x < y, so there may be a bug.
Patch by: Li Jia He!
Differential Revision: https://reviews.llvm.org/D47922
llvm-svn: 335129
These are the baseline tests for the functional change in D47922.
Patch by Li Jia He!
Differential Revision: https://reviews.llvm.org/D48000
llvm-svn: 335128
This patch replaces calls to X86-specific intrinsics with floor-ceil semantics
with calls to target-independent @llvm.floor.* and @llvm.ceil.* intrinsics. This
doesn't affect the resulting machine code, as those intrinsics are lowered to
the same instructions, but exposes these specific rounding cases to generic
optimizations.
Differential Revision: https://reviews.llvm.org/D48067
llvm-svn: 335039
LoopSimplifyCFG, being a loop pass, needs to preserve scalar
evolution. This invalidates SE for the loops altered during
block merging.
Differential Revision: https://reviews.llvm.org/D48258
llvm-svn: 335036
This patch adds logic to deal with the following constructions:
%iv = phi i64 ...
%trunc = trunc i64 %iv to i32
%cmp = icmp <pred> i32 %trunc, %invariant
Replacing it with
%iv = phi i64 ...
%cmp = icmp <pred> i64 %iv, sext/zext(%invariant)
In case if it is legal. Specifically, if `%iv` has signed comparison users, it is
required that `sext(trunc(%iv)) == %iv`, and if it has unsigned comparison
uses then we require `zext(trunc(%iv)) == %iv`. The current implementation
bails if `%trunc` has other uses than `icmp`, but in theory we can handle more
cases here (e.g. if the user of trunc is bitcast).
Differential Revision: https://reviews.llvm.org/D47928
Reviewed By: reames
llvm-svn: 335020
This reverts r334428. It incorrectly marks some multiplications as nuw. Tim
Shen is working on a proper fix.
Original commit message:
[SCEV] Add nuw/nsw to mul ops in StrengthenNoWrapFlags where safe.
Summary:
Previously we would add them for adds, but not multiplies.
llvm-svn: 335016
This reverts commit f976cf4cca0794267f28b54e468007fd476d37d9.
I am reverting this because it causes break in a few bots and its going
to take me sometime to look at this.
llvm-svn: 334993
Summary:
Simplify blockaddress usage before giving up in MergeBlockIntoPredecessor
This is a missing small optimization in MergeBlockIntoPredecessor.
This helps with one simplifycfg test which expects this case to be handled.
Reviewers: davide, spatel, brzycki, asbirlea
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D48284
llvm-svn: 334992
redundant-vf2-cost.ll is X86 specific. Moved from
test/Transforms/LoopVectorize/redundant-vf2-cost.ll to
test/Transforms/LoopVectorize/X86/redundant-vf2-cost.ll
llvm-svn: 334854
Summary:
When iterating users of a multiply in processUMulZExtIdiom, the
call to setOperand in the truncation case may replace the use
being visited; make sure the iterator has been advanced before
doing that replacement.
Reviewers: majnemer, davide
Reviewed By: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D48192
llvm-svn: 334844
This is a minor fix for LV cost model, where the cost for VF=2 was
computed twice when the vectorization of the loop was forced without
specifying a VF.
Reviewers: xusx595, hsaito, fhahn, mkuper
Reviewed By: hsaito, xusx595
Differential Revision: https://reviews.llvm.org/D48048
llvm-svn: 334840
This is r334704 (which was reverted in r334732) with a fix for
types like x86_fp80. We need to use getTypeAllocSizeInBits and
not getTypeStoreSizeInBits to avoid dropping debug info for
such types.
Original commit msg:
> Summary:
> Do not convert a DbgDeclare to DbgValue if the store
> instruction only refer to a fragment of the variable
> described by the DbgDeclare.
>
> Problem was seen when for example having an alloca for an
> array or struct, and there were stores to individual elements.
> In the past we inserted a DbgValue intrinsics for each store,
> just as if the store wrote the whole variable.
>
> When handling store instructions we insert a DbgValue that
> indicates that the variable is "undefined", as we do not know
> which part of the variable that is updated by the store.
>
> When ConvertDebugDeclareToDebugValue is used with a load/phi
> instruction we assert that the referenced value is large enough
> to cover the whole variable. Afaict this should be true for all
> scenarios where those methods are used on trunk. If the assert
> blows in the future I guess we could simply skip to insert a
> dbg.value instruction.
>
> In the future I think we should examine which part of the variable
> that is accessed, and add a DbgValue instrinsic with an appropriate
> DW_OP_LLVM_fragment expression.
>
> Reviewers: dblaikie, aprantl, rnk
>
> Reviewed By: aprantl
>
> Subscribers: JDevlieghere, llvm-commits
>
> Tags: #debug-info
>
> Differential Revision: https://reviews.llvm.org/D48024
llvm-svn: 334830
Summary:
We already do it for splat constants, but not just values.
Also, undef cases are mostly non-functional.
The original commit was reverted because
it broke tests for amdgpu backend, which i didn't check.
Now, the backed was updated to recognize these new
patterns, so we are good.
https://bugs.llvm.org/show_bug.cgi?id=37603https://rise4fun.com/Alive/cplX
Reviewers: spatel, craig.topper, mareko, bogner, rampitec, nhaehnle, arsenm
Reviewed By: spatel, rampitec, nhaehnle
Subscribers: wdng, nhaehnle, llvm-commits
Differential Revision: https://reviews.llvm.org/D47980
llvm-svn: 334818
IEEE 754 defines the expected result on overflow. As far as I know,
hardware implementations (of f16), and compiler-rt (__floatuntisf)
correctly return +-Inf on overflow. And I can't think of any useful
transform that would take advantage of overflow being undefined here.
Differential Revision: https://reviews.llvm.org/D47807
llvm-svn: 334777
This patches teaches EarlyCSE to figure out that if `and i1 %x, %y` is true then both
`%x` and `%y` are true in the taken branch, and if `or i1 %x, %y` is false then both
`%x` and `%y` are false in non-taken branch. Fix for PR37635.
Differential Revision: https://reviews.llvm.org/D47574
Reviewed By: reames
llvm-svn: 334707
Summary:
Do not convert a DbgDeclare to DbgValue if the store
instruction only refer to a fragment of the variable
described by the DbgDeclare.
Problem was seen when for example having an alloca for an
array or struct, and there were stores to individual elements.
In the past we inserted a DbgValue intrinsics for each store,
just as if the store wrote the whole variable.
When handling store instructions we insert a DbgValue that
indicates that the variable is "undefined", as we do not know
which part of the variable that is updated by the store.
When ConvertDebugDeclareToDebugValue is used with a load/phi
instruction we assert that the referenced value is large enough
to cover the whole variable. Afaict this should be true for all
scenarios where those methods are used on trunk. If the assert
blows in the future I guess we could simply skip to insert a
dbg.value instruction.
In the future I think we should examine which part of the variable
that is accessed, and add a DbgValue instrinsic with an appropriate
DW_OP_LLVM_fragment expression.
Reviewers: dblaikie, aprantl, rnk
Reviewed By: aprantl
Subscribers: JDevlieghere, llvm-commits
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D48024
llvm-svn: 334704
IndVarSimplify sometimes makes transforms basing on users that are trivially dead. In particular,
if DCE wasn't run before it, there may be a dead `sext/zext` in loop that will trigger widening
transforms, however it makes no sense to do it.
This patch teaches IndVarsSimplify ignore the mist trivial cases of that.
Differential Revision: https://reviews.llvm.org/D47974
Reviewed By: sanjoy
llvm-svn: 334567
Name table occupies a big chunk of size in current binary format sample profile.
In order to reduce its size, the patch changes the sample writer/reader to
save/restore MD5Hash of names in the name table. Sample annotation phase will
also use MD5Hash of name to query samples accordingly.
Experiment shows compact binary format can reduce the size of sample profile by
2/3 compared with binary format generally.
Differential Revision: https://reviews.llvm.org/D47955
llvm-svn: 334447
Summary:
Previously we would add them for adds, but not multiplies.
Reviewers: sanjoy
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D48038
llvm-svn: 334428