This is the same change as D60420 but for signed sub rather than
signed add: Range information is intersected into the known bits
result, allows to detect more no/always overflow conditions.
Differential Revision: https://reviews.llvm.org/D60469
llvm-svn: 358020
This is D59386 for the signed add case. The computeConstantRange()
result is now intersected into the existing known bits information,
allowing to detect additional no-overflow/always-overflow conditions
(though the latter isn't used yet).
This (finally...) covers the motivating case from D59071.
Differential Revision: https://reviews.llvm.org/D60420
llvm-svn: 358014
Similar to:
rL358005
Forego folding arbitrary vector constants to fix a possible miscompile bug.
We can enhance the transform if we do want to handle the more complicated
vector case.
llvm-svn: 358013
// 0 - (X sdiv C) -> (X sdiv -C) provided the negation doesn't overflow.
This fold has been around for many years and nobody noticed the potential
vector miscompile from overflow until recently...
So it seems unlikely that there's much demand for a vector sdiv optimization
on arbitrary vector constants, so just limit the matching to splat constants
to avoid the possible bug.
Differential Revision: https://reviews.llvm.org/D60426
llvm-svn: 358005
A more general canonicalization between fdiv and fmul would not
handle this case because that would have to be limited by uses
to prevent 2 values from becoming 3 values:
(x/y) * (x/y) --> (x*x) / (y*y)
(But we probably should still have that limited -- but more general --
canonicalization independently of this change.)
llvm-svn: 357943
Fixes bug 40992: https://bugs.llvm.org/show_bug.cgi?id=40992
There is potential for miscompiled code emitted from JumpThreading when
analyzing a block with one or more indirectbr or callbr predecessors. The
ProcessThreadableEdges() function incorrectly folds conditional branches
into an unconditional branch.
This patch prevents incorrect branch folding without fully pessimizing
other potential threading opportunities through the same basic block.
This IR shape was manually fed in via opt and is unclear if clang and the
full pass pipeline will ever emit similar code shapes.
Thanks to Matthias Liedtke for the bug report and simplified IR example.
Differential Revision: https://reviews.llvm.org/D60284
llvm-svn: 357930
First step towards removing the MOVMSK intrinsics completely - this patch expands MOVMSK to the pattern:
e.g. PMOVMSKB(v16i8 x):
%cmp = icmp slt <16 x i8> %x, zeroinitializer
%int = bitcast <16 x i8> %cmp to i16
%res = zext i16 %int to i32
Which is correctly handled by ISel and FastIsel (give or take an annoying movzx move....): https://godbolt.org/z/rkrSFW
Differential Revision: https://reviews.llvm.org/D60256
llvm-svn: 357909
Add support for min/max flavor selects in computeConstantRange(),
which allows us to fold comparisons of a min/max against a constant
in InstSimplify. This fixes an infinite InstCombine loop, with the
test case taken from D59378.
Relative to the previous iteration, this contains some adjustments for
AMDGPU med3 tests: The AMDGPU target runs InstSimplify prior to codegen,
which ends up constant folding some existing med3 tests after this
change. To preserve these tests a hidden -amdgpu-scalar-ir-passes option
is added, which allows disabling scalar IR passes (that use InstSimplify)
for testing purposes.
Differential Revision: https://reviews.llvm.org/D59506
llvm-svn: 357870
This revision causes tests to fail under ASAN. Since the cause of the failures
is not clear (could be ASAN, could be a Clang bug, could be a bug in this
revision), the safest course of action seems to be to revert while investigating.
llvm-svn: 357667
Summary: Currently ProfileSummaryBuilder doesn't count into callsite samples when computing total samples. Considering that ProfileSummaryInfo is used to checked the hotness of not only body samples but also callsite samples (from SampleProfileLoader), I think the callsite sample counts should be considered when computing total samples.
Reviewers: eraman, danielcdh, wmi
Subscribers: hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59835
llvm-svn: 357627
The test should really be checking for the property directly in the
code object headers, but there are problems with this. I don't see
this directly represented in the text form, and for the binary
emission this is depending on a function level subtarget feature to
emit a global flag.
llvm-svn: 357558
Set the correct debug location on instructions which load arguments in
preparation for a call to an arg-promoted function.
This prevents location cascade from misattributing the line/scope of one
of these loads to the location of the instruction preceding the call.
Differential Revision: https://reviews.llvm.org/D60113
llvm-svn: 357500
Bug: https://bugs.llvm.org/show_bug.cgi?id=41180
In the bug test case the debug location was missing for the cmp instruction in
the "middle block" BB. This patch fixes the bug by copying the debug location
from the cmp of the scalar loop's terminator branch, if it exists.
The patch also fixes the debug location on the subsequent branch instruction.
It was previously using the location of the of the original loop's pre-header
block terminator. Both of these instructions will now map to the source line of
the conditional branch in the original loop.
A regression test has been added that covers these issues.
Patch by Orlando Cazalet-Hyams!
Differential Revision: https://reviews.llvm.org/D59944
llvm-svn: 357499
The code was failing to actually check for the presence of the call to widenable_condition. The whole point of specifying the widenable_condition intrinsic was allowing widening transforms. A normal branch is not widenable. A normal branch leading to a deopt is not widenable (in general).
I added a test case via LoopPredication, but GuardWidening has an analogous bug. Those are the only two passes actually using this utility just yet. Noticed while working on LoopPredication for non-widenable branches; POC in D60111.
llvm-svn: 357493
Summary:
When inserting an `unreachable` after a noreturn call, we must ensure
that it's not a musttail call to avoid breaking the IR invariants for
musttail calls.
Reviewers: fedor.sergeev, majnemer
Reviewed By: majnemer
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60080
llvm-svn: 357485
Summary: It is possible that multiple indirect call targets have been promoted for a single callsite from the profiled binary. Current implementation repeats promotion for all these targets as far as the callsite itself is hot (the callsite is assumed to be hot if any one of these targets was "hot" during the profiling). However, even when one of the ICPed target is hot other targets may not, and we should not repeat promotion for "cold" targets.
Reviewers: danielcdh, wmi
Subscribers: hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59940
llvm-svn: 357484
Summary:
When inserting an `unreachable` after a noreturn call, we must ensure
that it's not a musttail call to avoid breaking the IR invariants for
musttail calls.
Reviewers: fedor.sergeev, majnemer
Reviewed By: majnemer
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60079
llvm-svn: 357483
The code was previously checking that candidates for sinking had exactly
one use or were a store instruction (which can't have uses). This meant
we could sink call instructions only if they had a use.
That limitation seemed a bit arbitrary, so this patch changes it to
"instruction has zero or one use" which seems more natural and removes
the need to special-case stores.
Differential revision: https://reviews.llvm.org/D59936
llvm-svn: 357452
We'd been optimizing the case where the predicate was obviously true, do the same for the false case. Mostly just for completeness sake, but also may improve compile time in loops which will exit through the guard. Such loops are presumed rare in fastpath code, but may be present down untaken paths, so optimizing for them is still useful.
llvm-svn: 357408
LoopPredication was replacing the original condition, but leaving the instructions to compute the old conditions around. This would get cleaned up by other passes of course, but we might as well do it eagerly. That also makes the test output less confusing.
llvm-svn: 357406
I'm about to make some changes to the pass which cause widespread - but uninteresting - test diffs. Prepare the tests for easy updating.
llvm-svn: 357404
As highlighted by tests, if one of the operands is loop variant, but guaranteed to have the same value on all iterations, we have a missed oppurtunity.
llvm-svn: 357403
Summary:
This fixes PR41270.
The recursive function evaluateInDifferentElementOrder expects to be called
on a vector Value, so when we call it on a vector GEP's arguments, we must
first check that the argument is indeed a vector.
Reviewers: reames, spatel
Reviewed By: spatel
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60058
llvm-svn: 357389
This reverts commit 75216a6dbcfe5fb55039ef06a07e419fa875f4a5.
I'll recommit with a better commit message with reference to the
phabricator review.
llvm-svn: 357387
This fixes PR41270.
The recursive function evaluateInDifferentElementOrder expects to be called
on a vector Value, so when we call it on a vector GEP's arguments, we must
first check that the argument is indeed a vector.
llvm-svn: 357385
If we have a commutable vector binop with inverted select-shuffles,
we don't care about the order of the operands in each vector lane:
LHS = shuffle V1, V2, <0, 5, 6, 3>
RHS = shuffle V2, V1, <0, 5, 6, 3>
LHS + RHS --> <V1[0]+V2[0], V2[1]+V1[1], V2[2]+V1[2], V1[3]+V2[3]> --> V1 + V2
PR41304:
https://bugs.llvm.org/show_bug.cgi?id=41304
...is currently titled as an SLP enhancement, but at least for the
given example, we can reduce that in instcombine because we are just
eliminating shuffles.
As noted in the TODO, this could be generalized, but I haven't thought
through those patterns completely, so this is limited to what appears
to be always safe.
Differential Revision: https://reviews.llvm.org/D60048
llvm-svn: 357382
In PR41304:
https://bugs.llvm.org/show_bug.cgi?id=41304
...we have a case where we want to fold a binop of select-shuffle (blended) values.
Rather than try to match commuted variants of the pattern, we can canonicalize the
shuffles and check for mask equality with commuted operands.
We don't produce arbitrary shuffle masks in instcombine, but select-shuffles are a
special case that the backend is required to handle because we already canonicalize
vector select to this shuffle form.
So there should be no codegen difference from this change. It's possible that this
improves CSE in IR though.
Differential Revision: https://reviews.llvm.org/D60016
llvm-svn: 357366
Since this can be set with s_setreg*, it should not be a subtarget
property. Set a default based on the calling convention, and Introduce
a new amdgpu-dx10-clamp attribute to override this if desired.
Also introduce a new amdgpu-ieee attribute to match.
The values need to match to allow inlining. I think it is OK for the
caller's dx10-clamp attribute to override the callee, but there
doesn't appear to be the infrastructure to do this currently without
definining the attribute in the generic Attributes.td.
Eventually the calling convention lowering will need to insert a mode
switch somewhere for these.
llvm-svn: 357302
For the cases where the icmp/fcmp predicate is commutative, use reorderInputsAccordingToOpcode to collect and commute the operands.
This requires a helper to recognise commutativity in both general Instruction and CmpInstr types - the CmpInst::isCommutative doesn't overload the Instruction::isCommutative method for reasons I'm not clear on (maybe because its based on predicate not opcode?!?).
Differential Revision: https://reviews.llvm.org/D59992
llvm-svn: 357266
We should be able to match elements with the swapped predicate as well - as long as we commute the source operands.
Differential Revision: https://reviews.llvm.org/D59956
llvm-svn: 357243
For the attached test case, unchecked addition of immediate starts and
ends overflows, as they can be arbitrary i64 constants.
Proof: https://rise4fun.com/Alive/Plqc
Reviewers: qcolombet, gilr, efriedma
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D59218
llvm-svn: 357217
Even if the interleaving transform would otherwise be legal, we shouldn't
introduce an interleaved load that is wider than the original load: it might
have undefined behavior.
It might be possible to perform some sort of mask-narrowing transform in
some cases (using a narrower interleaved load, then extending the
results using shufflevectors). But I haven't tried to implement that,
at least for now.
Fixes https://bugs.llvm.org/show_bug.cgi?id=41245 .
Differential Revision: https://reviews.llvm.org/D59954
llvm-svn: 357212
Summary:
This adds a BranchFusion feature to replace the usage of the MacroFusion
for AMD CPUs.
See D59688 for context.
Reviewers: andreadb, lebedev.ri
Subscribers: hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59872
llvm-svn: 357171
With this change, the VPlan native path is triggered with the directive:
#pragma clang loop vectorize(enable)
There is no need to specify the vectorize_width(N) clause.
Patch by Francesco Petrogalli <francesco.petrogalli@arm.com>
Differential Revision: https://reviews.llvm.org/D57598
llvm-svn: 357156
The issue here is that we actually allow CGSCC passes to mutate IR (and
therefore invalidate analyses) outside of the current SCC. At a minimum,
we need to support mutating parent and ancestor SCCs to support the
ArgumentPromotion pass which rewrites all calls to a function.
However, the analysis invalidation infrastructure is heavily based
around not needing to invalidate the same IR-unit at multiple levels.
With Loop passes for example, they don't invalidate other Loops. So we
need to customize how we handle CGSCC invalidation. Doing this without
gratuitously re-running analyses is even harder. I've avoided most of
these by using an out-of-band preserved set to accumulate the cross-SCC
invalidation, but it still isn't perfect in the case of re-visiting the
same SCC repeatedly *but* it coming off the worklist. Unclear how
important this use case really is, but I wanted to call it out.
Another wrinkle is that in order for this to successfully propagate to
function analyses, we have to make sure we have a proxy from the SCC to
the Function level. That requires pre-creating the necessary proxy.
The motivating test case now works cleanly and is added for
ArgumentPromotion.
Thanks for the review from Philip and Wei!
Differential Revision: https://reviews.llvm.org/D59869
llvm-svn: 357137
Start using the uadd.sat and usub.sat intrinsics for the existing
canonicalizations. These intrinsics should optimize better than
expanded IR, have better handling in the X86 backend and should
be no worse than expanded IR in other backends, as far as we know.
rL357012 already introduced use of uadd.sat for the add+umin pattern.
Differential Revision: https://reviews.llvm.org/D58872
llvm-svn: 357103
Add baseline tests for canonicalization of
ssubo X, C -> saddo X, -C.
Patch by Dan Robertson.
Differential Revision: https://reviews.llvm.org/D59653
llvm-svn: 357013
This is the last step towards solving the examples shown in:
https://bugs.llvm.org/show_bug.cgi?id=14613
With this change, x86 should end up with psubus instructions
when those are available.
All known codegen issues with expanding the saturating intrinsics
were resolved with:
D59006 / rL356855
We also have some early evidence in D58872 that using the intrinsics
will lead to better perf. If some target regresses from this, custom
lowering of the intrinsics (as in the above for x86) may be needed.
llvm-svn: 357012
As discussed on D59738, this generalizes reorderInputsAccordingToOpcode to handle multiple + non-commutative instructions so we can get rid of reorderAltShuffleOperands and make use of the extra canonicalizations that reorderInputsAccordingToOpcode brings.
Differential Revision: https://reviews.llvm.org/D59784
llvm-svn: 356939
Remove attempts to commute non-Instructions to the LHS - the codegen changes appear to rely on chance more than anything else and also have a tendency to fight existing instcombine canonicalization which moves constants to the RHS of commutable binary ops.
This is prep work towards:
(a) reusing reorderInputsAccordingToOpcode for alt-shuffles and removing the similar reorderAltShuffleOperands
(b) improving reordering to optimized cases with commutable and non-commutable instructions to still find splat/consecutive ops.
Differential Revision: https://reviews.llvm.org/D59738
llvm-svn: 356913
Summary:
Between building the pair map and querying it there are a few places that
erase and create Values. It's rare but the address of these newly created
Values is occasionally the same as a just-erased Value that we already
have in the pair map. These coincidences should be accounted for to avoid
non-determinism.
Thanks to Roman Tereshin for the test case.
Reviewers: rtereshin, bogner
Reviewed By: rtereshin
Subscribers: mgrang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59401
llvm-svn: 356803
Just as as llvm IR supports explicitly specifying numeric value ids
for instructions, and emits them by default in textual output, now do
the same for blocks.
This is a slightly incompatible change in the textual IR format.
Previously, llvm would parse numeric labels as string names. E.g.
define void @f() {
br label %"55"
55:
ret void
}
defined a label *named* "55", even without needing to be quoted, while
the reference required quoting. Now, if you intend a block label which
looks like a value number to be a name, you must quote it in the
definition too (e.g. `"55":`).
Previously, llvm would print nameless blocks only as a comment, and
would omit it if there was no predecessor. This could cause confusion
for readers of the IR, just as unnamed instructions did prior to the
addition of "%5 = " syntax, back in 2008 (PR2480).
Now, it will always print a label for an unnamed block, with the
exception of the entry block. (IMO it may be better to print it for
the entry-block as well. However, that requires updating many more
tests.)
Thus, the following is supported, and is the canonical printing:
define i32 @f(i32, i32) {
%3 = add i32 %0, %1
br label %4
4:
ret i32 %3
}
New test cases covering this behavior are added, and other tests
updated as required.
Differential Revision: https://reviews.llvm.org/D58548
llvm-svn: 356789
Summary:
In C++, the behavior of casting a double value that is beyond the range
of a single precision floating-point to a float value is undefined. This
change replaces such a cast with APFloat::convert to convert the value,
which is consistent with how we convert a double value to a half value.
Reviewers: sanjoy
Subscribers: lebedev.ri, sanjoy, jlebar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59500
llvm-svn: 356781
This helps to avoid the situation where RA spots that only 3 of the
v4f32 result of a load are used, and immediately reallocates the 4th
register for something else, requiring a stall waiting for the load.
Differential Revision: https://reviews.llvm.org/D58906
Change-Id: I947661edfd5715f62361a02b100f14aeeada29aa
llvm-svn: 356768
are annotated with notail.
r356705 annotated calls to objc_retainAutoreleasedReturnValue with
notail on x86-64. This commit teaches ARC optimizer to check the notail
marker on the call before turning it into a tail call.
rdar://problem/38675807
llvm-svn: 356707
If they have other users we'll just end up increasing the instruction count.
We might be able to weaken this to only one of them having a single use if we can prove that the and will be removed.
Fixes PR41164.
Differential Revision: https://reviews.llvm.org/D59630
llvm-svn: 356690
This adds support for scalarizing these intrinsics as well the X86TargetTransformInfo support to avoid scalarizing them in the cases X86 can handle.
I've omitted handling special cases for constant masks for this first pass. Though CodeGenPrepare can constant fold the branch conditions and remove some of the control flow anyway.
Fixes PR40994 and is covers most of PR3666. Might want to implement constant masks to close that.
Differential Revision: https://reviews.llvm.org/D59180
llvm-svn: 356687
This is D59450, but for signed sub. This case is not NFC, because
the overflow logic in ConstantRange is more powerful than the existing
check. This resolves the TODO in the function.
I've added two tests to show that this indeed catches more cases than
the previous logic, but the main correctness test coverage here is in
the existing ConstantRange unit tests.
Differential Revision: https://reviews.llvm.org/D59617
llvm-svn: 356685
This is probably a bigger limitation than necessary, but since we don't have any evidence yet
that this transform led to real-world perf improvements rather than regressions, I'm making a
quick, blunt fix.
In the motivating x86 example from:
https://bugs.llvm.org/show_bug.cgi?id=41129
...and shown in the regression test, we want to avoid an extra instruction in the dominating
block because that could be costly.
The x86 LSR test diff is reversing the changes from D57789. There's no evidence that 1 version
is any better than the other yet.
Differential Revision: https://reviews.llvm.org/D59602
llvm-svn: 356665
If we know we're not storing a lane, we don't need to compute the lane. This could be improved by using the undef element result to further prune the mask, but I want to separate that into its own change since it's relatively likely to expose other problems.
Differential Revision: https://reviews.llvm.org/D57247
llvm-svn: 356590
Summary:
Before this patch, if any Use existed in the loop, with a defining
access in the loop, we conservatively decide to not move the store.
What this approach was missing, is that ordered loads are not Uses, they're Defs
in MemorySSA. So, even when the clobbering walker does not find that
volatile load to interfere, we still cannot hoist a store past a
volatile load.
Resolves PR41140.
Reviewers: george.burgess.iv
Subscribers: sanjoy, jlebar, Prazek, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59564
llvm-svn: 356588
This is a small followup to D59511. The code that was moved into
computeConstantRange() there is a bit overly conversative: If the
abs is not nsw, it does not compute any range. However, abs without
nsw still has a well-defined contiguous unsigned range from 0 to
SIGNED_MIN. This is a lot less useful than the usual 0 to SIGNED_MAX
range, but if we're already here we might as well specify it...
Differential Revision: https://reviews.llvm.org/D59563
llvm-svn: 356586