The common phi value transform replaces constants with values that
have the same value as the constant on a given edge. However, LVI
generally only provides information that is correct up to poison,
so this can end up replacing a well-defined value with poison.
D69442 addressed an instance of this problem by clearing poison
flags on the generating instruction, which was sufficient at the
time. rGa917fb89dc28 made LVI's edge value analysis slightly more
powerful, and clearing poison flags is no longer sufficient.
This patch changes the transform to instead explicitly guard against
a poison value instead. This should be satisfied for most cases due
to a prior branch on poison.
Fixes https://bugs.llvm.org/show_bug.cgi?id=50399.
Differential Revision: https://reviews.llvm.org/D102966
Also check the case where one operand isn't constant, which isn't
handled right now, because the SPF code requires both operands
to be ranges.
Move the tests to directly check ranges rather than go through an
and, to make it more obvious that this has no relation to bitmasks.
Even if one of the operands is overdefined, we may still produce
a non-overdefined result, e.g. due to a min/max operation. This
matches our handling elsewhere, e.g. for binary operators.
The slot poisoning comment refers to a much older LVI cache
implementation.
Instead of handling a number of special cases for selects, handle
this generally when inferring ranges from conditions. We already
infer ranges from `x + C pred C2` to `x`, so doing the same for
`x pred C2` to `x + C` is straightforward.
These tests didn't test the pattern they were supposed to, because
%a instead of %add was used in the select, which turned this into
a normal min/max).
Noticed this when commenting out the clamp handling code did not
result in any test failures...
LVI previously handled "if (L && R)" conditions, but not
"if (L || R)" conditions. The latter case can still produce
useful information if L and R both constrain the same variable.
This adds support for handling the "if (L || R)" case as well.
The only difference is that we take the union instead of the
intersection of the lattice values.
Following the discussion in D93065, this adds m_LogicalAnd() and
m_LogicalOr() matchers, that match A && B and A || B logical
operations, either as bitwise operations or select expressions.
As an example usage, LVI is adapted to use these matchers for its
condition reasoning.
The plan here is to switch other parts of LLVM that reason about
and/or of conditions to also support the select forms, and then
merge D93065 (or a variant thereof) to disable the poison-unsafe
select to and/or transform.
Differential Revision: https://reviews.llvm.org/D93827
InstCombine canonicalizes X>C && X<C' style comparisons into
(X+C1)<C2. This type of expression is recognized by some analyses
like LVI, but currently not when used inside assumptions, because
AssumptionCache does not track affected values for it.
CVP currently handles switches by checking an equality predicate
on all edges from predecessor blocks. Of course, this can only
work if the value being switched over is defined in a different block.
Replace this implementation with a call to getPredicateAt(), which
also does the predecessor edge predicate check (if not defined in
the same block), but can also do quite a bit more: It can reason
about phi-nodes by checking edge predicates for incoming values,
it can reason about assumes, and it can reason about block values.
As such, this makes the implementation both simpler and more
powerful. The compile-time impact on CTMark is in the noise.
These cover cases handled by getPredicateAt(), but not by the
current implementation:
* Assumes based on context instruction.
* Value from phi node in same block (using per-pred reasoning).
* Value from non-phi node in same block (using block-val reasoning).
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D88609
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D88609
For a call site which had both constant deopt operands and nonnull arguments, we were missing the opportunity to recognize the later by bailing early.
This is somewhat of a speculative fix. Months ago, I'd had a private report of performance and compile time regressions from the deopt operand folding. I never received a test case. However, the only possibility I see was that after that change CVP missed the nonnull fold, and we end up with a pass ordering/missed simplification issue. So, since it's a real issue, fix it and hope.
Add a flag to getPredicateAt() that allows making use of the block
value. This allows us to take into account range information from
the current block, rather than only information that is threaded
over edges, making the icmp simplification in CVP a lot more
powerful.
I'm not changing getPredicateAt() to use the block value
unconditionally to avoid any impact on the JumpThreading pass,
which is somewhat picky about LVI query order.
Most test changes here are just icmps that now get dropped (while
previously only a result used in a return was replaced). The three
tests in icmp.ll show some representative improvements. Some of
the folds this enables have been covered by IPSCCP in the meantime,
but LVI can reason about some cases which are hard to support in
IPSCCP, such as in test_br_cmp_with_offset.
The compile-time time cost of doing this is fairly minimal, with
a ~0.05% CTMark regression for ReleaseThinLTO:
https://llvm-compile-time-tracker.com/compare.php?from=709d03f8af4da4204849a70f01798e7cebba2e32&to=6236fd503761f43c99f4537121e057a01056f185&stat=instructions
This is because the block values will typically already be queried
and cached by other CVP optimizations anyway.
Differential Revision: https://reviews.llvm.org/D69686
D69686 will be able to determine that the icmp is always false.
As this is not the purpose of the test, use a different modulus
that doesn't trivialize the condition.
This fold was the only place not passing the context instruction.
The tests worked around that fact by introducing a basic block split,
which is now no longer necessary.
This is a continuation of 8d487668d0,
the logic is pretty much identical for SRem:
Name: pos pos
Pre: C0 >= 0 && C1 >= 0
%r = srem i8 C0, C1
=>
%r = urem i8 C0, C1
Name: pos neg
Pre: C0 >= 0 && C1 <= 0
%r = srem i8 C0, C1
=>
%r = urem i8 C0, -C1
Name: neg pos
Pre: C0 <= 0 && C1 >= 0
%r = srem i8 C0, C1
=>
%t0 = urem i8 -C0, C1
%r = sub i8 0, %t0
Name: neg neg
Pre: C0 <= 0 && C1 <= 0
%r = srem i8 C0, C1
=>
%t0 = urem i8 -C0, -C1
%r = sub i8 0, %t0
https://rise4fun.com/Alive/Vd6
Now, this new logic does not result in any new catches
as of vanilla llvm test-suite + RawSpeed.
but it should be virtually compile-time free,
and it may be important to be consistent in their handling,
because if we had a pair of sdiv-srem, and only converted one of them,
-divrempairs will no longer see them as a pair,
and thus not "merge" them.
InstCombine likes to canonicalize comparisons of the form
X == C || X == C+1 into (X & -2) == C'. Make sure LVI can still
recover the value range from this. Can of course also be useful
for proper mask comparisons.
For the sake of clarity, the implementation goes through KnownBits
to compute the range.
This got reverted because a dependency was reverted. It has since
been reapplied, so reapply this as well.
-----
Related to D69686. As noted there, LVI currently behaves differently
for integer and pointer values: For integers, the block value is always
valid inside the basic block, while for pointers it is only valid at
the end of the basic block. I believe the integer behavior is the
correct one, and CVP relies on it via its getConstantRange() uses.
The reason for the special pointer behavior is that LVI checks whether
a pointer is dereferenced in a given basic block and marks it as
non-null in that case. Of course, this information is valid only after
the dereferencing instruction, or in conservative approximation,
at the end of the block.
This patch changes the treatment of dereferencability: Instead of
including it inside the block value, we instead treat it as something
similar to an assume (it essentially is a non-nullness assume) and
incorporate this information in intersectAssumeOrGuardBlockValueConstantRange()
if the context instruction is the terminator of the basic block.
This happens either when determining an edge-value internally in LVI,
or when a terminator was explicitly passed to getValueAt(). The latter
case makes this more powerful than the previous implementation as
a side-effect, and this does actually seem benefitial in practice.
Of course, we do not want to recompute dereferencability on each
intersectAssume call, so we need a new cache for this. The
dereferencability analysis requires walking the entire basic block
and computing underlying objects of all memory operands. This was
previously done separately for each queried pointer value. In the
new implementation (both because this makes the caching simpler,
and because it is faster), I instead only walk the full BB once and
cache all the dereferenced pointers. So the traversal is now performed
only once per BB, instead of once per queried pointer value.
I think the overall model now makes more sense than before, and there
will be no more pitfalls due to differing integer/pointer behavior.
Differential Revision: https://reviews.llvm.org/D69914
Pass the abs poison flag to the underlying ConstantRange
implementation, allowing CVP to simplify based on it.
Importantly, this recognizes that abs with poison flag is actually
non-negative...
Yes, if operands are non-positive this comes at the extra cost
of two extra negations. But a. division is already just
ridiculously costly, two more subtractions can't hurt much :)
and b. we have better/more analyzes/folds for an unsigned division,
we could end up narrowing it's bitwidth, converting it to lshr, etc.
This is essentially a take two on 0fdcca07ad,
which didn't fix the potential regression i was seeing,
because ValueTracking's computeKnownBits() doesn't make use
of dominating conditions in it's analysis.
While i could teach it that, this seems like the more general fix.
This big hammer actually does catch said potential regression.
Over vanilla test-suite + RawSpeed + darktable
(10M IR instrs, 1M IR BB, 1M X86 ASM instrs), this fires/converts 5 more
(+2%) SDiv's, the total instruction count at the end of middle-end pipeline
is only +6, so out of +10 extra negations, ~half are folded away,
and asm instr count is only +1, so practically speaking all extra
negations are folded away and are therefore free.
Sadly, all these new UDiv's remained, none folded away.
But there are two less basic blocks.
https://rise4fun.com/Alive/VS6
Name: v0
Pre: C0 >= 0 && C1 >= 0
%r = sdiv i8 C0, C1
=>
%r = udiv i8 C0, C1
Name: v1
Pre: C0 <= 0 && C1 >= 0
%r = sdiv i8 C0, C1
=>
%t0 = udiv i8 -C0, C1
%r = sub i8 0, %t0
Name: v2
Pre: C0 >= 0 && C1 <= 0
%r = sdiv i8 C0, C1
=>
%t0 = udiv i8 C0, -C1
%r = sub i8 0, %t0
Name: v3
Pre: C0 <= 0 && C1 <= 0
%r = sdiv i8 C0, C1
=>
%r = udiv i8 -C0, -C1
Currently that fold requires both operands to be non-negative,
but the only real requirement for the fold is that we must know
the domains of the operands.
InstCombine may convert conditions like (x < C) && (y < C) into
(x | y) < C (for some C). This patch teaches LVI to recognize that
in this case, it can infer either x < C or y < C along the edge.
This fixes the issue reported at
https://github.com/rust-lang/rust/issues/73827.
Differential Revision: https://reviews.llvm.org/D82715