CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D88609
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D88609
For a call site which had both constant deopt operands and nonnull arguments, we were missing the opportunity to recognize the later by bailing early.
This is somewhat of a speculative fix. Months ago, I'd had a private report of performance and compile time regressions from the deopt operand folding. I never received a test case. However, the only possibility I see was that after that change CVP missed the nonnull fold, and we end up with a pass ordering/missed simplification issue. So, since it's a real issue, fix it and hope.
Add a flag to getPredicateAt() that allows making use of the block
value. This allows us to take into account range information from
the current block, rather than only information that is threaded
over edges, making the icmp simplification in CVP a lot more
powerful.
I'm not changing getPredicateAt() to use the block value
unconditionally to avoid any impact on the JumpThreading pass,
which is somewhat picky about LVI query order.
Most test changes here are just icmps that now get dropped (while
previously only a result used in a return was replaced). The three
tests in icmp.ll show some representative improvements. Some of
the folds this enables have been covered by IPSCCP in the meantime,
but LVI can reason about some cases which are hard to support in
IPSCCP, such as in test_br_cmp_with_offset.
The compile-time time cost of doing this is fairly minimal, with
a ~0.05% CTMark regression for ReleaseThinLTO:
https://llvm-compile-time-tracker.com/compare.php?from=709d03f8af4da4204849a70f01798e7cebba2e32&to=6236fd503761f43c99f4537121e057a01056f185&stat=instructions
This is because the block values will typically already be queried
and cached by other CVP optimizations anyway.
Differential Revision: https://reviews.llvm.org/D69686
D69686 will be able to determine that the icmp is always false.
As this is not the purpose of the test, use a different modulus
that doesn't trivialize the condition.
This fold was the only place not passing the context instruction.
The tests worked around that fact by introducing a basic block split,
which is now no longer necessary.
This is a continuation of 8d487668d0,
the logic is pretty much identical for SRem:
Name: pos pos
Pre: C0 >= 0 && C1 >= 0
%r = srem i8 C0, C1
=>
%r = urem i8 C0, C1
Name: pos neg
Pre: C0 >= 0 && C1 <= 0
%r = srem i8 C0, C1
=>
%r = urem i8 C0, -C1
Name: neg pos
Pre: C0 <= 0 && C1 >= 0
%r = srem i8 C0, C1
=>
%t0 = urem i8 -C0, C1
%r = sub i8 0, %t0
Name: neg neg
Pre: C0 <= 0 && C1 <= 0
%r = srem i8 C0, C1
=>
%t0 = urem i8 -C0, -C1
%r = sub i8 0, %t0
https://rise4fun.com/Alive/Vd6
Now, this new logic does not result in any new catches
as of vanilla llvm test-suite + RawSpeed.
but it should be virtually compile-time free,
and it may be important to be consistent in their handling,
because if we had a pair of sdiv-srem, and only converted one of them,
-divrempairs will no longer see them as a pair,
and thus not "merge" them.
InstCombine likes to canonicalize comparisons of the form
X == C || X == C+1 into (X & -2) == C'. Make sure LVI can still
recover the value range from this. Can of course also be useful
for proper mask comparisons.
For the sake of clarity, the implementation goes through KnownBits
to compute the range.
This got reverted because a dependency was reverted. It has since
been reapplied, so reapply this as well.
-----
Related to D69686. As noted there, LVI currently behaves differently
for integer and pointer values: For integers, the block value is always
valid inside the basic block, while for pointers it is only valid at
the end of the basic block. I believe the integer behavior is the
correct one, and CVP relies on it via its getConstantRange() uses.
The reason for the special pointer behavior is that LVI checks whether
a pointer is dereferenced in a given basic block and marks it as
non-null in that case. Of course, this information is valid only after
the dereferencing instruction, or in conservative approximation,
at the end of the block.
This patch changes the treatment of dereferencability: Instead of
including it inside the block value, we instead treat it as something
similar to an assume (it essentially is a non-nullness assume) and
incorporate this information in intersectAssumeOrGuardBlockValueConstantRange()
if the context instruction is the terminator of the basic block.
This happens either when determining an edge-value internally in LVI,
or when a terminator was explicitly passed to getValueAt(). The latter
case makes this more powerful than the previous implementation as
a side-effect, and this does actually seem benefitial in practice.
Of course, we do not want to recompute dereferencability on each
intersectAssume call, so we need a new cache for this. The
dereferencability analysis requires walking the entire basic block
and computing underlying objects of all memory operands. This was
previously done separately for each queried pointer value. In the
new implementation (both because this makes the caching simpler,
and because it is faster), I instead only walk the full BB once and
cache all the dereferenced pointers. So the traversal is now performed
only once per BB, instead of once per queried pointer value.
I think the overall model now makes more sense than before, and there
will be no more pitfalls due to differing integer/pointer behavior.
Differential Revision: https://reviews.llvm.org/D69914
Pass the abs poison flag to the underlying ConstantRange
implementation, allowing CVP to simplify based on it.
Importantly, this recognizes that abs with poison flag is actually
non-negative...
Yes, if operands are non-positive this comes at the extra cost
of two extra negations. But a. division is already just
ridiculously costly, two more subtractions can't hurt much :)
and b. we have better/more analyzes/folds for an unsigned division,
we could end up narrowing it's bitwidth, converting it to lshr, etc.
This is essentially a take two on 0fdcca07ad,
which didn't fix the potential regression i was seeing,
because ValueTracking's computeKnownBits() doesn't make use
of dominating conditions in it's analysis.
While i could teach it that, this seems like the more general fix.
This big hammer actually does catch said potential regression.
Over vanilla test-suite + RawSpeed + darktable
(10M IR instrs, 1M IR BB, 1M X86 ASM instrs), this fires/converts 5 more
(+2%) SDiv's, the total instruction count at the end of middle-end pipeline
is only +6, so out of +10 extra negations, ~half are folded away,
and asm instr count is only +1, so practically speaking all extra
negations are folded away and are therefore free.
Sadly, all these new UDiv's remained, none folded away.
But there are two less basic blocks.
https://rise4fun.com/Alive/VS6
Name: v0
Pre: C0 >= 0 && C1 >= 0
%r = sdiv i8 C0, C1
=>
%r = udiv i8 C0, C1
Name: v1
Pre: C0 <= 0 && C1 >= 0
%r = sdiv i8 C0, C1
=>
%t0 = udiv i8 -C0, C1
%r = sub i8 0, %t0
Name: v2
Pre: C0 >= 0 && C1 <= 0
%r = sdiv i8 C0, C1
=>
%t0 = udiv i8 C0, -C1
%r = sub i8 0, %t0
Name: v3
Pre: C0 <= 0 && C1 <= 0
%r = sdiv i8 C0, C1
=>
%r = udiv i8 -C0, -C1
Currently that fold requires both operands to be non-negative,
but the only real requirement for the fold is that we must know
the domains of the operands.
InstCombine may convert conditions like (x < C) && (y < C) into
(x | y) < C (for some C). This patch teaches LVI to recognize that
in this case, it can infer either x < C or y < C along the edge.
This fixes the issue reported at
https://github.com/rust-lang/rust/issues/73827.
Differential Revision: https://reviews.llvm.org/D82715
This is D77454, except for stores. All the infrastructure work was done
for loads, so the remaining changes necessary are relatively small.
Differential Revision: https://reviews.llvm.org/D79968
The "null-pointer-is-valid" attribute needs to be checked by many
pointer-related combines. To make the check more efficient, convert
it from a string into an enum attribute.
In the future, this attribute may be replaced with data layout
properties.
Differential Revision: https://reviews.llvm.org/D78862
For IR generated by a compiler, this is really simple: you just take the
datalayout from the beginning of the file, and apply it to all the IR
later in the file. For optimization testcases that don't care about the
datalayout, this is also really simple: we just use the default
datalayout.
The complexity here comes from the fact that some LLVM tools allow
overriding the datalayout: some tools have an explicit flag for this,
some tools will infer a datalayout based on the code generation target.
Supporting this properly required plumbing through a bunch of new
machinery: we want to allow overriding the datalayout after the
datalayout is parsed from the file, but before we use any information
from it. Therefore, IR/bitcode parsing now has a callback to allow tools
to compute the datalayout at the appropriate time.
Not sure if I covered all the LLVM tools that want to use the callback.
(clang? lli? Misc IR manipulation tools like llvm-link?). But this is at
least enough for all the LLVM regression tests, and IR without a
datalayout is not something frontends should generate.
This change had some sort of weird effects for certain CodeGen
regression tests: if the datalayout is overridden with a datalayout with
a different program or stack address space, we now parse IR based on the
overridden datalayout, instead of the one written in the file (or the
default one, if none is specified). This broke a few AVR tests, and one
AMDGPU test.
Outside the CodeGen tests I mentioned, the test changes are all just
fixing CHECK lines and moving around datalayout lines in weird places.
Differential Revision: https://reviews.llvm.org/D78403
Currently an unknown/undef value is marked as overdefined when merged
with an empty range. An empty range can occur in unreachable/dead code.
When merging the new unknown state (= no value known yet) with an empty
range, there still isn't any information about the value yet and we can
stay in unknown.
This gives a few nice improvements on the number of instructions removed
by IPSCCP:
Same hash: 170 (filtered out)
Remaining: 67
Metric: sccp.IPNumInstRemoved
Program base patch diff
test-suite...rks/FreeBench/mason/mason.test 3.00 6.00 100.0%
test-suite...nchmarks/McCat/18-imp/imp.test 3.00 5.00 66.7%
test-suite...C/CFP2000/179.art/179.art.test 2.00 3.00 50.0%
test-suite...ijndael/security-rijndael.test 2.00 3.00 50.0%
test-suite...ks/Prolangs-C/agrep/agrep.test 40.00 58.00 45.0%
test-suite...ce/Applications/Burg/burg.test 26.00 37.00 42.3%
test-suite...cCat/03-testtrie/testtrie.test 3.00 4.00 33.3%
test-suite...Source/Benchmarks/sim/sim.test 29.00 36.00 24.1%
test-suite.../Applications/spiff/spiff.test 9.00 11.00 22.2%
test-suite...s/FreeBench/neural/neural.test 5.00 6.00 20.0%
test-suite...pplications/treecc/treecc.test 66.00 79.00 19.7%
test-suite...langs-C/football/football.test 85.00 101.00 18.8%
test-suite...ce/Benchmarks/PAQ8p/paq8p.test 90.00 105.00 16.7%
test-suite...oxyApps-C++/miniFE/miniFE.test 37.00 43.00 16.2%
test-suite...rks/FreeBench/pifft/pifft.test 26.00 30.00 15.4%
test-suite...lications/sqlite3/sqlite3.test 481.00 548.00 13.9%
test-suite...marks/7zip/7zip-benchmark.test 4875.00 5522.00 13.3%
test-suite.../CINT2000/176.gcc/176.gcc.test 1117.00 1197.00 7.2%
test-suite...0.perlbench/400.perlbench.test 1618.00 1732.00 7.0%
Reviewers: efriedma, nikic, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D78667
Currently ConstantRange::binaryAnd/binaryOr results are too pessimistic
for single element constant ranges.
If both operands are single element ranges, we can use APInt's AND and
OR implementations directly.
Note that some other binary operations on constant ranges can cover the
single element cases naturally, but for OR and AND this unfortunately is
not the case.
Reviewers: nikic, spatel, lebedev.ri
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D76446
This patch updates ValueLattice to distinguish between ranges that are
guaranteed to not include undef and ranges that may include undef.
A constant range guaranteed to not contain undef can be used to simplify
instructions to arbitrary values. A constant range that may contain
undef can only be used to simplify to a constant. If the value can be
undef, it might take a value outside the range. For example, consider
the snipped below
define i32 @f(i32 %a, i1 %c) {
br i1 %c, label %true, label %false
true:
%a.255 = and i32 %a, 255
br label %exit
false:
br label %exit
exit:
%p = phi i32 [ %a.255, %true ], [ undef, %false ]
%f.1 = icmp eq i32 %p, 300
call void @use(i1 %f.1)
%res = and i32 %p, 255
ret i32 %res
}
In the exit block, %p would be a constant range [0, 256) including undef as
%p could be undef. We can use the range information to replace %f.1 with
false because we remove the compare, effectively forcing the use of the
constant to be != 300. We cannot replace %res with %p however, because
if %a would be undef %cond may be true but the second use might not be
< 256.
Currently LazyValueInfo uses the new behavior just when simplifying AND
instructions and does not distinguish between constant ranges with and
without undef otherwise. I think we should address the remaining issues
in LVI incrementally.
Reviewers: efriedma, reames, aqjune, jdoerfert, sstefan1
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D76931
This patch adds a new singlecrfromundef lattice value, indicating a
single element constant range which was merge with undef at some point.
Merging it with another constant range results in overdefined, as we
won't be able to replace all users with a single value.
This patch uses a ConstantRange instead of a Constant*, because regular
integer constants are represented as single element constant ranges as
well and this allows the existing code working without additional
changes.
Reviewers: efriedma, nikic, reames, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D75845
This patch adds a new undef lattice state, which is used to represent
UndefValue constants or instructions producing undef.
The main difference to the unknown state is that merging undef values
with constants (or single element constant ranges) produces the
constant/constant range, assuming all uses of the merge result will be
replaced by the found constant.
Contrary, merging non-single element ranges with undef needs to go to
overdefined. Using unknown for UndefValues currently causes mis-compiles
in CVP/LVI (PR44949) and will become problematic once we use
ValueLatticeElement for SCCP.
Reviewers: efriedma, reames, davide, nikic
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D75120
CVP currently does not simplify cmps with instructions in the same
block, because LVI getPredicateAt() currently does not provide
much useful information for that case (D69686 would change that,
but is stuck.) However, if the instruction is a Phi node, then
LVI can compute the result of the predicate by threading it into
the predecessor blocks, which allows it simplify some conditions
that nothing else can handle. Relevant code:
6d6a4590c5/llvm/lib/Analysis/LazyValueInfo.cpp (L1904-L1927)
Differential Revision: https://reviews.llvm.org/D72169
This phi simplification transform was added with:
D45448
However as shown in PR43802:
https://bugs.llvm.org/show_bug.cgi?id=43802
...we must be careful not to propagate poison when we do the substitution.
There might be some more complicated analysis possible to retain the overflow flag,
but it should always be safe and easy to drop flags (we have similar behavior in
instcombine and other passes).
Differential Revision: https://reviews.llvm.org/D69442