Commit Graph

468 Commits

Author SHA1 Message Date
Thomas Preud'homme 3b52c04e82 Make FindAvailableLoadedValue TBAA aware
FindAvailableLoadedValue() relies on FindAvailablePtrLoadStore() to run
the alias analysis when searching for an equivalent value. However,
FindAvailablePtrLoadStore() calls the alias analysis framework with a
memory location for the load constructed from an address and a size,
which thus lacks TBAA metadata info. This commit modifies
FindAvailablePtrLoadStore() to accept an optional memory location as
parameter to allow FindAvailableLoadedValue() to create it based on the
load instruction, which would then have TBAA metadata info attached.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D99206
2021-03-24 17:20:26 +00:00
Hongtao Yu c75da238b4 [CSSPGO] Deduplicating dangling pseudo probes.
Same dangling probes are redundant since they all have the same semantic that is to rely on the counts inference tool to get reasonable count for the same original block. Therefore, there's no need to keep multiple copies of them. I've seen jump threading created tons of redundant dangling probes that slowed down the compiler dramatically. Other optimization passes can also result in redundant probes though without an observed impact so far.

This change removes block-wise redundant dangling probes specifically introduced by jump threading. To support removing redundant dangling probes caused by all other passes, a final function-wise deduplication is also added.

An 18% size win of the .pseudo_probe section was seen for SPEC2017. No performance difference was observed.

Differential Revision: https://reviews.llvm.org/D97482
2021-03-03 22:44:42 -08:00
Hongtao Yu 8985515822 [CSSPGO] Unblocking optimizations by dangling pseudo probes.
This change fixes a couple places where the pseudo probe intrinsic blocks optimizations because they are not naturally removable. To unblock those optimizations, the blocking pseudo probes are moved out of the original blocks and tagged dangling, instead of allowing pseudo probes to be literally removed. The reason is that when the original block is removed, we won't be able to sample it. Instead of assigning it a zero weight, moving all its pseudo probes into another block and marking them dangling should allow the counts inference a chance to assign them a more reasonable weight. We have not seen counts quality degradation from our experiments.

The optimizations being unblocked are:

	1. Removing conditional probes for if-converted branches. Conditional probes are tagged dangling when their homing branch arms are folded so that they will not be over-counted.
	2. Unblocking jump threading from removing empty blocks. Pseudo probe prevents jump threading from removing logically empty blocks that only has one unconditional jump instructions.
	3. Unblocking SimplifyCFG and MIR tail duplicate to thread empty blocks and blocks with redundant branch checks.

Since dangling probes are logically deleted, they should not consume any samples in LTO postLink. This can be achieved by setting their distribution factors to zero when dangled.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D97481
2021-03-03 22:44:42 -08:00
Juneyoung Lee 365f5e2475 [JumpThreading] Fix tryToUnfoldSelectInCurrBB to treat and/or and its select form equally
This is a minor fix to update tryToUnfoldSelectInCurrBB to ignore select
form of and/ors because the function does not look into binops as well
2021-03-02 18:35:18 +09:00
Juneyoung Lee 19c2e12947 [JumpThreading] Update computeValueKnownInPredecessors to recognize logical and/or patterns
This allows JumpThreading's computeValueKnownInPredecessors to
recognize select form of and/or patterns as well.
2021-02-24 00:06:10 +09:00
Nikita Popov 5e7e499b91 [JumpThreading] Clone noalias.scope.decl when threading blocks
When cloning instructions during jump threading, also clone and
adapt any declared scopes. This is primarily important when
threading loop exits, because we'll end up with two dominating
scope declarations in that case (at least after additional loop
rotation). This addresses a loose thread from
https://reviews.llvm.org/rG2556b413a7b8#975012.

Differential Revision: https://reviews.llvm.org/D97154
2021-02-22 18:35:30 +01:00
Kazu Hirata 33bf1cad75 [llvm] Use *Set::contains (NFC) 2021-01-07 20:29:34 -08:00
Arthur Eubanks e30fbbe9a5 [JumpThreading][NewPM] Skip when target has divergent CF
Matches the legacy pass.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D94028
2021-01-04 16:08:08 -08:00
alex-t 35ec3ff76d Disable Jump Threading for the targets with divergent control flow
Details: Jump Threading does not make sense for the targets with divergent CF
         since they do not use branch prediction for speculative execution.
         Also in the high level IR there is no enough information to conclude that the branch is divergent or uniform.
         This may cause errors in further CF lowering.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D93302
2020-12-17 02:40:54 +03:00
Kazu Hirata e2fc11cf9f [JumpThreading] Call eraseBlock when folding a conditional branch
This patch teaches the jump threading pass to call BPI->eraseBlock
when it folds a conditional branch.

Without this patch, BranchProbabilityInfo could end up with stale edge
probabilities for the basic block containing the conditional branch --
one edge probability with less than 1.0 and the other for a removed
edge.

Differential Revision: https://reviews.llvm.org/D92608
2020-12-03 23:50:17 -08:00
Hongtao Yu f3c445697d [CSSPGO] IR intrinsic for pseudo-probe block instrumentation
This change introduces a new IR intrinsic named `llvm.pseudoprobe` for pseudo-probe block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.

A pseudo probe is used to collect the execution count of the block where the probe is instrumented. This requires a pseudo probe to be persisting. The LLVM PGO instrumentation also instruments in similar places by placing a counter in the form of atomic read/write operations or runtime helper calls. While these operations are very persisting or optimization-resilient, in theory we can borrow the atomic read/write implementation from PGO counters and cut it off at the end of compilation with all the atomics converted into binary data. This was our initial design and we’ve seen promising sample correlation quality with it. However, the atomics approach has a couple issues:

1. IR Optimizations are blocked unexpectedly. Those atomic instructions are not going to be physically present in the binary code, but since they are on the IR till very end of compilation, they can still prevent certain IR optimizations and result in lower code quality.
2. The counter atomics may not be fully cleaned up from the code stream eventually.
3. Extra work is needed for re-targeting.

We choose to implement pseudo probes based on a special LLVM intrinsic, which is expected to have most of the semantics that comes with an atomic operation but does not block desired optimizations as much as possible. More specifically the semantics associated with the new intrinsic enforces a pseudo probe to be virtually executed exactly the same number of times before and after an IR optimization. The intrinsic also comes with certain flags that are carefully chosen so that the places they are probing are not going to be messed up by the optimizer while most of the IR optimizations still work. The core flags given to the special intrinsic is `IntrInaccessibleMemOnly`, which means the intrinsic accesses memory and does have a side effect so that it is not removable, but is does not access memory locations that are accessible by any original instructions. This way the intrinsic does not alias with any original instruction and thus it does not block optimizations as much as an atomic operation does. We also assign a function GUID and a block index to an intrinsic so that they are uniquely identified and not merged in order to achieve good correlation quality.

Let's now look at an example. Given the following LLVM IR:

```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
  %cmp = icmp eq i32 %x, 0
   br i1 %cmp, label %bb1, label %bb2
bb1:
   br label %bb3
bb2:
   br label %bb3
bb3:
   ret void
}
```

The instrumented IR will look like below. Note that each `llvm.pseudoprobe` intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID.

```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
   %cmp = icmp eq i32 %x, 0
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 1)
   br i1 %cmp, label %bb1, label %bb2
bb1:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 2)
   br label %bb3
bb2:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 3)
   br label %bb3
bb3:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 4)
   ret void
}

```

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D86490
2020-11-20 10:39:24 -08:00
Arthur Eubanks 9e3b4f4941 [JumpThreading] Make -print-lvi-after-jump-threading work with NPM 2020-11-17 23:15:20 -08:00
Yevgeny Rouban a57fe210ff [JumpThreading] Fix branch probabilities in DuplicateCondBranchOnPHIIntoPred()
When instructions are cloned from block BB to PredBB in the method
DuplicateCondBranchOnPHIIntoPred() number of successors of PredBB
changes from 1 to number of successors of BB. So we have to copy
branch probabilities from BB to PredBB.

Reviewed By: Kazu Hirata

Differential Revision: https://reviews.llvm.org/D90841
2020-11-17 14:40:50 +07:00
Kazu Hirata 147ccc848a [JumpThreading] Call eraseBlock when folding a conditional branch
This patch teaches the jump threading pass to call BPI->eraseBlock
when it folds a conditional branch.

Without this patch, BranchProbabilityInfo could end up with stale edge
probabilities for the basic block containing the conditional branch --
one edge probability with less than 1.0 and the other for a removed
edge.

This patch is one of the steps before we can safely re-apply D91017.

Differential Revision: https://reviews.llvm.org/D91511
2020-11-15 22:29:30 -08:00
Kazu Hirata c95fff5be7 [JumpThreading] Fix function names (NFC) 2020-11-07 19:35:03 -08:00
Arthur Eubanks 5c31b8b94f Revert "Use uint64_t for branch weights instead of uint32_t"
This reverts commit 10f2a0d662.

More uint64_t overflows.
2020-10-31 00:25:32 -07:00
Arthur Eubanks 10f2a0d662 Use uint64_t for branch weights instead of uint32_t
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D88609
2020-10-30 10:03:46 -07:00
Kazu Hirata b2f05fae80 [JumpThreading] Remove extraneous calls to setEdgeProbability
This patch removes extraneous calls to setEdgeProbability introduced
in c91487769d.

The follow-up patch, a7b662d0f4, has
since fixed BranchProbabilityInfo::eraseBlock, so we don't need to
worry about getting stale values from getEdgeProbability.

Also, since getEdgeProbability(BB, BB->getSingleSuccessor()) returns
edge probability 1/1 by default for BB with exactly one successor
edge, we don't need to explicitly call setEdgeProbability.

This patch introduces almost no functional change, but we do end up
reducing debug messages from setEdgeProbability.

Differential Revision: https://reviews.llvm.org/D90284
2020-10-27 21:12:54 -07:00
Kazu Hirata c91487769d [JumpThreading] Set edge probabilities when creating basic blocks
This patch teaches the jump threading pass to set edge probabilities
whenever the pass creates new basic blocks.

Without this patch, the compiler sometimes produces non-deterministic
results.  The non-determinism comes from the jump threading pass using
stale edge probabilities in BranchProbabilityInfo.  Specifically, when
the jump threading pass creates a new basic block, we don't initialize
its outgoing edge probability.

Edge probabilities are maintained in:

  DenseMap<Edge, BranchProbability> Probs;

in class BranchProbabilityInfo, where Edge is an ordered pair of
BasicBlock * and a successor index declared as:

  using Edge = std::pair<const BasicBlock *, unsigned>;

Probs maps edges to their corresponding probabilities.

Now, we rarely remove entries from this map, so if we happen to
allocate a new basic block at the same address as a previously deleted
basic block with an edge probability assigned, the newly created basic
block appears to have an edge probability, albeit a stale one.

This patch fixes the problem by explicitly setting edge probabilities
whenever the jump threading pass creates new basic blocks.

Differential Revision: https://reviews.llvm.org/D90106
2020-10-27 16:07:27 -07:00
Nico Weber 2a4e704c92 Revert "Use uint64_t for branch weights instead of uint32_t"
This reverts commit e5766f25c6.
Makes clang assert when building Chromium, see https://crbug.com/1142813
for a repro.
2020-10-27 09:26:21 -04:00
Arthur Eubanks e5766f25c6 Use uint64_t for branch weights instead of uint32_t
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D88609
2020-10-26 20:24:04 -07:00
Juneyoung Lee 9b3c2a72e4 [ValueTracking] Use assume's noundef operand bundle
This patch updates `isGuaranteedNotToBeUndefOrPoison` to use `llvm.assume`'s `noundef` operand bundle.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D89219
2020-10-14 20:16:33 +09:00
Nikita Popov 9b959b59df [LVI] Require context instruction in external API (NFCI)
Require CxtI in getConstant() and getConstantRange() APIs.
Accordingly drop the BB parameter, as it is implied by
CxtI->getParent().

This makes sure we don't forget to pass the context instruction,
and makes the API contract clearer (also clean up the comments to
that effect -- the value holds at the context instruction, not
the end of the block).
2020-09-27 18:07:24 +02:00
David Stenberg bfcb824ba5 [JumpThreading] Fix an incorrect Modified status
This fixes PR47297.

When ProcessBlock() was able to constant fold the terminator's
condition, but not do any more transformations, the function would
return false, which would lead to the JumpThreading pass returning an
incorrect modified status. This patch makes so that ProcessBlock()
returns true in such cases. This will trigger an unnecessary invocation
of ProcessBlock() in such cases, but this should be rare to occur.

This was caught using the check introduced by D80916.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87392
2020-09-14 10:36:13 +02:00
Juneyoung Lee 39c1653b3d [JumpThreading] Conditionally freeze its condition when unfolding select
This patch fixes pr45956 (https://bugs.llvm.org/show_bug.cgi?id=45956 ).
To minimize its impact to the quality of generated code, I suggest enabling
this only for LTO as a start (it has two JumpThreading passes registered).
This patch contains a flag that makes JumpThreading enable it.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84940
2020-09-10 15:49:40 +09:00
Juneyoung Lee 9f717d7b94 [JumpThreading] Allow duplicating a basic block into preds when its branch condition is freeze(phi)
This is the last JumpThreading patch for getting the performance numbers shown at
https://reviews.llvm.org/D84940#2184653 .

This patch makes ProcessBlock call ProcessBranchOnPHI when the branch condition
is freeze(phi) as well (originally it calls the function when the condition is
phi only).

Since what ProcessBranchOnPHI does is to duplicate the basic block into
predecessors if profitable, it is still valid when the condition is freeze(phi)
too.

```
    p = phi [a, pred1] [b, pred2]
    p.fr = freeze p
    br p.fr, ...
=>
  pred1:
    p.fr = freeze a
    br p.fr, ...
  pred2:
    p.fr2 = freeze b
    br p.fr2, ...
```

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D85029
2020-08-06 09:51:17 +09:00
Juneyoung Lee e0d99e9aaf [JumpThreading] Consider freeze as a zero-cost instruction
This is a simple patch that makes freeze as a zero-cost instruction, as bitcast already is.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D85023
2020-08-05 14:42:36 +09:00
Juneyoung Lee e734e8286b [JumpThreading] Remove cast's constraint
As discussed in D84949, this removes the constraint to cast since it does not
cause compile time degradation.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D85188
2020-08-04 19:09:25 +09:00
Juneyoung Lee 6f97103b56 [JumpThreading] Don't limit the type of an operand
Compared to the optimized code with branch conditions never frozen,
limiting the type of freeze's operand causes generation of suboptimal code in
some cases.
I would like to suggest removing the constraint, as this patch does.
If the number of freeze instructions becomes significant, this can be revisited.

Differential Revision: https://reviews.llvm.org/D84949
2020-08-04 16:21:58 +09:00
Juneyoung Lee ad48367722 [JumpThreading] Let SimplifyPartiallyRedundantLoad look into freeze
This patch allows SimplifyPartiallyRedundantLoad work when
the branch condition was frozen.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84944
2020-07-31 15:28:24 +09:00
Juneyoung Lee 111a02decd [JumpThreading] Fold br(freeze(undef))
This patch makes JumpThreading fold br(freeze(undef)) if the freeze instruction
is only used by the branch.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84818
2020-07-30 09:38:50 +09:00
Juneyoung Lee 4c9af6d0e0 [JumpThreading] Add a basic support for freeze instruction
This patch adds a basic support for freeze instruction to JumpThreading
by making ComputeValueKnownInPredecessorsImpl look into its operand.

Reviewed By: efriedma, nikic

Differential Revision: https://reviews.llvm.org/D84598
2020-07-29 03:12:14 +09:00
Roman Lebedev 1da9834557
[JumpThreading] ProcessBranchOnXOR(): bailout if any pred ends in indirect branch (PR46857)
SplitBlockPredecessors() can not split blocks that have such terminators,
and in two other places we already ensure that we don't end up calling
SplitBlockPredecessors() on such blocks. Do so in one more place.

Fixes https://bugs.llvm.org/show_bug.cgi?id=46857
2020-07-27 15:39:03 +03:00
Yevgeny Rouban 707836ed4e [JumpThreading] Handle zero !prof branch_weights
Avoid division by zero in updatePredecessorProfileMetadata().

Reviewers: yamauchi
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81499
2020-06-12 11:55:15 +07:00
Kazu Hirata f355c7fc2f [JumpThreading] Simplify FindMostPopularDest (NFC)
Summary:
This patch simplifies FindMostPopularDest without changing the
functionality.

Given a list of jump threading destinations, the function finds the
most popular destination.  To ensure determinism when there are
multiple destinations with the highest popularity, the function picks
the first one in the successor list with the highest popularity.

Without this patch:

- The function populates DestPopularity -- a histogram mapping
  destinations to their respective occurrence counts.

- Then we iterate over DestPopularity, looking for the highest
  popularity while building a vector of destinations with the highest
  popularity.

- Finally, we iterate the successor list, looking for the destination
  with the highest popularity.

With this patch:

- We implement DestPopularity with MapVector instead of DenseMap.  We
  populate the map with popularity 0 for all successors in the order
  they appear in the successor list.

- We build the histogram in the same way as before.

- We simply use std::max_element on DestPopularity to find the most
  popular destination.  The use of MapVector ensures determinism.

Reviewers: wmi, efriedma

Reviewed By: wmi

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81030
2020-06-02 18:43:31 -07:00
Kazu Hirata c4990a03c6 [JumpThreading] Use emplace_back instead of push_back (NFC)
Summary: This patch replaces push_back with emplace_back where appropriate.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80688
2020-05-27 22:31:23 -07:00
Yevgeny Rouban 8138487468 [BrachProbablityInfo] Set edge probabilities at once and fix calcMetadataWeights()
Hide the method that allows setting probability for particular edge
and introduce a public method that sets probabilities for all
outgoing edges at once.
Setting individual edge probability is error prone. More over it is
difficult to check that the total probability is 1.0 because there is
no easy way to know when the user finished setting all
the probabilities.

Related bug is fixed in BranchProbabilityInfo::calcMetadataWeights().
Changing unreachable branch probabilities to raw(1) and distributing
the rest (oldProbability - raw(1)) over the reachable branches could
introduce total probability inaccuracy bigger than 1/numOfBranches.

Reviewers: yamauchi, ebrevnov
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79396
2020-05-21 12:52:37 +07:00
Nikita Popov 5fae613a4f [LVI] Don't require DominatorTree in LVI (NFC)
After D76797 the dominator tree is no longer used in LVI, so we
can remove it as a pass dependency, and also get rid of the
dominator tree enabling/disabling logic in JumpThreading.

Apart from cleaning up the code, this also clarifies LVI
cache consistency, in that the LVI cache can no longer
depend on whether the DT was or wasn't enabled due to
pending DT updates at any given time.

Differential Revision: https://reviews.llvm.org/D76985
2020-05-19 20:21:46 +02:00
Reid Kleckner 1370757dd0 Revert "[BrachProbablityInfo] Set edge probabilities at once. NFC."
This reverts commit eef95f2746.

The new assertion about branch propability sums does not hold.
2020-05-13 08:23:09 -07:00
Yevgeny Rouban eef95f2746 [BrachProbablityInfo] Set edge probabilities at once. NFC.
Hide the method that allows setting probability for particular
edge and introduce a public method that sets probabilities for
all outgoing edges at once.
Setting individual edge probability is error prone. More over
it is difficult to check that the total probability is 1.0
because there is no easy way to know when the user finished
setting all the probabilities.

Reviewers: yamauchi, ebrevnov
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79396
2020-05-13 13:55:36 +07:00
Kazu Hirata 91eb442fde [JumpThreading] NFC: Simplify ComputeValueKnownInPredecessorsImpl
Summary:
ComputeValueKnownInPredecessorsImpl is the main folding mechanism in
JumpThreading.cpp.  To avoid potential infinite recursion while
chasing use-def chains, it uses:

  DenseSet<std::pair<Value *, BasicBlock *>> &RecursionSet

to keep track of Value-BB pairs that we've processed.

Now, when ComputeValueKnownInPredecessorsImpl recursively calls
itself, it always passes BB as is, so the second element is always BB.

This patch simplifes the function by dropping "BasicBlock *" from
RecursionSet.

Reviewers: wmi, efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77699
2020-04-07 18:37:36 -07:00
Eli Friedman 3f13ee8a00 [NFC] Modernize misc. uses of Align/MaybeAlign APIs.
Use the current getAlign() APIs where it makes sense, and use Align
instead of MaybeAlign when we know the value is non-zero.
2020-04-06 17:53:04 -07:00
Evgenii Stepanov f9471b0010 Fix MSan false positive due to select folding.
Summary:
Select folding in JumpThreading can create a conditional branch on a
code patch that did not have one in the original program. This is not a
valid transformation in sanitize_memory functions.

Note that JumpThreading does select folding in 3 different places. Two
of them seem safe - they apply to a select instruction in a BB that ends
with an unconditional branch to another BB, which (in turn) ends with a
conditional branch or a switch with the same condition.

Fixes PR45220.

Reviewers: glider, dvyukov, efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76332
2020-03-31 15:25:42 -07:00
Kazu Hirata e23d786526 [JumpThreading] Fix infinite loop (PR44611)
Summary:
This patch fixes https://bugs.llvm.org/show_bug.cgi?id=44611 by
preventing an infinite loop in the jump threading pass when
-jump-threading-across-loop-headers is on.  Specifically, without this
patch, jump threading through two basic blocks would trigger on the
same area of the CFG over and over, resulting in an infinite loop.

Consider testcase PR44611-across-header-hang.ll in this patch.  The
first opportunity to thread through two basic blocks is:

  from bb_body2 through bb_header and bb_body1 to bb_body2.

The pass duplicates bb_header and bb_body1 as, say, bb_header.thread1
and bb_body1.thread1.  Since bb_header contains a successor edge back
to itself, bb_header.thread1 also contains a successor edge to
bb_header, immediately giving rise to the next jump threading
opportunity:

  from bb_header.thread1 through bb_header and bb_body1 to bb_body2.

After that, we repeatedly thread an incoming edge into bb_header
through bb_header and bb_body1 to bb_body2.  In other words, we keep
peeling one iteration from bb_header's self loop.

The patch fixes the problem by preventing the pass from duplicating a
basic block containing a self loop.

Reviewers: wmi, junparser, efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76390
2020-03-19 12:49:36 -07:00
Fangrui Song 13a97305ba [JumpThreading] Skip unconditional PredBB when threading jumps through two basic blocks
Fixes https://bugs.llvm.org/show_bug.cgi?id=44922 (caused by 4698bf145d)

ThreadThroughTwoBasicBlocks assumes PredBBBranch is conditional. The following code can segfault.

  AddPHINodeEntriesForMappedBlock(PredBBBranch->getSuccessor(1), PredBB, NewBB,
                                  ValueMapping);

We can also allow unconditional PredBB, but the produced code is not
better.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D74747
2020-02-18 11:01:46 -08:00
stozer ffeb64db35 Reapply "[DebugInfo] Prevent explosion of debug intrinsics during jump threading"
This reverts commit 6ded69f294.
2020-02-12 12:39:54 +00:00
stozer 6ded69f294 Revert "[DebugInfo] Prevent explosion of debug intrinsics during jump threading"
This reverts commit fe6f6cd6b8.

Found test failure on several buildbots.
2020-02-12 11:48:00 +00:00
stozer fe6f6cd6b8 [DebugInfo] Prevent explosion of debug intrinsics during jump threading
This patch is a fix following the revert of 72ce759
(https://reviews.llvm.org/rG72ce759928e6dfee6a9efa310b966c19722352ba)
and fixes the failure that it caused.

The above patch failed on the Thread Sanitizer buildbot with an out of
memory error. After an investigation, the cause was identified as an
explosion in debug intrinsics while running the Jump Threading pass on
ModuleMap.ll. The above patched prevented debug intrinsics from being
dropped when their Basic Block was deleted due to being "empty". In this
case, one of the functions in ModuleMap.ll had (after many optimization
passes) a very large number of debug intrinsics representing a set of
repeatedly inlined variables. Previously the vast majority of these were
silently dropped during Jump Threading when their blocks were deleted,
but as of the above patch they survived for longer, causing a large
increase in the number of debug intrinsics. These intrinsics were then
repeatedly cloned by the Jump Threading pass as edges were threaded,
multiplying the intrinsic count further. The memory consumed by this
process spiralled out of control, crashing the buildbot that uses TSan
(which has an estimated 5-10x memory overhead compared to non-sanitized
builds).

This patch adds RemoveRedundantDbgInstrs to the Jump Threading pass, in
order to reduce the number of debug intrinsics down to a manageable
amount in cases where many intrinsics for the same variable end up
bunched together contiguously, as in this case.

Differential Revision: https://reviews.llvm.org/D73054
2020-02-12 11:22:54 +00:00
Kazu Hirata 4698bf145d Resubmit^2: [JumpThreading] Thread jumps through two basic blocks
This reverts commit 41784bed01.

Since the original revision ead815924e,
this revision fixes three issues:

- This revision fixes the Windows build.  My original patch improperly
  copied EH pads on Windows.  This patch disregards jump threading
  opportunities having to do with EH pads.

- This revision fixes jump threading to a wrong destination.
  Specifically, my original patch treated any Constant other than 0 as 1
  while evaluating the branch condition.  This bug led to treating
  constant expressions like:

    icmp ugt i8* null, inttoptr (i64 4 to i8*)

  to "true".  This patch fixes the bug by calling isOneValue.

- This revision fixes the cost calculation of two basic blocks being
  threaded through.  Note that getJumpThreadDuplicationCost returns
  "(unsigned)~0" for those basic blocks that cannot be duplicated.  If
  we sum of two return values from getJumpThreadDuplicationCost, we
  could have an unsigned overflow like:

    (unsigned)~0 + 5 = 4

  and mistakenly determine that it's safe and profitable to proceed
  with the jump threading opportunity.  The patch fixes the bug by
  checking each return value before summing them up.

[JumpThreading] Thread jumps through two basic blocks

Summary:
This patch teaches JumpThreading.cpp to thread through two basic
blocks like:

  bb3:
    %var = phi i32* [ null, %bb1 ], [ @a, %bb2 ]
    %tobool = icmp eq i32 %cond, 0
    br i1 %tobool, label %bb4, label ...

  bb4:
    %cmp = icmp eq i32* %var, null
    br i1 %cmp, label bb5, label bb6

by duplicating basic blocks like bb3 above.  Once we duplicate bb3 as
bb3.dup and redirect edge bb2->bb3 to bb2->bb3.dup, we have:

  bb3:
    %var = phi i32* [ @a, %bb2 ]
    %tobool = icmp eq i32 %cond, 0
    br i1 %tobool, label %bb4, label ...

  bb3.dup:
    %var = phi i32* [ null, %bb1 ]
    %tobool = icmp eq i32 %cond, 0
    br i1 %tobool, label %bb4, label ...

  bb4:
    %cmp = icmp eq i32* %var, null
    br i1 %cmp, label bb5, label bb6

Then the existing code in JumpThreading.cpp can thread edge
bb3.dup->bb4 through bb4 and eventually create bb3.dup->bb5.

Reviewers: wmi

Subscribers: hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70247
2020-02-05 09:23:37 -08:00
Sam Parker 2663a25fad [JumpThreading] Half the duplicate threshold at Oz
Duplicating instructions can lead to code size increases but using
a threshold of 3 is good for reducing code size.

Differential Revision: https://reviews.llvm.org/D72916
2020-02-03 08:40:20 +00:00