llvm-project/llvm/test/Transforms
Adam Nemet c520822dbf [JumpThreading] Only write back branch-weight MDs for blocks that originally had PGO info
Currently the pass updates branch weights in the IR if the function has
any PGO info (entry frequency is set).  However we could still have
regions of the CFG that does not have branch weights collected (e.g. a
cold region).  In this case we'd use static estimates.  Since static
estimates for branches are determined independently, they are
inconsistent.  Updating them can "randomly" inflate block frequencies.

I've run into this in a completely cold loop of h264ref from
SPEC.  -Rpass-with-hotness showed the loop to be completely cold during
inlining (before JT) but completely hot during vectorization (after JT).

The new testcase demonstrate the problem.  We check array elements
against 1, 2 and 3 in a loop.  The check against 3 is the loop-exiting
check.  The block names should be self-explanatory.

In this example, jump threading incorrectly updates the weight of the
loop-exiting branch to 0, drastically inflating the frequency of the
loop (in the range of billions).

There is no run-time profile info for edges inside the loop, so branch
probabilities are estimated.  These are the resulting branch and block
frequencies for the loop body:

                check_1 (16)
            (8) /  |
            eq_1   | (8)
                \  |
                check_2 (16)
            (8) /  |
            eq_2   | (8)
                \  |
                check_3 (16)
            (1) /  |
       (loop exit) | (15)
                   |
              (back edge)

First we thread eq_1 -> check_2 to check_3.  Frequencies are updated to
remove the frequency of eq_1 from check_2 and then from the false edge
leaving check_2.  Changed frequencies are highlighted with * *:

                check_1 (16)
            (8) /  |
           eq_1~   | (8)
           /       |
          /     check_2 (*8*)
         /  (8) /  |
         \  eq_2   | (*0*)
          \     \  |
           ` --- check_3 (16)
            (1) /  |
       (loop exit) | (15)
                   |
              (back edge)

Next we thread eq_1 -> check_3 and eq_2 -> check_3 to check_1 as new
back edges.  Frequencies are updated to remove the frequency of eq_1 and
eq_3 from check_3 and then the false edge leaving check_3 (changed
frequencies are highlighted with * *):

                  check_1 (16)
              (8) /  |
             eq_1~   | (8)
             /       |
            /     check_2 (*8*)
           /  (8) /  |
          /-- eq_2~  | (*0*)
  (back edge)        |
                  check_3 (*0*)
            (*0*) /  |
         (loop exit) | (*0*)
                     |
                (back edge)

As a result, the loop exit edge ends up with 0 frequency which in turn makes
the loop header to have maximum frequency.

There are a few potential problems here:

1. The profile data seems odd.  There is a single profile sample of the
loop being entered.  On the other hand, there are no weights inside the
loop.

2. Based on static estimation we shouldn't set edges to "extreme"
values, i.e. extremely likely or unlikely.

3. We shouldn't create profile metadata that is calculated from static
estimation.  I am not sure what policy is but it seems to make sense to
treat profile metadata as something that is known to originate from
profiling.  Estimated probabilities should only be reflected in BPI/BFI.

Any one of these would probably fix the immediate problem.  I went for 3
because I think it's a good policy to have and added a FIXME about 2.

Differential Revision: https://reviews.llvm.org/D24118

llvm-svn: 280713
2016-09-06 16:08:33 +00:00
..
ADCE [PR27284] Reverse the ownership between DICompileUnit and DISubprogram. 2016-04-15 15:57:41 +00:00
AddDiscriminators Do not assign new discriminator for all intrinsics. 2016-08-05 17:56:49 +00:00
AlignmentFromAssumptions [PM] Port AlignmentFromAssumptions to the new PM. 2016-06-15 06:18:01 +00:00
ArgumentPromotion Remove the ScalarReplAggregates pass 2016-06-15 00:19:09 +00:00
AtomicExpand Support expanding partial-word cmpxchg to full-word cmpxchg in AtomicExpandPass. 2016-06-17 18:11:48 +00:00
BBVectorize Revert -r278267 [ValueTracking] An improvement to IR ValueTracking on Non-negative Integers 2016-08-22 13:14:07 +00:00
BDCE [PM] Port BDCE to the new pass manager. 2016-05-25 01:57:04 +00:00
BranchFolding
CodeExtractor CodeExtractor : Add ability to preserve profile data. 2016-08-02 02:15:45 +00:00
CodeGenPrepare [CodeGenPrepare] Don't sink a cast past its user 2016-04-27 19:36:38 +00:00
ConstProp Don't remove side effecting instructions due to ConstantFoldInstruction 2016-07-22 04:54:44 +00:00
ConstantHoisting This implements a more optimal algorithm for selecting a base constant in 2016-07-14 07:44:20 +00:00
ConstantMerge [PM] Port ConstantMerge to the new pass manager. 2016-05-05 00:51:09 +00:00
Coroutines [Coroutines] Part12: Handle alloca address-taken 2016-09-05 23:45:45 +00:00
CorrelatedValuePropagation CVP. Turn marking adds as no wrap (introduced by r278107) off by default 2016-08-18 16:08:35 +00:00
CountingFunctionInserter Add a counter-function insertion pass 2016-09-01 09:42:39 +00:00
CrossDSOCFI [PM] Port CrossDSOCFI to the new pass manager. 2016-07-09 03:25:35 +00:00
DCE Mark guards on true as "trivially dead" 2016-04-29 22:23:16 +00:00
DeadArgElim [PM] Port DeadArgumentElimination to the new PM 2016-06-12 09:16:39 +00:00
DeadStoreElimination [DSE] Don't remove stores made live by a call which unwinds. 2016-08-12 01:09:53 +00:00
EarlyCSE [EarlyCSE] Optionally use MemorySSA. NFC. 2016-08-31 19:24:10 +00:00
EliminateAvailableExternally [PM] Port EliminateAvailableExternally pass to the new pass manager. 2016-05-05 02:37:32 +00:00
Float2Int [PM] Port float2int to the new pass manager 2016-06-24 23:32:02 +00:00
ForcedFunctionAttrs
FunctionAttrs Forgot to add a test for r276008. 2016-07-20 04:13:05 +00:00
FunctionImport Don't import variadic functions 2016-08-11 22:13:57 +00:00
GCOVProfiling llvm/test/Transforms/GCOVProfiling/three-element-mdnode.ll: Use %/T instead of %T, not to emit backslashes. 2016-09-02 01:33:00 +00:00
GVN IntrArgMemOnly is only defined (and current AA machinery only sanely supports) pointer arguments, and these intrinsics have vector of pointer arguments. Remove ArgMemOnly until we either have the machinery, define a new attribute, or something similar 2016-08-30 19:58:48 +00:00
GVNHoist GVN-hoist: fix hoistingFromAllPaths for loops (PR29034) 2016-08-25 11:55:47 +00:00
GlobalDCE [GlobalDCE, Misc] Don't remove functions referenced by ifuncs 2016-05-04 00:20:48 +00:00
GlobalMerge CodeGen: Make the global-merge pass independently testable, and add a test. 2016-05-19 04:38:56 +00:00
GlobalOpt Revert "Revert "Invariant start/end intrinsics overloaded for address space"" 2016-08-13 23:31:24 +00:00
GuardWidening [GuardWidening] Fix incorrect use of remove_if 2016-05-21 02:24:44 +00:00
IPConstantProp [PM] Port Interprocedural SCCP to the new pass manager. 2016-05-05 21:05:36 +00:00
IRCE [IRCE] Create llvm::Loop instances for cloned out loops 2016-08-14 01:04:46 +00:00
IndVarSimplify Revert -r278269 [IndVarSimplify] Eliminate zext of a signed IV when the IV is known to be non-negative 2016-08-22 13:12:07 +00:00
InferFunctionAttrs Recommitting r275284: add support to inline __builtin_mempcpy 2016-07-29 18:23:18 +00:00
Inline Fix inliner funclet unwind memoization 2016-09-04 01:23:20 +00:00
InstCombine fix FileCheck variables for test added with r280677 2016-09-05 23:49:32 +00:00
InstMerge [PM] Port MergedLoadStoreMotion to the new pass manager, take two. 2016-06-17 19:10:09 +00:00
InstSimplify [instsimplify] Fix incorrect folding of an ordered fcmp with a vector of all NaN. 2016-09-02 14:47:43 +00:00
Internalize [Internalize] Test that __stack_chk_{guard, fail} are not internalized. 2016-06-05 19:08:54 +00:00
JumpThreading [JumpThreading] Only write back branch-weight MDs for blocks that originally had PGO info 2016-09-06 16:08:33 +00:00
LCSSA Revert "Revert r275883 and r275891. They seem to cause PR28608." 2016-07-20 01:55:27 +00:00
LICM New pass manager for LICM. 2016-07-12 22:37:48 +00:00
LoadCombine [LoadCombine] Combine Loads formed from GEPS with negative indexes 2016-06-19 06:14:56 +00:00
LoadStoreVectorizer [LoadStoreVectorizer] Change VectorSet to Vector to match head and tail positions. Resolves PR29148. 2016-08-30 23:53:59 +00:00
LoopDataPrefetch [PM] Port LoopDataPrefetch AArch64 tests to new pass manager 2016-08-22 12:59:58 +00:00
LoopDeletion [PM] Port Dead Loop Deletion Pass to the new PM 2016-07-14 18:28:29 +00:00
LoopDistribute [BPI] Add new LazyBPI analysis 2016-07-28 23:31:12 +00:00
LoopIdiom Target independent codesize heuristics for Loop Idiom Recognition 2016-08-11 18:28:33 +00:00
LoopInterchange
LoopLoadElim [LLE] Don't hoist conditionally executed loads 2016-06-28 04:02:47 +00:00
LoopReroll [LoopReroll] Reroll loops with unordered atomic memory accesses 2016-07-19 00:23:54 +00:00
LoopRotate LPM: Drop require<loops> from these tests, it's redundant. NFC 2016-05-10 18:28:10 +00:00
LoopSimplify [LoopSimplify] Rebuild LCSSA for the inner loop after separating nested loops. 2016-08-09 22:44:56 +00:00
LoopSimplifyCFG LPM: Drop require<loops> from these tests, it's redundant. NFC 2016-05-10 18:28:10 +00:00
LoopStrengthReduce [LSR] Don't try and create post-inc expressions on non-rotated loops 2016-08-15 07:53:03 +00:00
LoopUnroll [LoopUnroll] Fix a PowerPC test broken by r277524. 2016-08-02 21:43:25 +00:00
LoopUnswitch [LoopUnswitch] Unswitch on conditions feeding into guards 2016-06-26 05:10:45 +00:00
LoopVectorize [LV] Ensure reverse interleaved group GEPs remain uniform 2016-09-02 16:19:22 +00:00
LoopVersioning [LoopVer] Update all existing PHIs in the exit block 2016-06-14 09:38:54 +00:00
LoopVersioningLICM [Loop Vectorizer] Fixed memory confilict checks. 2016-08-28 08:53:53 +00:00
LowerAtomic [PM] Port LowerAtomic to the new pass manager. 2016-05-13 22:52:35 +00:00
LowerExpectIntrinsic [Profile] handle select instruction in 'expect' lowering 2016-09-02 22:03:40 +00:00
LowerGuardIntrinsic [PM] Port LowerGuardIntrinsic to the new PM. 2016-07-28 22:08:41 +00:00
LowerInvoke [PM] Port LowerInvoke to the new pass manager 2016-08-12 17:28:27 +00:00
LowerSwitch
LowerTypeTests [WebAssembly] Fix CFI index to account for padding nullptr function 2016-08-08 23:56:01 +00:00
Mem2Reg [PM] Port Mem2Reg to the new pass manager. 2016-06-14 03:22:22 +00:00
MemCpyOpt [MemCpy] Add comments for r279769 2016-08-25 21:03:46 +00:00
MergeFunc Fix a crash in MergeFunctions related to ordering of weak/strong functions 2016-05-31 17:20:23 +00:00
MetaRenamer
NameAnonFunctions Add a pass to name anonymous/nameless function 2016-04-12 21:35:28 +00:00
NaryReassociate [PM] Port NaryReassociate to the new PM 2016-07-21 22:28:52 +00:00
ObjCARC [Verifier] Resume instructions can only be in functions w/ a personality 2016-08-01 18:06:34 +00:00
PGOProfile [ThinLTO] Indirect call promotion fixes for promoted local functions 2016-08-29 22:46:56 +00:00
PartiallyInlineLibCalls [PM] Port PartiallyInlineLibCalls to the new pass manager. 2016-05-25 23:38:53 +00:00
PhaseOrdering Mark that SpeculativeExecution preserves Globals Alias Analysis. 2016-05-03 08:33:26 +00:00
PlaceSafepoints [PlaceSafepoints] Don't call undef in test case; NFC 2016-06-25 01:40:54 +00:00
PreISelIntrinsicLowering [PM] Port PreISelIntrinsicLowering to the new PM 2016-06-24 20:13:42 +00:00
PruneEH
Reassociate [Reassociate] Add test for PR28367. 2016-08-18 13:22:37 +00:00
Reg2Mem
RewriteStatepointsForGC [statepoints][experimental] Add support for live-in semantics of values in deopt bundles 2016-08-31 15:12:17 +00:00
SCCP [SCCP] Don't delete side-effecting instructions 2016-08-24 18:10:21 +00:00
SLPVectorizer [SLP] Avoid signed integer overflow 2016-08-23 20:48:50 +00:00
SROA [SROA] Fix crash with lifetime intrinsic partially covering alloca. 2016-08-08 01:30:53 +00:00
SafeStack [safestack] Layout large allocas first to reduce fragmentation. 2016-08-02 23:21:30 +00:00
SampleProfile Fine tuning of sample profile propagation algorithm. 2016-08-12 16:22:12 +00:00
Scalarizer Scalarizer: Support scalarizing intrinsics 2016-07-25 20:02:54 +00:00
SeparateConstOffsetFromGEP [NVPTX] Enable the load-store vectorizer on nvptx. 2016-07-20 22:11:36 +00:00
SimplifyCFG [SimplifyCFG] Add test for sinking inline asm in if/else 2016-09-05 13:49:26 +00:00
Sink Add a testcase for r275581 2016-07-19 17:52:41 +00:00
SpeculativeExecution [PM] Port SpeculativeExecution to the new PM 2016-08-01 21:48:33 +00:00
StraightLineStrengthReduce [SLSR] Call getPointerSizeInBits with the correct address space. 2016-07-11 18:13:28 +00:00
StripDeadPrototypes
StripSymbols Fix strip-dead-debug-info test if path contains "bar". 2016-06-16 19:39:55 +00:00
StructurizeCFG StructurizeCFG: Fix inverting constantexpr conditions 2016-07-15 22:13:16 +00:00
TailCallElim [PM] Port TailCallElim 2016-07-06 23:48:41 +00:00
Util [MSSA] Fix PR28880 by fixing use optimizer's lower bound tracking behavior. 2016-08-08 04:44:53 +00:00
WholeProgramDevirt WholeProgramDevirt: generate more detailed and accurate remarks. 2016-08-11 19:09:02 +00:00