[X86CmovConversion] Make heuristic for optimized cmov depth more conservative (PR44539)

Fix/workaround for https://bugs.llvm.org/show_bug.cgi?id=44539. As discussed there, this pass makes some overly optimistic assumptions, as it does not have access to actual branch weights. This patch makes the computation of the depth of the optimized cmov more conservative, by assuming a distribution of 75/25 rather than 50/50 and placing the weights to get the more conservative result (larger depth). The fully conservative choice would be std::max(TrueOpDepth, FalseOpDepth), but that would break at least one existing test (which may or may not be an issue in practice). Differential Revision: https://reviews.llvm.org/D74155
2020-02-06 21:16:10 +01:00 · 2020-02-06 21:16:10 +01:00 · 5eb19bf4a2
parent 37f46650c3
commit 5eb19bf4a2
1 changed files with 7 additions and 6 deletions
--- a/llvm/lib/Target/X86/X86CmovConversion.cpp
+++ b/llvm/lib/Target/X86/X86CmovConversion.cpp
@ -364,12 +364,13 @@ bool X86CmovConverterPass::collectCmovCandidates(
 /// \param TrueOpDepth depth cost of CMOV true value operand.
 /// \param FalseOpDepth depth cost of CMOV false value operand.
 static unsigned getDepthOfOptCmov(unsigned TrueOpDepth, unsigned FalseOpDepth) {
-  //===--------------------------------------------------------------------===//
-  // With no info about branch weight, we assume 50% for each value operand.
-  // Thus, depth of optimized CMOV instruction is the rounded up average of
-  // its True-Operand-Value-Depth and False-Operand-Value-Depth.
-  //===--------------------------------------------------------------------===//
-  return (TrueOpDepth + FalseOpDepth + 1) / 2;
+  // The depth of the result after branch conversion is
+  // TrueOpDepth * TrueOpProbability + FalseOpDepth * FalseOpProbability.
+  // As we have no info about branch weight, we assume 75% for one and 25% for
+  // the other, and pick the result with the largest resulting depth.
+  return std::max(
+      divideCeil(TrueOpDepth * 3 + FalseOpDepth, 4),
+      divideCeil(FalseOpDepth * 3 + TrueOpDepth, 4));
 }

 bool X86CmovConverterPass::checkForProfitableCmovCandidates(