Don't leave unused divs/rems sitting around in BypassSlowDivision.

Summary:
This "pass" eagerly creates div and rem instructions even when only one
is needed -- it relies on a later pass (machine DCE?) to clean them up.

This is problematic not just from a cleanliness perspective (this pass
is running during CodeGenPrepare, so should leave the IR in a better
state), but it also creates a problem for instruction selection.  If we
always have a div+rem, isel will always select a divrem instruction (if
possible), even when a single div or rem would do.

Specifically, in NVPTX, we want to compute rem from the output of div,
if available.  But if a div is not available, we want to leave the rem
alone.  This transformation is overeager if div is always available.

Because this code runs as part of CodeGenPrepare, it's nontrivial to
write a test for this change.  But this will effectively be tested by
a later patch which adds the aforementioned change to NVPTX isel.

Reviewers: tra

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26088

llvm-svn: 285460
This commit is contained in:
Justin Lebar 2016-10-28 21:43:54 +00:00
parent 468bf73209
commit 0ede5fb1bb
2 changed files with 37 additions and 0 deletions

View File

@ -20,6 +20,7 @@
#include "llvm/IR/Function.h"
#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/Instructions.h"
#include "llvm/Transforms/Utils/Local.h"
using namespace llvm;
@ -246,5 +247,12 @@ bool llvm::bypassSlowDivision(
MadeChange |= reuseOrInsertFastDiv(I, BT, UseDivOp, UseSignedOp, DivCache);
}
// Above we eagerly create divs and rems, as pairs, so that we can efficiently
// create divrem machine instructions. Now erase any unused divs / rems so we
// don't leave extra instructions sitting around.
for (auto &KV : DivCache)
for (Instruction *Phi : {KV.second.Quotient, KV.second.Remainder})
RecursivelyDeleteTriviallyDeadInstructions(Phi);
return MadeChange;
}

View File

@ -0,0 +1,29 @@
; RUN: opt -S -codegenprepare < %s | FileCheck %s
target datalayout = "e-i64:64-v16:16-v32:32-n16:32:64"
target triple = "nvptx64-nvidia-cuda"
; We only use the div instruction -- the rem should be DCE'ed.
; CHECK-LABEL: @div_only
define void @div_only(i64 %a, i64 %b, i64* %retptr) {
; CHECK: udiv i32
; CHECK-NOT: urem
; CHECK: sdiv i64
; CHECK-NOT: rem
%d = sdiv i64 %a, %b
store i64 %d, i64* %retptr
ret void
}
; We only use the rem instruction -- the div should be DCE'ed.
; CHECK-LABEL: @rem_only
define void @rem_only(i64 %a, i64 %b, i64* %retptr) {
; CHECK-NOT: div
; CHECK: urem i32
; CHECK-NOT: div
; CHECK: rem i64
; CHECK-NOT: div
%d = srem i64 %a, %b
store i64 %d, i64* %retptr
ret void
}