llvm-project/llvm/lib/Transforms/InstCombine
Sanjay Patel 7ac2db6a48 [InstCombine] improve folds for icmp gt/lt (shr X, C1), C2
We can always eliminate the shift in: icmp gt/lt (shr X, C1), C2 --> icmp gt/lt X, C'
This patch was supposed to just be an efficiency improvement because we were doing this 3-step process to fold:

IC: Visiting:   %c = icmp ugt i4 %s, 1
IC: ADD:   %s = lshr i4 %x, 1
IC: ADD:   %1 = udiv i4 %x, 2
IC: Old =   %c = icmp ugt i4 %1, 1
    New =   <badref> = icmp uge i4 %x, 4
IC: ADD:   %c = icmp uge i4 %x, 4
IC: ERASE   %2 = icmp ugt i4 %1, 1
IC: Visiting:   %c = icmp uge i4 %x, 4
IC: Old =   %c = icmp uge i4 %x, 4
    New =   <badref> = icmp ugt i4 %x, 3
IC: ADD:   %c = icmp ugt i4 %x, 3
IC: ERASE   %2 = icmp uge i4 %x, 4
IC: Visiting:   %c = icmp ugt i4 %x, 3
IC: DCE:   %1 = udiv i4 %x, 2
IC: ERASE   %1 = udiv i4 %x, 2
IC: DCE:   %s = lshr i4 %x, 1
IC: ERASE   %s = lshr i4 %x, 1
IC: Visiting:   ret i1 %c

When we could go directly to canonical icmp form:

IC: Visiting:   %c = icmp ugt i4 %s, 1
IC: Old =   %c = icmp ugt i4 %s, 1
    New =   <badref> = icmp ugt i4 %x, 3
IC: ADD:   %c = icmp ugt i4 %x, 3
IC: ERASE   %1 = icmp ugt i4 %s, 1
IC: ADD:   %s = lshr i4 %x, 1
IC: DCE:   %s = lshr i4 %x, 1
IC: ERASE   %s = lshr i4 %x, 1
IC: Visiting:   %c = icmp ugt i4 %x, 3

...but then I noticed that the folds were incomplete too:
https://godbolt.org/g/aB2hLE

Here are attempts to prove the logic with Alive:
https://rise4fun.com/Alive/92o

Name: lshr_ult
Pre: ((C2 << C1) u>> C1) == C2
%sh = lshr i8 %x, C1
%r = icmp ult i8 %sh, C2
  =>
%r = icmp ult i8 %x, (C2 << C1)

Name: ashr_slt
Pre: ((C2 << C1) >> C1) == C2
%sh = ashr i8 %x, C1
%r = icmp slt i8 %sh, C2
  =>
%r = icmp slt i8 %x, (C2 << C1)

Name: lshr_ugt
Pre: (((C2+1) << C1) u>> C1) == (C2+1)
%sh = lshr i8 %x, C1
%r = icmp ugt i8 %sh, C2
  =>
%r = icmp ugt i8 %x, ((C2+1) << C1) - 1

Name: ashr_sgt
Pre: (C2 != 127) && ((C2+1) << C1 != -128) && (((C2+1) << C1) >> C1) == (C2+1)
%sh = ashr i8 %x, C1
%r = icmp sgt i8 %sh, C2
  =>
%r = icmp sgt i8 %x, ((C2+1) << C1) - 1

Name: ashr_exact_sgt
Pre: ((C2 << C1) >> C1) == C2
%sh = ashr exact i8 %x, C1
%r = icmp sgt i8 %sh, C2
  =>
%r = icmp sgt i8 %x, (C2 << C1)

Name: ashr_exact_slt
Pre: ((C2 << C1) >> C1) == C2
%sh = ashr exact i8 %x, C1
%r = icmp slt i8 %sh, C2
  =>
%r = icmp slt i8 %x, (C2 << C1)

Name: lshr_exact_ugt
Pre: ((C2 << C1) u>> C1) == C2
%sh = lshr exact i8 %x, C1
%r = icmp ugt i8 %sh, C2
  =>
%r = icmp ugt i8 %x, (C2 << C1)

Name: lshr_exact_ult
Pre: ((C2 << C1) u>> C1) == C2
%sh = lshr exact i8 %x, C1
%r = icmp ult i8 %sh, C2
  =>
%r = icmp ult i8 %x, (C2 << C1)

We did something similar for 'shl' in D28406.

Differential Revision: https://reviews.llvm.org/D38514

llvm-svn: 315021
2017-10-05 21:11:49 +00:00
..
CMakeLists.txt [CMake] NFC. Updating CMake dependency specifications 2016-11-17 04:36:50 +00:00
InstCombineAddSub.cpp [InstCombine] Add select simplifications 2017-09-20 17:32:16 +00:00
InstCombineAndOrXor.cpp [ValueTracking, InstCombine] canonicalize fcmp ord/uno with non-NAN ops to null constants 2017-09-05 23:13:13 +00:00
InstCombineCalls.cpp [X86] Remove VPERM2F128/VPERM2I128 intrinsics and autoupgrade to native shuffles. 2017-09-16 07:36:14 +00:00
InstCombineCasts.cpp [InstCombine] Fix a vector splat handling bug in transformZExtICmp. 2017-10-05 07:59:11 +00:00
InstCombineCompares.cpp [InstCombine] improve folds for icmp gt/lt (shr X, C1), C2 2017-10-05 21:11:49 +00:00
InstCombineInternal.h Revert r314928 to investigate thinLTO bootstrap failure 2017-10-05 01:40:13 +00:00
InstCombineLoadStoreAlloca.cpp Update getMergedLocation to check the instruction type and merge properly. 2017-10-02 18:13:14 +00:00
InstCombineMulDivRem.cpp [InstCombine] Add select simplifications 2017-09-20 17:32:16 +00:00
InstCombinePHI.cpp Revert r314928 to investigate thinLTO bootstrap failure 2017-10-05 01:40:13 +00:00
InstCombineSelect.cpp [InstCombine] Move foldSelectICmpAnd helper function earlier in the file to enable reuse in a future patch. 2017-09-05 05:26:37 +00:00
InstCombineShifts.cpp [InstCombine] Added support for (X >>s C) << C --> X & (-1 << C) 2017-08-15 19:33:14 +00:00
InstCombineSimplifyDemanded.cpp [InstCombine] improve demanded vector elements analysis of insertelement 2017-08-31 15:57:17 +00:00
InstCombineVectorOps.cpp [InstCombine] remove extract-of-select vector transform (2nd try) 2017-09-25 20:30:53 +00:00
InstructionCombining.cpp [InstCombine] Gating select arithmetic optimization. 2017-09-27 17:16:51 +00:00
LLVMBuild.txt