llvm-project

History

Sanjay Patel 7ac2db6a48 [InstCombine] improve folds for icmp gt/lt (shr X, C1), C2 We can always eliminate the shift in: icmp gt/lt (shr X, C1), C2 --> icmp gt/lt X, C' This patch was supposed to just be an efficiency improvement because we were doing this 3-step process to fold: IC: Visiting: %c = icmp ugt i4 %s, 1 IC: ADD: %s = lshr i4 %x, 1 IC: ADD: %1 = udiv i4 %x, 2 IC: Old = %c = icmp ugt i4 %1, 1 New = <badref> = icmp uge i4 %x, 4 IC: ADD: %c = icmp uge i4 %x, 4 IC: ERASE %2 = icmp ugt i4 %1, 1 IC: Visiting: %c = icmp uge i4 %x, 4 IC: Old = %c = icmp uge i4 %x, 4 New = <badref> = icmp ugt i4 %x, 3 IC: ADD: %c = icmp ugt i4 %x, 3 IC: ERASE %2 = icmp uge i4 %x, 4 IC: Visiting: %c = icmp ugt i4 %x, 3 IC: DCE: %1 = udiv i4 %x, 2 IC: ERASE %1 = udiv i4 %x, 2 IC: DCE: %s = lshr i4 %x, 1 IC: ERASE %s = lshr i4 %x, 1 IC: Visiting: ret i1 %c When we could go directly to canonical icmp form: IC: Visiting: %c = icmp ugt i4 %s, 1 IC: Old = %c = icmp ugt i4 %s, 1 New = <badref> = icmp ugt i4 %x, 3 IC: ADD: %c = icmp ugt i4 %x, 3 IC: ERASE %1 = icmp ugt i4 %s, 1 IC: ADD: %s = lshr i4 %x, 1 IC: DCE: %s = lshr i4 %x, 1 IC: ERASE %s = lshr i4 %x, 1 IC: Visiting: %c = icmp ugt i4 %x, 3 ...but then I noticed that the folds were incomplete too: https://godbolt.org/g/aB2hLE Here are attempts to prove the logic with Alive: https://rise4fun.com/Alive/92o Name: lshr_ult Pre: ((C2 << C1) u>> C1) == C2 %sh = lshr i8 %x, C1 %r = icmp ult i8 %sh, C2 => %r = icmp ult i8 %x, (C2 << C1) Name: ashr_slt Pre: ((C2 << C1) >> C1) == C2 %sh = ashr i8 %x, C1 %r = icmp slt i8 %sh, C2 => %r = icmp slt i8 %x, (C2 << C1) Name: lshr_ugt Pre: (((C2+1) << C1) u>> C1) == (C2+1) %sh = lshr i8 %x, C1 %r = icmp ugt i8 %sh, C2 => %r = icmp ugt i8 %x, ((C2+1) << C1) - 1 Name: ashr_sgt Pre: (C2 != 127) && ((C2+1) << C1 != -128) && (((C2+1) << C1) >> C1) == (C2+1) %sh = ashr i8 %x, C1 %r = icmp sgt i8 %sh, C2 => %r = icmp sgt i8 %x, ((C2+1) << C1) - 1 Name: ashr_exact_sgt Pre: ((C2 << C1) >> C1) == C2 %sh = ashr exact i8 %x, C1 %r = icmp sgt i8 %sh, C2 => %r = icmp sgt i8 %x, (C2 << C1) Name: ashr_exact_slt Pre: ((C2 << C1) >> C1) == C2 %sh = ashr exact i8 %x, C1 %r = icmp slt i8 %sh, C2 => %r = icmp slt i8 %x, (C2 << C1) Name: lshr_exact_ugt Pre: ((C2 << C1) u>> C1) == C2 %sh = lshr exact i8 %x, C1 %r = icmp ugt i8 %sh, C2 => %r = icmp ugt i8 %x, (C2 << C1) Name: lshr_exact_ult Pre: ((C2 << C1) u>> C1) == C2 %sh = lshr exact i8 %x, C1 %r = icmp ult i8 %sh, C2 => %r = icmp ult i8 %x, (C2 << C1) We did something similar for 'shl' in D28406. Differential Revision: https://reviews.llvm.org/D38514 llvm-svn: 315021		2017-10-05 21:11:49 +00:00
..
CMakeLists.txt	[CMake] NFC. Updating CMake dependency specifications	2016-11-17 04:36:50 +00:00
InstCombineAddSub.cpp	[InstCombine] Add select simplifications	2017-09-20 17:32:16 +00:00
InstCombineAndOrXor.cpp	[ValueTracking, InstCombine] canonicalize fcmp ord/uno with non-NAN ops to null constants	2017-09-05 23:13:13 +00:00
InstCombineCalls.cpp	[X86] Remove VPERM2F128/VPERM2I128 intrinsics and autoupgrade to native shuffles.	2017-09-16 07:36:14 +00:00
InstCombineCasts.cpp	[InstCombine] Fix a vector splat handling bug in transformZExtICmp.	2017-10-05 07:59:11 +00:00
InstCombineCompares.cpp	[InstCombine] improve folds for icmp gt/lt (shr X, C1), C2	2017-10-05 21:11:49 +00:00
InstCombineInternal.h	Revert r314928 to investigate thinLTO bootstrap failure	2017-10-05 01:40:13 +00:00
InstCombineLoadStoreAlloca.cpp	Update getMergedLocation to check the instruction type and merge properly.	2017-10-02 18:13:14 +00:00
InstCombineMulDivRem.cpp	[InstCombine] Add select simplifications	2017-09-20 17:32:16 +00:00
InstCombinePHI.cpp	Revert r314928 to investigate thinLTO bootstrap failure	2017-10-05 01:40:13 +00:00
InstCombineSelect.cpp	[InstCombine] Move foldSelectICmpAnd helper function earlier in the file to enable reuse in a future patch.	2017-09-05 05:26:37 +00:00
InstCombineShifts.cpp	[InstCombine] Added support for (X >>s C) << C --> X & (-1 << C)	2017-08-15 19:33:14 +00:00
InstCombineSimplifyDemanded.cpp	[InstCombine] improve demanded vector elements analysis of insertelement	2017-08-31 15:57:17 +00:00
InstCombineVectorOps.cpp	[InstCombine] remove extract-of-select vector transform (2nd try)	2017-09-25 20:30:53 +00:00
InstructionCombining.cpp	[InstCombine] Gating select arithmetic optimization.	2017-09-27 17:16:51 +00:00
LLVMBuild.txt	…