[DAGCombiner] Fix wrong folding of AND dag nodes.

This patch fixes the logic in the DAGCombiner that folds an AND node according
to rule: (and (X (load V)), C) -> (X (load V))

An AND between a vector load 'X' and a constant build_vector 'C' can be folded
into the load itself only if we can prove that the AND operation is redundant.
The algorithm implemented by 'visitAND' firstly computes the splat value 'S'
from C, and then checks if S has the lower 'B' bits set (where B is the size in
bits of the vector element type). The algorithm takes into account also the
'undef' bits in the splat mask.

Unfortunately, the algorithm only worked under the assumption that the size of S
is a multiple of the vector element type. With this patch, we conservatively
avoid folding the AND if the splat bits are not compatible with the vector
element type.

Added X86 test and-load-fold.ll

Differential Revision: http://reviews.llvm.org/D8085

llvm-svn: 231563
This commit is contained in:
Andrea Di Biagio 2015-03-07 12:24:55 +00:00
parent 133b615558
commit c9d79e8103
2 changed files with 22 additions and 3 deletions

View File

@ -2912,9 +2912,13 @@ SDValue DAGCombiner::visitAND(SDNode *N) {
SplatBitSize = SplatBitSize * 2)
SplatValue |= SplatValue.shl(SplatBitSize);
Constant = APInt::getAllOnesValue(BitWidth);
for (unsigned i = 0, n = SplatBitSize/BitWidth; i < n; ++i)
Constant &= SplatValue.lshr(i*BitWidth).zextOrTrunc(BitWidth);
// Make sure that variable 'Constant' is only set if 'SplatBitSize' is a
// multiple of 'BitWidth'. Otherwise, we could propagate a wrong value.
if (SplatBitSize % BitWidth == 0) {
Constant = APInt::getAllOnesValue(BitWidth);
for (unsigned i = 0, n = SplatBitSize/BitWidth; i < n; ++i)
Constant &= SplatValue.lshr(i*BitWidth).zextOrTrunc(BitWidth);
}
}
}

View File

@ -0,0 +1,15 @@
; RUN: llc -mtriple=x86_64-unknown-unknown -mcpu=generic < %s | FileCheck %s
; Verify that the DAGCombiner doesn't wrongly remove the 'and' from the dag.
define i8 @foo(<4 x i8>* %V) {
; CHECK-LABEL: foo:
; CHECK: pand
; CHECK: ret
entry:
%Vp = bitcast <4 x i8>* %V to <3 x i8>*
%V3i8 = load <3 x i8>, <3 x i8>* %Vp, align 4
%0 = and <3 x i8> %V3i8, <i8 undef, i8 undef, i8 95>
%1 = extractelement <3 x i8> %0, i64 2
ret i8 %1
}