Commit Graph

74 Commits

Author SHA1 Message Date
Benjamin Kramer 50a281a871 While SimplifyDemandedBits constant folds this, we can't rely on it here.
It's possible to craft an input that hits the recursion limits in a way
that SimplifyDemandedBits doesn't simplify the icmp but ComputeMaskedBits
can infer which bits are zero.

No test case as it depends on too many other things. Fixes PR9609.

llvm-svn: 128777
2011-04-02 18:50:58 +00:00
Benjamin Kramer 8b94c295c3 Fix comment.
llvm-svn: 128745
2011-04-01 22:29:18 +00:00
Benjamin Kramer 5cad45307e Tweaks to the icmp+sext-to-shifts optimization to address Frits' comments:
- Localize the check if an icmp has one use to a place where we know we're
  introducing something that's likely more expensive than a sext from i1.
- Add an assert to make sure a case that would lead to a miscompilation is
  folded away earlier.
- Fix a typo.

llvm-svn: 128744
2011-04-01 22:22:11 +00:00
Benjamin Kramer ac2d5657a6 Fix build.
llvm-svn: 128733
2011-04-01 20:15:16 +00:00
Benjamin Kramer d121765e64 InstCombine: Turn icmp + sext into bitwise/integer ops when the input has only one unknown bit.
int test1(unsigned x) { return (x&8) ? 0 : -1; }
int test3(unsigned x) { return (x&8) ? -1 : 0; }

before (x86_64):
_test1:
	andl	$8, %edi
	cmpl	$1, %edi
	sbbl	%eax, %eax
	ret
_test3:
	andl	$8, %edi
	cmpl	$1, %edi
	sbbl	%eax, %eax
	notl	%eax
	ret

after:
_test1:
	shrl	$3, %edi
	andl	$1, %edi
	leal	-1(%rdi), %eax
	ret
_test3:
	shll	$28, %edi
	movl	%edi, %eax
	sarl	$31, %eax
	ret

llvm-svn: 128732
2011-04-01 20:09:10 +00:00
Benjamin Kramer 398b8c5faf InstCombine: Move (sext icmp) transforms into their own method. No intended functionality change.
llvm-svn: 128731
2011-04-01 20:09:03 +00:00
Jay Foad 52131344a2 Remove PHINode::reserveOperandSpace(). Instead, add a parameter to
PHINode::Create() giving the (known or expected) number of operands.

llvm-svn: 128537
2011-03-30 11:28:46 +00:00
Jay Foad e0938d8a87 (Almost) always call reserveOperandSpace() on newly created PHINodes.
llvm-svn: 128535
2011-03-30 11:19:20 +00:00
Devang Patel fbb482b314 llvm.dbg.declare intrinsic does not use any llvm::Values. It's magic!
llvm-svn: 127282
2011-03-08 22:12:11 +00:00
Chris Lattner 69229316aa convert ConstantVector::get to use ArrayRef.
llvm-svn: 125537
2011-02-15 00:14:00 +00:00
Chris Lattner 34442e6ebf revert my ConstantVector patch, it seems to have made the llvm-gcc
builders unhappy.

llvm-svn: 125504
2011-02-14 18:15:46 +00:00
Chris Lattner d9f5b88548 Switch ConstantVector::get to use ArrayRef instead of a pointer+size
idiom.  Change various clients to simplify their code.

llvm-svn: 125487
2011-02-14 07:55:32 +00:00
Chris Lattner 9c10d587f6 implement an instcombine xform that canonicalizes casts outside of and-with-constant operations.
This fixes rdar://8808586 which observed that we used to compile:


union xy {
        struct x { _Bool b[15]; } x;
        __attribute__((packed))
        struct y {
                __attribute__((packed)) unsigned long b0to7;
                __attribute__((packed)) unsigned int b8to11;
                __attribute__((packed)) unsigned short b12to13;
                __attribute__((packed)) unsigned char b14;
        } y;
};

struct x
foo(union xy *xy)
{
        return xy->x;
}

into:

_foo:                                   ## @foo
	movq	(%rdi), %rax
	movabsq	$1095216660480, %rcx    ## imm = 0xFF00000000
	andq	%rax, %rcx
	movabsq	$-72057594037927936, %rdx ## imm = 0xFF00000000000000
	andq	%rax, %rdx
	movzbl	%al, %esi
	orq	%rdx, %rsi
	movq	%rax, %rdx
	andq	$65280, %rdx            ## imm = 0xFF00
	orq	%rsi, %rdx
	movq	%rax, %rsi
	andq	$16711680, %rsi         ## imm = 0xFF0000
	orq	%rdx, %rsi
	movl	%eax, %edx
	andl	$-16777216, %edx        ## imm = 0xFFFFFFFFFF000000
	orq	%rsi, %rdx
	orq	%rcx, %rdx
	movabsq	$280375465082880, %rcx  ## imm = 0xFF0000000000
	movq	%rax, %rsi
	andq	%rcx, %rsi
	orq	%rdx, %rsi
	movabsq	$71776119061217280, %r8 ## imm = 0xFF000000000000
	andq	%r8, %rax
	orq	%rsi, %rax
	movzwl	12(%rdi), %edx
	movzbl	14(%rdi), %esi
	shlq	$16, %rsi
	orl	%edx, %esi
	movq	%rsi, %r9
	shlq	$32, %r9
	movl	8(%rdi), %edx
	orq	%r9, %rdx
	andq	%rdx, %rcx
	movzbl	%sil, %esi
	shlq	$32, %rsi
	orq	%rcx, %rsi
	movl	%edx, %ecx
	andl	$-16777216, %ecx        ## imm = 0xFFFFFFFFFF000000
	orq	%rsi, %rcx
	movq	%rdx, %rsi
	andq	$16711680, %rsi         ## imm = 0xFF0000
	orq	%rcx, %rsi
	movq	%rdx, %rcx
	andq	$65280, %rcx            ## imm = 0xFF00
	orq	%rsi, %rcx
	movzbl	%dl, %esi
	orq	%rcx, %rsi
	andq	%r8, %rdx
	orq	%rsi, %rdx
	ret

We now compile this into:

_foo:                                   ## @foo
## BB#0:                                ## %entry
	movzwl	12(%rdi), %eax
	movzbl	14(%rdi), %ecx
	shlq	$16, %rcx
	orl	%eax, %ecx
	shlq	$32, %rcx
	movl	8(%rdi), %edx
	orq	%rcx, %rdx
	movq	(%rdi), %rax
	ret

A small improvement :-)

llvm-svn: 123520
2011-01-15 06:32:33 +00:00
Bill Wendling 5e3605552e Whitespace fixes. No functionality change.
llvm-svn: 122110
2010-12-17 23:27:41 +00:00
Nate Begeman 7aa18bf46a Add vector versions of some existing scalar transforms to aid codegen in matching psign & pblend operations to the IR produced by clang/gcc for their C idioms.
llvm-svn: 122105
2010-12-17 23:12:19 +00:00
Chris Lattner 6e27b3e004 Fix a serious performance regression introduced by r108687 on linux:
turning (fptrunc (sqrt (fpext x))) -> (sqrtf x)  is great, but we have
to delete the original sqrt as well.  Not doing so causes us to do 
two sqrt's when building with -fmath-errno (the default on linux).

llvm-svn: 113260
2010-09-07 20:01:38 +00:00
Chris Lattner 50df36ac0a for completeness, allow undef also.
llvm-svn: 112351
2010-08-28 03:36:51 +00:00
Chris Lattner d0214f3efe handle the constant case of vector insertion. For something
like this:

struct S { float A, B, C, D; };

struct S g;
struct S bar() { 
  struct S A = g;
  ++A.B;
  A.A = 42;
  return A;
}

we now generate:

_bar:                                   ## @bar
## BB#0:                                ## %entry
	movq	_g@GOTPCREL(%rip), %rax
	movss	12(%rax), %xmm0
	pshufd	$16, %xmm0, %xmm0
	movss	4(%rax), %xmm2
	movss	8(%rax), %xmm1
	pshufd	$16, %xmm1, %xmm1
	unpcklps	%xmm0, %xmm1
	addss	LCPI1_0(%rip), %xmm2
	pshufd	$16, %xmm2, %xmm2
	movss	LCPI1_1(%rip), %xmm0
	pshufd	$16, %xmm0, %xmm0
	unpcklps	%xmm2, %xmm0
	ret

instead of:

_bar:                                   ## @bar
## BB#0:                                ## %entry
	movq	_g@GOTPCREL(%rip), %rax
	movss	12(%rax), %xmm0
	pshufd	$16, %xmm0, %xmm0
	movss	4(%rax), %xmm2
	movss	8(%rax), %xmm1
	pshufd	$16, %xmm1, %xmm1
	unpcklps	%xmm0, %xmm1
	addss	LCPI1_0(%rip), %xmm2
	movd	%xmm2, %eax
	shlq	$32, %rax
	addq	$1109917696, %rax       ## imm = 0x42280000
	movd	%rax, %xmm0
	ret

llvm-svn: 112345
2010-08-28 01:50:57 +00:00
Chris Lattner dd6601048e optimize bitcasts from large integers to vector into vector
element insertion from the pieces that feed into the vector.
This handles a pattern that occurs frequently due to code
generated for the x86-64 abi.  We now compile something like
this:

struct S { float A, B, C, D; };
struct S g;
struct S bar() { 
  struct S A = g;
  ++A.A;
  ++A.C;
  return A;
}

into all nice vector operations:

_bar:                                   ## @bar
## BB#0:                                ## %entry
	movq	_g@GOTPCREL(%rip), %rax
	movss	LCPI1_0(%rip), %xmm1
	movss	(%rax), %xmm0
	addss	%xmm1, %xmm0
	pshufd	$16, %xmm0, %xmm0
	movss	4(%rax), %xmm2
	movss	12(%rax), %xmm3
	pshufd	$16, %xmm2, %xmm2
	unpcklps	%xmm2, %xmm0
	addss	8(%rax), %xmm1
	pshufd	$16, %xmm1, %xmm1
	pshufd	$16, %xmm3, %xmm2
	unpcklps	%xmm2, %xmm1
	ret

instead of icky integer operations:

_bar:                                   ## @bar
	movq	_g@GOTPCREL(%rip), %rax
	movss	LCPI1_0(%rip), %xmm1
	movss	(%rax), %xmm0
	addss	%xmm1, %xmm0
	movd	%xmm0, %ecx
	movl	4(%rax), %edx
	movl	12(%rax), %esi
	shlq	$32, %rdx
	addq	%rcx, %rdx
	movd	%rdx, %xmm0
	addss	8(%rax), %xmm1
	movd	%xmm1, %eax
	shlq	$32, %rsi
	addq	%rax, %rsi
	movd	%rsi, %xmm1
	ret

This resolves rdar://8360454

llvm-svn: 112343
2010-08-28 01:20:38 +00:00
Chris Lattner 18d7fc8fc6 Implement a pretty general logical shift propagation
framework, which is good at ripping through bitfield
operations.  This generalize a bunch of the existing
xforms that instcombine does, such as 
  (x << c) >> c -> and
to handle intermediate logical nodes.  This is useful for
ripping up the "promote to large integer" code produced by
SRoA.

llvm-svn: 112304
2010-08-27 22:24:38 +00:00
Chris Lattner 7398434675 teach the truncation optimization that an entire chain of
computation can be truncated if it is fed by a sext/zext that doesn't
have to be exactly equal to the truncation result type.

llvm-svn: 112285
2010-08-27 20:32:06 +00:00
Chris Lattner 90cd746e63 Add an instcombine to clean up a common pattern produced
by the SRoA "promote to large integer" code, eliminating
some type conversions like this:

   %94 = zext i16 %93 to i32                       ; <i32> [#uses=2]
   %96 = lshr i32 %94, 8                           ; <i32> [#uses=1]
   %101 = trunc i32 %96 to i8                      ; <i8> [#uses=1]

This also unblocks other xforms from happening, now clang is able to compile:

struct S { float A, B, C, D; };
float foo(struct S A) { return A.A + A.B+A.C+A.D; }

into:

_foo:                                   ## @foo
## BB#0:                                ## %entry
	pshufd	$1, %xmm0, %xmm2
	addss	%xmm0, %xmm2
	movdqa	%xmm1, %xmm3
	addss	%xmm2, %xmm3
	pshufd	$1, %xmm1, %xmm0
	addss	%xmm3, %xmm0
	ret

on x86-64, instead of:

_foo:                                   ## @foo
## BB#0:                                ## %entry
	movd	%xmm0, %rax
	shrq	$32, %rax
	movd	%eax, %xmm2
	addss	%xmm0, %xmm2
	movapd	%xmm1, %xmm3
	addss	%xmm2, %xmm3
	movd	%xmm1, %rax
	shrq	$32, %rax
	movd	%eax, %xmm0
	addss	%xmm3, %xmm0
	ret

This seems pretty close to optimal to me, at least without
using horizontal adds.  This also triggers in lots of other
code, including SPEC.

llvm-svn: 112278
2010-08-27 18:31:05 +00:00
Chris Lattner bfd2228182 optimize "integer extraction out of the middle of a vector" as produced
by SRoA.  This is part of rdar://7892780, but needs another xform to
expose this.

llvm-svn: 112232
2010-08-26 22:14:59 +00:00
Chris Lattner d4ebd6df5a optimize bitcast(trunc(bitcast(x))) where the result is a float and 'x'
is a vector to be a vector element extraction.  This allows clang to
compile:

struct S { float A, B, C, D; };
float foo(struct S A) { return A.A + A.B+A.C+A.D; }

into:

_foo:                                   ## @foo
## BB#0:                                ## %entry
	movd	%xmm0, %rax
	shrq	$32, %rax
	movd	%eax, %xmm2
	addss	%xmm0, %xmm2
	movapd	%xmm1, %xmm3
	addss	%xmm2, %xmm3
	movd	%xmm1, %rax
	shrq	$32, %rax
	movd	%eax, %xmm0
	addss	%xmm3, %xmm0
	ret

instead of:

_foo:                                   ## @foo
## BB#0:                                ## %entry
	movd	%xmm0, %rax
	movd	%eax, %xmm0
	shrq	$32, %rax
	movd	%eax, %xmm2
	addss	%xmm0, %xmm2
	movd	%xmm1, %rax
	movd	%eax, %xmm1
	addss	%xmm2, %xmm1
	shrq	$32, %rax
	movd	%eax, %xmm0
	addss	%xmm1, %xmm0
	ret

... eliminating half of the horribleness.

llvm-svn: 112227
2010-08-26 21:55:42 +00:00
Owen Anderson 84774eda4b Tweak per Chris' comments.
llvm-svn: 108736
2010-07-19 19:23:32 +00:00
Owen Anderson 32a58342ed Reimplement r108639 in InstCombine rather than DAGCombine.
llvm-svn: 108687
2010-07-19 08:09:34 +00:00
Dan Gohman 05a6555acb Fix instcombine's handling of alloca to accept non-i32 types.
llvm-svn: 104935
2010-05-28 04:33:04 +00:00
Dan Gohman a4abd035ea Fix a missing newline in debug output.
llvm-svn: 104644
2010-05-25 21:50:35 +00:00
Chris Lattner 02b0df5338 Teach instcombine to transform a bitcast/(zext|trunc)/bitcast sequence
with a vector input and output into a shuffle vector.  This sort of 
sequence happens when the input code stores with one type and reloads
with another type and then SROA promotes to i96 integers, which make
everyone sad.

This fixes rdar://7896024

llvm-svn: 103354
2010-05-08 21:50:26 +00:00
Dan Gohman eb7111b98f Say bitcast instead of bitconvert.
llvm-svn: 100720
2010-04-07 23:22:42 +00:00
Duncan Sands 19d0b47b1f There are two ways of checking for a given type, for example isa<PointerType>(T)
and T->isPointerTy().  Convert most instances of the first form to the second form.
Requested by Chris.

llvm-svn: 96344
2010-02-16 11:11:14 +00:00
Duncan Sands 9dff9bec31 Uniformize the names of type predicates: rather than having isFloatTy and
isInteger, we now have isFloatTy and isIntegerTy.  Requested by Chris!

llvm-svn: 96223
2010-02-15 16:12:20 +00:00
Chris Lattner 4e8137d678 Rename ValueRequiresCast to ShouldOptimizeCast, to better reflect
what it does.  Enhance it to return false to optimizing vector
sign extensions from vector comparisions, which is the idiom used
to get a splatted vector for a vector comparison.

Doing this breaks vector-casts.ll, add some compensating 
transformations to handle the important case they cover without
depending on this canonicalization.

This fixes rdar://7434900 a serious pessimization of vector compares.

llvm-svn: 95855
2010-02-11 06:26:33 +00:00
Dan Gohman 949458d014 LangRef.html says that inttoptr and ptrtoint always use zero-extension
when the cast is extending.

llvm-svn: 95046
2010-02-02 01:44:02 +00:00
Chris Lattner 1b35bbe813 change the canonical form of "cond ? -1 : 0" to be
"sext cond" instead of a select.  This simplifies some instcombine
code, matches the policy for zext (cond ? 1 : 0 -> zext), and allows
us to generate better code for a testcase on ppc.

llvm-svn: 94339
2010-01-24 00:09:49 +00:00
Chris Lattner 43f2fa6201 my instcombine transformations to make extension elimination more
aggressive changed the canonical form from sext(trunc(x)) to ashr(lshr(x)),
make sure to transform a couple more things into that canonical form,
and catch a case where we missed turning zext/shl/ashr into a single sext.

llvm-svn: 93787
2010-01-18 22:19:16 +00:00
Chris Lattner d1a3efedd8 reenable the piece that turns trunc(zext(x)) -> x even if zext has multiple uses,
codegen has no apparent problem with the trunc version of this, because it turns
into a simple subreg idiom

llvm-svn: 93202
2010-01-11 22:49:40 +00:00
Chris Lattner a6b1356cf9 Disable folding sext(trunc(x)) -> x (and other similar cast/cast cases) when the
trunc has multiple uses.  Codegen is not able to coalesce the subreg case 
correctly and so this leads to higher register pressure and spilling (see PR5997).

This speeds up 256.bzip2 from 8.60 -> 8.04s on my machine, ~7%.

llvm-svn: 93200
2010-01-11 22:45:25 +00:00
Chris Lattner 0a85420409 Extend CanEvaluateZExtd to handle and/or/xor more aggressively in the
BitsToClear case.  This allows it to promote expressions which have an
and/or/xor after the lshr, promoting cases like test2 (from PR4216) 
and test3 (random extample extracted from a spec benchmark).

clang now compiles the code in PR4216 into:

_test_bitfield:                                             ## @test_bitfield
	movl	%edi, %eax
	orl	$194, %eax
	movl	$4294902010, %ecx
	andq	%rax, %rcx
	orl	$32768, %edi
	andq	$39936, %rdi
	movq	%rdi, %rax
	orq	%rcx, %rax
	ret

instead of:

_test_bitfield:                                             ## @test_bitfield
	movl	%edi, %eax
	orl	$194, %eax
	movl	$4294902010, %ecx
	andq	%rax, %rcx
	shrl	$8, %edi
	orl	$128, %edi
	shlq	$8, %rdi
	andq	$39936, %rdi
	movq	%rdi, %rax
	orq	%rcx, %rax
	ret

which is still not great, but is progress.

llvm-svn: 93145
2010-01-11 04:05:13 +00:00
Chris Lattner 12bd8992b3 Remove the dead TD argument to CanEvaluateZExtd, and add a
new BitsToClear result which allows us to start promoting
expressions that end with a lshr-by-constant.  This is
conservatively correct and better than what we had before
(see testcases) but still needs to be extended further.

llvm-svn: 93144
2010-01-11 03:32:00 +00:00
Chris Lattner 172630abd2 improve comments, remove dead TD argument to CanEvaluateSExtd.
llvm-svn: 93143
2010-01-11 02:43:35 +00:00
Chris Lattner 7dd540ee24 teach sext optimization to handle truncs from types that are not
the dest of the sext.

llvm-svn: 93128
2010-01-10 20:30:41 +00:00
Chris Lattner 39d2daa94c teach zext optimization how to deal with truncs that don't come from
the zext dest type.  This allows us to handle test52/53 in cast.ll,
and allows llvm-gcc to generate much better code for PR4216 in -m64
mode:

_test_bitfield:                                             ## @test_bitfield
	orl	$32962, %edi
	movl	%edi, %eax
	andl	$-25350, %eax
	ret

This also fixes a bug handling vector extends, ensuring that the
mask produced is a vector constant, not an integer constant.

llvm-svn: 93127
2010-01-10 20:25:54 +00:00
Chris Lattner 1a05fddcdc simplify CanEvaluateSExtd to return a bool now that we have a
simpler profitability predicate.

llvm-svn: 93111
2010-01-10 07:57:20 +00:00
Chris Lattner d7816780e2 the NumCastsRemoved argument to CanEvaluateSExtd is dead, remove it.
llvm-svn: 93110
2010-01-10 07:42:21 +00:00
Chris Lattner 2fff10c424 now that the cost model has changed, we can always consider
elimination of a sign extend to be a win, which simplifies 
the client of CanEvaluateSExtd, and allows us to eliminate
more casts (examples taken from real code).

llvm-svn: 93109
2010-01-10 07:40:50 +00:00
Chris Lattner d8509424a4 change the preferred canonical form for a sign extension to be
lshr+ashr instead of trunc+sext.  We want to avoid type 
conversions whenever possible, it is easier to codegen expressions
without truncates and extensions.

llvm-svn: 93107
2010-01-10 07:08:30 +00:00
Chris Lattner 127bbc715e fix pasto that broke bootstrap.
llvm-svn: 93105
2010-01-10 06:50:04 +00:00
Chris Lattner b7be7cc486 simplify CanEvaluateZExtd now that we don't care about the number of
bits known clear in the result and don't care about the # casts 
eliminated.  TD is also dead but keeping it for now.

llvm-svn: 93098
2010-01-10 02:50:04 +00:00
Chris Lattner 49d2c9764d two changes:
1) don't try to optimize a sext or zext that is only used by a trunc, let
   the trunc get optimized first.  This avoids some pointless effort in
   some common cases since instcombine scans down a block in the first pass.
2) Change the cost model for zext elimination to consider an 'and' cheaper
   than a zext.  This allows us to do it more aggressively, and for the next
   patch to simplify the code quite a bit.

llvm-svn: 93097
2010-01-10 02:39:31 +00:00