Commit Graph

14630 Commits

Author SHA1 Message Date
Reid Spencer f9fdfa7aa5 Add the --with-automake option to AutoRegen.sh and provide the automake
version of the configure script. This is an early commit of the automake
support so that automake support can be tested on multiple platforms. Many
additional Makefile.am need to be added to LLVM before this is of any use.
Please wait until automake support is announced on llvmdev list before
using the --with-automake option.

llvm-svn: 16837
2004-10-08 05:33:35 +00:00
Chris Lattner bff91d9a2e Instcombine (X & FF00) + xx00 -> (X+xx00) & FF00, implementing and.ll:test27
This comes up when doing adds to bitfield elements.

llvm-svn: 16836
2004-10-08 05:07:56 +00:00
Chris Lattner 9467062bcf New testcase
llvm-svn: 16835
2004-10-08 05:03:25 +00:00
Chris Lattner 44bd392cbf Little patch to turn (shl (add X, 123), 4) -> (add (shl X, 4), 123 << 4)
This triggers in cases of bitfield additions, opening opportunities for
future improvements.

llvm-svn: 16834
2004-10-08 03:46:20 +00:00
Chris Lattner 7bfe4032fc New testcase
llvm-svn: 16833
2004-10-08 03:41:59 +00:00
Nate Begeman b58dd6799f Implement logical and with an immediate that consists of a contiguous block
of one or more 1 bits (may wrap from least significant bit to most
significant bit) as the rlwinm rather than andi., andis., or some longer
instructons sequence.

int andn4(int z) { return z & -4; }
int clearhi(int z) { return z & 0x0000FFFF; }
int clearlo(int z) { return z & 0xFFFF0000; }
int clearmid(int z) { return z & 0x00FFFF00; }
int clearwrap(int z) { return z & 0xFF0000FF; }

_andn4:
        rlwinm r3, r3, 0, 0, 29
        blr

_clearhi:
        rlwinm r3, r3, 0, 16, 31
        blr

_clearlo:
        rlwinm r3, r3, 0, 0, 15
        blr

_clearmid:
        rlwinm r3, r3, 0, 8, 23
        blr

_clearwrap:
        rlwinm r3, r3, 0, 24, 7
        blr

llvm-svn: 16832
2004-10-08 02:49:24 +00:00
Misha Brukman 3e0a20ebb8 Fix usage description typo
llvm-svn: 16831
2004-10-08 01:11:15 +00:00
Misha Brukman f5a92bda7b Make comment header span the entire line
llvm-svn: 16830
2004-10-08 01:10:52 +00:00
Misha Brukman ed3dc439e1 Describe how to configure tests to work with f2c
llvm-svn: 16829
2004-10-08 00:55:43 +00:00
Misha Brukman 8d08d36199 * Reformat to fit 80 cols
* Add missing <li> tags

llvm-svn: 16828
2004-10-08 00:41:27 +00:00
Nate Begeman 6e6514c47e Several fixes and enhancements to the PPC32 backend.
1. Fix an illegal argument to getClassB when deciding whether or not to
   sign extend a byte load.

2. Initial addition of isLoad and isStore flags to the instruction .td file
   for eventual use in a scheduler.

3. Rewrite of how constants are handled in emitSimpleBinaryOperation so
   that we can emit the PowerPC shifted immediate instructions far more
   often.  This allows us to emit the following code:

int foo(int x) { return x | 0x00F0000; }

_foo:
.LBB_foo_0:     ; entry
        ; IMPLICIT_DEF
        oris r3, r3, 15
        blr

llvm-svn: 16826
2004-10-07 22:30:03 +00:00
Nate Begeman c6b63cd2ed Add ori reg, reg, 0 as a move instruction. This can be generated from
loading a 32bit constant into a register whose low halfword is all zeroes.

We now omit the ori after the lis for the following C code:

int bar(int y) { return y * 0x00F0000; }

_bar:
.LBB_bar_0:     ; entry
        ; IMPLICIT_DEF
        lis r2, 15
        mullw r3, r3, r2
        blr

llvm-svn: 16825
2004-10-07 22:26:12 +00:00
Nate Begeman 70a9d9c0b1 Remove unnecessary header include
llvm-svn: 16824
2004-10-07 22:24:32 +00:00
Chris Lattner 617f1a34f1 Improve comments, no functionality changes
llvm-svn: 16814
2004-10-07 21:30:30 +00:00
Chris Lattner 3ae7bb6b7c Fix a nasty dangling pointer problem, due to a free'd pointer being left in
a map.  This caused problems if a later object happened to be allocated at
the free'd object's address.

llvm-svn: 16813
2004-10-07 20:01:31 +00:00
Chris Lattner b0b1cb2182 Get friendly with Type
llvm-svn: 16812
2004-10-07 19:21:43 +00:00
Chris Lattner 251093ca5d Unfortunately the fix for the previous bug introduced the previous
exponential behavior (bork!).  This patch processes stuff with an
explicit SCC finder, allowing the algorithm to be more clear,
efficient, and also (as a bonus) correct!  This gets us back to taking
0.6s to disassemble my horrible .bc file that previously took something
> 30 mins.

llvm-svn: 16811
2004-10-07 19:20:48 +00:00
Chris Lattner d299ffee01 Change signature of this method again
llvm-svn: 16810
2004-10-07 19:19:12 +00:00
Chris Lattner 31d9e6f922 These files now live in Transforms/GlobalOpt
llvm-svn: 16809
2004-10-07 19:16:43 +00:00
Chris Lattner 5860106954 Move these files from Transforms/GlobalConstifier
llvm-svn: 16808
2004-10-07 19:16:26 +00:00
Chris Lattner cef3c06027 Fix a bug in my previous change. Unfortunately this reverts most of the
speedup, but has the advantage of not breaking a bunch of programs!

llvm-svn: 16806
2004-10-07 16:19:40 +00:00
Reid Spencer 50a425a56d Make these scripts work on SunOS too.
llvm-svn: 16805
2004-10-07 16:03:21 +00:00
Chris Lattner 02b6c918b7 Fix a bug in the safety analysis routine
llvm-svn: 16804
2004-10-07 06:01:25 +00:00
Chris Lattner f64799683e Comment cleanups
llvm-svn: 16803
2004-10-07 06:00:24 +00:00
Chris Lattner 25db58032d * Rename pass to globalopt, since we do more than just constify
* Instead of handling dead functions specially, just nuke them.
* Be more aggressive about cleaning up after constification, in
  particular, handle getelementptr instructions and constantexprs.
* Be a little bit more structured about how we process globals.

*** Delete globals that are only stored to, and never read.  These are
    clearly not useful, so they should go.  This implements deadglobal.llx

This last one triggers quite a few times.  In particular, 2208 in the
external tests, 1865 of which are in 252.eon.  This shrinks eon from
1995094 to 1732341 bytes of bytecode.

llvm-svn: 16802
2004-10-07 04:16:33 +00:00
Chris Lattner fa3cfd3955 Rename pass
llvm-svn: 16801
2004-10-07 04:12:02 +00:00
Chris Lattner b0c8aab038 This pass is not needed, as there is only ever one global: the stack
llvm-svn: 16800
2004-10-07 04:10:36 +00:00
Chris Lattner 381fbf1616 Add new testcase, rename pass
llvm-svn: 16799
2004-10-07 04:07:08 +00:00
Chris Lattner fc303099c9 Don't add libz or libbz2 to the USEDLIBS lists, those are for LLVM libraries.
llvm-svn: 16798
2004-10-07 00:03:11 +00:00
Chris Lattner 76319a83bd Don't call memset if malloc returns a null pointer
llvm-svn: 16797
2004-10-06 23:08:03 +00:00
Chris Lattner 1f849a08a3 Implement GlobalConstifier/trivialstore.llx, and also do some
simplifications of the resultant program to avoid making later passes
do it all.

This allows us to constify globals that just have the same constant that
they are initialized stored into them.

Suprisingly this comes up ALL of the freaking time, dozens of times in
SPEC, 30 times in vortex alone.

For example, on 256.bzip2, it allows us to constify these two globals:

%smallMode = internal global ubyte 0             ; <ubyte*> [#uses=8]
%verbosity = internal global int 0               ; <int*> [#uses=49]

Which (with later optimizations) results in the bytecode file shrinking
from 82286 to 69686 bytes!  Lets hear it for IPO :)

For the record, it's nuking lots of "if (verbosity > 2) { do lots of stuff }"
code.

llvm-svn: 16793
2004-10-06 20:57:02 +00:00
Chris Lattner 645bcf6c5d New testcase
llvm-svn: 16791
2004-10-06 20:42:51 +00:00
Chris Lattner af88fcd4c9 Dont' let null nodes sneak past cast instructions
llvm-svn: 16779
2004-10-06 19:29:13 +00:00
Misha Brukman fe643e314f Undoxyfy internal method.
llvm-svn: 16774
2004-10-06 17:19:58 +00:00
Misha Brukman 74a1195bd6 Doxygen-ify comments
llvm-svn: 16773
2004-10-06 16:56:16 +00:00
Chris Lattner 43e03c9cdf Change Type::isAbstract to have better comments, a more correct name
(PromoteAbstractToConcrete), and to use a set to avoid recomputation.
In particular, this set eliminates the potentially exponential cases
from this little recursive algorithm.

On a particularly nasty testcase, llvm-dis on the .bc file went from 34
minutes (which is when I killed it, it still hadn't finished) to 0.57s.
Remember kids, exponential algorithms are bad.

llvm-svn: 16772
2004-10-06 16:36:46 +00:00
Chris Lattner f29560783a Rename method, change comment, add argument
llvm-svn: 16771
2004-10-06 16:34:23 +00:00
Chris Lattner f94f985bbd Correct some typeos
llvm-svn: 16770
2004-10-06 16:28:24 +00:00
Chris Lattner 0aee4b7947 Instcombine: -(X sdiv C) -> (X sdiv -C), tested by sub.ll:test16
llvm-svn: 16769
2004-10-06 15:08:25 +00:00
Chris Lattner 52783ab1d8 New testcase
llvm-svn: 16768
2004-10-06 15:07:56 +00:00
Chris Lattner 93867e516a Remove debugging code, fix encoding problem. This fixes the problems
the JIT had last night.

llvm-svn: 16766
2004-10-06 14:31:50 +00:00
Nate Begeman 9a1fbaf1e9 Turning on fsel code gen now that we can do so would be good.
llvm-svn: 16765
2004-10-06 11:03:30 +00:00
Nate Begeman fac8529df8 Implement floating point select for lt, gt, le, ge using the powerpc fsel
instruction.

Now, rather than emitting the following loop out of bisect:
.LBB_main_19:	; no_exit.0.i
	rlwinm r3, r2, 3, 0, 28
	lfdx f1, r3, r27
	addis r3, r30, ha16(.CPI_main_1-"L00000$pb")
	lfd f2, lo16(.CPI_main_1-"L00000$pb")(r3)
	fsub f2, f2, f1
	addis r3, r30, ha16(.CPI_main_1-"L00000$pb")
	lfd f4, lo16(.CPI_main_1-"L00000$pb")(r3)
	fcmpu cr0, f1, f4
	bge .LBB_main_64	; no_exit.0.i
.LBB_main_63:	; no_exit.0.i
	b .LBB_main_65	; no_exit.0.i
.LBB_main_64:	; no_exit.0.i
	fmr f2, f1
.LBB_main_65:	; no_exit.0.i
	addi r3, r2, 1
	rlwinm r3, r3, 3, 0, 28
	lfdx f1, r3, r27
	addis r3, r30, ha16(.CPI_main_1-"L00000$pb")
	lfd f4, lo16(.CPI_main_1-"L00000$pb")(r3)
	fsub f4, f4, f1
	addis r3, r30, ha16(.CPI_main_1-"L00000$pb")
	lfd f5, lo16(.CPI_main_1-"L00000$pb")(r3)
	fcmpu cr0, f1, f5
	bge .LBB_main_67	; no_exit.0.i
.LBB_main_66:	; no_exit.0.i
	b .LBB_main_68	; no_exit.0.i
.LBB_main_67:	; no_exit.0.i
	fmr f4, f1
.LBB_main_68:	; no_exit.0.i
	fadd f1, f2, f4
	addis r3, r30, ha16(.CPI_main_2-"L00000$pb")
	lfd f2, lo16(.CPI_main_2-"L00000$pb")(r3)
	fmul f1, f1, f2
	rlwinm r3, r2, 3, 0, 28
	lfdx f2, r3, r28
	fadd f4, f2, f1
	fcmpu cr0, f4, f0
	bgt .LBB_main_70	; no_exit.0.i
.LBB_main_69:	; no_exit.0.i
	b .LBB_main_71	; no_exit.0.i
.LBB_main_70:	; no_exit.0.i
	fmr f0, f4
.LBB_main_71:	; no_exit.0.i
	fsub f1, f2, f1
	addi r2, r2, -1
	fcmpu cr0, f1, f3
	blt .LBB_main_73	; no_exit.0.i
.LBB_main_72:	; no_exit.0.i
	b .LBB_main_74	; no_exit.0.i
.LBB_main_73:	; no_exit.0.i
	fmr f3, f1
.LBB_main_74:	; no_exit.0.i
	cmpwi cr0, r2, -1
	fmr f16, f0
	fmr f17, f3
	bgt .LBB_main_19	; no_exit.0.i

We emit this instead:
.LBB_main_19:	; no_exit.0.i
	rlwinm r3, r2, 3, 0, 28
	lfdx f1, r3, r27
	addis r3, r30, ha16(.CPI_main_1-"L00000$pb")
	lfd f2, lo16(.CPI_main_1-"L00000$pb")(r3)
	fsub f2, f2, f1
	fsel f1, f1, f1, f2
	addi r3, r2, 1
	rlwinm r3, r3, 3, 0, 28
	lfdx f2, r3, r27
	addis r3, r30, ha16(.CPI_main_1-"L00000$pb")
	lfd f4, lo16(.CPI_main_1-"L00000$pb")(r3)
	fsub f4, f4, f2
	fsel f2, f2, f2, f4
	fadd f1, f1, f2
	addis r3, r30, ha16(.CPI_main_2-"L00000$pb")
	lfd f2, lo16(.CPI_main_2-"L00000$pb")(r3)
	fmul f1, f1, f2
	rlwinm r3, r2, 3, 0, 28
	lfdx f2, r3, r28
	fadd f4, f2, f1
	fsub f5, f0, f4
	fsel f0, f5, f0, f4
	fsub f1, f2, f1
	addi r2, r2, -1
	fsub f2, f1, f3
	fsel f3, f2, f3, f1
	cmpwi cr0, r2, -1
	fmr f16, f0
	fmr f17, f3
	bgt .LBB_main_19	; no_exit.0.i

llvm-svn: 16764
2004-10-06 09:53:04 +00:00
Chris Lattner 6835dedb5b Codegen signed mod by 2 or -2 more efficiently. Instead of generating:
t:
        mov %EDX, DWORD PTR [%ESP + 4]
        mov %ECX, 2
        mov %EAX, %EDX
        sar %EDX, 31
        idiv %ECX
        mov %EAX, %EDX
        ret

Generate:
t:
        mov %ECX, DWORD PTR [%ESP + 4]
***     mov %EAX, %ECX
        cdq
        and %ECX, 1
        xor %ECX, %EDX
        sub %ECX, %EDX
***     mov %EAX, %ECX
        ret

Note that the two marked moves are redundant, and should be eliminated by the
register allocator, but aren't.

Compare this to GCC, which generates:

t:
        mov     %eax, DWORD PTR [%esp+4]
        mov     %edx, %eax
        shr     %edx, 31
        lea     %ecx, [%edx+%eax]
        and     %ecx, -2
        sub     %eax, %ecx
        ret

or ICC 8.0, which generates:

t:
        movl      4(%esp), %ecx                                 #3.5
        movl      $-2147483647, %eax                            #3.25
        imull     %ecx                                          #3.25
        movl      %ecx, %eax                                    #3.25
        sarl      $31, %eax                                     #3.25
        addl      %ecx, %edx                                    #3.25
        subl      %edx, %eax                                    #3.25
        addl      %eax, %eax                                    #3.25
        negl      %eax                                          #3.25
        subl      %eax, %ecx                                    #3.25
        movl      %ecx, %eax                                    #3.25
        ret                                                     #3.25

We would be in great shape if not for the moves.

llvm-svn: 16763
2004-10-06 05:01:07 +00:00
Chris Lattner e4c60eb704 Really fix FreeBSD, which apparently doesn't tolerate the extern.
Thanks to Jeff Cohen for pointing out my goof.

llvm-svn: 16762
2004-10-06 04:21:52 +00:00
Chris Lattner 7bd8f1332d Fix a scary bug with signed division by a power of two. We used to generate:
s:   ;; X / 4
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %ECX, %EAX
        sar %ECX, 1
        shr %ECX, 30
        mov %EDX, %EAX
        add %EDX, %ECX
        sar %EAX, 2
        ret

When we really meant:

s:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %ECX, %EAX
        sar %ECX, 1
        shr %ECX, 30
        add %EAX, %ECX
        sar %EAX, 2
        ret

Hey, this also reduces register pressure too :)

llvm-svn: 16761
2004-10-06 04:19:43 +00:00
Chris Lattner 147edd2f7e Codegen signed divides by 2 and -2 more efficiently. In particular
instead of:

s:   ;; X / 2
        movl 4(%esp), %eax
        movl %eax, %ecx
        shrl $31, %ecx
        movl %eax, %edx
        addl %ecx, %edx
        sarl $1, %eax
        ret

t:   ;; X / -2
        movl 4(%esp), %eax
        movl %eax, %ecx
        shrl $31, %ecx
        movl %eax, %edx
        addl %ecx, %edx
        sarl $1, %eax
        negl %eax
        ret

Emit:

s:
        movl 4(%esp), %eax
        cmpl $-2147483648, %eax
        sbbl $-1, %eax
        sarl $1, %eax
        ret

t:
        movl 4(%esp), %eax
        cmpl $-2147483648, %eax
        sbbl $-1, %eax
        sarl $1, %eax
        negl %eax
        ret

llvm-svn: 16760
2004-10-06 04:02:39 +00:00
Chris Lattner e9bfa5a2a4 Add some new instructions. Fix the asm string for sbb32rr
llvm-svn: 16759
2004-10-06 04:01:02 +00:00
Chris Lattner 2ce32df8b0 Reduce code growth implied by the tail duplication pass by not duplicating
an instruction if it can be hoisted to a common dominator of the block.
This implements: test/Regression/Transforms/TailDup/MergeTest.ll

llvm-svn: 16758
2004-10-06 03:27:37 +00:00
Chris Lattner 7d83efbc0b When tail duplicating these functions, the add instruction should not be
duplicated, even though the block it is in is duplicated.

llvm-svn: 16757
2004-10-06 03:26:38 +00:00