Commit Graph

14303 Commits

Author SHA1 Message Date
Chris Lattner 936cd3390d Remove dead var
llvm-svn: 28250
2006-05-12 17:50:35 +00:00
Chris Lattner ce9ac0cdab Remove dead variable
llvm-svn: 28249
2006-05-12 17:41:45 +00:00
Chris Lattner f76c42776d remove dead variable.
llvm-svn: 28248
2006-05-12 17:33:59 +00:00
Chris Lattner 9cd2ef34e6 Remove dead variable.
llvm-svn: 28247
2006-05-12 17:31:21 +00:00
Chris Lattner 8c02c3f41a Compile:
%tmp152 = setgt uint %tmp144, %tmp149           ; <bool> [#uses=1]
        %tmp159 = setlt uint %tmp144, %tmp149           ; <bool> [#uses=1]
        %bothcond2 = or bool %tmp152, %tmp159           ; <bool> [#uses=1]

To setne, not setune, which causes an assertion fault.

llvm-svn: 28244
2006-05-12 17:03:46 +00:00
Chris Lattner a296339c87 Fix PowerPC/2006-05-12-rlwimi-crash.ll
Nate, please verify that if InsertMask is 0, rlwimi shouldn't be used.
This fixes the crash and causes no PPC testsuite regressions.

llvm-svn: 28243
2006-05-12 16:29:37 +00:00
Owen Anderson 5fea9f0a93 Add a method to generate a string representation from a TargetData.
This continues the work on PR 761.

llvm-svn: 28239
2006-05-12 07:01:44 +00:00
Owen Anderson 8c2c1e90c4 Refactor a bunch of includes so that TargetMachine.h doesn't have to include
TargetData.h.  This should make recompiles a bit faster with my current
TargetData tinkering.

llvm-svn: 28238
2006-05-12 06:33:49 +00:00
Owen Anderson d7c77b8c56 Fix some tabbing issues.
llvm-svn: 28237
2006-05-12 06:06:55 +00:00
Evan Cheng 6a6886185b Backing out fix for PR770. Need to re-apply it after live range splitting is possible
llvm-svn: 28236
2006-05-12 06:06:34 +00:00
Evan Cheng 095c9d9b7f Duh. That could take a long time.
llvm-svn: 28235
2006-05-12 06:05:18 +00:00
Owen Anderson 8d7774cb1d Add a new constructor to TargetData that builds a TargetData from its
string representation.

This is part of PR 761.

llvm-svn: 28234
2006-05-12 05:49:47 +00:00
Chris Lattner 66adee93aa Two simplifications for token factor nodes: simplify tf(x,x) -> x.
simplify tf(x,y,y,z) -> tf(x,y,z).

llvm-svn: 28233
2006-05-12 05:01:37 +00:00
Evan Cheng afed73eebe Add capability to scheduler to commute nodes for profit.
If a two-address code whose first operand has uses below, it should be commuted
when possible.

llvm-svn: 28230
2006-05-12 01:58:24 +00:00
Evan Cheng c30a5558f2 Typo! How did we commute nodes before?!
llvm-svn: 28229
2006-05-12 01:46:26 +00:00
Chris Lattner 22acb80971 For extra sanity checking, fill free'd memory with garbage so we know that
people aren't reusing machine code buffers at all.

llvm-svn: 28228
2006-05-12 00:03:12 +00:00
Chris Lattner 3c729b9e5d Fix some bugs in the freelist manipulation code.
Finally, implement ExecutionEngine::freeMachineCodeForFunction.

llvm-svn: 28227
2006-05-11 23:56:57 +00:00
Evan Cheng d38c22bdd3 Refactor scheduler code. Move register-reduction list scheduler to a
separate file. Added an initial implementation of top-down register pressure
reduction list scheduler.

llvm-svn: 28226
2006-05-11 23:55:42 +00:00
Chris Lattner 873ef133ce Significantly revamp allocation of machine code to use free lists, real
allocation policies and much more.  All this complexity, and we have no
functionality change, woo! :)

llvm-svn: 28225
2006-05-11 23:08:08 +00:00
Chris Lattner 1443bc52be Refactor some code, making it simpler.
When doing the initial pass of constant folding, if we get a constantexpr,
simplify the constant expr like we would do if the constant is folded in the
normal loop.

This fixes the missed-optimization regression in
Transforms/InstCombine/getelementptr.ll last night.

llvm-svn: 28224
2006-05-11 17:11:52 +00:00
Evan Cheng dd7230c9e0 Add MOV16_rm / MOV32_rm and MOV16_mr / MOV32_mr to isLoadFromStackSlot and isStoreToStackSlot
llvm-svn: 28223
2006-05-11 07:33:49 +00:00
Evan Cheng 47926aff96 Set weight of zero length intervals to infinite to prevent them from being
spilled.

llvm-svn: 28220
2006-05-11 07:29:24 +00:00
Evan Cheng db6aa4896b Backing out previous check-in.
llvm-svn: 28219
2006-05-11 07:28:16 +00:00
Evan Cheng 6ad040a6bc If the live interval legnth is essentially zero, i.e. in every live range
the use follows def immediately, it doesn't make sense to spill it and
hope it will be easier to allocate for this LI.

llvm-svn: 28217
2006-05-10 22:30:41 +00:00
Chris Lattner a36ee4ea34 Two changes:
1. Implement InstCombine/deadcode.ll by not adding instructions in unreachable
   blocks (due to constants in conditional branches/switches) to the worklist.
   This causes them to be deleted before instcombine starts up, leading to
   better optimization.

2. In the prepass over instructions, do trivial constprop/dce as we go.  This
   has the effect of improving the effectiveness of #1.  In addition, it
   *significantly* speeds up instcombine on test cases with large amounts of
   constant folding code (for example, that produced by code specialization
   or partial evaluation).  In one example, it speeds up instcombine from
   0.0589s to 0.0224s with a release build (a 2.6x speedup).

llvm-svn: 28215
2006-05-10 19:00:36 +00:00
Chris Lattner b25cb79604 Fix the PowerPC JIT-only failure on UnitTests/Vector/sumarray-dbl, which is
really a bad codegen bug that LLC happens to get lucky with. I must chat with
Nate for the proper fix.

llvm-svn: 28213
2006-05-10 06:38:32 +00:00
Evan Cheng 9665ba053f Templatify RegReductionPriorityQueue
llvm-svn: 28212
2006-05-10 06:16:44 +00:00
Chris Lattner bb7ff6690f Add an assertion for a common error
llvm-svn: 28210
2006-05-10 04:32:43 +00:00
Nate Begeman 1a225d23ae Fix PR773
llvm-svn: 28207
2006-05-09 18:20:51 +00:00
Chris Lattner f801792e08 Fix a regression in my patch from last night that broke the llvmgcc4 build on
ppc

llvm-svn: 28205
2006-05-09 16:41:59 +00:00
Chris Lattner 2814134a5d Indent .data/.text in the .s file
llvm-svn: 28204
2006-05-09 16:15:00 +00:00
Evan Cheng 7d693898ee Add pseudo dependency to force a def&use operand to be scheduled last (unless
the distance between the def and another use is much longer). This is under
option control for now "-sched-lower-defnuse".

llvm-svn: 28201
2006-05-09 07:13:34 +00:00
Evan Cheng 2c74848af1 Debugging info
llvm-svn: 28200
2006-05-09 06:55:15 +00:00
Evan Cheng fc532fe1b7 Remove a completed entry.
llvm-svn: 28199
2006-05-09 06:54:05 +00:00
Evan Cheng ae45020720 PR 770 - permit coallescing of registers in subset register classes.
llvm-svn: 28197
2006-05-09 06:37:48 +00:00
Chris Lattner 4ebc6a2311 Implement MASM sections correctly, without a "has masm sections flag" and a bunch of special case code.
llvm-svn: 28194
2006-05-09 05:33:48 +00:00
Chris Lattner 8c2bfc0659 Oh yeah, there are two of these now, unify both.
llvm-svn: 28192
2006-05-09 05:24:50 +00:00
Chris Lattner 6341df8069 Setting SwitchToSectionDirective properly in the MASM backend permits a bunch
of code to be unified.

llvm-svn: 28191
2006-05-09 05:23:12 +00:00
Chris Lattner 0b7acaf027 MASM doesn't have one of these.
llvm-svn: 28190
2006-05-09 05:21:47 +00:00
Chris Lattner d36cc2b610 Don't prefix section directives with a tab. Doing so causes blank lines to
be emitted to the .s file.

llvm-svn: 28189
2006-05-09 05:19:59 +00:00
Chris Lattner e64f764d25 Make the masm codepath work like the normal code path.
llvm-svn: 28188
2006-05-09 05:15:58 +00:00
Chris Lattner e0006c6794 Preserve prior behavior
llvm-svn: 28187
2006-05-09 05:15:24 +00:00
Chris Lattner c0f0dfa56f The MASM asmprinter has been fixed, these hacks are no longer needed.
llvm-svn: 28186
2006-05-09 05:13:34 +00:00
Chris Lattner d0201946ad Fix the MASM asmprinter's lies. It does not want to emit code to .text/.data
it wants it emitted to _text/_data.

llvm-svn: 28185
2006-05-09 05:12:53 +00:00
Chris Lattner 8488ba2e41 Split SwitchSection into SwitchTo{Text|Data}Section methods.
llvm-svn: 28184
2006-05-09 04:59:56 +00:00
Chris Lattner 8587f8885d Some notes and thoughts to myself
llvm-svn: 28182
2006-05-09 04:58:46 +00:00
Chris Lattner 4fe87d67c4 Patch to make some xforms preserve each other. Patch contributed by
Domagoj Babic!

llvm-svn: 28181
2006-05-09 04:13:41 +00:00
Chris Lattner 6d8dd189f6 Move some methods out of line so that MutexGuard.h isn't needed in a public header.
llvm-svn: 28179
2006-05-08 22:00:52 +00:00
Chris Lattner aa193d80a9 Another bad case I noticed
llvm-svn: 28177
2006-05-08 21:39:45 +00:00
Chris Lattner 5bcea612f4 add a note
llvm-svn: 28176
2006-05-08 21:24:21 +00:00
Chris Lattner 446e1ef26a Make the case I just checked in stronger. Now we compile this:
short test2(short X, short x) {
  int Y = (short)(X+x);
  return Y >> 1;
}

to:

_test2:
        add r2, r3, r4
        extsh r2, r2
        srawi r3, r2, 1
        blr

instead of:

_test2:
        add r2, r3, r4
        extsh r2, r2
        srwi r2, r2, 1
        extsh r3, r2
        blr

llvm-svn: 28175
2006-05-08 21:18:59 +00:00
Chris Lattner 29062da0ac Implement and_sext.ll:test3, generating:
_test4:
        srawi r3, r3, 16
        blr

instead of:

_test4:
        srwi r2, r3, 16
        extsh r3, r2
        blr

for:

short test4(unsigned X) {
  return (X >> 16);
}

llvm-svn: 28174
2006-05-08 20:59:41 +00:00
Nate Begeman ce6646c366 Yet more readme updating
llvm-svn: 28172
2006-05-08 20:54:02 +00:00
Chris Lattner 2935d8190c Compile this:
short test4(unsigned X) {
  return (X >> 16);
}

to:

_test4:
        movl 4(%esp), %eax
        sarl $16, %eax
        ret

instead of:

_test4:
        movl $-65536, %eax
        andl 4(%esp), %eax
        sarl $16, %eax
        ret

llvm-svn: 28171
2006-05-08 20:51:54 +00:00
Nate Begeman 68a45419cc New note about something bad happening in target independent optimizers
llvm-svn: 28170
2006-05-08 20:08:28 +00:00
Nate Begeman 0eb8f2e496 Proving once again that I am not as smart as the compiler
llvm-svn: 28169
2006-05-08 19:09:24 +00:00
Nate Begeman 9b6d4c2968 Fold more shifts into inserts, and update the README
llvm-svn: 28168
2006-05-08 17:38:32 +00:00
Chris Lattner 78da6792e7 Fold shifts with undef operands.
llvm-svn: 28167
2006-05-08 17:29:49 +00:00
Chris Lattner 10c653744e When tracking demanded bits, if any bits from the sext of an SRA are demanded,
then so is the input sign bit.  This fixes mediabench/g721 on X86.

llvm-svn: 28166
2006-05-08 17:22:53 +00:00
Nate Begeman d7a19102d1 Make emission of jump tables a bit less conservative; they are now required
to be only 31.25% dense, rather than 75% dense.

llvm-svn: 28165
2006-05-08 16:51:36 +00:00
Evan Cheng 9733bde74c Fixing truncate. Previously we were emitting truncate from r16 to r8 as
movw. That is we promote the destination operand to r16. So
        %CH = TRUNC_R16_R8 %BP
is emitted as
        movw %bp, %cx.

This is incorrect. If %cl is live, it would be clobbered.
Ideally we want to do the opposite, that is emitted it as
        movb ??, %ch
But this is not possible since %bp does not have a r8 sub-register.

We are now defining a new register class R16_ which is a subclass of R16
containing only those 16-bit registers that have r8 sub-registers (i.e.
AX - DX). We isel the truncate to two instructions, a MOV16to16_ to copy the
value to the R16_ class, followed by a TRUNC_R16_R8.

Due to bug 770, the register colaescer is not going to coalesce between R16 and
R16_. That will be fixed later so we can eliminate the MOV16to16_. Right now, it
can only be eliminated if we are lucky that source and destination registers are
the same.

llvm-svn: 28164
2006-05-08 08:01:26 +00:00
Nate Begeman dc996b3f6c Update some stuff now that the new rlwimi code has gone in
llvm-svn: 28162
2006-05-08 02:52:38 +00:00
Nate Begeman e5ce5bb6da Fix PR772
llvm-svn: 28161
2006-05-08 01:35:01 +00:00
Evan Cheng 6732dcd5b3 Typo's
llvm-svn: 28158
2006-05-07 10:10:20 +00:00
Jeff Cohen 04196acdcb Unlike Unix, Windows won't let a file be implicitly replaced via renaming without explicit permission.
llvm-svn: 28157
2006-05-07 02:51:51 +00:00
Nate Begeman 1333cead5b New rlwimi implementation, which is superior to the old one. There are
still a couple missed optimizations, but we now generate all the possible
rlwimis for multiple inserts into the same bitfield.  More regression tests
to come.

llvm-svn: 28156
2006-05-07 00:23:38 +00:00
Chris Lattner cd4a643728 Use ComputeMaskedBits to determine # sign bits as a fallback. This allows us
to handle all kinds of stuff, including silly things like:
sextinreg(setcc,i16) -> setcc.

llvm-svn: 28155
2006-05-06 23:48:13 +00:00
Chris Lattner 4f3de3e33c Add some more sign propagation cases
llvm-svn: 28154
2006-05-06 23:40:29 +00:00
Jeff Cohen 9978476c14 Apply bug fix supplied by Greg Pettyjohn for a bug he found: '<invalid>' is not a legal path on Windows.
llvm-svn: 28153
2006-05-06 23:25:53 +00:00
Chris Lattner 7e7bcf3a54 Simplify some code, add a couple minor missed folds
llvm-svn: 28152
2006-05-06 23:06:26 +00:00
Chris Lattner 751817c54f constant fold sign_extend_inreg
llvm-svn: 28151
2006-05-06 23:05:41 +00:00
Chris Lattner 2a4d7b845b remove cases handled elsewhere
llvm-svn: 28150
2006-05-06 22:43:44 +00:00
Chris Lattner f86075731d Add some more simple sign bit propagation cases.
llvm-svn: 28149
2006-05-06 22:39:59 +00:00
Jeff Cohen ce9b9fe6eb Fix some loose ends in MASM support.
llvm-svn: 28148
2006-05-06 21:27:14 +00:00
Chris Lattner 1ecb2a2dac Use the new TargetLowering::ComputeNumSignBits method to eliminate
sign_extend_inreg operations.  Though ComputeNumSignBits is still rudimentary,
this is enough to compile this:

short test(short X, short x) {
  int Y = X+x;
  return (Y >> 1);
}
short test2(short X, short x) {
  int Y = (short)(X+x);
  return Y >> 1;
}

into:

_test:
        add r2, r3, r4
        srawi r3, r2, 1
        blr
_test2:
        add r2, r3, r4
        extsh r2, r2
        srawi r3, r2, 1
        blr

instead of:

_test:
        add r2, r3, r4
        srawi r2, r2, 1
        extsh r3, r2
        blr
_test2:
        add r2, r3, r4
        extsh r2, r2
        srawi r2, r2, 1
        extsh r3, r2
        blr

llvm-svn: 28146
2006-05-06 09:30:03 +00:00
Chris Lattner 7206d74f0c Add some really really simple code for computing sign-bit propagation.
This will certainly be enhanced in the future.

llvm-svn: 28145
2006-05-06 09:27:13 +00:00
Chris Lattner 21cd99024a When inserting casts, be careful of where we put them. We cannot insert
a cast immediately before a PHI node.

This fixes Regression/CodeGen/Generic/2006-05-06-GEP-Cast-Sink-Crash.ll

llvm-svn: 28143
2006-05-06 09:10:37 +00:00
Chris Lattner 1d441adfbf Move some code around.
Make the "fold (and (cast A), (cast B)) -> (cast (and A, B))" transformation
only apply when both casts really will cause code to be generated.  If one or
both doesn't, then this xform doesn't remove a cast.

This fixes Transforms/InstCombine/2006-05-06-Infloop.ll

llvm-svn: 28141
2006-05-06 09:00:16 +00:00
Chris Lattner 6d4a2dc4ad Teach the X86 backend about non-i32 inline asm register classes.
llvm-svn: 28139
2006-05-06 00:29:37 +00:00
Chris Lattner 86a1467fc0 Fold (trunc (srl x, c)) -> (srl (trunc x), c)
llvm-svn: 28138
2006-05-06 00:11:52 +00:00
Chris Lattner 907e392dba Fold trunc(any_ext). This gives stuff like:
27,28c27
<       movzwl %di, %edi
<       movl %edi, %ebx
---
>       movw %di, %bx

llvm-svn: 28137
2006-05-05 22:56:26 +00:00
Chris Lattner 57f8c5a387 Shrink shifts when possible.
llvm-svn: 28136
2006-05-05 22:53:17 +00:00
Chris Lattner 0f64932a5c Implement ComputeMaskedBits/SimplifyDemandedBits for ISD::TRUNCATE
llvm-svn: 28135
2006-05-05 22:32:12 +00:00
Chris Lattner 8b9e11c110 Print a grouping around inline asm blocks so that we can tell when we are
using them.

llvm-svn: 28134
2006-05-05 21:50:04 +00:00
Chris Lattner c22d4bede5 Print *some* grouping around inline asm blocks so we know where they are.
llvm-svn: 28133
2006-05-05 21:48:50 +00:00
Chris Lattner a633c31319 Indent multiline asm strings more nicely
llvm-svn: 28132
2006-05-05 21:47:05 +00:00
Chris Lattner 44a73e9fa5 Teach the code generator to use cvtss2sd as extload f32 -> f64
llvm-svn: 28131
2006-05-05 21:35:18 +00:00
Chris Lattner 3d26577396 Fold (fpext (load x)) -> (extload x)
llvm-svn: 28130
2006-05-05 21:34:35 +00:00
Chris Lattner 3e3f2c63c3 More aggressively sink GEP offsets into loops. For example, before we
generated:

        movl 8(%esp), %eax
        movl %eax, %edx
        addl $4316, %edx
        cmpb $1, %cl
        ja LBB1_2       #cond_false
LBB1_1: #cond_true
        movl L_QuantizationTables720$non_lazy_ptr, %ecx
        movl %ecx, (%edx)
        movl L_QNOtoQuantTableShift720$non_lazy_ptr, %edx
        movl %edx, 4460(%eax)
        ret
...

Now we generate:

        movl 8(%esp), %eax
        cmpb $1, %cl
        ja LBB1_2       #cond_false
LBB1_1: #cond_true
        movl L_QuantizationTables720$non_lazy_ptr, %ecx
        movl %ecx, 4316(%eax)
        movl L_QNOtoQuantTableShift720$non_lazy_ptr, %ecx
        movl %ecx, 4460(%eax)
        ret

... which uses one fewer register.

llvm-svn: 28129
2006-05-05 21:17:49 +00:00
Chris Lattner e745c7de0e Fix an infinite loop compiling oggenc last night.
llvm-svn: 28128
2006-05-05 20:51:30 +00:00
Evan Cheng 52c22512b9 Need extload patterns after Chris' DAG combiner changes
llvm-svn: 28127
2006-05-05 08:23:07 +00:00
Chris Lattner 3af1053488 Implement InstCombine/cast.ll:test29
llvm-svn: 28126
2006-05-05 06:39:07 +00:00
Chris Lattner 25a5283a86 Fold some common code.
llvm-svn: 28124
2006-05-05 06:32:04 +00:00
Chris Lattner 002ee91457 Implement:
// fold (and (sext x), (sext y)) -> (sext (and x, y))
  // fold (or  (sext x), (sext y)) -> (sext (or  x, y))
  // fold (xor (sext x), (sext y)) -> (sext (xor x, y))
  // fold (and (aext x), (aext y)) -> (aext (and x, y))
  // fold (or  (aext x), (aext y)) -> (aext (or  x, y))
  // fold (xor (aext x), (aext y)) -> (aext (xor x, y))

llvm-svn: 28123
2006-05-05 06:31:05 +00:00
Chris Lattner 5ac4293606 Pull and through and/or/xor. This compiles some bitfield code to:
mov EAX, DWORD PTR [ESP + 4]
        mov ECX, DWORD PTR [EAX]
        mov EDX, ECX
        add EDX, EDX
        or EDX, ECX
        and EDX, -2147483648
        and ECX, 2147483647
        or EDX, ECX
        mov DWORD PTR [EAX], EDX
        ret

instead of:

        sub ESP, 4
        mov DWORD PTR [ESP], ESI
        mov EAX, DWORD PTR [ESP + 8]
        mov ECX, DWORD PTR [EAX]
        mov EDX, ECX
        add EDX, EDX
        mov ESI, ECX
        and ESI, -2147483648
        and EDX, -2147483648
        or EDX, ESI
        and ECX, 2147483647
        or EDX, ECX
        mov DWORD PTR [EAX], EDX
        mov ESI, DWORD PTR [ESP]
        add ESP, 4
        ret

llvm-svn: 28122
2006-05-05 06:10:43 +00:00
Chris Lattner 812646aa0c Implement a variety of simplifications for ANY_EXTEND.
llvm-svn: 28121
2006-05-05 05:58:59 +00:00
Chris Lattner 8d6fc20181 Factor some code, add these transformations:
// fold (and (trunc x), (trunc y)) -> (trunc (and x, y))
  // fold (or  (trunc x), (trunc y)) -> (trunc (or  x, y))
  // fold (xor (trunc x), (trunc y)) -> (trunc (xor x, y))

llvm-svn: 28120
2006-05-05 05:51:50 +00:00
Evan Cheng ddb6cc1d8e Better implementation of truncate. ISel matches it to a pseudo instruction
that gets emitted as movl (for r32 to i16, i8) or a movw (for r16 to i8). And
if the destination gets allocated a subregister of the source operand, then
the instruction will not be emitted at all.

llvm-svn: 28119
2006-05-05 05:40:20 +00:00
Chris Lattner 304bbf3b1d New note, Nate, please check to see if I'm full of it :)
llvm-svn: 28118
2006-05-05 05:36:15 +00:00
Jeff Cohen 78a7f0e05e Fix VC++ compilation error.
llvm-svn: 28117
2006-05-05 01:47:05 +00:00
Chris Lattner 7a3ecf7993 Sink noop copies into the basic block that uses them. This reduces the number
of cross-block live ranges, and allows the bb-at-a-time selector to always
coallesce these away, at isel time.

This reduces the load on the coallescer and register allocator.  For example
on a codec on X86, we went from:

   1643 asm-printer           - Number of machine instrs printed
    419 liveintervals         - Number of loads/stores folded into instructions
   1144 liveintervals         - Number of identity moves eliminated after coalescing
   1022 liveintervals         - Number of interval joins performed
    282 liveintervals         - Number of intervals after coalescing
   1304 liveintervals         - Number of original intervals
     86 regalloc              - Number of times we had to backtrack
1.90232 regalloc              - Ratio of intervals processed over total intervals
     40 spiller               - Number of values reused
    182 spiller               - Number of loads added
    121 spiller               - Number of stores added
    132 spiller               - Number of register spills
      6 twoaddressinstruction - Number of instructions commuted to coalesce
    360 twoaddressinstruction - Number of two-address instructions

to:

   1636 asm-printer           - Number of machine instrs printed
    403 liveintervals         - Number of loads/stores folded into instructions
   1155 liveintervals         - Number of identity moves eliminated after coalescing
   1033 liveintervals         - Number of interval joins performed
    279 liveintervals         - Number of intervals after coalescing
   1312 liveintervals         - Number of original intervals
     76 regalloc              - Number of times we had to backtrack
1.88998 regalloc              - Ratio of intervals processed over total intervals
      1 spiller               - Number of copies elided
     41 spiller               - Number of values reused
    191 spiller               - Number of loads added
    114 spiller               - Number of stores added
    128 spiller               - Number of register spills
      4 twoaddressinstruction - Number of instructions commuted to coalesce
    356 twoaddressinstruction - Number of two-address instructions

On this testcase, this change provides a modest reduction in spill code,
regalloc iterations, and total instructions emitted.  It increases the number
of register coallesces.

llvm-svn: 28115
2006-05-05 01:04:50 +00:00
Chris Lattner b6277c510c Adjust to use proper TargetData copy ctor
llvm-svn: 28112
2006-05-04 21:18:40 +00:00
Chris Lattner abdf4d569c Final pass of minor cleanups for MachineInstr
llvm-svn: 28110
2006-05-04 19:36:09 +00:00
Evan Cheng 9add880566 Initial support for register pressure aware scheduling. The register reduction
scheduler can go into a "vertical mode" (i.e. traversing up the two-address
chain, etc.) when the register pressure is low.
This does seem to reduce the number of spills in the cases I've looked at. But
with x86, it's no guarantee the performance of the code improves.
It can be turned on with -sched-vertically option.

llvm-svn: 28108
2006-05-04 19:16:39 +00:00
Chris Lattner 53af9da363 Remove redundancy and a level of indirection when creating machine operands
llvm-svn: 28107
2006-05-04 19:14:44 +00:00
Chris Lattner 469647bf38 Remove and simplify some more machineinstr/machineoperand stuff.
llvm-svn: 28105
2006-05-04 18:16:01 +00:00
Chris Lattner 10b71c0d08 Rename MO_VirtualRegister -> MO_Register. Clean up immediate handling.
llvm-svn: 28104
2006-05-04 18:05:43 +00:00
Chris Lattner 10d6341618 Move some methods out of MachineInstr into MachineOperand
llvm-svn: 28102
2006-05-04 17:52:23 +00:00
Chris Lattner fb29692055 Fix Transforms/InstCombine/2006-05-04-DemandedBitCrash.ll
llvm-svn: 28101
2006-05-04 17:33:35 +00:00
Chris Lattner fef7a2d0f5 There shalt be only one "immediate" operand type!
llvm-svn: 28099
2006-05-04 17:21:20 +00:00
Chris Lattner 15c52bda1d Change "value" in MachineOperand to be a GlobalValue, as that is the only
thing that can be in it.  Remove a dead method.

llvm-svn: 28098
2006-05-04 17:02:51 +00:00
Chris Lattner 13d5f3eb05 Revert Nate's CR patch from last night, which caused many regressions (e.g. fhourstones).
Loading and storing off R0 isn't what we wanted.  Also, taking some CR's out of
CRRC seems to cause failures as well.  Further investigation is required.

llvm-svn: 28097
2006-05-04 16:56:45 +00:00
Jeff Cohen 06041abeb6 Make external globals public; other minor cleanup.
llvm-svn: 28096
2006-05-04 16:20:22 +00:00
Jeff Cohen f812a4fa75 Make Intel syntax the default when LLVM is built with VC++.
llvm-svn: 28095
2006-05-04 16:19:27 +00:00
Chris Lattner ee64b6b40f Remove a bunch more dead V9 specific stuff
llvm-svn: 28094
2006-05-04 01:26:39 +00:00
Chris Lattner 940cc978ef Remove a bunch more SparcV9 specific stuff
llvm-svn: 28093
2006-05-04 01:15:02 +00:00
Chris Lattner 6e663f1c1e Remove some more V9-specific stuff.
llvm-svn: 28092
2006-05-04 00:49:59 +00:00
Chris Lattner 9f6639b64d Remove some more unused stuff from MachineInstr that was leftover from V9.
llvm-svn: 28091
2006-05-04 00:44:25 +00:00
Chris Lattner 2aef59f123 Simplify handling of relocations
llvm-svn: 28090
2006-05-04 00:42:08 +00:00
Evan Cheng 8b1cde2bbe Use movsd to shuffle in the lowest two elements of a v4f32 / v4i32 vector when
movlps cannot be used (e.g. when load from m64 has multiple uses).

llvm-svn: 28089
2006-05-03 20:32:03 +00:00
Chris Lattner e3a9c70ba0 Change from using MachineRelocation ctors to using static methods
in MachineRelocation to create Relocations.

llvm-svn: 28088
2006-05-03 20:30:20 +00:00
Chris Lattner 005d7174c8 minor cleanups, no functionality change
llvm-svn: 28087
2006-05-03 18:55:56 +00:00
Chris Lattner 9e68942d78 inline a simple method
llvm-svn: 28083
2006-05-03 17:21:32 +00:00
Chris Lattner 1d8ee1fc80 Suck block address tracking out of targets into the JIT Emitter. This
simplifies the MachineCodeEmitter interface just a little bit and makes
BasicBlocks work like constant pools and jump tables.

llvm-svn: 28082
2006-05-03 17:10:41 +00:00
Chris Lattner 9954bc9c19 Fix a bug in Owen's checkin that broke the CBE on all non sparc v9 platforms.
llvm-svn: 28081
2006-05-03 05:48:41 +00:00
Nate Begeman 43b1ed7e3d Teach the x86 jit how to handle jump tables not directly used by a jump
instruction.

llvm-svn: 28080
2006-05-03 04:52:47 +00:00
Nate Begeman df4883971e Finish up the initial jump table implementation by allowing jump tables to
not be 100% dense.  Increase the minimum threshold for the number of cases
in a switch statement from 4 to 6 in order to create a jump table.

llvm-svn: 28079
2006-05-03 03:48:02 +00:00
Evan Cheng ffef8b9412 Bottom up register pressure reduction work: clean up some hacks and enhanced
the heuristic to further reduce spills for several test cases. (Note, it may
not necessarily translate to runtime win!)

llvm-svn: 28076
2006-05-03 02:10:45 +00:00
Owen Anderson 20a631fde7 Refactor TargetMachine, pushing handling of TargetData into the target-specific subclasses. This has one caller-visible change: getTargetData() now returns a pointer instead of a reference.
This fixes PR 759.

llvm-svn: 28074
2006-05-03 01:29:57 +00:00
Chris Lattner 37c39b9c62 Align function bodies correctly.
llvm-svn: 28073
2006-05-03 01:03:20 +00:00
Chris Lattner 77fe5b4459 Simplify some code. Don't add memory blocks to the Blocks list twice.
llvm-svn: 28071
2006-05-03 00:54:49 +00:00
Chris Lattner 667a056e11 Add assertions that verify that the actual arguments to a call or invoke match
the prototype of the called function.

llvm-svn: 28070
2006-05-03 00:48:22 +00:00
Chris Lattner d8b192ba3b Change the BasicBlockAddrs map to be a vector, indexed by MBB number.
llvm-svn: 28069
2006-05-03 00:32:55 +00:00
Chris Lattner 0267807ddc Keep the alpha JIT similar to the PPC/X86 jits
llvm-svn: 28068
2006-05-03 00:31:21 +00:00
Chris Lattner 0574e47dca Simplify some code
llvm-svn: 28066
2006-05-03 00:13:06 +00:00
Chris Lattner b8065a9a3a Several related changes:
1. Change several methods in the MachineCodeEmitter class to be pure virtual.
2. Suck emitConstantPool/initJumpTableInfo into startFunction, removing them
   from the MachineCodeEmitter interface, and reducing the amount of target-
   specific code.
3. Change the JITEmitter so that it allocates constantpools and jump tables
   *right* next to the functions that they belong to, instead of in a separate
   pool of memory.  This makes all memory for a function be contiguous, and
   means the JITEmitter only tracks one block of memory now.

llvm-svn: 28065
2006-05-02 23:22:24 +00:00
Nate Begeman 233391f5f5 Remove some stuff from the README
llvm-svn: 28063
2006-05-02 22:43:31 +00:00
Chris Lattner 23621fe8f4 Do not make the JIT memory manager manage the memory for globals. Instead
just have the JIT malloc them.

llvm-svn: 28062
2006-05-02 21:57:51 +00:00
Chris Lattner 25b95ed868 Minor cleanups, no functionality change.
llvm-svn: 28061
2006-05-02 21:44:14 +00:00
Chris Lattner e1c96369e2 Fix a purely hypothetical problem (for now): emitWord emits in the host
byte format.  This doesn't work when using the code emitter in a cross target
environment.  Since the code emitter is only really used by the JIT, this
isn't a current problem, but if we ever start emitting .o files, it would be.

llvm-svn: 28060
2006-05-02 19:14:47 +00:00
Chris Lattner c9aa3715e8 Refactor the machine code emitter interface to pull the pointers for the current
code emission location into the base class, instead of being in the derived classes.

This change means that low-level methods like emitByte/emitWord now are no longer
virtual (yaay for speed), and we now have a framework to support growable code
segments.  This implements feature request #1 of PR469.

llvm-svn: 28059
2006-05-02 18:27:26 +00:00
Nate Begeman bbcbf48aab Since we don't handle callee-save CRs right yet, don't allocate them. Also
don't step on R11 in the middle of a function when saving and restoring CRs

llvm-svn: 28058
2006-05-02 17:37:31 +00:00
Nate Begeman 4971ba5f41 Print function number instead of name
llvm-svn: 28057
2006-05-02 17:36:46 +00:00
Nate Begeman 287dc5be0d Hooray, everyone now uses the same printBasicBlockLabel implementation
llvm-svn: 28056
2006-05-02 17:34:51 +00:00
Chris Lattner 67a3aa2aaa Remove dead method
llvm-svn: 28055
2006-05-02 17:20:28 +00:00
Chris Lattner 5bc9c583e3 There is no reason to use a virtual method to store this word.
llvm-svn: 28053
2006-05-02 17:16:20 +00:00
Chris Lattner bb1c345ec6 Remove the debug machine code emitter. The "FilePrinterEmitter" is more
useful for debugging.

llvm-svn: 28051
2006-05-02 16:59:24 +00:00
Nate Begeman b9d4f8324d Extend printBasicBlockLabel a bit so that it can be used to print all
basic block labels, consolidating the code to do so in one place for each
target.

llvm-svn: 28050
2006-05-02 05:37:32 +00:00
Nate Begeman 01364fbba8 Update the PPC compilation callback code to not need weird abi-violating
prologs and epilogs, keep all the asm in one place, and remove use of
compiler builtin functions.

llvm-svn: 28049
2006-05-02 04:50:05 +00:00
Chris Lattner 2d3a02725d Add pass ID's for various passes, so they can be AddRequiredID. Patch by
Domagoj Babic!

llvm-svn: 28048
2006-05-02 04:24:36 +00:00
Jeff Cohen 470f431f44 De-virtualize SwitchSection.
llvm-svn: 28047
2006-05-02 03:58:45 +00:00
Jeff Cohen f34ddb1e0d De-virtualize EmitZeroes.
llvm-svn: 28046
2006-05-02 03:46:13 +00:00
Jeff Cohen bfe9ffb449 Finish support for Microsoft ML/MASM. May still be a few rough edges.
llvm-svn: 28045
2006-05-02 03:11:50 +00:00
Jeff Cohen 24a62a9bc1 Make Intel syntax mode friendlier to Microsoft ML assembler (still needs more work).
llvm-svn: 28044
2006-05-02 01:16:28 +00:00
Chris Lattner fd0a5478a1 Fix a latent bug that my spiller patch last week exposed: we were leaving
instructions in the virtregfolded map that were deleted.  Because they
were deleted, newly allocated instructions could end up at the same address,
magically finding themselves in the map.  The solution is to remove entries
from the map when we delete the instructions.

llvm-svn: 28041
2006-05-01 22:03:24 +00:00
Chris Lattner ab7dbe0cc9 When promoting a load to a reg-reg copy, where the load was a previous
instruction folded with spill code, make sure the remove the load from
the virt reg folded map.

llvm-svn: 28040
2006-05-01 21:17:10 +00:00
Chris Lattner 4dee67c2cd Remove previous patch, which wasn't quite right.
llvm-svn: 28039
2006-05-01 21:16:03 +00:00
Chris Lattner 85e9909755 Put PHI/INLINEASM into the correct namespace.
llvm-svn: 28037
2006-05-01 17:00:49 +00:00
Evan Cheng 0d084fb9ca Dis-favor stores more
llvm-svn: 28035
2006-05-01 09:20:44 +00:00
Evan Cheng 24e795496d Bottom up register-pressure reduction scheduler now pushes store operations
up the schedule. This helps code that looks like this:

loads ...
computations (first set) ...
stores (first set) ...
loads
computations (seccond set) ...
stores (seccond set) ...

Without this change, the stores and computations are more likely to
interleave:

loads ...
loads ...
computations (first set) ...
computations (second set) ...
computations (first set) ...
stores (first set) ...
computations (second set) ...
stores (stores set) ...

This can increase the number of spills if we are unlucky.

llvm-svn: 28033
2006-05-01 09:14:40 +00:00
Evan Cheng 10ff7b27ce Didn't mean ScheduleDAGList.cpp to make the last checkin.
llvm-svn: 28030
2006-05-01 08:56:34 +00:00
Evan Cheng a656242690 Remove temp. option -spiller-check-liveout, it didn't cause any failure nor performance regressions.
llvm-svn: 28029
2006-05-01 08:54:57 +00:00
Chris Lattner 563f0417d2 Remove %'s from register names when in intel mode.
llvm-svn: 28027
2006-05-01 05:53:50 +00:00
Chris Lattner 25f55ae74a Format #APP lines a bit nicer
llvm-svn: 28026
2006-05-01 04:11:03 +00:00
Evan Cheng f71f0f2e0b Local spiller kills a store if the folded restore is turned into a copy.
But this is incorrect if the spilled value live range extends beyond the
current BB.
It is currently controlled by a temporary option -spiller-check-liveout.

llvm-svn: 28024
2006-04-30 08:41:47 +00:00
Jeff Cohen 71c2e0f262 Mingw32 patches supplied by Anton Korobeynikov.
llvm-svn: 28023
2006-04-29 18:41:44 +00:00
Chris Lattner 2b48a94413 Remove a bogus transformation. This fixes SingleSource/UnitTests/2006-01-23-InitializedBitField.c
with some changes I have to the new CFE.

llvm-svn: 28022
2006-04-28 23:33:20 +00:00
Evan Cheng d369603df9 I can't spell: Register, not Regsiter.
llvm-svn: 28021
2006-04-28 23:19:39 +00:00
Evan Cheng b244b80172 Implemented x86 inline asm b, h, w, k modifiers.
llvm-svn: 28020
2006-04-28 23:11:40 +00:00
Chris Lattner 655d08fda8 Fix InstCombine/2006-04-28-ShiftShiftLongLong.ll
llvm-svn: 28019
2006-04-28 22:21:41 +00:00
Chris Lattner 84b49d51be Fix CodeGen/Generic/2006-04-28-Sign-extend-bool.ll
llvm-svn: 28017
2006-04-28 21:56:10 +00:00
Evan Cheng 88decded82 Initial caller side support (for CCC only, not FastCC) of 128-bit vector
passing by value.

llvm-svn: 28015
2006-04-28 21:29:37 +00:00
Evan Cheng 68a44dc445 Bare-bone X86 inline asm printer support.
llvm-svn: 28014
2006-04-28 21:19:05 +00:00
Evan Cheng c5e8ce8b8c Remove the temporary option: -no-isel-fold-inflight
llvm-svn: 28012
2006-04-28 18:54:11 +00:00
Evan Cheng 3cd4362ade Implement four-wide shuffle with 2 shufps if no more than two elements come
from each vector. e.g.
        shuffle(G1, G2, 7, 1, 5, 2)
==>
        movaps _G2, %xmm0
        shufps $151, _G1, %xmm0
        shufps $216, %xmm0, %xmm0

llvm-svn: 28011
2006-04-28 07:03:38 +00:00
Chris Lattner 1b7a51520c Fix PR743: emit -help output of a tool to cout, not cerr.
llvm-svn: 28010
2006-04-28 05:36:25 +00:00
Evan Cheng d43c5c6046 TargetLowering::LowerArguments should return a VBIT_CONVERT of
FORMAL_ARGUMENTS SDOperand in the return result vector.

llvm-svn: 28009
2006-04-28 05:25:15 +00:00
Chris Lattner 79c50d96c9 Mapping of physregs can make it so that the designated and input physregs are
the same.  In this case, don't emit a noop copy.

llvm-svn: 28008
2006-04-28 04:43:18 +00:00
Chris Lattner e63d808b6e Fix Transforms/Reassociate/2006-04-27-ReassociateVector.ll
llvm-svn: 28007
2006-04-28 04:14:49 +00:00
Evan Cheng f0157cb0bc Use movaps instead of movapd for spill / restore.
llvm-svn: 28005
2006-04-28 02:23:35 +00:00
Evan Cheng 51ab4498e7 Added a temporary option -no-isel-fold-inflight to control whether a "inflight"
node can be folded.

llvm-svn: 28003
2006-04-28 02:09:19 +00:00
Chris Lattner 84e95d00b5 When we have a two-address instruction where the input cannot be clobbered
and is already available, instead of falling back to emitting a load, fall
back to emitting a reg-reg copy.  This generates significantly better code
for some SSE testcases, as SSE has lots of two-address instructions and
none of them are read/modify/write.  As one example, this change does:

        pshufd %XMM5, XMMWORD PTR [%ESP + 84], 255
        xorps %XMM2, %XMM5
        cmpltps %XMM1, %XMM0
-       movaps XMMWORD PTR [%ESP + 52], %XMM0
-       movapd %XMM6, XMMWORD PTR [%ESP + 52]
+       movaps %XMM6, %XMM0
        cmpltps %XMM6, XMMWORD PTR [%ESP + 68]
        movapd XMMWORD PTR [%ESP + 52], %XMM6
        movaps %XMM6, %XMM0
        cmpltps %XMM6, XMMWORD PTR [%ESP + 36]
        cmpltps %XMM3, %XMM0
-       movaps XMMWORD PTR [%ESP + 20], %XMM0
-       movapd %XMM7, XMMWORD PTR [%ESP + 20]
+       movaps %XMM7, %XMM0
        cmpltps %XMM7, XMMWORD PTR [%ESP + 4]
        movapd XMMWORD PTR [%ESP + 20], %XMM7
        cmpltps %XMM4, %XMM0

... which is far better than a store followed by a load!

llvm-svn: 28001
2006-04-28 01:46:50 +00:00
Chris Lattner a4c2c4a276 Add a note
llvm-svn: 27999
2006-04-28 00:04:05 +00:00
Chris Lattner b209131b56 Add a note
llvm-svn: 27998
2006-04-27 21:40:57 +00:00
Chris Lattner b6cb64b7e6 Add support for inserting undef into a vector. This implements
Transforms/InstCombine/vec_insert_to_shuffle.ll

llvm-svn: 27997
2006-04-27 21:14:21 +00:00
Evan Cheng f4f3f0d25f Make x86 isel lowering produce tailcall nodes. They are match to normal calls
for now.

Patch contributed by Alexander Friedman.

llvm-svn: 27994
2006-04-27 08:40:39 +00:00
Evan Cheng ec04a37edd A couple of new entries.
llvm-svn: 27993
2006-04-27 08:31:33 +00:00
Evan Cheng 89001ad729 Support for passing 128-bit vector arguments via XMM registers.
llvm-svn: 27992
2006-04-27 08:31:10 +00:00
Evan Cheng 3784f3c57c Insert a VBIT_CONVERT between a FORMAL_ARGUMENT node and its vector uses
(VAND, VADD, etc.). Legalizer will assert otherwise.

llvm-svn: 27991
2006-04-27 08:29:42 +00:00
Evan Cheng a0374e1bed Oops
llvm-svn: 27989
2006-04-27 05:44:50 +00:00
Evan Cheng 24eb3f4765 Bug fix: not updating NumIntRegs.
llvm-svn: 27988
2006-04-27 05:35:28 +00:00
Chris Lattner 393d96a56c Fix Regression/CodeGen/Generic/2006-04-26-SetCCAnd.ll and
PR748.

llvm-svn: 27987
2006-04-27 05:01:07 +00:00
Evan Cheng 48940d16b2 - Clean up formal argument lowering code. Prepare for vector pass by value work.
- Fixed vararg support.

llvm-svn: 27985
2006-04-27 01:32:22 +00:00
Chris Lattner f98b4aa2e7 Fix some nondeterminstic behavior in the mem2reg pass that (in addition to
nondeterminism being bad) could cause some trivial missed optimizations (dead
phi nodes being left around for later passes to clean up).

With this, llvm-gcc4 now bootstraps and correctly compares.  I don't know
why I never tried to do it before... :)

llvm-svn: 27984
2006-04-27 01:14:43 +00:00
Chris Lattner e8cbdbf314 Implement Transforms/IndVarsSimplify/complex-scev.ll, a case where we didn't
recognize some simple affine IV's.

llvm-svn: 27982
2006-04-26 18:34:07 +00:00
Evan Cheng 1c39903297 Fix fastcc failures.
llvm-svn: 27980
2006-04-26 18:21:31 +00:00
Evan Cheng e0bcfbe811 Switching over FORMAL_ARGUMENTS mechanism to lower call arguments.
llvm-svn: 27975
2006-04-26 01:20:17 +00:00
Evan Cheng 9618df1190 Don't forget return void.
llvm-svn: 27974
2006-04-25 23:03:35 +00:00
Nate Begeman 4530327c04 Keep the stack from on darwin 16-byte aligned. This fixes many JIT
failres.

llvm-svn: 27973
2006-04-25 20:54:26 +00:00
Evan Cheng a9467aab0a Separate LowerOperation() into multiple functions, one per opcode.
llvm-svn: 27972
2006-04-25 20:13:52 +00:00
Andrew Lenharth 3c775bcd86 slightly more useful error message
llvm-svn: 27971
2006-04-25 19:33:41 +00:00
Andrew Lenharth f5a713d273 better c99 struct handling
llvm-svn: 27970
2006-04-25 19:33:23 +00:00
Evan Cheng 4cc3e0b05f Fix a typo.
llvm-svn: 27968
2006-04-25 17:48:41 +00:00
Nate Begeman 48ccd3f826 Fix a warning
llvm-svn: 27967
2006-04-25 17:46:32 +00:00
Nate Begeman 318bb96f9e No functionality changes, but cleaner code with correct comments.
llvm-svn: 27966
2006-04-25 04:45:59 +00:00
Evan Cheng fb46b2bf5d Explicitly specify result type for def : Pat<> patterns (if it produces a vector
result). Otherwise tblgen will pick the default (v16i8 for 128-bit vector).

llvm-svn: 27965
2006-04-25 00:50:01 +00:00
Evan Cheng 25b09295f8 Added X86 SSE2 intrinsics which can be represented as vector_shuffles. This is
a temporary workaround for the 2-wide vector_shuffle problem (i.e. its mask
would have type v2i32 which is not legal).

llvm-svn: 27964
2006-04-24 23:34:56 +00:00
Evan Cheng d03631ee76 Add a new entry.
llvm-svn: 27963
2006-04-24 23:30:10 +00:00
Evan Cheng 5c2bfb069e Special case handling two wide build_vector(0, x).
llvm-svn: 27961
2006-04-24 22:58:52 +00:00
Evan Cheng 63bd4d3730 Some missing movlps, movhps, movlpd, and movhpd patterns.
llvm-svn: 27960
2006-04-24 21:58:20 +00:00
Evan Cheng b0461080e4 A little bit more build_vector enhancement for v8i16 cases.
llvm-svn: 27959
2006-04-24 18:01:45 +00:00
Evan Cheng 2f9b0bcbd5 Remove a completed entry.
llvm-svn: 27958
2006-04-24 17:38:16 +00:00
Evan Cheng ab0ee6340c MakeMIInst() should handle jump table index operands.
llvm-svn: 27955
2006-04-24 05:37:35 +00:00
Chris Lattner f110527a29 Add a note
llvm-svn: 27954
2006-04-23 19:47:09 +00:00
Evan Cheng b4f31dd1a8 MOVL shuffle (i.e. movd or movss / movsd from memory) of undef, V2 == V2
llvm-svn: 27953
2006-04-23 06:35:19 +00:00
Nate Begeman 866b4b4d45 Fix the updating of the machine CFG when a PHI node was in a successor of
the jump table's range check block.  This re-enables 100% dense jump tables
by default on PPC & x86

llvm-svn: 27952
2006-04-23 06:26:20 +00:00
Nate Begeman 3e04bb482b Code cleanup associated with jump tables, thanks to Chris for noticing
these.

llvm-svn: 27950
2006-04-22 23:52:35 +00:00
Nate Begeman ecb1dafd3d Turn of jump tables for a bit, there are still some issues to work out with
updating the machine CFG.

llvm-svn: 27949
2006-04-22 23:51:56 +00:00
Nate Begeman 9f0b13c885 Optimized stores to the constant pool, while cool, are unnecessary.
llvm-svn: 27948
2006-04-22 22:31:45 +00:00
Nate Begeman 4ca2ea5b43 JumpTable support! What this represents is working asm and jit support for
x86 and ppc for 100% dense switch statements when relocations are non-PIC.
This support will be extended and enhanced in the coming days to support
PIC, and less dense forms of jump tables.

llvm-svn: 27947
2006-04-22 18:53:45 +00:00
Evan Cheng e728efdfce Don't do all the lowering stuff for 2-wide build_vector's. Also, minor optimization for shuffle of undef.
llvm-svn: 27946
2006-04-22 08:34:05 +00:00
Evan Cheng 16ef94f4e8 Fix a performance regression. Use {p}shuf* when there are only two distinct elements in a build_vector.
llvm-svn: 27945
2006-04-22 06:21:46 +00:00
Chris Lattner c8afdfec52 Teach the JIT how to relocate LI, this fixes the JIT on Prolangs-C/TimberWolfMC
llvm-svn: 27943
2006-04-22 06:17:56 +00:00
Chris Lattner fe36eaebda Fix JIT support for static ctors, which was apparently completely broken!
This allows Prolangs-C++/city and probably a bunch of other stuff to work
well with the new front-end

llvm-svn: 27941
2006-04-22 05:02:46 +00:00
Evan Cheng 14215c36b6 Revamp build_vector lowering to take advantage of movss and movd instructions.
movd always clear the top 96 bits and movss does so when it's loading the
value from memory.
The net result is codegen for 4-wide shuffles is much improved. It is near
optimal if one or more elements is a zero. e.g.

__m128i test(int a, int b) {
  return _mm_set_epi32(0, 0, b, a);
}

compiles to

_test:
	movd 8(%esp), %xmm1
	movd 4(%esp), %xmm0
	punpckldq %xmm1, %xmm0
	ret

compare to gcc:

_test:
	subl	$12, %esp
	movd	20(%esp), %xmm0
	movd	16(%esp), %xmm1
	punpckldq	%xmm0, %xmm1
	movq	%xmm1, %xmm0
	movhps	LC0, %xmm0
	addl	$12, %esp
	ret

or icc:

_test:
        movd      4(%esp), %xmm0                                #5.10
        movd      8(%esp), %xmm3                                #5.10
        xorl      %eax, %eax                                    #5.10
        movd      %eax, %xmm1                                   #5.10
        punpckldq %xmm1, %xmm0                                  #5.10
        movd      %eax, %xmm2                                   #5.10
        punpckldq %xmm2, %xmm3                                  #5.10
        punpckldq %xmm3, %xmm0                                  #5.10
        ret                                                     #5.10

There are still room for improvement, for example the FP variant of the above example:

__m128 test(float a, float b) {
  return _mm_set_ps(0.0, 0.0, b, a);
}

_test:
	movss 8(%esp), %xmm1
	movss 4(%esp), %xmm0
	unpcklps %xmm1, %xmm0
	xorps %xmm1, %xmm1
	movlhps %xmm1, %xmm0
	ret

The xorps and movlhps are unnecessary. This will require post legalizer optimization to handle.

llvm-svn: 27939
2006-04-21 23:03:30 +00:00
Nate Begeman 57a32f0bc1 Fix the comment
llvm-svn: 27938
2006-04-21 22:11:27 +00:00
Nate Begeman 516b393992 Change the PPC JIT to use a Static relocation model
llvm-svn: 27937
2006-04-21 22:04:15 +00:00
Chris Lattner 3e62d4b289 fix thinko
llvm-svn: 27935
2006-04-21 21:05:22 +00:00
Chris Lattner e1f9ab7d53 add some low-prio notes
llvm-svn: 27934
2006-04-21 21:03:21 +00:00
Chris Lattner b21d3bfd1f The BFS scheduler is apparently nondeterminstic (causes many llvmgcc bootstrap
miscompares).  Switch RISC targets to use the list-td scheduler, which isn't.

llvm-svn: 27933
2006-04-21 17:16:16 +00:00
Chris Lattner 28ead23d1c Remove a hack required by V9.
llvm-svn: 27931
2006-04-21 15:33:35 +00:00
Chris Lattner 662e940f73 Fix a couple more memory issues
llvm-svn: 27930
2006-04-21 15:32:26 +00:00
Evan Cheng e8b5180044 Now generating perfect (I think) code for "vector set" with a single non-zero
scalar value.

e.g.
        _mm_set_epi32(0, a, 0, 0);
==>
	movd 4(%esp), %xmm0
	pshufd $69, %xmm0, %xmm0

        _mm_set_epi8(0, 0, 0, 0, 0, a, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0);
==>
	movzbw 4(%esp), %ax
	movzwl %ax, %eax
	pxor %xmm0, %xmm0
	pinsrw $5, %eax, %xmm0

llvm-svn: 27923
2006-04-21 01:05:10 +00:00
Chris Lattner cc47ab3305 Fix a really subtle and obnoxious memory bug that caused issues with an
llvm-gcc4 boostrap.  Whenever a node is deleted by the dag combiner, it
*must* be returned by the visit function, or the dag combiner will not
know that the node has been processed (and will, e.g., send it to the
target dag combine xforms).

llvm-svn: 27922
2006-04-20 23:55:59 +00:00
Chris Lattner dae49df407 Fix Transforms/ScalarRepl/2006-04-20-PromoteCrash.ll
llvm-svn: 27912
2006-04-20 20:48:50 +00:00
Chris Lattner 99d3da9d2c Fix the CodeGen/PowerPC/buildvec_canonicalize.ll regression last night.
llvm-svn: 27908
2006-04-20 19:01:30 +00:00
Chris Lattner d1c3a067ee add a note
llvm-svn: 27907
2006-04-20 18:49:28 +00:00
Chris Lattner 3e5521799c remove some v9 specific code
llvm-svn: 27900
2006-04-20 18:33:11 +00:00
Chris Lattner dcc1f995eb This field no longer exists
llvm-svn: 27899
2006-04-20 18:32:41 +00:00
Chris Lattner 2a875285f7 Remove this obsolete file
llvm-svn: 27895
2006-04-20 18:16:45 +00:00
Chris Lattner a38c3580bd Remove some of the obvious V9-specific cruft
llvm-svn: 27893
2006-04-20 18:08:53 +00:00
Chris Lattner ac61195539 This target is no longer built. The ,v files now live in the reoptimizer.
llvm-svn: 27885
2006-04-20 17:15:44 +00:00
Andrew Lenharth f89e630b2f Make code match cvs commit message :)
llvm-svn: 27881
2006-04-20 15:41:37 +00:00
Andrew Lenharth 61eae29ad6 If we can convert the return pointer type into an integer that IntPtrType
can be converted to losslessly, we can continue the conversion to a direct call.

llvm-svn: 27880
2006-04-20 14:56:47 +00:00
Evan Cheng 60f0b8998e - Added support to turn "vector clear elements", e.g. pand V, <-1, -1, 0, -1>
to a vector shuffle.
- VECTOR_SHUFFLE lowering change in preparation for more efficient codegen
of vector shuffle with zero (or any splat) vector.

llvm-svn: 27875
2006-04-20 08:58:49 +00:00
Evan Cheng a320abc494 Turn a VAND into a VECTOR_SHUFFLE is applicable.
DAG combiner can turn a VAND V, <-1, 0, -1, -1>, i.e. vector clear elements,
into a vector shuffle with a zero vector. It only does so when TLI tells it
the xform is profitable.

llvm-svn: 27874
2006-04-20 08:56:16 +00:00
Chris Lattner 0cd0065c58 Make sure that the new instructions selected have the right type. This fixes
CodeGen/PowerPC/2006-04-19-vmaddfp-crash.ll

llvm-svn: 27868
2006-04-20 05:58:10 +00:00
Chris Lattner bc1b262725 Implement folding of a bunch of binops with undef
llvm-svn: 27863
2006-04-20 05:39:12 +00:00
Evan Cheng 15c264b753 Handle v2i64 BUILD_VECTOR custom lowering correctly. v2i64 is a legal type,
but i64 is not. If possible, change a i64 op to a f64 (e.g. load, constant)
and then cast it back.

llvm-svn: 27849
2006-04-20 00:11:39 +00:00
Evan Cheng 4a1b0d3292 isSplatMask() bug: first element can be an undef.
llvm-svn: 27847
2006-04-19 23:28:59 +00:00
Chris Lattner 73eb58e1a2 Simplify some code
llvm-svn: 27846
2006-04-19 23:17:50 +00:00
Evan Cheng a3caaee503 - Added support to do aribitrary 4 wide shuffle with no more than three
instructions.
- Fixed a commute vector_shuff bug.

llvm-svn: 27845
2006-04-19 22:48:17 +00:00
Evan Cheng 6d5297dac3 Prefer {p}unpack* and mov*dup over {p}shuf* as well.
llvm-svn: 27844
2006-04-19 21:15:24 +00:00
Evan Cheng 52df74000a Renamed AddedCost to AddedComplexity.
llvm-svn: 27843
2006-04-19 20:38:28 +00:00
Evan Cheng b416a25174 - Renamed AddedCost to AddedComplexity.
- Added more movhlps and movlhps patterns.

llvm-svn: 27842
2006-04-19 20:37:34 +00:00
Evan Cheng 7855e4d032 Commute vector_shuffle to match more movlhps, movlp{s|d} cases.
llvm-svn: 27840
2006-04-19 20:35:22 +00:00
Evan Cheng cc7abc6c38 More mov{h|l}p{d|s} patterns.
llvm-svn: 27836
2006-04-19 18:20:17 +00:00
Evan Cheng aeb09ccdd3 - More mov{h|l}ps patterns.
- Increase cost (complexity) of patterns which match mov{h|l}ps ops. These
  are preferred over shufps in most cases.

llvm-svn: 27835
2006-04-19 18:11:52 +00:00
Evan Cheng aa3325e925 Allow "let AddedCost = n in" to increase pattern complexity.
llvm-svn: 27834
2006-04-19 18:07:24 +00:00
Chris Lattner 05bbec5020 add a note
llvm-svn: 27832
2006-04-19 16:22:38 +00:00
Andrew Lenharth 02f9df3b7b Another simple case type merge case to try
llvm-svn: 27831
2006-04-19 15:34:34 +00:00
Andrew Lenharth edf349aba6 deal with memchr
llvm-svn: 27830
2006-04-19 15:34:02 +00:00
Andrew Lenharth 7f2cee3d3e friendlier error message
llvm-svn: 27829
2006-04-19 15:33:35 +00:00
Chris Lattner a922a516b0 add a note
llvm-svn: 27828
2006-04-19 05:55:06 +00:00
Chris Lattner bfab82817a Add a note.
llvm-svn: 27827
2006-04-19 05:53:27 +00:00
Andrew Lenharth 7c8be502e9 stupid stuff
llvm-svn: 27821
2006-04-19 03:45:25 +00:00
Andrew Lenharth 3e642d012a I understand now. Shoot.
llvm-svn: 27819
2006-04-18 22:36:11 +00:00
Evan Cheng 3823aa1d0f - PEXTRW cannot take a memory location as its first source operand.
- PINSRWrmi encoding bug.

llvm-svn: 27818
2006-04-18 21:59:43 +00:00
Evan Cheng 43f4ef4ffb SHUFP{S|D}, PSHUF* encoding bugs. Left out the mask immediate operand.
llvm-svn: 27817
2006-04-18 21:56:36 +00:00
Evan Cheng a179ea631d Name change for clarity sake
llvm-svn: 27816
2006-04-18 21:55:35 +00:00
Evan Cheng 09e36ef710 Encoding bug: CMPPSrmi, CMPPDrmi dropped operand 2 (condtion immediate).
llvm-svn: 27815
2006-04-18 21:31:08 +00:00
Evan Cheng d799d680f4 Name change for clarity sake
llvm-svn: 27814
2006-04-18 21:29:50 +00:00
Evan Cheng 0ee281f37c Left a pattern out
llvm-svn: 27813
2006-04-18 21:29:08 +00:00
Andrew Lenharth f70cb84083 llvm.memc* improvements. helps PA a lot in some specmarks
llvm-svn: 27812
2006-04-18 20:59:52 +00:00
Andrew Lenharth 49e188d7f7 llvm.memc* improvements. helps PA a lot in some specmarks
llvm-svn: 27811
2006-04-18 19:54:11 +00:00
Chris Lattner 34c901b50e These are correctly encoded by the JIT. I checked :)
llvm-svn: 27810
2006-04-18 19:03:38 +00:00
Chris Lattner 197d762232 add a note
llvm-svn: 27809
2006-04-18 18:30:19 +00:00
Chris Lattner 518834c67e Fix a crash on:
void foo2(vector float *A, vector float *B) {
  vector float C = (vector float)vec_cmpeq(*A, *B);
  if (!vec_any_eq(*A, *B))
    *B = (vector float){0,0,0,0};
  *A = C;
}

llvm-svn: 27808
2006-04-18 18:28:22 +00:00
Evan Cheng e2d25a1a50 Fixed an encoding bug: movd from XMM to R32.
llvm-svn: 27807
2006-04-18 18:19:00 +00:00
Chris Lattner 1e174c87c3 pretty print node name
llvm-svn: 27806
2006-04-18 18:05:58 +00:00
Chris Lattner 9754d142a4 Implement an important entry from README_ALTIVEC:
If an altivec predicate compare is used immediately by a branch, don't
use a (serializing) MFCR instruction to read the CR6 register, which requires
a compare to get it back to CR's.  Instead, just branch on CR6 directly. :)

For example, for:
void foo2(vector float *A, vector float *B) {
  if (!vec_any_eq(*A, *B))
    *B = (vector float){0,0,0,0};
}

We now generate:

_foo2:
        mfspr r2, 256
        oris r5, r2, 12288
        mtspr 256, r5
        lvx v2, 0, r4
        lvx v3, 0, r3
        vcmpeqfp. v2, v3, v2
        bne cr6, LBB1_2 ; UnifiedReturnBlock
LBB1_1: ; cond_true
        vxor v2, v2, v2
        stvx v2, 0, r4
        mtspr 256, r2
        blr
LBB1_2: ; UnifiedReturnBlock
        mtspr 256, r2
        blr

instead of:

_foo2:
        mfspr r2, 256
        oris r5, r2, 12288
        mtspr 256, r5
        lvx v2, 0, r4
        lvx v3, 0, r3
        vcmpeqfp. v2, v3, v2
        mfcr r3, 2
        rlwinm r3, r3, 27, 31, 31
        cmpwi cr0, r3, 0
        beq cr0, LBB1_2 ; UnifiedReturnBlock
LBB1_1: ; cond_true
        vxor v2, v2, v2
        stvx v2, 0, r4
        mtspr 256, r2
        blr
LBB1_2: ; UnifiedReturnBlock
        mtspr 256, r2
        blr

This implements CodeGen/PowerPC/vec_br_cmp.ll.

llvm-svn: 27804
2006-04-18 17:59:36 +00:00
Chris Lattner 68c16a201e move some stuff around, clean things up
llvm-svn: 27802
2006-04-18 17:52:36 +00:00
Chris Lattner bfc2c68386 Teach the codegen about instructions used for SSE spill code, allowing it
to optimize cases where it has to spill a lot

llvm-svn: 27801
2006-04-18 16:44:51 +00:00
Chris Lattner 96d50487c9 Use vmladduhm to do v8i16 multiplies which is faster and simpler than doing
even/odd halves.  Thanks to Nate telling me what's what.

llvm-svn: 27793
2006-04-18 04:28:57 +00:00
Chris Lattner d6d82aa889 Implement v16i8 multiply with this code:
vmuloub v5, v3, v2
        vmuleub v2, v3, v2
        vperm v2, v2, v5, v4

This implements CodeGen/PowerPC/vec_mul.ll.  With this, v16i8 multiplies are
6.79x faster than before.

Overall, UnitTests/Vector/multiplies.c is now 2.45x faster with LLVM than with
GCC.

Remove the 'integer multiplies' todo from the README file.

llvm-svn: 27792
2006-04-18 03:57:35 +00:00
Evan Cheng 4d36a36900 Correct comments
llvm-svn: 27790
2006-04-18 03:45:01 +00:00
Chris Lattner 7e439874cb Lower v8i16 multiply into this code:
li r5, lo16(LCPI1_0)
        lis r6, ha16(LCPI1_0)
        lvx v4, r6, r5
        vmulouh v5, v3, v2
        vmuleuh v2, v3, v2
        vperm v2, v2, v5, v4

where v4 is:
LCPI1_0:                                        ;  <16 x ubyte>
        .byte   2
        .byte   3
        .byte   18
        .byte   19
        .byte   6
        .byte   7
        .byte   22
        .byte   23
        .byte   10
        .byte   11
        .byte   26
        .byte   27
        .byte   14
        .byte   15
        .byte   30
        .byte   31

This is 5.07x faster on the G5 (measured) than lowering to scalar code +
loads/stores.

llvm-svn: 27789
2006-04-18 03:43:48 +00:00
Chris Lattner a2cae1bb10 Custom lower v4i32 multiplies into a cute sequence, instead of having legalize
scalarize the sequence into 4 mullw's and a bunch of load/store traffic.

This speeds up v4i32 multiplies 4.1x (measured) on a G5.  This implements
PowerPC/vec_mul.ll

llvm-svn: 27788
2006-04-18 03:24:30 +00:00
Evan Cheng 0ef233509b Another entry
llvm-svn: 27786
2006-04-18 01:22:57 +00:00
Evan Cheng e008bd3d27 Another entry.
llvm-svn: 27784
2006-04-18 00:21:01 +00:00
Evan Cheng 5421206c4b Use movss to insert_vector_elt(v, s, 0).
llvm-svn: 27782
2006-04-17 22:45:49 +00:00
Chris Lattner 36dd7c98d1 Turn x86 unaligned load/store intrinsics into aligned load/store instructions
if the pointer is known aligned.

llvm-svn: 27781
2006-04-17 22:26:56 +00:00
Chris Lattner 916ae0775e Fix handling of calls in functions that use vectors. This fixes a crash on
the code in GCC PR26546.

llvm-svn: 27780
2006-04-17 22:10:08 +00:00
Evan Cheng 6e5e205841 Use two pinsrw to insert an element into v4i32 / v4f32 vector.
llvm-svn: 27779
2006-04-17 22:04:06 +00:00
Chris Lattner 63a5cdc423 remove done item
llvm-svn: 27778
2006-04-17 21:52:03 +00:00
Chris Lattner 6bd68ae81e Don't diddle VRSAVE if no registers need to be added/removed from it. This
allows us to codegen functions as:

_test_rol:
        vspltisw v2, -12
        vrlw v2, v2, v2
        blr

instead of:

_test_rol:
        mfvrsave r2, 256
        mr r3, r2
        mtvrsave r3
        vspltisw v2, -12
        vrlw v2, v2, v2
        mtvrsave r2
        blr

Testcase here: CodeGen/PowerPC/vec_vrsave.ll

llvm-svn: 27777
2006-04-17 21:48:13 +00:00
Chris Lattner bec79b4a59 Add a MachineInstr::eraseFromParent convenience method.
llvm-svn: 27775
2006-04-17 21:35:41 +00:00
Evan Cheng 22c06f054b Encoding bug
llvm-svn: 27773
2006-04-17 21:33:57 +00:00
Chris Lattner 72d7c27069 Vectors that are known live-in and live-out are clearly already marked in
the vrsave register for the caller.  This allows us to codegen a function as:

_test_rol:
        mfspr r2, 256
        mr r3, r2
        mtspr 256, r3
        vspltisw v2, -12
        vrlw v2, v2, v2
        mtspr 256, r2
        blr

instead of:

_test_rol:
        mfspr r2, 256
        oris r3, r2, 40960
        mtspr 256, r3
        vspltisw v0, -12
        vrlw v2, v0, v0
        mtspr 256, r2
        blr

llvm-svn: 27772
2006-04-17 21:22:06 +00:00
Chris Lattner 14c4972b6d Prefer to allocate V2-V5 before V0,V1. This lets us generate code like this:
vspltisw v2, -12
        vrlw v2, v2, v2

instead of:

        vspltisw v0, -12
        vrlw v2, v0, v0

when a function is returning a value.

llvm-svn: 27771
2006-04-17 21:19:12 +00:00