Chris Lattner
1b20615173
We can fold promoted and non-promoted loads into divs also!
...
llvm-svn: 19835
2005-01-25 20:35:10 +00:00
Chris Lattner
30607ec66e
Fold promoted loads into binary ops for FP, allowing us to generate m32 forms
...
of FP ops.
llvm-svn: 19834
2005-01-25 20:03:11 +00:00
Chris Lattner
0e1de101a1
Silence a warning.
...
llvm-svn: 19798
2005-01-23 23:20:06 +00:00
Chris Lattner
e70eb9da7d
Speed up folding operations into loads.
...
llvm-svn: 19733
2005-01-21 21:43:02 +00:00
Chris Lattner
e1e844c416
The ever-important vanity pass name :)
...
llvm-svn: 19731
2005-01-21 21:35:14 +00:00
Chris Lattner
c78776d209
Fix a FIXME: realize that argument stores are all independent (don't alias)
...
llvm-svn: 19728
2005-01-21 19:46:38 +00:00
Chris Lattner
2a631fa406
Implement ADD_PARTS/SUB_PARTS so that 64-bit integer add/sub work. This
...
fixes most of the remaining llc-beta failures.
llvm-svn: 19716
2005-01-20 18:53:00 +00:00
Chris Lattner
5b04f33405
Fix a crash compiling 134.perl.
...
llvm-svn: 19711
2005-01-20 16:50:16 +00:00
Chris Lattner
474aac4da9
Fix a problem where were were literally selecting for INCREASED register
...
pressure, not decreases register pressure. Fix problem where we accidentally
swapped the operands of SHLD, which caused fourinarow to fail. This fixes
fourinarow.
llvm-svn: 19697
2005-01-19 17:24:34 +00:00
Chris Lattner
de87d146ab
Implement Regression/CodeGen/X86/rotate.ll: emit rotate instructions (which
...
typically cost 1 cycle) instead of shld/shrd instruction (which are typically
6 or more cycles). This also saves code space.
For example, instead of emitting:
rotr:
mov %EAX, DWORD PTR [%ESP + 4]
mov %CL, BYTE PTR [%ESP + 8]
shrd %EAX, %EAX, %CL
ret
rotli:
mov %EAX, DWORD PTR [%ESP + 4]
shrd %EAX, %EAX, 27
ret
Emit:
rotr32:
mov %CL, BYTE PTR [%ESP + 8]
mov %EAX, DWORD PTR [%ESP + 4]
ror %EAX, %CL
ret
rotli32:
mov %EAX, DWORD PTR [%ESP + 4]
ror %EAX, 27
ret
We also emit byte rotate instructions which do not have a sh[lr]d counterpart
at all.
llvm-svn: 19692
2005-01-19 08:07:05 +00:00
Chris Lattner
29f5819158
Match 16-bit shld/shrd instructions as well, implementing shift-double.llx:test5
...
llvm-svn: 19689
2005-01-19 07:37:26 +00:00
Chris Lattner
41fe201b61
Codegen long >> 2 to this:
...
foo:
mov %EAX, DWORD PTR [%ESP + 4]
mov %EDX, DWORD PTR [%ESP + 8]
shrd %EAX, %EDX, 2
sar %EDX, 2
ret
instead of this:
test1:
mov %ECX, DWORD PTR [%ESP + 4]
shr %ECX, 2
mov %EDX, DWORD PTR [%ESP + 8]
mov %EAX, %EDX
shl %EAX, 30
or %EAX, %ECX
sar %EDX, 2
ret
and long << 2 to this:
foo:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, DWORD PTR [%ESP + 8]
*** mov %EDX, %EAX
shrd %EDX, %ECX, 30
shl %EAX, 2
ret
instead of this:
foo:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, %EAX
shr %ECX, 30
mov %EDX, DWORD PTR [%ESP + 8]
shl %EDX, 2
or %EDX, %ECX
shl %EAX, 2
ret
The extra copy (marked ***) can be eliminated when I teach the code generator
that shrd32rri8 is really commutative.
llvm-svn: 19681
2005-01-19 06:18:43 +00:00
Chris Lattner
d8d306601a
X86 shifts mask the amount.
...
llvm-svn: 19678
2005-01-19 03:36:30 +00:00
Chris Lattner
14947c34cc
Code to handle FP_EXTEND is dead now. X86 doesn't support any data types to
...
FP_EXTEND from!
llvm-svn: 19674
2005-01-18 20:05:56 +00:00
Chris Lattner
c6e928cba5
Remove more dead code.
...
llvm-svn: 19673
2005-01-18 19:50:08 +00:00
Chris Lattner
0616fa6b9b
The selection dag code handles the promotions from F32 to F64 for us, so we
...
don't need to even think about F32 in the X86 code anymore.
llvm-svn: 19672
2005-01-18 19:46:54 +00:00
Chris Lattner
479c7118e4
Fix 124.m88ksim.
...
llvm-svn: 19667
2005-01-18 17:35:28 +00:00
Chris Lattner
ed246ec0d2
Do not emit loads multiple times, potentially in the wrong places.
...
llvm-svn: 19661
2005-01-18 04:18:32 +00:00
Chris Lattner
28a205e01b
Eliminate bad assertions.
...
llvm-svn: 19659
2005-01-18 04:00:54 +00:00
Chris Lattner
78d3028350
* Eliminate the TokenSet and just use the ExprMap for both tokens and values.
...
* Insert some really pedantic assertions that will notice when we emit the
same loads more than one time, exposing bugs. This turns a miscompilation in
bzip2 into a compile-fail. yaay.
llvm-svn: 19658
2005-01-18 03:51:59 +00:00
Chris Lattner
d7f93950aa
Rely on the code in MatchAddress to do this work. Otherwise we fail to
...
match (X+Y)+(Z << 1), because we match the X+Y first, consuming the index
register, then there is no place to put the Z.
llvm-svn: 19652
2005-01-18 02:25:52 +00:00
Chris Lattner
a7acdda064
Fix a problem where probing for addressing modes caused expressions to be
...
emitted too early. In particular, this fixes
Regression/CodeGen/X86/regpressure.ll:regpressure3.
This also improves the 2nd basic block in 164.gzip:flush_block, which went from
.LBBflush_block_1: # loopentry.1.i
movzx %EAX, WORD PTR [dyn_ltree + 20]
movzx %ECX, WORD PTR [dyn_ltree + 16]
mov DWORD PTR [%ESP + 32], %ECX
movzx %ECX, WORD PTR [dyn_ltree + 12]
movzx %EDX, WORD PTR [dyn_ltree + 8]
movzx %EBX, WORD PTR [dyn_ltree + 4]
mov DWORD PTR [%ESP + 36], %EBX
movzx %EBX, WORD PTR [dyn_ltree]
add DWORD PTR [%ESP + 36], %EBX
add %EDX, DWORD PTR [%ESP + 36]
add %ECX, %EDX
add DWORD PTR [%ESP + 32], %ECX
add %EAX, DWORD PTR [%ESP + 32]
movzx %ECX, WORD PTR [dyn_ltree + 24]
add %EAX, %ECX
mov %ECX, 0
mov %EDX, %ECX
to
.LBBflush_block_1: # loopentry.1.i
movzx %EAX, WORD PTR [dyn_ltree]
movzx %ECX, WORD PTR [dyn_ltree + 4]
add %ECX, %EAX
movzx %EAX, WORD PTR [dyn_ltree + 8]
add %EAX, %ECX
movzx %ECX, WORD PTR [dyn_ltree + 12]
add %ECX, %EAX
movzx %EAX, WORD PTR [dyn_ltree + 16]
add %EAX, %ECX
movzx %ECX, WORD PTR [dyn_ltree + 20]
add %ECX, %EAX
movzx %EAX, WORD PTR [dyn_ltree + 24]
add %ECX, %EAX
mov %EAX, 0
mov %EDX, %EAX
... which results in less spilling in the function.
This change alone speeds up 164.gzip from 37.23s to 36.24s on apoc. The
default isel takes 37.31s.
llvm-svn: 19650
2005-01-18 01:06:26 +00:00
Chris Lattner
a5d137f471
Don't bother using max here.
...
llvm-svn: 19647
2005-01-17 23:02:13 +00:00
Chris Lattner
ca318edb94
Do not give token factor nodes outrageous weights
...
llvm-svn: 19645
2005-01-17 22:56:09 +00:00
Chris Lattner
e86c933df7
Two changes:
...
1. Fold [mem] += (1|-1) into inc [mem]/dec [mem] to save some icache space.
2. Do not let token factor nodes prevent forming '[mem] op= val' folds.
llvm-svn: 19643
2005-01-17 22:10:42 +00:00
Chris Lattner
96113fd08f
Refactor load/op/store folding into it's own method, no functionality changes.
...
llvm-svn: 19641
2005-01-17 19:25:26 +00:00
Chris Lattner
9098879472
Fix a major regression last night that prevented us from producing [mem] op= reg
...
operations.
The body of the if is less indented but unmodified in this patch.
llvm-svn: 19638
2005-01-17 17:49:14 +00:00
Chris Lattner
b72ea1b719
Codegen this:
...
int %foo(int %X) {
%T = add int %X, 13
%S = mul int %T, 3
ret int %S
}
as this:
mov %ECX, DWORD PTR [%ESP + 4]
lea %EAX, DWORD PTR [%ECX + 2*%ECX + 39]
ret
instead of this:
mov %ECX, DWORD PTR [%ESP + 4]
mov %EAX, %ECX
add %EAX, 13
imul %EAX, %EAX, 3
ret
llvm-svn: 19633
2005-01-17 06:48:02 +00:00
Chris Lattner
a56d29d517
Fix test/Regression/CodeGen/X86/2005-01-17-CycleInDAG.ll and 132.ijpeg.
...
Do not fold a load into an operation if it will induce a cycle in the DAG.
Repeat after me: dAg.
llvm-svn: 19631
2005-01-17 06:26:58 +00:00
Chris Lattner
3be6cd57c9
Do not fold a load into a comparison that is used by more than one place.
...
The comparison will probably be folded, so this is not ok to do.
This fixed 197.parser.
llvm-svn: 19624
2005-01-17 01:34:14 +00:00
Chris Lattner
0cd6b9ae1e
Do not codegen 'xor bool, true' as 'not reg'. not reg inverts the upper bits
...
of the bytereg. This fixes yacr2, 300.twolf and probably others.
llvm-svn: 19622
2005-01-17 00:23:16 +00:00
Chris Lattner
c1f386c7b8
Set up the shift and setcc types.
...
If we emit a load because we followed a token chain to get to it, try to
fold it into its single user if possible.
llvm-svn: 19620
2005-01-17 00:00:33 +00:00
Chris Lattner
b14a63aa1c
* Adjust to changes in TargetLowering interfaces.
...
* Remove custom promotion for bool and byte select ops. Legalize now
promotes them for us.
* Allow folding ConstantPoolIndexes into EXTLOAD's, useful for float immediates.
* Declare which operations are not supported better.
* Add some hacky code for TRUNCSTORE to pretend that we have truncstore
for i16 types. This is useful for testing promotion code because I can
just remove 16-bit registers all together and verify that programs work.
llvm-svn: 19614
2005-01-16 07:34:08 +00:00
Chris Lattner
e18a4c4c19
Add support for truncstore and *extload.
...
llvm-svn: 19566
2005-01-15 05:22:24 +00:00
Chris Lattner
720a62e8c7
Adjust to CopyFromREg changes.
...
llvm-svn: 19561
2005-01-14 22:37:41 +00:00
Chris Lattner
e727af06c8
Add new ImplicitDef node, rename CopyRegSDNode class to RegSDNode.
...
llvm-svn: 19535
2005-01-13 20:50:02 +00:00
Chris Lattner
15bd19dd76
Codegen factor nodes more intelligently according to perceived register pressure.
...
llvm-svn: 19532
2005-01-13 19:56:00 +00:00
Chris Lattner
c251fb6441
Initial trivial (but stupid) codegen for this node.
...
llvm-svn: 19529
2005-01-13 18:01:36 +00:00
Chris Lattner
3676cd6fe2
Add some really pedantic assertions to the load folding code. Fix a bunch
...
of cases where we accidentally emitted a load folded once and unfolded
elsewhere.
llvm-svn: 19522
2005-01-13 05:53:16 +00:00
Chris Lattner
7b1dae8032
We can only fold a load into an op if there is exactly one use of the value.
...
Checking to see if the load has two uses is not equivalent, as the chain
value may have zero uses.
llvm-svn: 19518
2005-01-12 18:38:26 +00:00
Chris Lattner
1755360db6
Try both ways to fold an add together. This allows us to generate this code
...
imul %EAX, %EAX, 400
add %ECX, %EAX
add %ESI, DWORD PTR [%ECX + 4*%EDX]
inc %EDX
cmp %EDX, 100
instead of this:
imul %EAX, %EAX, 400
add %ECX, %EAX
mov %EAX, %EDX
shl %EAX, 2
add %ECX, %EAX
add %ESI, DWORD PTR [%ECX]
inc %EDX
cmp %EDX, 100
llvm-svn: 19513
2005-01-12 18:08:53 +00:00
Chris Lattner
42e7e98908
Fix a major miscompilation where we were overwriting the scale reg.
...
llvm-svn: 19511
2005-01-12 07:33:20 +00:00
Chris Lattner
db4b67c81e
Do not use the type of the RHS constant to determine the type of the operation.
...
This fails for shifts because the constant is always 8 bits.
llvm-svn: 19508
2005-01-12 05:22:07 +00:00
Jeff Cohen
407aa0198c
Fix C++ more compilatiom errors
...
llvm-svn: 19504
2005-01-12 04:29:05 +00:00
Chris Lattner
efe90209ef
Fix a compile error with VC++, which things that static const arrays need
...
to be dynamically initialized. :(
llvm-svn: 19503
2005-01-12 04:23:22 +00:00
Chris Lattner
6fba62d6ec
Fix a bug that caused us to crash on povray. We weren't emitting an FP_REG_KILL into a block that had a successor with a FP PHI node.
...
llvm-svn: 19502
2005-01-12 04:21:28 +00:00
Chris Lattner
f8f79c4192
Fix a crash compiling povray on UINT_TO_FP from i16.
...
llvm-svn: 19499
2005-01-12 04:00:00 +00:00
Chris Lattner
3278ce8871
There are no [mem] op= reg instructions for FP, so remove their entries.
...
llvm-svn: 19496
2005-01-12 03:16:09 +00:00
Chris Lattner
e49a335797
Fix a bug where we didn't insert FP_REG_KILL instructions into MBB's that
...
contain FP PHI nodes but no other FP defining instructions. This fixes
183.equake
llvm-svn: 19495
2005-01-12 02:57:10 +00:00
Chris Lattner
b7fe57a0f1
Fold TRUNCATE (LOAD P) into a smaller load from P.
...
llvm-svn: 19494
2005-01-12 02:19:06 +00:00