Chris Lattner
ba3ba8fa1f
some peepholes that should match horizontal add/sub operations.
...
llvm-svn: 163103
2012-09-03 02:58:21 +00:00
Benjamin Kramer
57003a6768
Add a note for -ffast-math optimization of vector norm.
...
llvm-svn: 153031
2012-03-19 00:43:34 +00:00
Benjamin Kramer
863683c590
This is now implemented.
...
llvm-svn: 146258
2011-12-09 15:45:57 +00:00
Lang Hames
de7ab801cc
Add a natural stack alignment field to TargetData, and prevent InstCombine from
...
promoting allocas to preferred alignments that exceed the natural
alignment. This avoids some potentially expensive dynamic stack realignments.
The natural stack alignment is set in target data strings via the "S<size>"
option. Size is in bits and must be a multiple of 8. The natural stack alignment
defaults to "unspecified" (represented by a zero value), and the "unspecified"
value does not prevent any alignment promotions. Target maintainers that care
about avoiding promotions should explicitly add the "S<size>" option to their
target data strings.
llvm-svn: 141599
2011-10-10 23:42:08 +00:00
Benjamin Kramer
69affe6a94
Add a note about SSE4.1 roundss/roundsd.
...
llvm-svn: 125438
2011-02-12 17:58:16 +00:00
Chris Lattner
5cac0f71ca
update this.
...
llvm-svn: 113116
2010-09-05 20:22:09 +00:00
Chris Lattner
aecf47a5cb
we should pattern match the SSE complex arithmetic ops.
...
llvm-svn: 112109
2010-08-25 23:31:42 +00:00
Chris Lattner
a42202e0e4
random improvement for variable shift codegen.
...
llvm-svn: 111813
2010-08-23 17:30:29 +00:00
Jakob Stoklund Olesen
98ee37d878
Remove obsolete README_SSE note.
...
We are generating movaps for all XMM register copies, including scalar
floating point values. This is known to be at least as good as movss and movsd
for all known architectures up to and including Nehalem because it avoids a
partial register stall.
The SSEDomainFix pass will switch movaps to movdqa when appropriate (i.e., when
operands come from the integer unit). We don't now that switching movaps to
movapd has any benefit.
The same applies to andps -> pand.
llvm-svn: 108096
2010-07-11 17:13:42 +00:00
Chris Lattner
7b909ac785
some notes about suboptimal insertps's
...
llvm-svn: 107613
2010-07-05 05:48:41 +00:00
Eli Friedman
ceb13f2af3
Remove some already-fixed README entries.
...
llvm-svn: 105377
2010-06-03 01:47:31 +00:00
Eli Friedman
a59b7a72b9
Remove README entry which no longer compiles to something sane.
...
llvm-svn: 105376
2010-06-03 01:16:51 +00:00
Dan Gohman
6f34abd092
Floating-point add, sub, and mul are now spelled fadd, fsub, and fmul,
...
respectively.
llvm-svn: 97531
2010-03-02 01:11:08 +00:00
Dan Gohman
4a618827de
Fix "the the" and similar typos.
...
llvm-svn: 95781
2010-02-10 16:03:48 +00:00
Chris Lattner
cf11e602a2
add a note from PR6194
...
llvm-svn: 95649
2010-02-09 05:45:29 +00:00
Chris Lattner
fb5670fc16
move the PR6214 microoptzn to this file.
...
llvm-svn: 95299
2010-02-04 07:32:01 +00:00
Chris Lattner
3eb76c23dd
this is an SSE-specific issue.
...
llvm-svn: 93373
2010-01-13 23:29:11 +00:00
Chris Lattner
e84a7911c4
Bill implemented this.
...
llvm-svn: 63752
2009-02-04 19:09:07 +00:00
Chris Lattner
553fd7e1eb
add a note, this is why we're faster at SciMark-MonteCarlo with
...
SSE disabled.
llvm-svn: 63751
2009-02-04 19:08:01 +00:00
Evan Cheng
f31f288863
The memory alignment requirement on some of the mov{h|l}p{d|s} patterns are 16-byte. That is overly strict. These instructions read / write f64 memory locations without alignment requirement.
...
llvm-svn: 63195
2009-01-28 08:35:02 +00:00
Chris Lattner
9a8eb0d534
add a note
...
llvm-svn: 56391
2008-09-20 19:17:53 +00:00
Chris Lattner
f076d5eea8
add a note
...
llvm-svn: 54964
2008-08-19 00:41:02 +00:00
Evan Cheng
3fc2372d3a
- Fix a x86 vector isel bug: illegal transformation of a vector_shuffle into a
...
shift.
- Add a readme entry for a missing vector_shuffle optimization that results in
awful codegen.
llvm-svn: 52740
2008-06-25 20:52:59 +00:00
Evan Cheng
8647b875cc
This is done.
...
llvm-svn: 51526
2008-05-24 00:10:13 +00:00
Evan Cheng
04d24edcbb
Use movlps / movhps to modify low / high half of 16-byet memory location.
...
llvm-svn: 51501
2008-05-23 21:23:16 +00:00
Dan Gohman
66eea1b9b3
Elaborate on the entry on integer vector multiplication by constants.
...
llvm-svn: 51491
2008-05-23 18:05:39 +00:00
Evan Cheng
d25cb8e0d2
New entry.
...
llvm-svn: 51487
2008-05-23 17:28:11 +00:00
Chris Lattner
3546c2b4e4
we compile multiply-by-constant into horrible code. Doesn't sse4 have some
...
instruction for doing this?
llvm-svn: 51473
2008-05-23 04:29:53 +00:00
Chris Lattner
03ce206143
add a note
...
llvm-svn: 51062
2008-05-13 19:56:20 +00:00
Chris Lattner
d17f58ae6e
add a note
...
llvm-svn: 51060
2008-05-13 18:48:54 +00:00
Evan Cheng
1120279ae6
Instead of a vector load, shuffle and then extract an element. Load the element from address with an offset.
...
pshufd $1, (%rdi), %xmm0
movd %xmm0, %eax
=>
movl 4(%rdi), %eax
llvm-svn: 51026
2008-05-13 08:35:03 +00:00
Evan Cheng
3f40c69083
On x86, it's safe to treat i32 load anyext as a normal i32 load. Ditto for i8 anyext load to i16.
...
llvm-svn: 51019
2008-05-13 00:54:02 +00:00
Evan Cheng
b980f6fb3d
Xform bitconvert(build_pair(load a, load b)) to a single load if the load locations are at the right offset from each other.
...
llvm-svn: 51008
2008-05-12 23:04:07 +00:00
Anton Korobeynikov
a38e72d247
Add note
...
llvm-svn: 50959
2008-05-11 14:33:15 +00:00
Chris Lattner
aeb23a8a34
add a note, this is actually not too bad to implement.
...
llvm-svn: 49466
2008-04-10 05:54:50 +00:00
Chris Lattner
c692188075
move the x86-32 part of PR2108 here.
...
llvm-svn: 49465
2008-04-10 05:37:47 +00:00
Chris Lattner
b6387c8a74
Finish implementing a readme entry: when inserting an i64 variable
...
into a vector of zeros or undef, and when the top part is obviously
zero, we can just use movd + shuffle. This allows us to compile
vec_set-B.ll into:
_test3:
movl $1234567, %eax
andl 4(%esp), %eax
movd %eax, %xmm0
ret
instead of:
_test3:
subl $28, %esp
movl $1234567, %eax
andl 32(%esp), %eax
movl %eax, (%esp)
movl $0, 4(%esp)
movq (%esp), %xmm0
addl $28, %esp
ret
llvm-svn: 48090
2008-03-09 05:42:06 +00:00
Chris Lattner
93930dc28c
add a note
...
llvm-svn: 48064
2008-03-09 01:08:22 +00:00
Chris Lattner
eef374c197
Implement a readme entry, compiling
...
#include <xmmintrin.h>
__m128i doload64(short x) {return _mm_set_epi16(0,0,0,0,0,0,0,1);}
into:
movl $1, %eax
movd %eax, %xmm0
ret
instead of a constant pool load.
llvm-svn: 48063
2008-03-09 01:05:04 +00:00
Chris Lattner
35adf46967
This one looks easy, add a note.
...
llvm-svn: 48055
2008-03-08 22:32:39 +00:00
Chris Lattner
a76e23a935
move these to the appropriate file
...
llvm-svn: 48054
2008-03-08 22:28:45 +00:00
Chris Lattner
7c08a01698
evan implemented this.
...
llvm-svn: 47948
2008-03-05 17:11:51 +00:00
Chris Lattner
2acd0c25f6
add a note
...
llvm-svn: 47939
2008-03-05 07:22:39 +00:00
Chris Lattner
a70df9e2ee
Evan implemented these.
...
llvm-svn: 47828
2008-03-02 18:05:14 +00:00
Chris Lattner
eb63b09206
upgrade some entries, remove stuff that is done.
...
llvm-svn: 47109
2008-02-14 06:19:02 +00:00
Nate Begeman
eea32990a9
readme updates
...
llvm-svn: 47051
2008-02-13 07:06:12 +00:00
Nate Begeman
2d77e8e446
Enable SSE4 codegen and pattern matching.
...
Add some notes to the README.
llvm-svn: 46949
2008-02-11 04:19:36 +00:00
Chris Lattner
2e4719ec55
add a note
...
llvm-svn: 46413
2008-01-27 07:31:41 +00:00
Chris Lattner
2dd23b9f32
Add some notes.
...
llvm-svn: 46405
2008-01-26 20:12:07 +00:00
Chris Lattner
d2b8a36f0e
One readme entry is done, one is really easy (Evan, want to investigate
...
eliminating the llvm.x86.sse2.loadl.pd intrinsic?), one shuffle optzn
may be done (if shufps is better than pinsw, Evan, please review), and
we already know about LICM of simple instructions.
llvm-svn: 45407
2007-12-29 19:31:47 +00:00