Bruno Cardoso Lopes
02a05a6a89
Move insertps mask decoding to header file
...
llvm-svn: 112896
2010-09-02 22:43:39 +00:00
Anton Korobeynikov
a689c5b2c0
Revert win64 changes. They seem to be incomplete
...
llvm-svn: 112885
2010-09-02 22:31:32 +00:00
Anton Korobeynikov
56291f7e53
Properly allocate win64 shadow reg area.
...
Patch by Jan Sjodin!
llvm-svn: 112875
2010-09-02 22:16:28 +00:00
Bruno Cardoso Lopes
814a69c330
Move decoding of insertps back to avoid unused warnings in x86 isel lowering, and fix movlhps/movhlps to decode 4 elements shuffles
...
llvm-svn: 112869
2010-09-02 21:51:11 +00:00
Dan Gohman
3c9b5f394b
Don't narrow the load and store in a load+twiddle+store sequence unless
...
there are clearly no stores between the load and the store. This fixes
this miscompile reported as PR7833.
This breaks the test/CodeGen/X86/narrow_op-2.ll optimization, which is
safe, but awkward to prove safe. Move it to X86's README.txt.
llvm-svn: 112861
2010-09-02 21:18:42 +00:00
Bruno Cardoso Lopes
c79f50170a
Move x86 specific shuffle mask decoding to its own header, it's also going to be used elsewhere. Also trim trailing whitespaces
...
llvm-svn: 112846
2010-09-02 18:40:13 +00:00
Bruno Cardoso Lopes
489613f1e5
Replace unpckl_undef and unpckh_undef matching with target specific opcodes
...
llvm-svn: 112806
2010-09-02 05:23:12 +00:00
Bruno Cardoso Lopes
e4e4be3885
Move condition out to prepare for more matching
...
llvm-svn: 112805
2010-09-02 04:20:26 +00:00
Bruno Cardoso Lopes
bf7fd146c7
Remove checking for isUNPCKL_v_undef_Mask, the specific node is already emitted for it
...
llvm-svn: 112804
2010-09-02 03:57:58 +00:00
Bruno Cardoso Lopes
6a7f634487
become more strict about when it's safe to use X86ISD::MOVLPS
...
llvm-svn: 112799
2010-09-02 02:35:51 +00:00
Bruno Cardoso Lopes
04c25c15c7
Revert r112689, avoid those kind of checks cause they mess up with mmx
...
llvm-svn: 112760
2010-09-01 22:59:03 +00:00
Bruno Cardoso Lopes
fea81b4831
Using target specific nodes for shuffle nodes makes the mask
...
check more strict, breaking some cases not checked in the
testsuite, but also exposes some foldings not done before,
as this example:
movaps (%rdi), %xmm0
movaps (%rax), %xmm1
movaps %xmm0, %xmm2
movss %xmm1, %xmm2
shufps $36, %xmm2, %xmm0
now is generated as:
movaps (%rdi), %xmm0
movaps %xmm0, %xmm1
movlps (%rax), %xmm1
shufps $36, %xmm1, %xmm0
llvm-svn: 112753
2010-09-01 22:33:20 +00:00
Bruno Cardoso Lopes
b3825216ce
Use movlps, movlpd, movss and movsd specific nodes instead of pattern matching with movlp pattern fragment
...
llvm-svn: 112694
2010-09-01 05:08:25 +00:00
Bruno Cardoso Lopes
6aaebe877b
minor change, simplify some logic
...
llvm-svn: 112689
2010-09-01 00:57:08 +00:00
Bruno Cardoso Lopes
2b025707a2
Move some functions around so they can be used for some other to come function
...
llvm-svn: 112687
2010-09-01 00:51:36 +00:00
Bruno Cardoso Lopes
4b56d87290
Use x86 specific MOVSLDUP node, add more patterns to match it and remove useless load nodes
...
llvm-svn: 112661
2010-08-31 22:35:05 +00:00
Bruno Cardoso Lopes
61996ef835
Use x86 specific MOVSHDUP node and add more patterns to match it
...
llvm-svn: 112657
2010-08-31 22:22:11 +00:00
Jakob Stoklund Olesen
33e9fce2d6
Make %EFLAGS unallocatable.
...
No CCR virtual registers should exist, and %EFLAGS is used in ways that can
surprise RegAllocFast.
llvm-svn: 112650
2010-08-31 21:51:07 +00:00
Bruno Cardoso Lopes
5de15ce468
Use MOVHLPS node instead of matching using movhlps and movhlps_undef pattern fragments
...
llvm-svn: 112644
2010-08-31 21:38:49 +00:00
Bruno Cardoso Lopes
03e4c35302
Use MOVLHPS and MOVHLPS x86 nodes whenever possible. Also remove some useless nodes
...
llvm-svn: 112642
2010-08-31 21:15:21 +00:00
Bruno Cardoso Lopes
dfd9dd5d75
Use X86ISD::MOVSS and MOVSD to represent the movl mask pattern, also fix the handling of those nodes when seeking for scalars inside vector shuffles
...
llvm-svn: 112570
2010-08-31 02:26:40 +00:00
Eli Friedman
f75de6eae7
A couple of small missed optimizations.
...
llvm-svn: 112411
2010-08-29 05:07:40 +00:00
Chris Lattner
38ccc8b884
add a bunch more common shuffles to the instprinter.
...
llvm-svn: 112397
2010-08-29 03:08:08 +00:00
Chris Lattner
7a05e6dca2
I have manually decoded the imm field of an insertps one too many
...
times. This patch causes llc and llvm-mc (which both default to
verbose-asm) to print out comments after a few common shuffle
instructions which indicates the shuffle mask, e.g.:
insertps $113, %xmm3, %xmm0 ## xmm0 = zero,xmm0[1,2],xmm3[1]
unpcklps %xmm1, %xmm0 ## xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
pshufd $1, %xmm1, %xmm1 ## xmm1 = xmm1[1,0,0,0]
This is carefully factored to keep the information extraction (of the
shuffle mask) separate from the printing logic. I plan to move the
extraction part out somewhere else at some point for other parts of
the x86 backend that want to introspect on the behavior of shuffles.
llvm-svn: 112387
2010-08-28 20:42:31 +00:00
Chris Lattner
94656b1c8c
fix the buildvector->insertp[sd] logic to not always create a redundant
...
insertp[sd] $0, which is a noop. Before:
_f32: ## @f32
pshufd $1, %xmm1, %xmm2
pshufd $1, %xmm0, %xmm3
addss %xmm2, %xmm3
addss %xmm1, %xmm0
## kill: XMM0<def> XMM0<kill> XMM0<def>
insertps $0, %xmm0, %xmm0
insertps $16, %xmm3, %xmm0
ret
after:
_f32: ## @f32
movdqa %xmm0, %xmm2
addss %xmm1, %xmm2
pshufd $1, %xmm1, %xmm1
pshufd $1, %xmm0, %xmm3
addss %xmm1, %xmm3
movdqa %xmm2, %xmm0
insertps $16, %xmm3, %xmm0
ret
The extra movs are due to a random (poor) scheduling decision.
llvm-svn: 112379
2010-08-28 17:59:08 +00:00
Chris Lattner
bcb6090ad0
fix the BuildVector -> unpcklps logic to not do pointless shuffles
...
when the top elements of a vector are undefined. This happens all
the time for X86-64 ABI stuff because only the low 2 elements of
a 4 element vector are defined. For example, on:
_Complex float f32(_Complex float A, _Complex float B) {
return A+B;
}
We used to produce (with SSE2, SSE4.1+ uses insertps):
_f32: ## @f32
movdqa %xmm0, %xmm2
addss %xmm1, %xmm2
pshufd $16, %xmm2, %xmm2
pshufd $1, %xmm1, %xmm1
pshufd $1, %xmm0, %xmm0
addss %xmm1, %xmm0
pshufd $16, %xmm0, %xmm1
movdqa %xmm2, %xmm0
unpcklps %xmm1, %xmm0
ret
We now produce:
_f32: ## @f32
movdqa %xmm0, %xmm2
addss %xmm1, %xmm2
pshufd $1, %xmm1, %xmm1
pshufd $1, %xmm0, %xmm3
addss %xmm1, %xmm3
movaps %xmm2, %xmm0
unpcklps %xmm3, %xmm0
ret
This implements rdar://8368414
llvm-svn: 112378
2010-08-28 17:28:30 +00:00
Chris Lattner
96db6e66f4
improve comments in the unpcklps generating logic, introduce
...
a new EltStride variable instead of reusing NumElems variable
for a non-obvious purpose. No functionality change.
llvm-svn: 112377
2010-08-28 17:15:43 +00:00
Bruno Cardoso Lopes
a982aa24ef
Clean up the logic of vector shuffles -> vector shifts.
...
Also teach this logic how to handle target specific shuffles if
needed, this is necessary while searching recursively for zeroed
scalar elements in vector shuffle operands.
llvm-svn: 112348
2010-08-28 02:46:39 +00:00
Anton Korobeynikov
c0b36921c2
Properly handle passing of FP stuff to varargs function on Win64:
...
value should be copied to the corresponding shadow reg as well.
Patch by Cameron Esfahani!
llvm-svn: 112262
2010-08-27 14:43:06 +00:00
Daniel Dunbar
1844a71e66
X86: Fix an encoding issue with LOCK_ADD64mr, which could lead to very hard to find miscompiles with the integrated assembler.
...
llvm-svn: 112250
2010-08-27 01:30:14 +00:00
Jim Grosbach
6a77066913
Simplify eliminateFrameIndex() interface back down now that PEI doesn't need
...
to try to re-use scavenged frame index reference registers. rdar://8277890
llvm-svn: 112241
2010-08-26 23:32:16 +00:00
Bruno Cardoso Lopes
e25ba0c7c2
zap the now unused MVT::getIntVectorWithNumElements
...
llvm-svn: 112218
2010-08-26 20:53:12 +00:00
Bob Wilson
a967c42a3d
Fix comment typos.
...
llvm-svn: 112202
2010-08-26 18:08:11 +00:00
Chris Lattner
eb2cc0ce0e
implement SplitVecOp_CONCAT_VECTORS, fixing the included testcase with SSE1.
...
llvm-svn: 112171
2010-08-26 05:51:22 +00:00
Chris Lattner
cc60609cb4
fix sse1 only codegen in x86-64 mode, which is something we
...
apparently try to support.
llvm-svn: 112168
2010-08-26 05:24:29 +00:00
Bruno Cardoso Lopes
184eaea855
Fix PR7748 without using microsoft extensions
...
llvm-svn: 112128
2010-08-26 01:02:53 +00:00
Chris Lattner
aecf47a5cb
we should pattern match the SSE complex arithmetic ops.
...
llvm-svn: 112109
2010-08-25 23:31:42 +00:00
Bruno Cardoso Lopes
d4085f6e91
Revert this for now, PUNPCKLDQ dont operate on v4f32
...
llvm-svn: 112090
2010-08-25 21:26:37 +00:00
Daniel Dunbar
3d148ac089
X86: Fix misencode of RI64mi8. This fixes OpenSSL / x86_64-apple-darwin10 / clang -O3.
...
llvm-svn: 112089
2010-08-25 21:11:02 +00:00
Benjamin Kramer
f1f2133ac0
Remove dead recursive function. Yay for clang -Wunused-function.
...
llvm-svn: 112060
2010-08-25 17:27:58 +00:00
Anton Korobeynikov
b3b53ecac0
Fix nasty mingw32 bug, which e.g. prevented llvm-gcc bootstrap there.
...
Mark _alloca call as clobberring EFLAGS, otherwise some DCE might remove
other flags-clobberring stuff (e.g. cmp instructions) occuring after
_alloca call.
llvm-svn: 112034
2010-08-25 07:50:11 +00:00
Bruno Cardoso Lopes
0770d25758
PUNPCKLDQ should also be used for v4f32
...
llvm-svn: 112020
2010-08-25 02:55:40 +00:00
Bruno Cardoso Lopes
2e45d522c1
teach lowering to get target specific nodes for pshufd, emulating the same isel behavior for now, so we can pass all vector shuffle tests
...
llvm-svn: 112017
2010-08-25 02:35:37 +00:00
Daniel Dunbar
1c8d777c93
MC/X86: Tweak imul recognition, previous hack only applies for the imul form
...
taking immediates.
llvm-svn: 111950
2010-08-24 19:37:56 +00:00
Daniel Dunbar
09392785b4
MC/X86: Add custom hack for recognizing "imul $12, %eax" and friends.
...
llvm-svn: 111947
2010-08-24 19:24:18 +00:00
Daniel Dunbar
94b84a19b9
MC/X86: Warn on scale factors > 1 without index register, instead of erroring,
...
for 'as' compatibility.
llvm-svn: 111945
2010-08-24 19:13:38 +00:00
Dan Gohman
c88fda477a
Fix X86's isLegalAddressingMode to recognize that static addresses
...
need not be RIP-relative in small mode.
llvm-svn: 111917
2010-08-24 15:55:12 +00:00
Bruno Cardoso Lopes
758d7b1f5c
Use pshufhw and pshuflw in more cases and fix getTargetShuffleNode number of arguments
...
llvm-svn: 111890
2010-08-24 01:16:15 +00:00
Bruno Cardoso Lopes
264d90fff7
Start using target speficic nodes for shuffles: pshufhw and pshuflw
...
llvm-svn: 111837
2010-08-23 20:41:02 +00:00
Gabor Greif
21fed6616c
tyops
...
llvm-svn: 111835
2010-08-23 20:30:51 +00:00