llvm-project

Commit Graph

Author	SHA1	Message	Date
Dale Johannesen	367afb5a00	Remove the rest of the nonexistent 64-bit AVX instructions. Bruno, please review. llvm-svn: 113014	2010-09-03 21:23:00 +00:00
Jim Grosbach	03f4be86ba	Re-apply r112883: "For ARM stack frames that utilize variable sized objects and have either large local stack areas or require dynamic stack realignment, allocate a base register via which to access the local frame. This allows efficient access to frame indices not accessible via the FP (either due to being out of range or due to dynamic realignment) or the SP (due to variable sized object allocation). In particular, this greatly improves efficiency of access to spill slots in Thumb functions which contain VLAs." r112986 fixed a latent bug exposed by the above. llvm-svn: 112989	2010-09-03 18:37:12 +00:00
Daniel Dunbar	2ac3386ef3	Revert "For ARM stack frames that utilize variable sized objects and have either", it is breaking oggenc with Clang for ARMv6. This reverts commit 8d6e29cfda270be483abf638850311670829ee65. llvm-svn: 112962	2010-09-03 15:26:42 +00:00
NAKAMURA Takumi	24d039ebe3	test/CodeGen/X86: Add explicit -mtriple=(i686\|x86_64)-linux for Win32 host. llvm-svn: 112947	2010-09-03 03:24:08 +00:00
Bruno Cardoso Lopes	d6634a5b2e	AVX doesn't support mm operations neither its instrinsics. The AVX versions of PALIGN and PABS* should only exist for 128-bit. Remove the unnecessary stuff. llvm-svn: 112944	2010-09-03 02:08:45 +00:00
Bob Wilson	f65c9ef720	Replace NEON vabdl, vaba, and vabal intrinsics with combinations of the vabd intrinsic and add and/or zext operations. In the case of vaba, this also avoids the need for a DAG combine pattern to combine vabd with add. Update tests. Auto-upgrade the old intrinsics. llvm-svn: 112941	2010-09-03 01:35:08 +00:00
Anton Korobeynikov	a5a645559c	Properly emit __chkstk call instead of __alloca on non-mingw windows targets. Patch by Cameron Esfahani! llvm-svn: 112902	2010-09-02 23:03:46 +00:00
Jim Grosbach	7fd9aea67c	For ARM stack frames that utilize variable sized objects and have either large local stack areas or require dynamic stack realignment, allocate a base register via which to access the local frame. This allows efficient access to frame indices not accessible via the FP (either due to being out of range or due to dynamic realignment) or the SP (due to variable sized object allocation). In particular, this greatly improves efficiency of access to spill slots in Thumb functions which contain VLAs. rdar://7352504 rdar://8374540 rdar://8355680 llvm-svn: 112883	2010-09-02 22:29:01 +00:00
Dan Gohman	3c9b5f394b	Don't narrow the load and store in a load+twiddle+store sequence unless there are clearly no stores between the load and the store. This fixes this miscompile reported as PR7833. This breaks the test/CodeGen/X86/narrow_op-2.ll optimization, which is safe, but awkward to prove safe. Move it to X86's README.txt. llvm-svn: 112861	2010-09-02 21:18:42 +00:00
Sandeep Patel	0ca17f7e8a	Fix an unnecessary XFAIL llvm-svn: 112853	2010-09-02 20:19:24 +00:00
Jim Grosbach	66c681a644	Now that register allocation properly considers reserved regs, simplify the ARM register class allocation order functions to take advantage of that. llvm-svn: 112841	2010-09-02 18:14:29 +00:00
Bob Wilson	75a6408f88	Convert VLD1 and VLD2 instructions to use pseudo-instructions until after regalloc. llvm-svn: 112825	2010-09-02 16:00:54 +00:00
NAKAMURA Takumi	a224e5563e	test/loop-strength-reduce4: Add explicit triplet for Win32 host. llvm-svn: 112802	2010-09-02 03:45:58 +00:00
NAKAMURA Takumi	54ce546865	test/twoaddr-coalesce: Do not use @main . Win32 codegen emits implicit invoking __main into, to fail. llvm-svn: 112801	2010-09-02 03:45:51 +00:00
Bob Wilson	38ab35a911	Remove NEON vmull, vmlal, and vmlsl intrinsics, replacing them with multiply, add, and subtract operations with zero-extended or sign-extended vectors. Update tests. Add auto-upgrade support for the old intrinsics. llvm-svn: 112773	2010-09-01 23:50:19 +00:00
Bruno Cardoso Lopes	fea81b4831	Using target specific nodes for shuffle nodes makes the mask check more strict, breaking some cases not checked in the testsuite, but also exposes some foldings not done before, as this example: movaps (%rdi), %xmm0 movaps (%rax), %xmm1 movaps %xmm0, %xmm2 movss %xmm1, %xmm2 shufps $36, %xmm2, %xmm0 now is generated as: movaps (%rdi), %xmm0 movaps %xmm0, %xmm1 movlps (%rax), %xmm1 shufps $36, %xmm1, %xmm0 llvm-svn: 112753	2010-09-01 22:33:20 +00:00
Jakob Stoklund Olesen	4b6fd48bba	Teach RemoveCopyByCommutingDef to check all aliases, not just subregisters. This caused a miscompilation in WebKit where %RAX had conflicting defs when RemoveCopyByCommutingDef was commuting a %EAX use. llvm-svn: 112751	2010-09-01 22:15:35 +00:00
Chris Lattner	39eccb4754	temporarily revert r112664, it is causing a decoding conflict, and the testcases should be merged. llvm-svn: 112711	2010-09-01 16:00:50 +00:00
Dan Gohman	110ed64fbb	Revert 112442 and 112440 until the compile time problems introduced by 112440 are resolved. llvm-svn: 112692	2010-09-01 01:45:53 +00:00
Bill Wendling	6789f8b6ae	We have a chance for an optimization. Consider this code: int x(int t) { if (t & 256) return -26; return 0; } We generate this: tst.w r0, #256 mvn r0, #25 it eq moveq r0, #0 while gcc generates this: ands r0, r0, #256 it ne mvnne r0, #25 bx lr Scandalous really! During ISel time, we can look for this particular pattern. One where we have a "MOVCC" that uses the flag off of a CMPZ that itself is comparing an AND instruction to 0. Something like this (greatly simplified): %r0 = ISD::AND ... ARMISD::CMPZ %r0, 0 @ sets [CPSR] %r0 = ARMISD::MOVCC 0, -26 @ reads [CPSR] All we have to do is convert the "ISD::AND" into an "ARM::ANDS" that sets [CPSR] when it's zero. The zero value will all ready be in the %r0 register and we only need to change it if the AND wasn't zero. Easy! llvm-svn: 112664	2010-08-31 22:41:22 +00:00
Jim Grosbach	ad9b6de3b6	Update test for 112609 llvm-svn: 112610	2010-08-31 17:58:47 +00:00
Anton Korobeynikov	3a1d87a7ba	Fix borken test llvm-svn: 112555	2010-08-30 23:41:49 +00:00
Bob Wilson	4cd8a126c3	Remove NEON vmovn intrinsic, replacing it with vector truncate operations. Auto-upgrade the old intrinsic and update tests. llvm-svn: 112507	2010-08-30 20:02:30 +00:00
Chris Lattner	34bfab0ad5	two changes: 1) nuke ConstDataCoalSection, which is dead. 2) revise my previous patch for rdar://8018335, which was completely wrong. Specifically, it doesn't make sense to mark __TEXT,__const_coal as PURE_INSTRUCTIONS, because it is for readonly data. templates (it turns out) go to const_coal_nt. The real fix for rdar://8018335 was to give ConstTextCoalSection a section kind of ReadOnly instead of Text. llvm-svn: 112496	2010-08-30 18:12:35 +00:00
Duncan Sands	68c30907cc	Correct bogus module triple specifications. llvm-svn: 112469	2010-08-30 10:48:29 +00:00
Dan Gohman	3a08ed7904	Make IVUsers iterative instead of recursive. This has the side effect of reversing the order of most of IVUser's results. llvm-svn: 112442	2010-08-29 16:40:03 +00:00
Dan Gohman	6665550bca	Make this test less dependent on register allocation choices. llvm-svn: 112426	2010-08-29 14:49:42 +00:00
Kalle Raiskila	1e616572d9	Fix lowering of INSERT_VECTOR_ELT in SPU. The IDX was treated as byte index, not element index. llvm-svn: 112422	2010-08-29 12:41:50 +00:00
Bob Wilson	d0c054886c	Remove NEON vaddl, vaddw, vsubl, and vsubw intrinsics. Instead, use llvm IR add/sub operations with one or both operands sign- or zero-extended. Auto-upgrade the old intrinsics. llvm-svn: 112416	2010-08-29 05:57:34 +00:00
Chris Lattner	c2887bc283	merge a bunch of shuffle tests into sse2.ll llvm-svn: 112398	2010-08-29 03:19:04 +00:00
Chris Lattner	b1ff978406	add some nounwind's llvm-svn: 112396	2010-08-29 03:07:47 +00:00
Chris Lattner	94656b1c8c	fix the buildvector->insertp[sd] logic to not always create a redundant insertp[sd] $0, which is a noop. Before: _f32: ## @f32 pshufd $1, %xmm1, %xmm2 pshufd $1, %xmm0, %xmm3 addss %xmm2, %xmm3 addss %xmm1, %xmm0 ## kill: XMM0<def> XMM0<kill> XMM0<def> insertps $0, %xmm0, %xmm0 insertps $16, %xmm3, %xmm0 ret after: _f32: ## @f32 movdqa %xmm0, %xmm2 addss %xmm1, %xmm2 pshufd $1, %xmm1, %xmm1 pshufd $1, %xmm0, %xmm3 addss %xmm1, %xmm3 movdqa %xmm2, %xmm0 insertps $16, %xmm3, %xmm0 ret The extra movs are due to a random (poor) scheduling decision. llvm-svn: 112379	2010-08-28 17:59:08 +00:00
Chris Lattner	bcb6090ad0	fix the BuildVector -> unpcklps logic to not do pointless shuffles when the top elements of a vector are undefined. This happens all the time for X86-64 ABI stuff because only the low 2 elements of a 4 element vector are defined. For example, on: _Complex float f32(_Complex float A, _Complex float B) { return A+B; } We used to produce (with SSE2, SSE4.1+ uses insertps): _f32: ## @f32 movdqa %xmm0, %xmm2 addss %xmm1, %xmm2 pshufd $16, %xmm2, %xmm2 pshufd $1, %xmm1, %xmm1 pshufd $1, %xmm0, %xmm0 addss %xmm1, %xmm0 pshufd $16, %xmm0, %xmm1 movdqa %xmm2, %xmm0 unpcklps %xmm1, %xmm0 ret We now produce: _f32: ## @f32 movdqa %xmm0, %xmm2 addss %xmm1, %xmm2 pshufd $1, %xmm1, %xmm1 pshufd $1, %xmm0, %xmm3 addss %xmm1, %xmm3 movaps %xmm2, %xmm0 unpcklps %xmm3, %xmm0 ret This implements rdar://8368414 llvm-svn: 112378	2010-08-28 17:28:30 +00:00
Dan Gohman	e06905d1f0	Completely disable tail calls when fast-isel is enabled, as fast-isel doesn't currently support dealing with this. llvm-svn: 112341	2010-08-28 00:51:03 +00:00
Bob Wilson	13ce07fa92	Change ARM VFP VLDM/VSTM instructions to use addressing mode #4 , just like all the other LDM/STM instructions. This fixes asm printer crashes when compiling with -O0. I've changed one of the NEON tests (vst3.ll) to run with -O0 to check this in the future. Prior to this change VLDM/VSTM used addressing mode #5, but not really. The offset field was used to hold a count of the number of registers being loaded or stored, and the AM5 opcode field was expanded to specify the IA or DB mode, instead of the standard ADD/SUB specifier. Much of the backend was not aware of these special cases. The crashes occured when rewriting a frameindex caused the AM5 offset field to be changed so that it did not have a valid submode. I don't know exactly what changed to expose this now. Maybe we've never done much with -O0 and NEON. Regardless, there's no longer any reason to keep a count of the VLDM/VSTM registers, so we can use addressing mode #4 and clean things up in a lot of places. llvm-svn: 112322	2010-08-27 23:18:17 +00:00
Chris Lattner	7413e87b6d	get this test passing on linux builders. llvm-svn: 112280	2010-08-27 18:49:08 +00:00
Bob Wilson	edf722add3	Add alignment arguments to all the NEON load/store intrinsics. Update all the tests using those intrinsics and add support for auto-upgrading bitcode files with the old versions of the intrinsics. llvm-svn: 112271	2010-08-27 17:13:24 +00:00
Daniel Dunbar	1844a71e66	X86: Fix an encoding issue with LOCK_ADD64mr, which could lead to very hard to find miscompiles with the integrated assembler. llvm-svn: 112250	2010-08-27 01:30:14 +00:00
Chris Lattner	af23e9a798	Add a hackaround for PR7993 which is causing failures on x86 builders that lack sse2. llvm-svn: 112175	2010-08-26 06:57:07 +00:00
Chris Lattner	66afba7aa4	I think enough general codegen bugs are fixed to allow this to work on random hosts, lets see! llvm-svn: 112172	2010-08-26 05:52:42 +00:00
Chris Lattner	eb2cc0ce0e	implement SplitVecOp_CONCAT_VECTORS, fixing the included testcase with SSE1. llvm-svn: 112171	2010-08-26 05:51:22 +00:00
Chris Lattner	825294b85f	Make sure this forces the x86 targets llvm-svn: 112169	2010-08-26 05:25:05 +00:00
Chris Lattner	cc60609cb4	fix sse1 only codegen in x86-64 mode, which is something we apparently try to support. llvm-svn: 112168	2010-08-26 05:24:29 +00:00
Jim Grosbach	08da771ec3	Enable pre-RA virtual frame base register allocation. rdar://8277890 llvm-svn: 112127	2010-08-26 00:58:06 +00:00
Bob Wilson	4629f423f8	Revert svn 107892 (with changes to work with trunk). It caused a crash if a VLD result was not used (Radar 8355607). It should also fix pr7988, but I haven't verified that yet. llvm-svn: 112118	2010-08-26 00:13:36 +00:00
Chris Lattner	c7fb446a9d	temporarily disable this, which started failing on the llvm-i686-linux builder. I will investigate tonight. llvm-svn: 112113	2010-08-25 23:43:14 +00:00
Chris Lattner	75ff053497	Change handling of illegal vector types to widen when possible instead of expanding: e.g. <2 x float> -> <4 x float> instead of -> 2 floats. This affects two places in the code: handling cross block values and handling function return and arguments. Since vectors are already widened by legalizetypes, this gives us much better code and unblocks x86-64 abi and SPU abi work. For example, this (which is a silly example of a cross-block value): define <4 x float> @test2(<4 x float> %A) nounwind { %B = shufflevector <4 x float> %A, <4 x float> undef, <2 x i32> <i32 0, i32 1> %C = fadd <2 x float> %B, %B br label %BB BB: %D = fadd <2 x float> %C, %C %E = shufflevector <2 x float> %D, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef> ret <4 x float> %E } Now compiles into: _test2: ## @test2 ## BB#0: addps %xmm0, %xmm0 addps %xmm0, %xmm0 ret previously it compiled into: _test2: ## @test2 ## BB#0: addps %xmm0, %xmm0 pshufd $1, %xmm0, %xmm1 ## kill: XMM0<def> XMM0<kill> XMM0<def> insertps $0, %xmm0, %xmm0 insertps $16, %xmm1, %xmm0 addps %xmm0, %xmm0 ret This implements rdar://8230384 llvm-svn: 112101	2010-08-25 22:49:25 +00:00
Daniel Dunbar	a54a1b0edf	ARM/Thumb2: Fix a misselect in getARMCmp, when attempting to adjust a signed comparison that would overflow. - The other under/overflow cases can't actually happen because the immediates which would trigger them are legal (so we don't enter this code), but adjusted the style to make it clear the transform is always valid. llvm-svn: 112053	2010-08-25 16:58:05 +00:00
Eric Christopher	6b1533a1a9	Add another basic test cribbed from the x86 fast-isel tests. llvm-svn: 112036	2010-08-25 07:57:29 +00:00
Eric Christopher	37d547aee6	Run this on thumb and arm. llvm-svn: 112035	2010-08-25 07:53:15 +00:00
Eric Christopher	e58c03698e	Make this testcase actually executed with fast-isel on arm. llvm-svn: 112033	2010-08-25 07:47:00 +00:00
Bruno Cardoso Lopes	0bc919fa35	Convert test to use filecheck and make it more specific llvm-svn: 112016	2010-08-25 01:47:16 +00:00
Dan Gohman	c88fda477a	Fix X86's isLegalAddressingMode to recognize that static addresses need not be RIP-relative in small mode. llvm-svn: 111917	2010-08-24 15:55:12 +00:00
Kalle Raiskila	7e25bc4145	Fix SPU BE to use all the available return registers. llc used to assert on the added testcase. llvm-svn: 111911	2010-08-24 11:50:48 +00:00
Chris Lattner	58bd73a5a7	Add a new llvm.x86.int intrinsic, allowing access to the x86 int and int3 instructions. Patch by Peter Housel! llvm-svn: 111831	2010-08-23 19:39:25 +00:00
Dan Gohman	42ef669d81	Fix x86 fast-isel's cmp+branch folding to avoid folding when the comparison is in a different basic block from the branch. In such cases, the comparison's operands may not have initialized virtual registers available. llvm-svn: 111709	2010-08-21 02:32:36 +00:00
Bob Wilson	be745d8c00	Replace some NEON vmovl intrinsic that I missed earlier. llvm-svn: 111696	2010-08-20 23:22:43 +00:00
Bob Wilson	9a511c07e4	Replace the arm.neon.vmovls and vmovlu intrinsics with vector sign-extend and zero-extend operations. llvm-svn: 111614	2010-08-20 04:54:02 +00:00
Evan Cheng	361b9be7c6	It's possible to sink a def if its local uses are PHI's. llvm-svn: 111537	2010-08-19 18:33:29 +00:00
Dan Gohman	82656fb0e1	When sending stats output to stdout for grepping, don't emit normal output to standard output also. llvm-svn: 111435	2010-08-18 22:22:44 +00:00
Dan Gohman	2470818942	When sending stats output to stdout for grepping, don't emit normal output to standard output also. llvm-svn: 111401	2010-08-18 20:32:46 +00:00
Kalle Raiskila	e60b5161d1	Fix a bug with insertelement on SPU. The previous algorithm in LowerVECTOR_SHUFFLE didn't check all requirements for "monotonic" shuffles. llvm-svn: 111361	2010-08-18 10:20:29 +00:00
Kalle Raiskila	ab49360f59	Remove all traces of v2[i,f]32 on SPU. The "half vectors" are now widened to full size by the legalizer. The only exception is in parameter passing, where half vectors are expanded. This causes changes to some dejagnu tests. llvm-svn: 111360	2010-08-18 10:04:39 +00:00
Kalle Raiskila	f3984d1ef6	Change SPU C calling convention to match that described in "SPU Application Binary Interface Specification, v1.9" by IBM. Specifically: use r3-r74 to pass parameters and the return value. llvm-svn: 111358	2010-08-18 09:50:30 +00:00
Bob Wilson	fb7eaff759	Expand ZERO_EXTEND operations for NEON vector types. Testcase from Nick Lewycky. llvm-svn: 111341	2010-08-18 01:45:52 +00:00
Dan Gohman	ed2b005842	Tweak IVUsers' concept of "interesting" to exclude add recurrences where the step value is an induction variable from an outer loop, to avoid trouble trying to re-expand such expressions. This effectively hides such expressions from indvars and lsr, which prevents them from getting into trouble. llvm-svn: 111317	2010-08-17 22:50:37 +00:00
Evan Cheng	efdc74ea59	Add nounwind. llvm-svn: 111312	2010-08-17 22:35:20 +00:00
Dale Johannesen	16f96445c3	Make fast scheduler handle asm clobbers correctly. PR 7882. Follows suggestion by Amaury Pouly, thanks. llvm-svn: 111306	2010-08-17 22:17:24 +00:00
Bob Wilson	942b10f511	Change ARM PKHTB and PKHBT instructions to use a shift_imm operand to avoid printing "lsl #0". This fixes the remaining parts of pr7792. Make corresponding changes for encoding/decoding these instructions. llvm-svn: 111251	2010-08-17 17:23:19 +00:00
Bob Wilson	411dfad981	Allow more cases of undef shuffle indices and add tests for them. llvm-svn: 111226	2010-08-17 05:54:34 +00:00
Evan Cheng	f259efde47	PHI elimination should not break back edge. It can cause some significant code placement issues. rdar://8263994 good: LBB0_2: mov r2, r0 . . . mov r1, r2 bne LBB0_2 bad: LBB0_2: mov r2, r0 . . . @ BB#3: mov r1, r2 b LBB0_2 llvm-svn: 111221	2010-08-17 01:20:36 +00:00
Bob Wilson	eee4824f74	Add a testcase for svn 111208. llvm-svn: 111212	2010-08-16 23:44:29 +00:00
Bob Wilson	804f6159f1	Generalize a pattern for PKHTB: an SRL of 16-31 bits will guarantee that the high halfword is zero. The shift need not be exactly 16 bits. llvm-svn: 111196	2010-08-16 22:26:55 +00:00
Bob Wilson	3fd1e0dcda	Convert test to FileCheck. llvm-svn: 111195	2010-08-16 22:21:13 +00:00
Bob Wilson	8f553757c4	Convert a test to use FileCheck. llvm-svn: 111153	2010-08-16 17:05:27 +00:00
Benjamin Kramer	cbc55d9dc0	Test expects SSE, give him SSE. llvm-svn: 111115	2010-08-15 23:32:03 +00:00
Benjamin Kramer	4566466b7f	Restore arch on these test, they fail on arm. llvm-svn: 111109	2010-08-15 20:42:56 +00:00
Dale Johannesen	339423c460	Mark as XFAIL on darwin 8. PR 7886. llvm-svn: 111108	2010-08-15 19:40:29 +00:00
Bob Wilson	3c9ed76ba5	Temporarily disable tail calls on ARM to work around some linker problems. llvm-svn: 111050	2010-08-13 22:43:33 +00:00
Dale Johannesen	8d3c89e765	Revert 110491. While not wrong, it was based on a misanalysis and is undesirable. llvm-svn: 111028	2010-08-13 18:43:45 +00:00
Bruno Cardoso Lopes	7f704b31a9	- Teach SSEDomainFix to switch between different levels of AVX instructions. Here we guess that AVX will have domain issues, so just implement them for consistency and in the future we remove if it's unnecessary. - Make foldMemoryOperandImpl aware of 256-bit zero vectors folding and support the 128-bit counterparts of AVX too. - Make sure MOV[AU]PS instructions are only selected when SSE1 is enabled, and duplicate the patterns to match AVX. - Add a testcase for a simple 128-bit zero vector creation. llvm-svn: 110946	2010-08-12 20:20:53 +00:00
Bruno Cardoso Lopes	7306c86886	Begin to support some vector operations for AVX 256-bit intructions. The long term goal here is to be able to match enough of vector_shuffle and build_vector so all avx intrinsics which aren't mapped to their own built-ins but to shufflevector calls can be codegen'd. This is the first (baby) step, support building zeroed vectors. llvm-svn: 110897	2010-08-12 02:06:36 +00:00
Devang Patel	48595bf2bc	This is x86 only test. llvm-svn: 110887	2010-08-12 00:17:38 +00:00
Bruno Cardoso Lopes	1675ee7a02	Add testcases for all AVX 256-bit intrinsics added in the last couple days llvm-svn: 110854	2010-08-11 21:12:09 +00:00
Bruno Cardoso Lopes	29c8818ad9	Reapply r109881 using a more strict command line for llc. llvm-svn: 110833	2010-08-11 17:39:23 +00:00
Jim Grosbach	a5f923b1a1	fix silly typo llvm-svn: 110831	2010-08-11 17:32:46 +00:00
Jim Grosbach	2bf8bd1e19	Add a target triple, as the runtime library invocation varies a bit by platform. It's apparently "bl __muldf3" on linux, for example. Since that's not what we're checking here, it's more robust to just force a triple. We just wwant to check that the inline FP instructions are only generated on cpus that have them." llvm-svn: 110830	2010-08-11 17:31:12 +00:00
Evan Cheng	b0276814d5	Fix test and re-enable it. llvm-svn: 110829	2010-08-11 17:25:51 +00:00
Dan Gohman	4df4114870	Temporarily disable some failing tests, until they can be properly investigated. llvm-svn: 110825	2010-08-11 16:36:07 +00:00
Jim Grosbach	4d5dc3e7e5	cortex m4 has floating point support, but only single precision. llvm-svn: 110810	2010-08-11 15:44:15 +00:00
Dan Gohman	f3d783a6d2	Temporarily disable some failing tests, until they can be properly investigated. llvm-svn: 110808	2010-08-11 15:09:00 +00:00
Bill Wendling	6a98131468	Consider this code snippet: float t1(int argc) { return (argc == 1123) ? 1.234f : 2.38213f; } We would generate truly awful code on ARM (those with a weak stomach should look away): _t1: movw r1, #1123 movs r2, #1 movs r3, #0 cmp r0, r1 mov.w r0, #0 it eq moveq r0, r2 movs r1, #4 cmp r0, #0 it ne movne r3, r1 adr r0, #LCPI1_0 ldr r0, [r0, r3] bx lr The problem was that legalization was creating a cascade of SELECT_CC nodes, for for the comparison of "argc == 1123" which was fed into a SELECT node for the ?: statement which was itself converted to a SELECT_CC node. This is because the ARM back-end doesn't have custom lowering for SELECT nodes, so it used the default "Expand". I added a fairly simple "LowerSELECT" to the ARM back-end. It takes care of this testcase, but can obviously be expanded to include more cases. Now we generate this, which looks optimal to me: _t1: movw r1, #1123 movs r2, #0 cmp r0, r1 adr r0, #LCPI0_0 it eq moveq r2, #4 ldr r0, [r0, r2] bx lr .align 2 LCPI0_0: .long 1075344593 @ float 2.382130e+00 .long 1067316150 @ float 1.234000e+00 llvm-svn: 110799	2010-08-11 08:43:16 +00:00
Evan Cheng	5190f09291	Report error if codegen tries to instantiate a ARM target when the cpu does support it. e.g. cortex-m* processors. llvm-svn: 110798	2010-08-11 07:17:46 +00:00
Evan Cheng	40921a4e62	Add ARM Archv6M and let it implies FeatureDB (having dmb, etc.) llvm-svn: 110795	2010-08-11 06:51:54 +00:00
Evan Cheng	49e02fc414	Add Cortex-M0 support. It's a ARMv6m device (no ARM mode) with some 32-bit instructions: dmb, dsb, isb, msr, and mrs. llvm-svn: 110786	2010-08-11 06:30:38 +00:00
Evan Cheng	6e809de90c	- Add subtarget feature -mattr=+db which determine whether an ARM cpu has the memory and synchronization barrier dmb and dsb instructions. - Change instruction names to something more sensible (matching name of actual instructions). - Added tests for memory barrier codegen. llvm-svn: 110785	2010-08-11 06:22:01 +00:00
Bill Wendling	79937dfc5b	Update test to match output of optimize compares for ARM. llvm-svn: 110765	2010-08-11 01:05:02 +00:00
Bill Wendling	871d4e1170	The optimize comparisons pass removes the "cmp" instruction this is checking for. llvm-svn: 110739	2010-08-10 22:16:05 +00:00
Evan Cheng	3f251fb26e	Re-apply r110655 with fixes. Epilogue must restore sp from fp if the function stack frame has a var-sized object. Also added a test case to check for the added benefit of this patch: it's optimizing away the unnecessary restore of sp from fp for some non-leaf functions. llvm-svn: 110707	2010-08-10 19:30:19 +00:00
Daniel Dunbar	0dd47bfca3	Revert r110655, "Fix ARM hasFP() semantics. It should return true whenever FP register is", it breaks a couple test-suite tests. llvm-svn: 110701	2010-08-10 18:32:02 +00:00
Jakob Stoklund Olesen	5730846c2f	Fix test for more architectures. Patch by Tobias Grosser. llvm-svn: 110685	2010-08-10 16:48:24 +00:00
Tobias Grosser	fedeff8015	Fix failing testcase. Those look like typos to me. llvm-svn: 110664	2010-08-10 09:54:29 +00:00
Devang Patel	b219746c80	Handle TAG_constant for integers. llvm-svn: 110656	2010-08-10 07:11:13 +00:00
Evan Cheng	8d5d1c1331	Fix ARM hasFP() semantics. It should return true whenever FP register is reserved, not available for general allocation. This eliminates all the extra checks for Darwin. This change also fixes the use of FP to access frame indices in leaf functions and cleaned up some confusing code in epilogue emission. llvm-svn: 110655	2010-08-10 06:26:49 +00:00
Kalle Raiskila	999da1f3a0	Have SPU handle halfvec stores aligned by 8 bytes. llvm-svn: 110576	2010-08-09 16:33:00 +00:00
Dale Johannesen	a3bd31a923	Use sdmem and sse_load_f64 (etc.) for the vector form of CMPSD (etc.) Matching a 128-bit memory operand is wrong, the instruction uses only 64 bits (same as ADDSD etc.) 8193553. llvm-svn: 110491	2010-08-07 00:33:42 +00:00
Rafael Espindola	027d5bcf89	Fix eabi calling convention when a 64 bit value shadows r3. Without this what was happening was: * R3 is not marked as "used" * ARM backend thinks it has to save it to the stack because of vaarg * Offset computation correctly ignores it * Offsets are wrong llvm-svn: 110446	2010-08-06 15:35:32 +00:00
Eric Christopher	e1fb772aa5	Add an option to always emit realignment code for a particular module. llvm-svn: 110404	2010-08-05 23:57:43 +00:00
Devang Patel	cc3f3b341d	Move x86 specific tests into test/CodeGen/X86. llvm-svn: 110372	2010-08-05 20:25:37 +00:00
Dan Gohman	c53ee449a5	Move x86-specific tests out of test/Transforms/LoopStrengthReduce and into test/CodeGen/X86, so that they aren't run when the x86 target is not enabled. Fix uglygep.ll to not be x86-specific. llvm-svn: 110343	2010-08-05 17:04:15 +00:00
Daniel Dunbar	e62e664656	tests: CodeGen/X86/GC tests require X86. llvm-svn: 110338	2010-08-05 15:45:33 +00:00
Bill Wendling	ca1cb13646	The lower invoke pass needs to have unreachable code elimination run after it because it could create such things. This fixes a MingW buildbot test failure. llvm-svn: 110279	2010-08-04 23:36:02 +00:00
Eli Friedman	39d0f57cab	PR7814: Truncates cannot be ignored for signed comparisons. llvm-svn: 110268	2010-08-04 22:40:58 +00:00
Bill Wendling	26feb849a4	Testcase for r110248. llvm-svn: 110249	2010-08-04 21:56:30 +00:00
Stuart Hastings	cba0d06b7c	call-imm.ll test case regex fix. Patch by Dimitry Andric! llvm-svn: 110199	2010-08-04 15:31:35 +00:00
Kalle Raiskila	8b2f70125f	Make SPU backend handle insertelement and store for "half vectors" llvm-svn: 110198	2010-08-04 13:59:48 +00:00
Bob Wilson	79daf7e0ae	Combine NEON VABD (absolute difference) intrinsics with ADDs to make VABA (absolute difference with accumulate) intrinsics. Radar 8228576. llvm-svn: 110170	2010-08-04 00:12:08 +00:00
Jakob Stoklund Olesen	011ff9bec9	OK, that's it. This test is going away now. But don't worry, I am taking it to a nice farm in the country where it can play with other tests. And bunnies. It is not clear what is being tested, and the revision history shows a bunch of random changes to the expected instruction count. Clearly, we are just fudging it to pass whenever it fails. llvm-svn: 110118	2010-08-03 17:21:14 +00:00
Kalle Raiskila	77558b7d13	More SPU v2f32 stuff added: insertelement and shuffle. llvm-svn: 110038	2010-08-02 11:22:10 +00:00
Kalle Raiskila	68b3886678	Add preliminary v2f32 support for SPU. Like with v2i32, we just duplicate the instructions and operate on half vectors. Also reorder code in SPUInstrInfo.td for better coherency. llvm-svn: 110037	2010-08-02 10:25:47 +00:00
Kalle Raiskila	622f8eb981	Add preliminary v2i32 support for SPU backend. As there are no such registers in SPU, this support boils down to "emulating" them by duplicating instructions on the general purpose registers. This adds the most basic operations on v2i32: passing parameters, addition, subtraction, multiplication and a few others. llvm-svn: 110035	2010-08-02 08:54:39 +00:00
Eli Friedman	7595ce05a2	PR7781: Fix incorrect shifting in PPCTargetLowering::LowerBUILD_VECTOR. llvm-svn: 109998	2010-08-02 00:18:19 +00:00
Eli Friedman	1b2bc1b844	PR7774: Fix undefined shifts in Alpha backend. As a bonus, this actually improves the generated code in some cases. llvm-svn: 109985	2010-08-01 21:13:28 +00:00
Bob Wilson	66161f5eb4	Revert new AVX intrinsic tests. They are breaking buildbots and Bruno is away from a computer now. --- Reverse-merging r109881 into '.': D test/CodeGen/X86/avx-intrinsics-x86.ll D test/CodeGen/X86/avx-intrinsics-x86_64.ll llvm-svn: 109959	2010-07-31 22:36:03 +00:00
Bruno Cardoso Lopes	92941fdb26	A bunch of tests for AVX intrinsics llvm-svn: 109881	2010-07-30 19:57:56 +00:00
Eli Friedman	ffe64c06ef	Fix for bug reported by Evzen Muller on llvm-commits: make sure to correctly check the range of the constant when optimizing a comparison between a constant and a sign_extend_inreg node. llvm-svn: 109854	2010-07-30 06:44:31 +00:00
Jim Grosbach	d343166a0b	Many Thumb2 instructions can reference the full ARM register set (i.e., have 4 bits per register in the operand encoding), but have undefined behavior when the operand value is 13 or 15 (SP and PC, respectively). The trivial coalescer in linear scan sometimes will merge a copy from SP into a subsequent instruction which uses the copy, and if that instruction cannot legally reference SP, we get bad code such as: mls r0,r9,r0,sp instead of: mov r2, sp mls r0, r9, r0, r2 This patch adds a new register class for use by Thumb2 that excludes the problematic registers (SP and PC) and is used instead of GPR for those operands which cannot legally reference PC or SP. The trivial coalescer explicitly requires that the register class of the destination for the COPY instruction contain the source register for the COPY to be considered for coalescing. This prevents errant instructions like that above. PR7499 llvm-svn: 109842	2010-07-30 02:41:01 +00:00
Dale Johannesen	2bff50546c	Implement vector constants which are splat of integers with mov + vdup. 8003375. This is currently disabled by default because LICM will not hoist a VDUP, so it pessimizes the code if the construct occurs inside a loop (8248029). llvm-svn: 109799	2010-07-29 20:10:08 +00:00
Nate Begeman	53afc8f06a	Implement a vectorized algorithm for <16 x i8> << <16 x i8> This is about 4x faster and smaller than the existing scalarization. llvm-svn: 109566	2010-07-28 00:21:48 +00:00
Nate Begeman	269a6da023	~40% faster vector shl <4 x i32> on SSE 4.1 Larger improvements for smaller types coming in future patches. For: define <2 x i64> @shl(<4 x i32> %r, <4 x i32> %a) nounwind readnone ssp { entry: %shl = shl <4 x i32> %r, %a ; <<4 x i32>> [#uses=1] %tmp2 = bitcast <4 x i32> %shl to <2 x i64> ; <<2 x i64>> [#uses=1] ret <2 x i64> %tmp2 } We get: _shl: ## @shl pslld $23, %xmm1 paddd LCPI0_0, %xmm1 cvttps2dq %xmm1, %xmm1 pmulld %xmm1, %xmm0 ret Instead of: _shl: ## @shl pshufd $3, %xmm0, %xmm2 movd %xmm2, %eax pshufd $3, %xmm1, %xmm2 movd %xmm2, %ecx shll %cl, %eax movd %eax, %xmm2 pshufd $1, %xmm0, %xmm3 movd %xmm3, %eax pshufd $1, %xmm1, %xmm3 movd %xmm3, %ecx shll %cl, %eax movd %eax, %xmm3 punpckldq %xmm2, %xmm3 movd %xmm0, %eax movd %xmm1, %ecx shll %cl, %eax movd %eax, %xmm2 movhlps %xmm0, %xmm0 movd %xmm0, %eax movhlps %xmm1, %xmm1 movd %xmm1, %ecx shll %cl, %eax movd %eax, %xmm0 punpckldq %xmm0, %xmm2 movdqa %xmm2, %xmm0 punpckldq %xmm3, %xmm0 ret llvm-svn: 109549	2010-07-27 22:37:06 +00:00
Nate Begeman	317b969ac5	Fix a crash in the dag combiner caused by ConstantFoldBIT_CONVERTofBUILD_VECTOR calling itself recursively and returning a SCALAR_TO_VECTOR node, but assuming the input was always a BUILD_VECTOR. llvm-svn: 109519	2010-07-27 18:02:18 +00:00
Anton Korobeynikov	6bcea068db	Currently EH lowering code expects typeinfo to be global only. This assumption is not satisfied due to global mergeing. Workaround the issue by temporary disablinge mergeing of const globals. Also, ignore LLVM "special" globals. This fixes PR7716 llvm-svn: 109423	2010-07-26 18:45:39 +00:00
Evan Cheng	df907f4594	- Allow target to specify when is register pressure "too high". In most cases, it's too late to start backing off aggressive latency scheduling when most of the registers are in use so the threshold should be a bit tighter. - Correctly handle live out's and extract_subreg etc. - Enable register pressure aware scheduling by default for hybrid scheduler. For ARM, this is almost always a win on # of instructions. It's runtime neutral for most of the tests. But for some kernels with high register pressure it can be a huge win. e.g. 464.h264ref reduced number of spills by 54 and sped up by 20%. llvm-svn: 109279	2010-07-23 22:39:59 +00:00
Dan Gohman	55e244698a	Use the proper type for shift counts. This fixes a bootstrap error. llvm-svn: 109265	2010-07-23 21:08:12 +00:00
Dan Gohman	0818684a70	DAGCombine (shl (anyext x, c)) to (anyext (shl x, c)) if the high bits are not demanded. This often allows the anyext to be folded away. llvm-svn: 109242	2010-07-23 18:03:30 +00:00
Eric Christopher	9a77382685	Custom lower the memory barrier instructions and add support for lowering without sse2. Add a couple of new testcases. Fixes a few libgomp tests and latent bugs. Remove a few todos. llvm-svn: 109078	2010-07-22 02:48:34 +00:00
Evan Cheng	285903853f	More register pressure aware scheduling work. llvm-svn: 109064	2010-07-21 23:53:58 +00:00
Eric Christopher	84bdfd80df	Baby steps towards ARM fast-isel. llvm-svn: 109047	2010-07-21 22:26:11 +00:00
Rafael Espindola	4277e14dc4	Fix calling convention on ARM if vfp2+ is enabled. llvm-svn: 109009	2010-07-21 11:38:30 +00:00
Dan Gohman	625fd2292d	Fix SCEV denormalization of expressions where the exit value from one loop is involved in the increment of an addrec for another loop. This fixes rdar://8168938. llvm-svn: 108863	2010-07-20 17:06:20 +00:00
Jim Grosbach	badf087e45	update tests for smarter BIC usage llvm-svn: 108846	2010-07-20 16:16:48 +00:00
Duncan Sands	2e839de377	The same problem was being tracked in PR7652. llvm-svn: 108843	2010-07-20 15:52:32 +00:00
Bruno Cardoso Lopes	160695fecb	Fix PR7174, a couple o Mips fixes: - Fix a typo for PIC check during jmp table lowering - Also fix the "first jump table basic block is not considered only reachable by fall through" problem, use this ad-hoc solution until I come up with something better. Patch by stetorvs@gmail.com llvm-svn: 108820	2010-07-20 08:37:04 +00:00
Bruno Cardoso Lopes	ea7863647b	Fix Mips PR7473. Patch by stetorvs@gmail.com llvm-svn: 108816	2010-07-20 07:58:51 +00:00
Dan Gohman	b5e918dc05	After a custom inserter, in a block which has constant instructions, update the current basic block in addition to the current insert position, so that they remain consistent. This fixes rdar://8204072. llvm-svn: 108765	2010-07-19 22:48:56 +00:00
Owen Anderson	9c271e2835	Remove r108639 now that it is handled by InstCombine instead. llvm-svn: 108688	2010-07-19 08:10:24 +00:00
Owen Anderson	41670a11a8	Add a testcase for r108639. llvm-svn: 108640	2010-07-18 08:57:19 +00:00
Jim Grosbach	b97e2bbe32	Add combiner patterns to more effectively utilize the BFI (bitfield insert) instruction for non-constant operands. This includes the case referenced in the README.txt regarding a bitfield copy. llvm-svn: 108608	2010-07-17 03:30:54 +00:00
Jim Grosbach	11013eda5a	Add basic support to code-gen the ARM/Thumb2 bit-field insert (BFI) instruction and a combine pattern to use it for setting a bit-field to a constant value. More to come for non-constant stores. llvm-svn: 108570	2010-07-16 23:05:05 +00:00
Bill Wendling	bf8370ff36	Consider this function: void foo() { __builtin_unreachable(); } It will output the following on Darwin X86: _func1: Leh_func_begin0: pushq %rbp Ltmp0: movq %rsp, %rbp Ltmp1: Leh_func_end0: This prolog adds a new Call Frame Information (CFI) row to the FDE with an address that is not within the address range of the code it describes -- part is equal to the end of the function -- and therefore results in an invalid EH frame. If we emit a nop in this situation, then the CFI row is now within the address range. llvm-svn: 108568	2010-07-16 22:51:10 +00:00

1 2 3 4 5 ...

3670 Commits