llvm-project

Commit Graph

Author	SHA1	Message	Date
Chris Lattner	96d50487c9	Use vmladduhm to do v8i16 multiplies which is faster and simpler than doing even/odd halves. Thanks to Nate telling me what's what. llvm-svn: 27793	2006-04-18 04:28:57 +00:00
Chris Lattner	d6d82aa889	Implement v16i8 multiply with this code: vmuloub v5, v3, v2 vmuleub v2, v3, v2 vperm v2, v2, v5, v4 This implements CodeGen/PowerPC/vec_mul.ll. With this, v16i8 multiplies are 6.79x faster than before. Overall, UnitTests/Vector/multiplies.c is now 2.45x faster with LLVM than with GCC. Remove the 'integer multiplies' todo from the README file. llvm-svn: 27792	2006-04-18 03:57:35 +00:00
Chris Lattner	48786e4887	Add tests for v8i16 and v16i8 llvm-svn: 27791	2006-04-18 03:54:50 +00:00
Evan Cheng	4d36a36900	Correct comments llvm-svn: 27790	2006-04-18 03:45:01 +00:00
Chris Lattner	7e439874cb	Lower v8i16 multiply into this code: li r5, lo16(LCPI1_0) lis r6, ha16(LCPI1_0) lvx v4, r6, r5 vmulouh v5, v3, v2 vmuleuh v2, v3, v2 vperm v2, v2, v5, v4 where v4 is: LCPI1_0: ; <16 x ubyte> .byte 2 .byte 3 .byte 18 .byte 19 .byte 6 .byte 7 .byte 22 .byte 23 .byte 10 .byte 11 .byte 26 .byte 27 .byte 14 .byte 15 .byte 30 .byte 31 This is 5.07x faster on the G5 (measured) than lowering to scalar code + loads/stores. llvm-svn: 27789	2006-04-18 03:43:48 +00:00
Chris Lattner	a2cae1bb10	Custom lower v4i32 multiplies into a cute sequence, instead of having legalize scalarize the sequence into 4 mullw's and a bunch of load/store traffic. This speeds up v4i32 multiplies 4.1x (measured) on a G5. This implements PowerPC/vec_mul.ll llvm-svn: 27788	2006-04-18 03:24:30 +00:00
Chris Lattner	2dea154035	new testcase llvm-svn: 27787	2006-04-18 03:22:16 +00:00
Evan Cheng	0ef233509b	Another entry llvm-svn: 27786	2006-04-18 01:22:57 +00:00
Chris Lattner	3db2056315	Fix a build failure on Vladimir's tester. llvm-svn: 27785	2006-04-18 00:21:25 +00:00
Evan Cheng	e008bd3d27	Another entry. llvm-svn: 27784	2006-04-18 00:21:01 +00:00
Evan Cheng	5421206c4b	Use movss to insert_vector_elt(v, s, 0). llvm-svn: 27782	2006-04-17 22:45:49 +00:00
Chris Lattner	36dd7c98d1	Turn x86 unaligned load/store intrinsics into aligned load/store instructions if the pointer is known aligned. llvm-svn: 27781	2006-04-17 22:26:56 +00:00
Chris Lattner	916ae0775e	Fix handling of calls in functions that use vectors. This fixes a crash on the code in GCC PR26546. llvm-svn: 27780	2006-04-17 22:10:08 +00:00
Evan Cheng	6e5e205841	Use two pinsrw to insert an element into v4i32 / v4f32 vector. llvm-svn: 27779	2006-04-17 22:04:06 +00:00
Chris Lattner	63a5cdc423	remove done item llvm-svn: 27778	2006-04-17 21:52:03 +00:00
Chris Lattner	6bd68ae81e	Don't diddle VRSAVE if no registers need to be added/removed from it. This allows us to codegen functions as: _test_rol: vspltisw v2, -12 vrlw v2, v2, v2 blr instead of: _test_rol: mfvrsave r2, 256 mr r3, r2 mtvrsave r3 vspltisw v2, -12 vrlw v2, v2, v2 mtvrsave r2 blr Testcase here: CodeGen/PowerPC/vec_vrsave.ll llvm-svn: 27777	2006-04-17 21:48:13 +00:00
Chris Lattner	efe2b3f2fc	New testcase, shouldn't touch vrsave llvm-svn: 27776	2006-04-17 21:48:03 +00:00
Chris Lattner	bec79b4a59	Add a MachineInstr::eraseFromParent convenience method. llvm-svn: 27775	2006-04-17 21:35:41 +00:00
Chris Lattner	9fcad09b1b	Add some convenience methods. llvm-svn: 27774	2006-04-17 21:35:08 +00:00
Evan Cheng	22c06f054b	Encoding bug llvm-svn: 27773	2006-04-17 21:33:57 +00:00
Chris Lattner	72d7c27069	Vectors that are known live-in and live-out are clearly already marked in the vrsave register for the caller. This allows us to codegen a function as: _test_rol: mfspr r2, 256 mr r3, r2 mtspr 256, r3 vspltisw v2, -12 vrlw v2, v2, v2 mtspr 256, r2 blr instead of: _test_rol: mfspr r2, 256 oris r3, r2, 40960 mtspr 256, r3 vspltisw v0, -12 vrlw v2, v0, v0 mtspr 256, r2 blr llvm-svn: 27772	2006-04-17 21:22:06 +00:00
Chris Lattner	14c4972b6d	Prefer to allocate V2-V5 before V0,V1. This lets us generate code like this: vspltisw v2, -12 vrlw v2, v2, v2 instead of: vspltisw v0, -12 vrlw v2, v0, v0 when a function is returning a value. llvm-svn: 27771	2006-04-17 21:19:12 +00:00
Chris Lattner	6df094b4ab	Move some knowledge about registers out of the code emitter into the register info. llvm-svn: 27770	2006-04-17 21:07:20 +00:00
Chris Lattner	0f28d48da2	Use a small table instead of macros to do this conversion. llvm-svn: 27769	2006-04-17 20:59:25 +00:00
Evan Cheng	5022b3426e	Implement v8i16, v16i8 splat using unpckl + pshufd. llvm-svn: 27768	2006-04-17 20:43:08 +00:00
Chris Lattner	c070c621ac	implement returns of a vector, testcase here: CodeGen/X86/vec_return.ll llvm-svn: 27767	2006-04-17 20:32:50 +00:00
Chris Lattner	e757ae6534	New testcase llvm-svn: 27766	2006-04-17 20:32:27 +00:00
Chris Lattner	326870b40b	Codegen insertelement with constant insertion points as scalar_to_vector and a shuffle. For this: void %test2(<4 x float>* %F, float %f) { %tmp = load <4 x float>* %F ; <<4 x float>> [#uses=2] %tmp3 = add <4 x float> %tmp, %tmp ; <<4 x float>> [#uses=1] %tmp2 = insertelement <4 x float> %tmp3, float %f, uint 2 ; <<4 x float>> [#uses=2] %tmp6 = add <4 x float> %tmp2, %tmp2 ; <<4 x float>> [#uses=1] store <4 x float> %tmp6, <4 x float>* %F ret void } we now get this on X86 (which will get better): _test2: movl 4(%esp), %eax movaps (%eax), %xmm0 addps %xmm0, %xmm0 movaps %xmm0, %xmm1 shufps $3, %xmm1, %xmm1 movaps %xmm0, %xmm2 shufps $1, %xmm2, %xmm2 unpcklps %xmm1, %xmm2 movss 8(%esp), %xmm1 unpcklps %xmm1, %xmm0 unpcklps %xmm2, %xmm0 addps %xmm0, %xmm0 movaps %xmm0, (%eax) ret instead of: _test2: subl $28, %esp movl 32(%esp), %eax movaps (%eax), %xmm0 addps %xmm0, %xmm0 movaps %xmm0, (%esp) movss 36(%esp), %xmm0 movss %xmm0, 8(%esp) movaps (%esp), %xmm0 addps %xmm0, %xmm0 movaps %xmm0, (%eax) addl $28, %esp ret llvm-svn: 27765	2006-04-17 19:21:01 +00:00
Chris Lattner	e54133cfba	Make sure to check splats of every constant we can, handle splat(31) by being a bit more clever, add support for odd splats from -31 to -17. llvm-svn: 27764	2006-04-17 18:09:22 +00:00
Evan Cheng	bf0d13c54f	Incorrect foldMemoryOperand entries llvm-svn: 27763	2006-04-17 18:06:12 +00:00
Evan Cheng	5112b5c544	Errors in patterns preventing load folding llvm-svn: 27762	2006-04-17 18:05:01 +00:00
Jeff Cohen	e3955a05e4	Add checks for __OpenBSD__. llvm-svn: 27761	2006-04-17 17:55:41 +00:00
Chris Lattner	264c908e3a	Teach the ppc backend to use rol and vsldoi to generate splatted constants. This implements vec_constants.ll:test_vsldoi and test_rol llvm-svn: 27760	2006-04-17 17:55:10 +00:00
Chris Lattner	8cdba16d5e	Some more cases that can be generated with two instructions llvm-svn: 27759	2006-04-17 17:54:18 +00:00
Chris Lattner	26fb8d9393	add a note llvm-svn: 27758	2006-04-17 17:29:41 +00:00
Evan Cheng	b3b41c4f3d	FP SETOLT, SETOLT, SETUGE, SETUGT conditions were implemented incorrectly llvm-svn: 27755	2006-04-17 07:24:10 +00:00
Chris Lattner	1b3806ace5	Make some code more general, adding support for constant formation of several new patterns. llvm-svn: 27754	2006-04-17 06:58:41 +00:00
Chris Lattner	9a3859b339	New testcases llvm-svn: 27753	2006-04-17 06:58:16 +00:00
Chris Lattner	f8dd76df5b	Learn how to make odd splatted constants in range [17,29]. This implements PowerPC/vec_constants.ll:test_29. llvm-svn: 27752	2006-04-17 06:07:44 +00:00
Chris Lattner	02440a996b	new testcase llvm-svn: 27751	2006-04-17 06:06:50 +00:00
Chris Lattner	2a099c04c1	Pull some code out into a helper function. Effeciently codegen even splats in the range [-32,30]. This allows us to codegen <30,30,30,30> as: vspltisw v0, 15 vadduwm v2, v0, v0 instead of as a cp load. llvm-svn: 27750	2006-04-17 06:00:21 +00:00
Chris Lattner	31b7d89e66	New testcase llvm-svn: 27749	2006-04-17 05:58:22 +00:00
Chris Lattner	071ad01ceb	Implement a TODO: for any shuffle that can be viewed as a v4[if]32 shuffle, if it can be implemented in 3 or fewer discrete altivec instructions, codegen it as such. This implements Regression/CodeGen/PowerPC/vec_perf_shuffle.ll llvm-svn: 27748	2006-04-17 05:28:54 +00:00
Chris Lattner	6e98b49b54	new testcase, these shuffles can be implemented with discrete instructions, and shouldn't be lowered to vperm. llvm-svn: 27747	2006-04-17 05:27:31 +00:00
Chris Lattner	85bfa3c2bc	Regenerate with adjusted costs llvm-svn: 27746	2006-04-17 05:26:20 +00:00
Chris Lattner	e2e2cc5b28	Encode a cost of zero as a cost of 1. llvm-svn: 27745	2006-04-17 05:25:16 +00:00
Chris Lattner	aac2a200cd	Regenerate with correct offset llvm-svn: 27744	2006-04-17 05:08:46 +00:00
Chris Lattner	3dcfef6310	Really, I can count! llvm-svn: 27743	2006-04-17 05:05:52 +00:00
Chris Lattner	311b1a6e23	Increase the opcodes by one each to disambiguate COPY from VMRGHW. llvm-svn: 27742	2006-04-17 00:47:48 +00:00
Chris Lattner	895dba9714	assign stable opcodes to the various altivec ops. llvm-svn: 27741	2006-04-17 00:47:18 +00:00

1 2 3 4 5 ...

24225 Commits All Branches Search

24225 Commits

All Branches