llvm-project

Commit Graph

Author	SHA1	Message	Date
Bill Schmidt	cb34fd09cd	[PPC64] VSX indexed-form loads use wrong instruction format The VSX instruction definitions for lxsdx, lxvd2x, lxvdsx, and lxvw4x incorrectly use the XForm_1 instruction format, rather than the XX1Form instruction format. This is likely a pasto when creating these instructions, which were based on lvx and so forth. This patch uses the correct format. The existing reformatting test (test/MC/PowerPC/vsx.s) missed this because the two formats differ only in that XX1Form has an extension to the target register field in bit 31. The tests for these instructions used a target register of 7, so the default of 0 in bit 31 for XForm_1 didn't expose a problem. For register numbers 32-63 this would be noticeable. I've changed the test to use higher register numbers to verify my change is effective. llvm-svn: 219416	2014-10-09 17:51:35 +00:00
Eric Christopher	1b8e763630	Reset the subtarget for DAGToDAG on every iteration of runOnMachineFunction. This required updating the generated functions and TD file accordingly to be pointers rather than const references. llvm-svn: 209375	2014-05-22 01:07:24 +00:00
Hal Finkel	9e0baa6d3a	[PowerPC] Add some missing VSX bitcast patterns llvm-svn: 205352	2014-04-01 19:24:27 +00:00
Hal Finkel	5c0d1454d6	[PowerPC] Handle VSX v2i64 SIGN_EXTEND_INREG sitofp from v2i32 to v2f64 ends up generating a SIGN_EXTEND_INREG v2i64 node (and similarly for v2i16 and v2i8). Even though there are no sign-extension (or algebraic shifts) for v2i64 types, we can handle v2i32 sign extensions by converting two and from v2i64. The small trick necessary here is to shift the i32 elements into the right lanes before the i32 -> f64 step. This is because of the big Endian nature of the system, we need the i32 portion in the high word of the i64 elements. For v2i16 and v2i8 we can do the same, but we first use the default Altivec shift-based expansion from v2i16 or v2i8 to v2i32 (by casting to v4i32) and then apply the above procedure. llvm-svn: 205146	2014-03-30 13:22:59 +00:00
Hal Finkel	e8fba98735	[PowerPC] VSX instruction latency corrections The vector divide and sqrt instructions have high latencies, and the scalar comparisons are like all of the others. On the P7, permutations take an extra cycle over purely-simple vector ops. llvm-svn: 205096	2014-03-29 13:20:31 +00:00
Hal Finkel	19be506a5e	[PowerPC] Add subregister classes for f64 VSX values We had stored both f64 values and v2f64, etc. values in the VSX registers. This worked, but was suboptimal because we would always spill 16-byte values even through we almost always had scalar 8-byte values. This resulted in an increase in stack-size use, extra memory bandwidth, etc. To fix this, I've added 64-bit subregisters of the Altivec registers, and combined those with the existing scalar floating-point registers to form a class of VSX scalar floating-point registers. The ABI code has also been enhanced to use this register class and some other necessary improvements have been made. llvm-svn: 205075	2014-03-29 05:29:01 +00:00
Hal Finkel	82569b6366	[PowerPC] Fix v2f64 vector extract and related patterns First, v2f64 vector extract had not been declared legal (and so the existing patterns were not being used). Second, the patterns for that, and for scalar_to_vector, should really be a regclass copy, not a subregister operation, because the VSX registers directly hold both the vector and scalar data. llvm-svn: 204971	2014-03-27 22:22:48 +00:00
Hal Finkel	df3e34d944	[PowerPC] Generate VSX permutations for v2[fi]64 vectors llvm-svn: 204873	2014-03-26 22:58:37 +00:00
Hal Finkel	7279f4b00d	[PowerPC] Use v2f64 <-> v2i64 VSX conversion instructions llvm-svn: 204843	2014-03-26 19:13:54 +00:00
Hal Finkel	ea76a44584	[PowerPC] Remove some dead VSX v4f32 store patterns These patterns are dead (because v4f32 stores are currently promoted to v4i32 and stored using Altivec instructions), and also are likely not correct (because they'd store the vector elements in the opposite order from that assumed by the rest of the Altivec code). llvm-svn: 204839	2014-03-26 18:26:36 +00:00
Hal Finkel	9281c9a38b	[PowerPC] Use VSX vector load/stores for v2[fi]64 These instructions have access to the complete VSX register file. In addition, they "swap" the order of the elements so that element 0 (the scalar part) comes first in memory and element 1 follows at a higher address. llvm-svn: 204838	2014-03-26 18:26:30 +00:00
Hal Finkel	a6c8b51212	[PowerPC] Add v2i64 as a legal VSX type v2i64 needs to be a legal VSX type because it is the SetCC result type from v2f64 comparisons. We need to expand all non-arithmetic v2i64 operations. This fixes the lowering for v2f64 VSELECT. llvm-svn: 204828	2014-03-26 16:12:58 +00:00
Hal Finkel	bd4de9d478	[PowerPC] Generate logical vector VSX instructions These instructions are essentially the same as their Altivec counterparts, but have access to the larger VSX register file. llvm-svn: 204782	2014-03-26 04:55:40 +00:00
Hal Finkel	25e0454f10	[PowerPC] Add a TableGen relation for A-type and M-type VSX FMA instructions TableGen will create a lookup table for the A-type FMA instructions providing their corresponding M-form opcodes. This will be used by upcoming commits. llvm-svn: 204746	2014-03-25 18:55:11 +00:00
Hal Finkel	e01d32107c	[PowerPC] Mark many instructions as commutative I'm under the impression that we used to infer the isCommutable flag from the instruction-associated pattern. Regardless, we don't seem to do this (at least by default) any more. I've gone through all of our instruction definitions, and marked as commutative all of those that should be trivial to commute (by exchanging the first two operands). There has been special code for the RL* instructions, and that's not changed. Before this change, we had the following commutative instructions: RLDIMI RLDIMIo RLWIMI RLWIMI8 RLWIMI8o RLWIMIo XSADDDP XSMULDP XVADDDP XVADDSP XVMULDP XVMULSP After: ADD4 ADD4o ADD8 ADD8o ADDC ADDC8 ADDC8o ADDCo ADDE ADDE8 ADDE8o ADDEo AND AND8 AND8o ANDo CRAND CREQV CRNAND CRNOR CROR CRXOR EQV EQV8 EQV8o EQVo FADD FADDS FADDSo FADDo FMADD FMADDS FMADDSo FMADDo FMSUB FMSUBS FMSUBSo FMSUBo FMUL FMULS FMULSo FMULo FNMADD FNMADDS FNMADDSo FNMADDo FNMSUB FNMSUBS FNMSUBSo FNMSUBo MULHD MULHDU MULHDUo MULHDo MULHW MULHWU MULHWUo MULHWo MULLD MULLDo MULLW MULLWo NAND NAND8 NAND8o NANDo NOR NOR8 NOR8o NORo OR OR8 OR8o ORo RLDIMI RLDIMIo RLWIMI RLWIMI8 RLWIMI8o RLWIMIo VADDCUW VADDFP VADDSBS VADDSHS VADDSWS VADDUBM VADDUBS VADDUHM VADDUHS VADDUWM VADDUWS VAND VAVGSB VAVGSH VAVGSW VAVGUB VAVGUH VAVGUW VMADDFP VMAXFP VMAXSB VMAXSH VMAXSW VMAXUB VMAXUH VMAXUW VMHADDSHS VMHRADDSHS VMINFP VMINSB VMINSH VMINSW VMINUB VMINUH VMINUW VMLADDUHM VMULESB VMULESH VMULEUB VMULEUH VMULOSB VMULOSH VMULOUB VMULOUH VNMSUBFP VOR VXOR XOR XOR8 XOR8o XORo XSADDDP XSMADDADP XSMAXDP XSMINDP XSMSUBADP XSMULDP XSNMADDADP XSNMSUBADP XVADDDP XVADDSP XVMADDADP XVMADDASP XVMAXDP XVMAXSP XVMINDP XVMINSP XVMSUBADP XVMSUBASP XVMULDP XVMULSP XVNMADDADP XVNMADDASP XVNMSUBADP XVNMSUBASP XXLAND XXLNOR XXLOR XXLXOR This is a by-inspection change, and I'm not sure how to write a reliable test case. I would like advice on this, however. llvm-svn: 204609	2014-03-24 15:07:28 +00:00
Hal Finkel	4a912250fa	[PowerPC] Make use of VSX f64 <-> i64 conversion instructions When VSX is available, these instructions should be used in preference to the older variants that only have access to the scalar floating-point registers. llvm-svn: 204559	2014-03-23 05:35:00 +00:00
Hal Finkel	27774d9274	[PowerPC] Initial support for the VSX instruction set VSX is an ISA extension supported on the POWER7 and later cores that enhances floating-point vector and scalar capabilities. Among other things, this adds <2 x double> support and generally helps to reduce register pressure. The interesting part of this ISA feature is the register configuration: there are 64 new 128-bit vector registers, the 32 of which are super-registers of the existing 32 scalar floating-point registers, and the second 32 of which overlap with the 32 Altivec vector registers. This makes things like vector insertion and extraction tricky: this can be free but only if we force a restriction to the right register subclass when needed. A new "minipass" PPCVSXCopy takes care of this (although it could do a more-optimal job of it; see the comment about unnecessary copies below). Please note that, currently, VSX is not enabled by default when targeting anything because it is not yet ready for that. The assembler and disassembler are fully implemented and tested. However: - CodeGen support causes miscompiles; test-suite runtime failures: MultiSource/Benchmarks/FreeBench/distray/distray MultiSource/Benchmarks/McCat/08-main/main MultiSource/Benchmarks/Olden/voronoi/voronoi MultiSource/Benchmarks/mafft/pairlocalalign MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4 SingleSource/Benchmarks/CoyoteBench/almabench SingleSource/Benchmarks/Misc/matmul_f64_4x4 - The lowering currently falls back to using Altivec instructions far more than it should. Worse, there are some things that are scalarized through the stack that shouldn't be. - A lot of unnecessary copies make it past the optimizers, and this needs to be fixed. - Many more regression tests are needed. Normally, I'd fix these things prior to committing, but there are some students and other contributors who would like to work this, and so it makes sense to move this development process upstream where it can be subject to the regular code-review procedures. llvm-svn: 203768	2014-03-13 07:58:58 +00:00

17 Commits