llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	20a6db5a84	[X86] Create PSUBUS from (add (umax X, C), -C) InstCombine seems to canonicalize or PSUB patter into a max with the cosntant and an add with an inverse of the constant. This patch recognizes this pattern and turns it into PSUBUS. Future work could improve undef element handling. Fixes some of PR40053 Differential Revision: https://reviews.llvm.org/D55780 llvm-svn: 349519	2018-12-18 18:26:25 +00:00
Craig Topper	6a6f6109c4	[X86] Add baseline tests for D55780 This adds tests for (add (umax X, C), -C) as part of fixing PR40053 llvm-svn: 349416	2018-12-17 23:20:14 +00:00
Sanjay Patel	f24900b934	[DAGCombiner] allow hoisting vector bitwise logic ahead of truncates The transform performs a bitwise logic op in a wider type followed by truncate when both inputs are truncated from the same source type: logic_op (truncate x), (truncate y) --> truncate (logic_op x, y) There are a bunch of other checks that should prevent doing this when it might be harmful. We already do this transform for scalars in this spot. The vector limitation was shared with a check for the case when the operands are extended. I'm not sure if that limit is needed either, but that would be a separate patch. Differential Revision: https://reviews.llvm.org/D55448 llvm-svn: 349303	2018-12-16 14:57:04 +00:00
Simon Pilgrim	1e1fd9c761	[TargetLowering] Add ISD::OR + ISD::XOR handling to SimplifyDemandedVectorElts Differential Revision: https://reviews.llvm.org/D55600 llvm-svn: 349264	2018-12-15 11:36:36 +00:00
Simon Pilgrim	ee8b96f253	[X86][SSE] Add computeKnownBits/ComputeNumSignBits support for PACKSS/PACKUS instructions. Pull out getPackDemandedElts demanded elts remapping helper from computeKnownBitsForTargetNode and use in computeKnownBits/ComputeNumSignBits. llvm-svn: 347303	2018-11-20 13:23:37 +00:00
Craig Topper	bc8148f7b0	[X86] Lower v16i16->v8i16 truncate using an 'and' with 255, an extract_subvector, and a packuswb instruction. Summary: This is an improvement over the two pshufbs and punpcklqdq we'd get otherwise. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54671 llvm-svn: 347171	2018-11-18 17:59:28 +00:00
Craig Topper	60c202a494	[X86] Don't emit *_extend_vector_inreg nodes when both the input and output types are legal with AVX1 We already have custom lowering for the AVX case in LegalizeVectorOps. So its better to keep the regular extend op around as long as possible. I had to qualify one place in DAG combine that created illegal vector extending load operations. This change by itself had no effect on any tests which is why its included here. I've made a few cleanups to the custom lowering. The sign extend code no longer creates an identity shuffle with undef elements. The zero extend code now emits a zero_extend_vector_inreg instead of an unpckl with a zero vector. For the high half of the custom lowering of zero_extend/any_extend, we're now using an unpckh with a zero vector or undef. Previously we used used a pshufd to move the upper 64-bits to the lower 64-bits and then used a zero_extend_vector_inreg. I think the zero vector should require less execution resources and be smaller code size. Differential Revision: https://reviews.llvm.org/D54024 llvm-svn: 346043	2018-11-02 21:09:49 +00:00
Craig Topper	8315d9990c	[X86] Stop promoting vector and/or/xor/andn to vXi64. These promotions add additional bitcasts to the SelectionDAG that can pessimize computeKnownBits/computeNumSignBits. It also seems to interfere with broadcast formation. This patch removes the promotion and adds isel patterns instead. The increased table size is more than I would like, but hopefully we can find some canonicalizations or other tricks to start pruning out patterns going forward. Differential Revision: https://reviews.llvm.org/D53268 llvm-svn: 345408	2018-10-26 17:21:26 +00:00
Craig Topper	06aea1720a	[X86] Move promotion of vector and/or/xor from legalization to DAG combine Summary: I've noticed that the bitcasts we introduce for these make computeKnownBits and computeNumSignBits not work well in LegalizeVectorOps. LegalizeVectorOps legalizes bottom up while LegalizeDAG legalizes top down. The bottom up strategy for LegalizeVectorOps means operands are legalized before their uses. So we promote and/or/xor before we legalize the operands that use them making computeKnownBits/computeNumSignBits in places like LowerTruncate suboptimal. I looked at changing LegalizeVectorOps to be top down as well, but that was more disruptive and caused some regressions. I also looked at just moving promotion of binops to LegalizeDAG, but that had a few issues one around matching AND,ANDN,OR into VSELECT because I had to create ANDN as vXi64, but the other nodes hadn't legalized yet, I didn't look too hard at fixing that. This patch seems to produce better results overall than my other attempts. We now form broadcasts of constants better in some cases. For at least some of them the AND was being introduced in LegalizeDAG, promoted to vXi64, and the BUILD_VECTOR was also legalized there. I think we got bad ordering of that. Now the promotion is out of the legalizer so we handle this better. In the longer term I think we really should evaluate whether we should be doing this promotion at all. It's really there to reduce isel pattern count, but I'm wondering if we'd be better served just eating the pattern cost or doing C++ based isel for vector and/or/xor in X86ISelDAGToDAG. The masked and/or/xor will definitely be difficult in patterns if a bitcast gets between the vselect and the and/or/xor node. That becomes a lot of permutations to cover. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53107 llvm-svn: 344487	2018-10-15 01:51:58 +00:00
Craig Topper	f6d8400869	[X86] When lowering unsigned v2i64 setcc without SSE42, flip the sign bits in the v2i64 type then bitcast to v4i32. This may give slightly better opportunities for DAG combine to simplify with the operations before the setcc. It also matches the type the xors will eventually be promoted to anyway so it saves a legalization step. Almost all of the test changes are because our constant pool entry is now v2i64 instead of v4i32 on 64-bit targets. On 32-bit targets getConstant should be emitting a v4i32 build_vector and a v4i32->v2i64 bitcast. There are a couple test cases where it appears we now combine a bitwise not with one of these xors which caused a new constant vector to be generated. This prevented a constant pool entry from being shared. But if that's an issue we're concerned about, it seems we need to address it another way that just relying a bitcast to hide it. This came about from experiments I've been trying with pushing the promotion of and/or/xor to vXi64 later than LegalizeVectorOps where it is today. We run LegalizeVectorOps in a bottom up order. So the and/or/xor are promoted before their users are legalized. The bitcasts added for the promotion act as a barrier to computeKnownBits if we try to use it during vector legalization of a later operation. So by moving the promotion out we can hopefully get better results from computeKnownBits/computeNumSignBits like in LowerTruncate on AVX512. I've also looked at running LegalizeVectorOps in a top down order like LegalizeDAG, but thats showing some other issues. llvm-svn: 344071	2018-10-09 19:05:50 +00:00
Simon Pilgrim	720db8ed7b	[X86][AVX1] Enable _EXTEND_VECTOR_INREG lowering of 256-bit vectors As discussed on D52964, this adds 256-bit _EXTEND_VECTOR_INREG lowering support for AVX1 targets to help improve SimplifyDemandedBits handling. Differential Revision: https://reviews.llvm.org/D52980 llvm-svn: 344019	2018-10-09 07:42:01 +00:00
Craig Topper	7af5e333e7	[X86] Create paddus/psubus from narrower vectors with i8/i16 element types. Summary: This patch allows vectors with a power of 2 number of elements and i8/i16 element type to select paddus/psubus instructions. ReplaceNodeResults has been updated to custom widen these operations up to 128 bits like we already do for PAVG. Another step towards fixing PR38691 Reviewers: RKSimon, spatel Reviewed By: RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51818 llvm-svn: 341753	2018-09-08 19:32:58 +00:00
Craig Topper	fa535c027e	[X86] Add codegen tests for narrow PADDUS/PSUBUS patterns for PR38691. llvm-svn: 341711	2018-09-07 21:28:46 +00:00
Craig Topper	40c9559b74	[X86] Add support for using 512-bit PSUBUS to combineSelect. The code already support 128 and 256 and even knows to split 256 for AVX1. So we really just needed to stop looking for specific VTs and subtarget features and just look for legal VTs with i8/i16 elements. While there, add some curly braces around outer if statement bodies that contain only another if. It makes all the closing curly braces look more regular. llvm-svn: 340128	2018-08-18 18:51:03 +00:00
Craig Topper	b40a1d5f84	[X86] Add test cases to show missed opportunities to use 512-bit PSUBUS. llvm-svn: 340127	2018-08-18 18:50:59 +00:00
Simon Pilgrim	a6afa310c9	[X86][SSE] Simplify combineVectorTruncationWithPACKUS to reduce code duplication Simplify combineVectorTruncationWithPACKUS to mask the upper bits followed by calling truncateVectorWithPACK instead of duplicating with similar code. This results in the codegen using (V)PACKUSDW on SSE41+ targets for vXi64/vXi32 inputs where before it always used PACKUSWB (along with a lot more bitcasting). I've raised PR37749 as until we avoid unnecessary concats back to 256-bit for bitwise ops, we can't avoid splitting the input value into 128-bit subvectors for masking. llvm-svn: 334289	2018-06-08 13:59:11 +00:00
Simon Pilgrim	fef9b6eea6	[X86][SSE] Add target shuffle support to X86TargetLowering::computeKnownBitsForTargetNode Ideally we'd use resolveTargetShuffleInputs to handle faux shuffles as well but: (a) that code path doesn't handle general/pre-legalized ops/types very well. (b) I'm concerned about the compute time as they recurse to calls to computeKnownBits/ComputeNumSignBits which would need depth limiting somehow. llvm-svn: 334007	2018-06-05 10:52:29 +00:00
Simon Pilgrim	a4fb569483	[X86][SSE] combineSubToSubus - support v8i64 handling from SSSE3 Our UMIN/UMAX, vector truncation and shuffle combining is good enough to efficiently handle v8i64 with the number of leading zeros that are necessary for PSUBUS. llvm-svn: 326034	2018-02-24 14:06:39 +00:00
Simon Pilgrim	8ad91261e8	[X86][SSE] combineSubToSubus - support v8i32 handling from SSSE3 (not SSE41) Now that UMIN etc are Legal/Custom for SSE2+, we can efficiently match SUBUS v8i32 cases from SSSE3 which can perform efficient truncation with PSHUFB. llvm-svn: 326033	2018-02-24 13:39:13 +00:00
Simon Pilgrim	744f008a75	[X86][SSE] combineSubToSubus - begun generalizing to work with any type sizes with SplitBinaryOpsAndApply llvm-svn: 326030	2018-02-24 12:44:12 +00:00
Simon Pilgrim	425965be0f	[X86][SSE] Generalize x > C-1 ? x+-C : 0 --> subus x, C combine for non-uniform constants llvm-svn: 325944	2018-02-23 19:58:44 +00:00
Simon Pilgrim	14686059d5	[X86][SSE] Add x > C-1 ? x+-C : 0 --> subus x, C test caaes for non-uniform constants llvm-svn: 325936	2018-02-23 18:57:26 +00:00
Simon Pilgrim	0d8c4bfc2a	[X86][SSE] Use SplitBinaryOpsAndApply to recognise PSUBUS patterns before they're split on AVX1 This needs to be generalised further to support AVX512BW cases but I want to add non-uniform constants first. llvm-svn: 324844	2018-02-11 17:29:42 +00:00
Craig Topper	ca5a340171	[X86] Use min/max for vector ult/ugt compares if avoids a sign flip. Summary: Currently we only use min/max to help with ule/uge compares because it removes an invert of the result that would otherwise be needed. But we can also use it for ult/ugt compares if it will prevent the need for a sign bit flip needed to use pcmpgt at the cost of requiring an invert after the compare. I also refactored the code so that the max/min code is self contained and does its own return instead of setting up a flag to manipulate the rest of the function's behavior. Most of the test cases look ok with this. I did notice that we added instructions when one of the operands being sign flipped is a constant vector that we were able to constant fold the flip into. I also noticed that sometimes the SSE min/max clobbers a register that is needed after the compare. This resulted in an extra move being inserted before the min/max to preserve the register. We could try to detect this and switch from min to max and change the compare operands to use the operand that gets reused in the compare. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42935 llvm-svn: 324842	2018-02-11 17:11:40 +00:00
Simon Pilgrim	0be5567a89	[X86][SSE] Enable SMIN/SMAX/UMIN/UMAX custom lowering for all legal types This allows us to recognise more saturation patterns and also simplify some MINMAX codegen that was failing to combine CMPGE comparisons to a legal CMPGT. Differential Revision: https://reviews.llvm.org/D43014 llvm-svn: 324837	2018-02-11 10:52:37 +00:00
Geoff Berry	94503c7bc3	[MachineCopyPropagation] Extend pass to do COPY source forwarding Summary: This change extends MachineCopyPropagation to do COPY source forwarding and adds an additional run of the pass to the default pass pipeline just after register allocation. This version of this patch uses the newly added MachineOperand::isRenamable bit to avoid forwarding registers is such a way as to violate constraints that aren't captured in the Machine IR (e.g. ABI or ISA constraints). This change is a continuation of the work started in D30751. Reviewers: qcolombet, javed.absar, MatzeB, jonpa, tstellar Subscribers: tpr, mgorny, mcrosier, nhaehnle, nemanjai, jyknight, hfinkel, arsenm, inouehrs, eraman, sdardis, guyblank, fedor.sergeev, aheejin, dschuff, jfb, myatsina, llvm-commits Differential Revision: https://reviews.llvm.org/D41835 llvm-svn: 323991	2018-02-01 18:54:01 +00:00
Puyan Lotfi	43e94b15ea	Followup on Proposal to move MIR physical register namespace to '$' sigil. Discussed here: http://lists.llvm.org/pipermail/llvm-dev/2018-January/120320.html In preparation for adding support for named vregs we are changing the sigil for physical registers in MIR to '$' from '%'. This will prevent name clashes of named physical register with named vregs. llvm-svn: 323922	2018-01-31 22:04:26 +00:00
Simon Pilgrim	17682a86da	[X86][SSE] Add ISD::VECTOR_SHUFFLE to faux shuffle decoding (Reapplied) Primarily, this allows us to use the aggressive extraction mechanisms in combineExtractWithShuffle earlier and make use of UNDEF elements that may be lost during lowering. Reapplied after rL322279 was reverted at rL322335 due to PR35918, underlying issue was fixed at rL322644. llvm-svn: 323104	2018-01-22 12:05:17 +00:00
David L. Jones	8c87213c26	Revert r322279 due to Skylake miscompile. Summary: This revision causes Skylake (and apparently, only Skylake) codegen to fail in certain cases. Details: https://bugs.llvm.org/show_bug.cgi?id=35918 Subscribers: sanjoy, llvm-commits Differential Revision: https://reviews.llvm.org/D41972 llvm-svn: 322335	2018-01-12 00:17:38 +00:00
Simon Pilgrim	6e6da3f449	[X86][SSE] Add ISD::VECTOR_SHUFFLE to faux shuffle decoding Primarily, this allows us to use the aggressive extraction mechanisms in combineExtractWithShuffle earlier and make use of UNDEF elements that may be lost during lowering. llvm-svn: 322279	2018-01-11 14:25:18 +00:00
Zvi Rackover	72b0bb1405	X86 Tests: Update more isel tests with FastVariableShuffle feature Summary: Added the FastVariableShuffle feature to cases that resembled processors for which this fearure is on. For AVX2 there are processors with and w/o this fearue enable. For AVX512 only KNL does enable this feature so cases which only have +avx512f were left without the FastVariableShuffle enabled. Reviewers: RKSimon, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41851 llvm-svn: 322090	2018-01-09 16:26:06 +00:00
Zvi Rackover	b11e84c1d8	X86 Tests: Add common check prefix to test-case. NFC. As suggested in D41851 llvm-svn: 322089	2018-01-09 16:14:15 +00:00
Simon Pilgrim	998180dad3	[DAG] Fix for Bug PR34620 - Allow SimplifyDemandedBits to look through bitcasts Allow SimplifyDemandedBits to use TargetLoweringOpt::computeKnownBits to look through bitcasts. This can help simplifying in some cases where bitcasts of constants generated during or after legalization can't be folded away, and thus didn't get picked up by SimplifyDemandedBits. This fixes PR34620, where a redundant pand created during legalization from lowering and lshr <16xi8> wasn't being simplified due to the presence of a bitcasted build_vector as an operand. Committed on the behalf of @sameconrad (Sam Conrad) Differential Revision: https://reviews.llvm.org/D41643 llvm-svn: 321969	2018-01-07 19:09:40 +00:00
Simon Pilgrim	c701596e86	[X86][SSE] Match PSHUFLW/PSHUFHW + PSHUFD vXi16 shuffle patterns (PR34686) As noted in PR34686, we are relying on a PSHUFD+PSHUFLW+PSHUFHW shuffle chain for most general vXi16 unary shuffles. This patch checks for simpler PSHUFLW+PSHUFD and PSHUFHW+PSHUFD cases beforehand, building on some existing code that just handled splat shuffles. By doing so we also prevent premature use of PSHUFB shuffles which can be slower and require the creation/loading of constant shuffle masks. We now have the 'fast-variable-shuffle' option for hardware that prefers combining 2 or more shuffles to VPSHUFB etc. Differential Revision: https://reviews.llvm.org/D38318 llvm-svn: 321553	2017-12-29 14:41:50 +00:00
Francis Visoiu Mistrih	a8a83d150f	[CodeGen] Use MachineOperand::print in the MIRPrinter for MO_Register. Work towards the unification of MIR and debug output by refactoring the interfaces. For MachineOperand::print, keep a simple version that can be easily called from `dump()`, and a more complex one which will be called from both the MIRPrinter and MachineInstr::print. Add extra checks inside MachineOperand for detached operands (operands with getParent() == nullptr). https://reviews.llvm.org/D40836 * find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+)<def> ([^ ]+)/kill: \1 def \2 \3/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: \1 \2 def \3/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/kill: def ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: def \1 \2 def \3/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/<def>//g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<kill>/killed \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-use,kill>/implicit killed \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<dead>/dead \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<def[ ],[ ]dead>/dead \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-def[ ],[ ]dead>/implicit-def dead \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-def>/implicit-def \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-use>/implicit \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<internal>/internal \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name "*.s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<undef>/undef \1/g' llvm-svn: 320022	2017-12-07 10:40:31 +00:00
Francis Visoiu Mistrih	25528d6de7	[CodeGen] Unify MBB reference format in both MIR and debug output As part of the unification of the debug format and the MIR format, print MBB references as '%bb.5'. The MIR printer prints the IR name of a MBB only for block definitions. * find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber/" << printMBBReference(\1)/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber/" << printMBBReference(\1)/g' * find . $ -name ".txt" -o -name ".s" -o -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g' * grep -nr 'BB#' and fix Differential Revision: https://reviews.llvm.org/D40422 llvm-svn: 319665	2017-12-04 17:18:51 +00:00
Francis Visoiu Mistrih	9d7bb0cb40	[CodeGen] Print register names in lowercase in both MIR and debug output As part of the unification of the debug format and the MIR format, always print registers as lowercase. * Only debug printing is affected. It now follows MIR. Differential Revision: https://reviews.llvm.org/D40417 llvm-svn: 319187	2017-11-28 17:15:09 +00:00
Simon Pilgrim	e152c2c447	[X86][SSE] Add PACKUS support to LowerTruncate Similar to the existing code to lower to PACKSS, we can use PACKUS if the input vector's leading zero bits extend all the way to the packed/truncated value. We have to account for pre-SSE41 targets not supporting PACKUSDW llvm-svn: 317128	2017-11-01 21:52:29 +00:00
Simon Pilgrim	0a12c239b6	[X86] truncateVectorCompareWithPACKSS - use PACKSSDW/PACKSSWB instead of just PACKSSWB. By using the widest type possible for PACKSS truncation we have a better chance of being able to peek through bitcasts and improves other combines driven by ComputeNumSignBits. llvm-svn: 316448	2017-10-24 15:38:16 +00:00
Zvi Rackover	c1d5955684	[X86] Unsigned saturation subtraction canonicalization [the backend part] Summary: On behalf of julia.koval@intel.com The patch transforms canonical version of unsigned saturation, which is sub(max(a,b),a) or sub(a,min(a,b)) to special psubus insturuction on targets, which support it(8bit and 16bit uints). umax(a,b) - b -> subus(a,b) a - umin(a,b) -> subus(a,b) There is also extra case handled, when right part of sub is 32 bit and can be truncated, using UMIN(this transformation was discussed in https://reviews.llvm.org/D25987). The example of special case code: ``` void foo(unsigned short p, int max, int n) { int i; unsigned m; for (i = 0; i < n; i++) { m = --p; p = (unsigned short)(m >= max ? m-max : 0); } } ``` Max in this example is truncated to max_short value, if it is greater than m, or just truncated to 16 bit, if it is not. It is vaid transformation, because if max > max_short, result of the expression will be zero. Here is the table of types, I try to support, special case items are bold: \| Size \| 128 \| 256 \| 512 \| ----- \| ----- \| ----- \| ----- \| i8 \| v16i8 \| v32i8 \| v64i8 \| i16 \| v8i16 \| v16i16 \| v32i16 \| i32 \| \| v8i32* \| v16i32 \| i64 \| \| \| v8i64 Reviewers: zvi, spatel, DavidKreitzer, RKSimon Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37534 llvm-svn: 315237	2017-10-09 20:01:10 +00:00
Simon Pilgrim	b47b3f2564	[X86][SSE] Add support for lowering v8i16 binary shuffles to PACKSS/PACKUS Missed in D38472 llvm-svn: 314916	2017-10-04 17:31:28 +00:00
Craig Topper	6fb55716e9	[X86] Redefine MOVSS/MOVSD instructions to take VR128 regclass as input instead of FR32/FR64 This patch redefines the MOVSS/MOVSD instructions to take VR128 as its second input. This allows the MOVSS/SD->BLEND commute to work without requiring a COPY to be inserted. This should fix PR33079 Overall this looks to be an improvement in the generated code. I haven't checked the EXPENSIVE_CHECKS build but I'll do that and update with results. Differential Revision: https://reviews.llvm.org/D38449 llvm-svn: 314914	2017-10-04 17:20:12 +00:00
Geoff Berry	fabedbad11	Revert "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"" This reverts commit r314729. Another bug has been encountered in an out-of-tree target reported by Quentin. llvm-svn: 314814	2017-10-03 16:59:13 +00:00
Simon Pilgrim	f5f291d129	[X86][SSE] Add support for lowering shuffles to PACKSS/PACKUS If the upper bits of a truncation shuffle patterns have at least the minimum number of sign/zero bits on their inputs then we can safely use PACKSS/PACKUS as shuffles. Partial fix for https://bugs.llvm.org/show_bug.cgi?id=34773 Differential Revision: https://reviews.llvm.org/D38472 llvm-svn: 314788	2017-10-03 12:01:31 +00:00
Geoff Berry	bfc5fb4571	Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding" Issues addressed since original review: - Avoid bug in regalloc greedy/machine verifier when forwarding to use in an instruction that re-defines the same virtual register. - Fixed bug when forwarding to use in EarlyClobber instruction slot. - Fixed incorrect forwarding to register definitions that showed up in explicit_uses() iterator (e.g. in INLINEASM). - Moved removal of dead instructions found by LiveIntervals::shrinkToUses() outside of loop iterating over instructions to avoid instructions being deleted while pointed to by iterator. - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. - The pass no longer forwards COPYs to physical register uses, since doing so can break code that implicitly relies on the physical register number of the use. - The pass no longer forwards COPYs to undef uses, since doing so can break the machine verifier by creating LiveRanges that don't end on a use (since the undef operand is not considered a use). [MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. llvm-svn: 314729	2017-10-02 22:01:37 +00:00
Zvi Rackover	eb7a0bf847	X86 Tests: Unsigned saturation subtraction tests. NFC. Summary: Adding tests for D37534. Commit on behalf of julia.koval@intel.com Reviewers: n.bozhenov, zvi, spatel, DavidKreitzer Reviewed By: zvi Differential Revision: https://reviews.llvm.org/D37510 llvm-svn: 314305	2017-09-27 14:38:05 +00:00
Simon Pilgrim	00161c9961	[X86][SSE] Improve support for vselect(Cond, 0, X) -> ANDN(Cond, X) As discussed on PR28925 and D37849. Differential Revision: https://reviews.llvm.org/D37975 llvm-svn: 313532	2017-09-18 14:23:23 +00:00
Sanjay Patel	e96f875deb	[x86] fix triple and regenerate checks for psubus; NFC Patch by Yulia Koval! Differential Revision: https://reviews.llvm.org/D37523 llvm-svn: 312662	2017-09-06 19:05:20 +00:00
Sam McCall	f71bb198ed	Revert "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"" This crashes on boringSSL on PPC (will send reduced testcase) This reverts commit r312328. llvm-svn: 312490	2017-09-04 15:47:00 +00:00
Geoff Berry	65528f2991	Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding" Issues addressed since original review: - Moved removal of dead instructions found by LiveIntervals::shrinkToUses() outside of loop iterating over instructions to avoid instructions being deleted while pointed to by iterator. - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. - The pass no longer forwards COPYs to physical register uses, since doing so can break code that implicitly relies on the physical register number of the use. - The pass no longer forwards COPYs to undef uses, since doing so can break the machine verifier by creating LiveRanges that don't end on a use (since the undef operand is not considered a use). [MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. llvm-svn: 312328	2017-09-01 14:27:20 +00:00

1 2

72 Commits