llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	fd2eda4f64	[X86][AVX2] Fix v16i16 SHL lowering (PR27730) The AVX2 v16i16 shift lowering works by unpacking to 2 x v8i32, performing the shift and then truncating the result. The unpacking is used to place the values in the upper 16-bits so that we can correctly sign-extend for SRA shifts. Unfortunately we weren't ensuring that the lower 16-bits were zero to ensure that SHL correctly shifts in zero bits. llvm-svn: 271796	2016-06-04 16:45:33 +00:00
Simon Pilgrim	971abe8256	[X86][AVX2] Regenerate avx2 vector shift tests llvm-svn: 270756	2016-05-25 21:00:40 +00:00
Simon Pilgrim	5965680d53	[X86][SSE] Vectorized i8 and i16 shift operators This patch ensures that SHL/SRL/SRA shifts for i8 and i16 vectors avoid scalarization. It builds on the existing i8 SHL vectorized implementation of moving the shift bits up to the sign bit position and separating the 4, 2 & 1 bit shifts with several improvements: 1 - SSE41 targets can use (v)pblendvb directly with the sign bit instead of performing a comparison to feed into a VSELECT node. 2 - pre-SSE41 targets were masking + comparing with an 0x80 constant - we avoid this by using the fact that a set sign bit means a negative integer which can be compared against zero to then feed into VSELECT, avoiding the need for a constant mask (zero generation is much cheaper). 3 - SRA i8 needs to be unpacked to the upper byte of a i16 so that the i16 psraw instruction can be correctly used for sign extension - we have to do more work than for SHL/SRL but perf tests indicate that this is still beneficial. The i16 implementation is similar but simpler than for i8 - we have to do 8, 4, 2 & 1 bit shifts but less shift masking is involved. SSE41 use of (v)pblendvb requires that the i16 shift amount is splatted to both bytes however. Tested on SSE2, SSE41 and AVX machines. Differential Revision: http://reviews.llvm.org/D9474 llvm-svn: 239509	2015-06-11 07:46:37 +00:00
Simon Pilgrim	7de557e5a9	[X86][AVX2] Added tests for v32i8 vector shifts Currently still scalarized, but D9474 should remedy that. llvm-svn: 239146	2015-06-05 12:35:36 +00:00
Simon Pilgrim	0be4fa761f	[X86][AVX2] Vectorized i16 shift operators Part of D9474, this patch extends AVX2 v16i16 types to 2 x 8i32 vectors and uses i32 shift variable shifts before packing back to i16. Adds AVX2 tests for v8i16 and v16i16 llvm-svn: 238149	2015-05-25 17:49:13 +00:00
Jim Grosbach	cad4cd6c9e	SelectionDAG: Don't constant fold target-specific nodes. FoldConstantArithmetic() only knows how to deal with a few target independent ISD opcodes. Bail early if it sees a target-specific ISD node. These node do funny things with operand types which may break the assumptions of the code that follows, and there's no actual folding that can be done anyway. For example, non-constant 256 bit vector shifts on X86 have a shift-amount operand that's a 128-bit v4i32 vector regardless of what the first operand type is and that breaks the assumption that the operand types must match. rdar://16530923 llvm-svn: 205937	2014-04-09 23:28:11 +00:00
Matt Arsenault	985b9de485	Make DAGCombiner work on vector bitshifts with constant splat vectors. llvm-svn: 204071	2014-03-17 18:58:01 +00:00
Matt Arsenault	532db69984	Fix undefined behavior in vector shift tests. These were all shifting the same amount as the bitwidth. llvm-svn: 203519	2014-03-11 00:01:41 +00:00
Lang Hames	2783993fca	X86 vector element shift-by-immediate instructions take i8 immediates. Make the instruction defenitions and ISEL reflect this. Prior to this patch these instructions took an i32i8imm, and the high bits were dropped during encoding. This led to incorrect behavior for shifts by immediates higher than 255. This patch fixes that issue by detecting large immediate shifts and returning constant zero (for logical shifts) or capping the shift amount at an encodable value (for arithmetic shifts). Fixes <rdar://problem/14968098> llvm-svn: 193096	2013-10-21 17:51:24 +00:00
Stephen Lin	f799e3f944	Convert CodeGen//.ll tests to use the new CHECK-LABEL for easier debugging. No functionality change and all tests pass after conversion. This was done with the following sed invocation to catch label lines demarking function boundaries: sed -i '' "s/^;$ $$[A-Z0-9_]$:$ $test$[A-Za-z0-9_-]$:$ $$/;\1\2-LABEL:\3test\4:\5/g" test/CodeGen//*.ll which was written conservatively to avoid false positives rather than false negatives. I scanned through all the changes and everything looks correct. llvm-svn: 186258	2013-07-13 20:38:47 +00:00
Stephen Lin	fda967fdea	X86: fold SSE2/AVX2 logical shift by immediate amount into zero vector when possible Patch by Andrea Di Biagio llvm-svn: 186165	2013-07-12 15:31:36 +00:00

11 Commits