llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	13fa33012b	[X86] Only accept SM_SentinelUndef (-1) as an undefined shuffle mask in range As discussed on D23027 we should be trying to be more strict on what is an undefined mask value. llvm-svn: 279435	2016-08-22 13:18:56 +00:00
Simon Pilgrim	2279e59573	[X86][SSE] Avoid specifying unused arguments in SHUFPD lowering As discussed on PR26491, we are missing the opportunity to make use of the smaller MOVHLPS instruction because we set both arguments of a SHUFPD when using it to lower a single input shuffle. This patch sets the lowered argument to UNDEF if that shuffle element is undefined. This in turn makes it easier for target shuffle combining to decode UNDEF shuffle elements, allowing combines to MOVHLPS to occur. A fix to match against MOVHPD stores was necessary as well. This builds on the improved MOVLHPS/MOVHLPS lowering and memory folding support added in D16956 Adding similar support for SHUFPS will have to wait until have better support for target combining of binary shuffles. Differential Revision: https://reviews.llvm.org/D23027 llvm-svn: 279430	2016-08-22 12:56:54 +00:00
Craig Topper	5f8419da34	[X86] Create a new instruction format to handle 4VOp3 encoding. This saves one bit in TSFlags and simplifies MRMSrcMem/MRMSrcReg format handling. llvm-svn: 279424	2016-08-22 07:38:50 +00:00
Craig Topper	9b20fece81	[X86] Create a new instruction format to handle MemOp4 encoding. This saves one bit in TSFlags and simplifies MRMSrcMem/MRMSrcReg format handling. llvm-svn: 279423	2016-08-22 07:38:45 +00:00
Craig Topper	61b62e56b7	[X86] Space out the encodings of X86 instruction formats. I plan to add some new encodings in future commits and this will reduce the size of those commits. NFC This tries to keep all the ModRM memory and register forms in their own regions of the encodings. Hoping to make it simple on some of the switch statements that operate on these encodings. llvm-svn: 279422	2016-08-22 07:38:41 +00:00
Craig Topper	ca0eda3e6a	[X86] Merge hasVEX_i8ImmReg into the ImmFormat type which had extra unused encodings. This saves one bit in TSFlags. NFC llvm-svn: 279412	2016-08-22 01:37:19 +00:00
Craig Topper	522541231a	[X86] Remove ignoreVEX_L from TSFlags. Only the disassembler needs it and the disassembler doesn't use TSFlags. NFC llvm-svn: 279411	2016-08-22 01:37:16 +00:00
Simon Pilgrim	67e7e22462	[X86][AVX] Dropped combineShuffle256 - this can now be performed by EltsFromConsecutiveLoads llvm-svn: 279397	2016-08-21 15:39:45 +00:00
Guy Blank	9ae797a798	[AVX512][FastISel] Do not use K registers in TEST instructions In some cases, FastIsel was emitting TEST instruction with K reg input, which is illegal. Changed to using KORTEST when dealing with K regs. Differential Revision: https://reviews.llvm.org/D23163 llvm-svn: 279393	2016-08-21 08:02:27 +00:00
Simon Pilgrim	d7a3782ae4	[X86][SSE] Generalised combining to VZEXT_MOVL to any vector size This doesn't change tests codegen as we already combined to blend+zero which is what we lower VZEXT_MOVL to on SSE41+ targets, but it does put us in a better position when we improve shuffling for optsize. llvm-svn: 279273	2016-08-19 17:02:00 +00:00
Simon Pilgrim	f1b8fdc074	[X86][SSE] Add support for matching commuted insertps patterns INSERTPS doesn't fit well with our shuffle mask canonicalization, so we need to attempt both the original mask and the commuted mask to more likely get a match llvm-svn: 279230	2016-08-19 10:31:53 +00:00
Dean Michael Berris	1dd1ca9727	[XRay] Synthesize a reference to the xray_instr_map Without the synthesized reference to a symbol in the xray_instr_map, linker section garbage collection will helpfully remove the whole xray_instr_map section from the final executable (or archive). This will cause the runtime to not be able to identify the sleds and hot-patch the calls/jumps into the runtime trampolines. This change adds a reference from the text section at the end of the function to keep around the associated xray_instr_map section as well. We also make sure that we catch this reference in the test. Reviewers: chandlerc, echristo, majnemer, mehdi_amini Subscribers: mehdi_amini, llvm-commits, dberris Differential Revision: https://reviews.llvm.org/D23398 llvm-svn: 279204	2016-08-19 04:44:30 +00:00
Andrew Kaylor	81901d658f	Include X86CallFrameOptimization in the opt-bisect process. Differential Revision: https://reviews.llvm.org/D23683 llvm-svn: 279175	2016-08-18 22:49:51 +00:00
Michael Kuperstein	2bc3d4d46c	[SelectionDAG] Rename fextend -> fpextend, fround -> fpround, frnd -> fround The names of the tablegen defs now match the names of the ISD nodes. This makes the world a slightly saner place, as previously "fround" matched ISD::FP_ROUND and not ISD::FROUND. Differential Revision: https://reviews.llvm.org/D23597 llvm-svn: 279129	2016-08-18 20:08:15 +00:00
Eugene Zelenko	61a72d8850	[LLVM] Fix some Clang-tidy modernize-use-using and Include What You Use warnings Differential revision: https://reviews.llvm.org/D23675 llvm-svn: 279102	2016-08-18 17:56:27 +00:00
Sanjay Patel	7d37b221a2	fix typo; NFC llvm-svn: 279068	2016-08-18 14:17:34 +00:00
Simon Pilgrim	916485c765	Remove trailing whitespace llvm-svn: 279054	2016-08-18 11:22:22 +00:00
Justin Bogner	cd1d5aaf2e	Replace a few more "fall through" comments with LLVM_FALLTHROUGH Follow up to r278902. I had missed "fall through", with a space. llvm-svn: 278970	2016-08-17 20:30:52 +00:00
Justin Bogner	b03fd12cef	Replace "fallthrough" comments with LLVM_FALLTHROUGH This is a mechanical change of comments in switches like fallthrough, fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead. llvm-svn: 278902	2016-08-17 05:10:15 +00:00
Sanjay Patel	904cd39b05	[x86] Allow merging multiple instances of an immediate within a basic block for code size savings, for 64-bit constants. This patch handles 64-bit constants which can be encoded as 32-bit immediates. It extends the functionality added by https://reviews.llvm.org/D11363 for 32-bit constants to 64-bit constants. Patch by Sunita Marathe! Differential Revision: https://reviews.llvm.org/D23391 llvm-svn: 278857	2016-08-16 21:35:16 +00:00
Simon Pilgrim	cc316f013a	[X86][SSE] Add support for combining v2f64 target shuffles to VZEXT_MOVL byte rotations The combine was only matching v2i64 as it assumed lowering to MOVQ - but we have v2f64 patterns that match in a similar fashion llvm-svn: 278794	2016-08-16 12:52:06 +00:00
Simon Pilgrim	f16cd361d4	[X86][SSE] Add support for combining target shuffles to PALIGNR byte rotations llvm-svn: 278787	2016-08-16 10:03:23 +00:00
Guy Blank	722caebdae	[X86] Add xgetbv/xsetbv intrinsics to non-windows platforms Differential Revision: https://reviews.llvm.org/D21958 llvm-svn: 278782	2016-08-16 06:41:00 +00:00
Craig Topper	f774de6d54	[X86] PADDUSB/W instructions should be commutable. llvm-svn: 278654	2016-08-15 06:31:57 +00:00
Craig Topper	80c8b80919	[X86] Mark some of the X86 SDNodes as commutative. llvm-svn: 278653	2016-08-15 04:47:30 +00:00
Craig Topper	dbc387cfc9	[X86] X86ISD::FANDN is not commutative or associative. llvm-svn: 278652	2016-08-15 04:47:28 +00:00
Craig Topper	37e8c5443c	[AVX-512] Mark VPMADDWD as commutable to match SSE/AVX version. llvm-svn: 278629	2016-08-14 17:57:22 +00:00
Craig Topper	c677e97dff	[AVX-512] Add masked commutable floating point max/min instructions to folding tables. llvm-svn: 278628	2016-08-14 17:57:19 +00:00
Craig Topper	29fbdc309a	[AVX-512] Add masked logical operations to memory folding tables. llvm-svn: 278627	2016-08-14 17:57:16 +00:00
Igor Breger	505f2cc468	[AVX512] Fix VFPCLASSSD/VFPCLASSSS intrinsic lowering. The i1 result should be zero extended according to SPEC. Differential Revision: http://reviews.llvm.org/D23489 llvm-svn: 278626	2016-08-14 13:58:57 +00:00
Igor Breger	8672408db0	[AVX512] Fix insertelement i1 lowering. 1. Use shuffle to insert element i1 into vector. The previous implementation was incorrect ( dest_bit OR src_bit , it doesn't clear the bit if src_bit=0 ) 2. Improve shuffle i1 vector, use CVT2MASK if supported instead TRUNCATE. Differential Revision: http://reviews.llvm.org/D23347 llvm-svn: 278623	2016-08-14 05:25:07 +00:00
Craig Topper	8c372a31b7	[X86] Add a check of isCommutable at the top of X86InstrInfo::findCommutedOpIndices. Most callers don't check if the instruction is commutable before calling. This saves us the trouble of ending up in the default of the switch and having to determine if this is an FMA or not. llvm-svn: 278597	2016-08-13 06:48:44 +00:00
Craig Topper	eafdbecc44	[AVX-512] Add isCommutable to scalar FMA3 instructions. llvm-svn: 278596	2016-08-13 06:48:41 +00:00
Craig Topper	5f2441d8f3	[AVX-512] Add commutable flags to 132 form FMA3 instructions. llvm-svn: 278595	2016-08-13 06:48:39 +00:00
Craig Topper	e5115aa4ca	[X86] Remove patterns for (vzmovl (insert_subvector undef, (scalar_to_vector))) as the (vzmovl VR256) pattern has higher priority. NFC llvm-svn: 278594	2016-08-13 06:02:19 +00:00
Craig Topper	3f8126e6fa	[AVX-512] Remove an AddedComplexity that was prioritizing basic vzmovl patterns over more complex ones that produce better code. llvm-svn: 278593	2016-08-13 05:43:20 +00:00
Craig Topper	600685d510	[AVX-512] Add patterns to support VZEXT_MOVL from 512-bit vectors with 64-bit and 32-bit elements. Fixes PR28961. llvm-svn: 278592	2016-08-13 05:33:12 +00:00
Hans Wennborg	0dd9ed1d45	Fix more dereferenced end() iterators after r278532 llvm-svn: 278587	2016-08-13 01:12:49 +00:00
Hans Wennborg	2d87ccfd58	X86: Fix another dereferenced end() iterator after r278532 llvm-svn: 278577	2016-08-12 23:35:59 +00:00
Duncan P. N. Exon Smith	69b0650548	X86: Stop dereferencing end() in X86FrameLowering::emitEpilogue On a Windows build of Chromium, r278532 (up to r278539) X86FrameLowering::emitEpilogue because it wasn't wary enough of the return of MachineBasicBlock::getFirstTerminator. Guard all the uses here. Note that r278532 looks like an NFC commit (just an API change), but it removes a couple of layers of abstraction and is probably causing optimization differences in MSVC. llvm-svn: 278572	2016-08-12 22:43:33 +00:00
Artur Pilipenko	87e4038a91	[x86] X86ISelLowering zext(add_nuw(x, C)) --> add(zext(x), C_zext) Currently X86ISelLowering has a similar transformation for sexts: sext(add_nsw(x, C)) --> add(sext(x), C_sext) In this change I extend this code to handle zexts as well. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D23359 llvm-svn: 278520	2016-08-12 16:08:30 +00:00
Simon Pilgrim	687d71e877	[X86][SSE] Add support for combining target shuffles to PSLLDQ/PSRLDQ byte shifts llvm-svn: 278502	2016-08-12 11:24:34 +00:00
Simon Pilgrim	ed96b9adfb	[X86][SSE] Fixed PALIGNR target shuffle decode The PALIGNR target shuffle decode was not taking into account that DecodePALIGNRMask (rather oddly) expects the operands to be in reverse order, nor was it detecting unary patterns, causing combines to combine with the incorrect input. The cgbuiltin, auto upgrade and instruction comments code correctly swap the operands so are not affected. llvm-svn: 278494	2016-08-12 10:10:51 +00:00
David Majnemer	42531260b3	Use the range variant of find/find_if instead of unpacking begin/end If the result of the find is only used to compare against end(), just use is_contained instead. No functionality change is intended. llvm-svn: 278469	2016-08-12 03:55:06 +00:00
David Majnemer	562e82945e	Use the range variant of find_if instead of unpacking begin/end No functionality change is intended. llvm-svn: 278443	2016-08-12 00:18:03 +00:00
David Majnemer	0d955d0bf5	Use the range variant of find instead of unpacking begin/end If the result of the find is only used to compare against end(), just use is_contained instead. No functionality change is intended. llvm-svn: 278433	2016-08-11 22:21:41 +00:00
Vyacheslav Klochkov	6daefcf626	X86-FMA3: Implemented commute transformation for EVEX/AVX512 FMA3 opcodes. This helped to improved memory-folding and register coalescing optimizations. Also, this patch fixed the tracker #17229. Reviewer: Craig Topper. Differential Revision: https://reviews.llvm.org/D23108 llvm-svn: 278431	2016-08-11 22:07:33 +00:00
David Majnemer	0a16c22846	Use range algorithms instead of unpacking begin/end No functionality change is intended. llvm-svn: 278417	2016-08-11 21:15:00 +00:00
Duncan P. N. Exon Smith	62e351f5a4	X86: Use operator lookup for operator==, NFC Avoid relying on the MachineInstrBundleIterator operator== being implemented as a member function. llvm-svn: 278347	2016-08-11 15:51:29 +00:00
Igor Breger	a77b14d02c	[AVX512] Fix extractelement i1 lowering. The previous implementation (not custom) doesn't enforce zeroing off upper bits. The assumption is that i1 PRODUCER (truncate and extractelement) must zero all upper bits, so i1 CONSUMER instructions ( test, zext, save, etc) can be done without additional zeroing. Make extractelement i1 lowering custom for all vector i1. Differential Revision: http://reviews.llvm.org/D23246 llvm-svn: 278328	2016-08-11 12:13:46 +00:00
Marina Yatsina	88f0c31f13	Avoid false dependencies of undef machine operands This patch helps avoid false dependencies on undef registers by updating the machine instructions' undef operand to use a register that the instruction is truly dependent on, or use a register with clearance higher than Pref. Pseudo example: loop: xmm0 = ... xmm1 = vcvtsi2sdl eax, xmm0<undef> ... = inst xmm0 jmp loop In this example, selecting xmm0 as the undef register creates false dependency between loop iterations. This false dependency cannot be solved by inserting an xor before vcvtsi2sdl because xmm0 is alive at the point of the vcvtsi2sdl instruction. Selecting a different register instead of xmm0, especially a register that is not used in the loop, will eliminate this problem. Differential Revision: https://reviews.llvm.org/D22466 llvm-svn: 278321	2016-08-11 07:32:08 +00:00
Craig Topper	a78b768ed4	[AVX-512] Promote 512-bit integer loads to v8i64 similar to what is done for 128/256-bit vectors for overall consistency. llvm-svn: 278318	2016-08-11 06:04:07 +00:00
Craig Topper	14aa2665d3	[AVX-512] Add patterns to allow EVEX encoded stores of v16i16/v8i16/v16i8/v32i8 even when BWI is not supported. llvm-svn: 278317	2016-08-11 06:04:04 +00:00
Craig Topper	3563d0f622	[AVX-512] Fix the 128-bit and 256-bit nontemporal load patterns with elements type other than i64. These loads have all been promoted to v2i64/v4i64 loads so we need bitcasts or we end up selecting VMOVDQA32/VMOVDQU32 instead. llvm-svn: 278316	2016-08-11 06:04:00 +00:00
Sanjay Patel	5ccc85fe83	[x86, AVX] allow FP vector select folding to bitwise logic ops (PR28895) This handles the case in: https://llvm.org/bugs/show_bug.cgi?id=28895 ...but we are not getting all of the possibilities yet. Eg, we use 'X86::FANDN' for scalar FP select combines. That enhancement is filed as: https://llvm.org/bugs/show_bug.cgi?id=28925 Differential Revision: https://reviews.llvm.org/D23337 llvm-svn: 278270	2016-08-10 19:00:11 +00:00
Simon Pilgrim	675c257a32	[X86][SSE] Dropped blend(insertps(x,y),zero) combine - this is now handled by target shuffle chain combining llvm-svn: 278260	2016-08-10 18:10:29 +00:00
Simon Pilgrim	ac8fa6c2c6	[X86][SSE] Add support for combining target shuffles to MOVSS/MOVSD Only do this on pre-SSE41 targets where we should be lowering to BLENDPS/BLENDPD instead llvm-svn: 278228	2016-08-10 14:15:41 +00:00
Simon Pilgrim	9811e98495	[X86][SSE] Only treat SM_SentinelUndef as UNDEF in shuffle mask predicates isUndefOrEqual and isUndefOrInRange treated all -ve shuffle mask values as UNDEF, now it has to be SM_SentinelUndef (-1) We already have asserts to check that lowered SHUFFLE_VECTOR indices are in the range -1 <= index < 2*masksize (or masksize for unary shuffles) llvm-svn: 278218	2016-08-10 12:55:25 +00:00
Simon Pilgrim	cb419a896c	[X86][SSE] Reorder shuffle mask undef helper predicates. NFCI To make it easier for a more complex helper to use a simpler one llvm-svn: 278216	2016-08-10 12:34:23 +00:00
David Majnemer	adc688ce9c	[X86] Don't model UD2/UD2B as a terminator A UD2 might make its way into the program via a call to @llvm.trap. Obviously, calls are not terminators. However, we modeled the X86 instruction, UD2, as a terminator. Later on, this confuses the epilogue insertion machinery which results in the epilogue getting inserted before the UD2. For some platforms, like x64, the result is a violation of the ABI. Instead, model UD2/UD2B as a side effecting instruction which may observe memory. llvm-svn: 278144	2016-08-09 17:55:12 +00:00
Simon Pilgrim	27740d038c	[X86][XOP] Add support for combining target shuffles to VPERMIL2PD/VPERMIL2PS llvm-svn: 278120	2016-08-09 12:56:15 +00:00
Simon Pilgrim	aae7d4a1b6	[X86][XOP] Add support for combining target shuffles to VPPERM llvm-svn: 278114	2016-08-09 10:56:29 +00:00
Dean Michael Berris	3a25d84a51	[XRay] Test for xray_instr_map in object file. (NFC) This makes a trivial change in the emission of the per-function XRay tables, and makes sure that the xray_instr_map section does show up in the object file. llvm-svn: 278113	2016-08-09 10:42:11 +00:00
Simon Pilgrim	54c32ddf55	[X86][SSE] Fix memory folding of (v)roundsd / (v)roundss We only had partial memory folding support for the intrinsic definitions, and (as noted on PR27481) was causing FR32/FR64/VR128 mismatch errors with the machine verifier. This patch adds missing memory folding support for both intrinsics and the ffloor/fnearbyint/fceil/frint/ftrunc patterns and in doing so fixes the failing machine verifier stack folding tests from PR27481. Differential Revision: https://reviews.llvm.org/D23276 llvm-svn: 278106	2016-08-09 09:32:34 +00:00
Craig Topper	a10549d3e9	[X86] Reduce duplicated code in the execution domain lookup functions by passing tables as an argument. llvm-svn: 278098	2016-08-09 05:26:09 +00:00
Craig Topper	92a4ff1294	[AVX-512] Add support for execution domain switching masked logical ops between floating point and integer domain. This switches PS<->D and PD<->Q. llvm-svn: 278097	2016-08-09 05:26:07 +00:00
Craig Topper	9bd6241106	[X86] Remove the Fv packed logical operation alias instructions. Replace them with patterns to the regular instructions. This enables execution domain fixing which is why the tests changed. llvm-svn: 278090	2016-08-09 03:06:33 +00:00
Craig Topper	c09273b42b	[X86] Cleanup patterns for AVX/SSE for PS operations. Always try to look for bitcasts from floating point types. If only AVX1 is supported we also need to handle integer types with floating point ops without looking for bitcasts. Previously SSE1 had a pattern that looked for integer types without bitcasts, but the type wasn't legal with only SSE1 and SSE2 add an identical pattern for the integer instructions. llvm-svn: 278089	2016-08-09 03:06:28 +00:00
Craig Topper	de06b51d3d	[X86] Remove unnecessary bitcast from the front of AVX1Only 256-bit logical operation patterns. llvm-svn: 278088	2016-08-09 03:06:26 +00:00
Matthias Braun	7313ca6dbf	X86InstrInfo: Update liveness in classifyLea() We need to update liveness information when we create COPYs in classifyLea(). This fixes http://llvm.org/28301 llvm-svn: 278086	2016-08-09 01:47:26 +00:00
Sanjay Patel	06ba09af67	[x86] split combineVSelectWithAllOnesOrZeros into a helper function; NFCI llvm-svn: 278074	2016-08-09 00:01:11 +00:00
Charles Davis	e9c32c7ed3	Revert "[X86] Support the "ms-hotpatch" attribute." This reverts commit r278048. Something changed between the last time I built this--it takes awhile on my ridiculously slow and ancient computer--and now that broke this. llvm-svn: 278053	2016-08-08 21:20:15 +00:00
Charles Davis	0822aa118e	[X86] Support the "ms-hotpatch" attribute. Summary: Based on two patches by Michael Mueller. This is a target attribute that causes a function marked with it to be emitted as "hotpatchable". This particular mechanism was originally devised by Microsoft for patching their binaries (which they are constantly updating to stay ahead of crackers, script kiddies, and other ne'er-do-wells on the Internet), but is now commonly abused by Windows programs to hook API functions. This mechanism is target-specific. For x86, a two-byte no-op instruction is emitted at the function's entry point; the entry point must be immediately preceded by 64 (32-bit) or 128 (64-bit) bytes of padding. This padding is where the patch code is written. The two byte no-op is then overwritten with a short jump into this code. The no-op is usually a `movl %edi, %edi` instruction; this is used as a magic value indicating that this is a hotpatchable function. Reviewers: majnemer, sanjoy, rnk Subscribers: dberris, llvm-commits Differential Revision: https://reviews.llvm.org/D19908 llvm-svn: 278048	2016-08-08 21:01:39 +00:00
Nirav Dave	f45fd2ba87	[X86] Improve code size on X86 segment moves Moves of a value to a segment register from a 16-bit register is equivalent to one from it's corresponding 32-bit register. Match gas's behavior and rewrite instructions to the shorter of equivalent forms. Reviewers: rnk, ab Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23166 llvm-svn: 278031	2016-08-08 18:01:04 +00:00
Simon Pilgrim	33fc788374	[X86][SSE] Assert if the shuffle mask indices are not -1 or within a valid input range As discussed in post-review rL277959 llvm-svn: 277993	2016-08-08 11:07:34 +00:00
Craig Topper	f44423120f	[AVX-512] Improve lowering of inserting a single element into lowest element of a 512-bit vector of zeroes by using vmovq/vmovd/vmovss/vmovsd. llvm-svn: 277965	2016-08-07 21:52:59 +00:00
Craig Topper	2c51c74d52	[AVX-512] Add 512-bit logical operations to load folding tables. Add avx512f stack folding test and move some tests from the avx512vl test. llvm-svn: 277961	2016-08-07 17:14:09 +00:00
Craig Topper	938e7ab9e1	[AVX-512] Add EVEX encoded floating point MAX/MIN instructions to the load folding tables. llvm-svn: 277960	2016-08-07 17:14:05 +00:00
Simon Pilgrim	21c61fba45	[X86] lowerVectorShuffle - ensure that undefined mask elements only use SM_SentinelUndef Help lowering and combining (which can specify SM_SentinelZero mask elements) share more shuffle matching code. llvm-svn: 277959	2016-08-07 15:29:12 +00:00
Elena Demikhovsky	dca03bebd3	AVX-512: Changed lowering of BITCAST between i1 vectors and i8/i16/i32 integer values Optimized lowering of BITCAST node. The BITCAST node can be replaced with COPY_TO_REG instead of KMOV. It allows to suppress two opposite BITCAST operations and avoid redundant "movs". Differential Revision: https://reviews.llvm.org/D23247 llvm-svn: 277958	2016-08-07 13:05:58 +00:00
Craig Topper	49841c3812	[X86] Add commutable floating point max/min instructions to the load folding tables. llvm-svn: 277949	2016-08-07 05:39:51 +00:00
Craig Topper	c4d757093e	[X86] Simplify a shuffle mask copy. NFC llvm-svn: 277947	2016-08-07 05:39:46 +00:00
Simon Pilgrim	bc573ca1b8	[X86][AVX2] Improve sign/zero extension on AVX2 targets Split extensions to large vectors into 256-bit chunks - the equivalent of what we do with pre-AVX2 into 128-bit chunks llvm-svn: 277939	2016-08-06 21:21:12 +00:00
Craig Topper	9d8676acc0	[AVX-512] Add SQRT/RCP14/RNDSCALE to hasUndefRegUpdate. llvm-svn: 277934	2016-08-06 19:31:52 +00:00
Craig Topper	19505bc354	[AVX-512] Add AVX-512 scalar CVT instructions to hasUndefRegUpdate. llvm-svn: 277933	2016-08-06 19:31:50 +00:00
Craig Topper	f5d05fb0ce	[X86] Add VRCPSSr_Int, VRSQRTSSr_Int, VSQRTSSr_Int, and VSQRTSDr_Int to hasUndefRegUpdate. llvm-svn: 277931	2016-08-06 19:31:44 +00:00
Simon Pilgrim	7d168e19e8	[X86][SSE] Enable commutation between MOVHLPS and UNPCKHPD Assuming SSE2 is available then we can safely commute between these, removing some unnecessary register moves and improving memory folding opportunities. VEX encoded versions don't benefit so I haven't added support to them. llvm-svn: 277930	2016-08-06 18:40:28 +00:00
Simon Pilgrim	f56309f11a	[X86][SSE] Add 2 input shuffle support to matchBinaryVectorShuffle Not actually used yet... llvm-svn: 277919	2016-08-06 11:22:39 +00:00
Benjamin Kramer	b7d3311c77	Move helpers into anonymous namespaces. NFC. llvm-svn: 277916	2016-08-06 11:13:10 +00:00
Simon Pilgrim	69b6a70834	[X86][SSE] Add initial support for 2 input target shuffle combining. At the moment only the INSERTPS matching can actually use 2 inputs but the plumbing is now in place. llvm-svn: 277839	2016-08-05 17:36:14 +00:00
Simon Pilgrim	24dc1e7a90	[X86][SSE] Update the the target shuffle matches to use the effective mask's value type directly instead of via the input value type. Preparation for adding 2 input support so we want to avoid unnecessary references to the input value type. llvm-svn: 277817	2016-08-05 14:33:11 +00:00
Simon Pilgrim	7080005e67	[X86][SSE] Consistently use the target shuffle root value type for vector size calculations. NFCI. Preparation for adding 2 input support so we want to avoid unnecessary references to the input value type. llvm-svn: 277814	2016-08-05 13:02:53 +00:00
Simon Pilgrim	6f7b0cd530	[X86][SSE] Added target shuffle combine binary compute matching function. NFCI. Added matchBinaryPermuteVectorShuffle and moved the blend+zero and insertps matching code into it. llvm-svn: 277808	2016-08-05 11:16:53 +00:00
Michael Kuperstein	3ceac2bbd5	[LV, X86] Be more optimistic about vectorizing shifts. Shifts with a uniform but non-constant count were considered very expensive to vectorize, because the splat of the uniform count and the shift would tend to appear in different blocks. That made the splat invisible to ISel, and we'd scalarize the shift at codegen time. Since r201655, CodeGenPrepare sinks those splats to be next to their use, and we are able to select the appropriate vector shifts. This updates the cost model to to take this into account by making shifts by a uniform cheap again. Differential Revision: https://reviews.llvm.org/D23049 llvm-svn: 277782	2016-08-04 22:48:03 +00:00
Simon Pilgrim	3dbce52c16	[X86][SSE] Rename target shuffle unary permute matching function. NFCI. In preparation for adding a binary permute matching function. llvm-svn: 277737	2016-08-04 17:16:50 +00:00
Simon Pilgrim	c2370b810d	[X86][SSE] Split off shuffle mask canonicalization from lowerVectorShuffle. NFCI. The new function now returns true if the shuffle should be commuted. This will allow target shuffle combines to share the code. llvm-svn: 277728	2016-08-04 14:21:32 +00:00
Nikolai Bozhenov	f679530ba1	[X86] Heuristic to selectively build Newton-Raphson SQRT estimation On modern Intel processors hardware SQRT in many cases is faster than RSQRT followed by Newton-Raphson refinement. The patch introduces a simple heuristic to choose between hardware SQRT instruction and Newton-Raphson software estimation. The patch treats scalars and vectors differently. The heuristic is that for scalars the compiler should optimize for latency while for vectors it should optimize for throughput. It is based on the assumption that throughput bound code is likely to be vectorized. Basically, the patch disables scalar NR for big cores and disables NR completely for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores. Secondly, vector SQRT has been greatly improved in Skylake and has better throughput compared to NR. Differential Revision: https://reviews.llvm.org/D21379 llvm-svn: 277725	2016-08-04 12:47:28 +00:00
Simon Pilgrim	5d5ca9c0cb	[X86][SSE] Add initial costs for vector CTTZ/CTLZ llvm-svn: 277716	2016-08-04 10:51:41 +00:00
Simon Pilgrim	8ae6dad49b	[X86][SSE] Don't decide when to scalarize CTTZ/CTLZ for performance at lowering - this is what cost models are for Improved CTTZ/CTLZ costings will be added shortly llvm-svn: 277713	2016-08-04 10:14:39 +00:00
Dean Michael Berris	7e9abea2ae	[XRay] Align entry and return sleds to 2 byte boundaries This should ensure that we can atomically write two bytes (on top of the retq and the one past it) and have those two bytes not straddle cache lines. We also move the label past the alignment instruction so that we can refer to the actual first instruction, as opposed to potential padding before the aligned instruction. Update the tests to allow us to reflect the new order of assembly. Reviewers: rSerge, echristo, majnemer Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D23101 llvm-svn: 277701	2016-08-04 07:37:28 +00:00
Simon Pilgrim	898f030f70	[X86][SSE] Enable target shuffle combining to combine multiple shuffle inputs. We currently only support combining target shuffles that consist of a single source input (plus elements known to be undef/zero). This patch generalizes the recursive combining of the target shuffle to collect all the inputs, merging any duplicates along the way, into a full set of src ops and its shuffle mask. We uncover a number of cases where we have failed to combine a unary shuffle because the input has been duplicated and separated during lowering. This will allow us to combine to 2-input shuffles in a future patch. Differential Revision: https://reviews.llvm.org/D22859 llvm-svn: 277631	2016-08-03 19:08:24 +00:00
Igor Breger	c59b3a2236	[AVX512] Add aliases for vcvttss2si{l\|q}, vcvttsd2si{l\|q}, vcvttss2usi{l\|q}, vcvttsd2usi{l\|q} instructions. Differential Revision: http://reviews.llvm.org/D23111 llvm-svn: 277586	2016-08-03 10:58:05 +00:00
Dean Michael Berris	0b8f6c8777	[XRay] Make the xray_instr_map section specification more correct Summary: We also add a test to show what currently happens when we create a section per function and emit an xray_instr_map. This illustrates the relationship (or lack thereof) between the per-function section and the xray_instr_map section. We also change the code generation slightly so that we don't always create group sections, but rather only do so if a function where the table is associated with is in a group. Also in this change: - Remove the "merge" flag on the xray_instr_map section. - Test that we're generating the right table for comdat and non-comdat functions. Reviewers: echristo, majnemer Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D23104 llvm-svn: 277580	2016-08-03 07:21:55 +00:00
Nirav Dave	8601ac11aa	[MC] Fix Intel Operand assembly parsing for .set ids Recommitting after fixing overaggressive fastpath return in parsing. Fix intel syntax special case identifier operands that refer to a constant (e.g. .set <ID> n) to be interpreted as immediate not memory in parsing. Associated commit to fix clang test commited shortly. Reviewers: rnk Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D22585 llvm-svn: 277489	2016-08-02 17:56:03 +00:00
Igor Breger	f44b79d08e	[AVX512] Don't use i128 masked gather/scatter/load/store. Do more accurately dataWidth check. Differential Revision: http://reviews.llvm.org/D23055 llvm-svn: 277435	2016-08-02 09:15:28 +00:00
Craig Topper	9433f975d0	[AVX-512] Mark VADDPS/PD and VMULPS/PD as commutable. This necessitated adding itineraries to all of the instructions that use the avx512_fp_binop_p class. llvm-svn: 277422	2016-08-02 06:16:53 +00:00
Craig Topper	553535848f	[AVX-512] Use SSE_MUL_ITINS_S/SSE_DIV_ITINS_S for the scalar FMUL/FDIV instructions to match SSE/AVX. llvm-svn: 277421	2016-08-02 06:16:51 +00:00
Craig Topper	05948fb36c	[AVX-512] Correct ExeDomain for many AVX-512 instructions. llvm-svn: 277416	2016-08-02 05:11:15 +00:00
Hans Wennborg	7a3a49b18a	Revert r276895 "[MC][X86] Fix Intel Operand assembly parsing for .set ids" This caused PR28805. Adding a regression test. llvm-svn: 277402	2016-08-01 23:00:01 +00:00
Simon Pilgrim	46f119a59f	[X86] Use implicit masking of SHLD/SHRD shift double instructions Similar to the regular shift instructions, SHLD/SHRD only use the bottom bits of the shift value llvm-svn: 277341	2016-08-01 12:11:43 +00:00
Craig Topper	c48c029610	[AVX-512] Fix duplicate column in AVX512 execution dependency table that was preventing VMOVDQU32/VMOVDQA32 from being recognized. Fix a bug in the code that stops execution dependency fix from turning operations on 32-bit integer element types into operations on 64-bit integer element types. llvm-svn: 277327	2016-08-01 07:55:33 +00:00
Craig Topper	749a111f1e	[AVX-512] Teach X86InstrInfo::getLargestLegalSuperClass to inflate to FR32X/FR64X if AVX512 is supported and VR128X/VR256X if VLX is supported. Had to update a stack folding test to clobber the other 16 registers since this now made them get used instead of spilling. llvm-svn: 277321	2016-08-01 05:31:50 +00:00
Craig Topper	3946176314	[AVX-512] Use FR32X/FR64X/VR128X/VR256X register classes in addRegisterClass if AVX512(for FR32X/FR64) or VLX(for VR128X/VR256) is supported. This is a minimal requirement to be able to allocate all 32 registers. llvm-svn: 277319	2016-08-01 04:29:13 +00:00
Craig Topper	da50eec26d	[X86] Move mask register handling into the main switch of getLoadStoreRegOpcode. No functional change intended. llvm-svn: 277318	2016-08-01 04:29:11 +00:00
Craig Topper	c0097dc7d0	[X86] Simplify code for determing GR or FR reg classes by querying for super classes instead of manually listing individual classes. llvm-svn: 277306	2016-07-31 20:20:08 +00:00
Craig Topper	7afdc0fb25	[AVX512] Always use EVEX encodings for 128/256-bit move instructions in getLoadStoreRegOpcode if VLX is supported. llvm-svn: 277305	2016-07-31 20:20:05 +00:00
Craig Topper	4c53e60360	[AVX512] Add VLX packed move instructions to the execution dependency fix pass and update tests. llvm-svn: 277304	2016-07-31 20:20:01 +00:00
Craig Topper	eb1cc981a5	[AVX512] Move FR32X/FR64X handling in getLoadStoreRegOpcode into the main switch. No functional change intended. llvm-svn: 277303	2016-07-31 20:19:55 +00:00
Craig Topper	338ec9a0cb	[AVX512] Stop treating VR512 specially in getLoadStoreRegOpcode and use the regular switch which already tried to handle it, but was unreachable. This has the added benefit of enabling aligned loads/stores if the stack is aligned. llvm-svn: 277302	2016-07-31 20:19:53 +00:00
Craig Topper	2a6bbb8203	[AVX512] Add X86::VR512RegClassID to X86RegisterInfo::getLargestLegalSuperClass. llvm-svn: 277301	2016-07-31 20:19:50 +00:00
Simon Pilgrim	6be48e4aa7	[X86] Improve 64-bit shifts on 32-bit targets (PR14593) As discussed on PR14593, this patch adds support for lowering to SHLD/SHRD from the patterns generated by DAGTypeLegalizer::ExpandShiftWithKnownAmountBit. Differential Revision: https://reviews.llvm.org/D23000 llvm-svn: 277299	2016-07-31 19:50:45 +00:00
Craig Topper	00d34ed64f	[AVX-512] Don't let ExeDependencyFix pass convert VPANDD/Q to VPANDPS/PD unless DQI instructions are supported. Same for ANDN, OR, and XOR. Thanks to Igor Breger for pointing out my mistake. llvm-svn: 277292	2016-07-31 17:15:07 +00:00
Elena Demikhovsky	6e9b16054f	AVX-512: Removed AssertZext node before TRUNCATE Removed AssertZext node, which was inserted between X86ISD::SETCC and "truncate to i1". Differential Revision: https://reviews.llvm.org/D22850 llvm-svn: 277289	2016-07-31 06:48:01 +00:00
Simon Pilgrim	5e0d6b509a	Strip trailing whitespace llvm-svn: 277280	2016-07-30 20:53:21 +00:00
Simon Pilgrim	8bbd3650a6	[X86] Use peekThroughOneUseBitcasts helper function llvm-svn: 277279	2016-07-30 20:51:26 +00:00
Simon Pilgrim	cf49fa3251	[X86][SSE] Let 64-bit targets use the fast 2i32-2f32 UINT_TO_FP conversion as well as 32-bit The 2i32-2i64 legalization means that we can use the slightly quicker double bits + fptrunc approach for the same results llvm-svn: 277271	2016-07-30 14:06:59 +00:00
Benjamin Kramer	205159c628	[X86] Fix lifetime of SMRange temporaries. Found by asan -fsanitize-address-use-after-scope. llvm-svn: 277266	2016-07-30 11:31:24 +00:00
Michael Kuperstein	f396b4c40d	[X86] Match PSADBW in straight-line code Up until now, we only had code to match PSADBW patterns that look like what comes out of the loop vectorizer - a partial reduction inside the loop body that gets fed into a horizontal operation in a different basic block. This adds support for straight-line patterns, like those generated by the SLP vectorizer. Differential Revision: https://reviews.llvm.org/D22889 llvm-svn: 277219	2016-07-29 21:45:51 +00:00
Simon Pilgrim	f107ffa8f0	[X86][AVX] Fix VBROADCASTF128 selection bug (PR28770) Support for lowering to VBROADCASTF128 etc. in D22460 was not correctly ensuring that the only users of the 128-bit vector load were the insertions of the vector into the lower/upper subvectors. llvm-svn: 277214	2016-07-29 21:05:10 +00:00
David L Kreitzer	8b959e5cfa	Avoid unnecessary 32-bit to 64-bit zero extensions following 32-bit CMOV instructions on x86_64. The 32-bit CMOV implicitly zero extends. Differential Revision: https://reviews.llvm.org/D22941 llvm-svn: 277148	2016-07-29 15:09:54 +00:00
Simon Pilgrim	cb780b32a3	[X86][SSE] Optimize the truncation of vector comparison results with PACKSS We currently default to using either generic shuffles or MASK+PACKUS/PACKSS to truncate all integer vectors. For vector comparisons, we know that the result will be either all or zero bits in every element, which can be efficiently truncated by directly using PACKSS to repeatedly halve the size of each element. Due to the limited input values (-1 or 0) we don't need to account for vector element size, so for simplicity we just use the PACKSS(vXi16,vXi16) implementation in all cases. Additionally for AVX2 PACKSS of 256bit data we must perform a PERMQ shuffle to reorder the data into the correct order. I did investigate performing a single shuffle after all the PACKSS calls but the need to cross 128bit lanes makes this difficult to achieve efficiently. We avoid performing this on AVX512 as it should have better alternative truncation instructions. Differential Revision: https://reviews.llvm.org/D22814 llvm-svn: 277132	2016-07-29 10:23:10 +00:00
Craig Topper	e4f868ea16	[AVX512] Mark EVEX VMOVSSrm and VMOVSDrm as canFoldAsLoad and isReMaterializable. llvm-svn: 277120	2016-07-29 06:06:04 +00:00
Craig Topper	5625d24977	[AVX512] Copy the patterns that recognize scalar arimetic operations inserting into the lower element of a packed vector from AVX/SSE so that we can use EVEX encoded instructions. llvm-svn: 277119	2016-07-29 06:06:00 +00:00
Craig Topper	c7de3a1018	[AVX512] Remove the intrinsic forms of VMOVSS/VMOVSD. We don't need two different forms of 'rr' and 'rm'. This matches SSE/AVX. I'm not convinced the patterns for the rm_Int was correct anyway. It had a tied source that should't exist for the unmasked version. The load form of MOVSS always zeros the most significant bits. I've left the patterns off the masked load instructions as I'm not sure what the correct pattern should be and we don't have any tests currently. Nor do we implement masked scalar load intrinsics in clang currently. llvm-svn: 277098	2016-07-29 02:49:08 +00:00
Matthias Braun	941a705b7b	MachineFunction: Return reference for getFrameInfo(); NFC getFrameInfo() never returns nullptr so we should use a reference instead of a pointer. llvm-svn: 277017	2016-07-28 18:40:00 +00:00
Craig Topper	7e27885f69	[X86] Remove CustomInserter for FMA3 instructions. Looks like since we got full commuting support for FMAs after this was added, the coalescer can now get this right on its own. Differential Revision: https://reviews.llvm.org/D22799 llvm-svn: 276987	2016-07-28 15:28:56 +00:00
Michael Kuperstein	e7605e28f9	[X86] Factor out another piece of the SAD combine. NFCI. llvm-svn: 276918	2016-07-27 20:59:51 +00:00
Nirav Dave	06a99a46e2	[MC][X86] Fix Intel Operand assembly parsing for .set ids Fix intel syntax special case identifier operands that refer to a constant (e.g. .set <ID> n) to be interpreted as immediate not memory in parsing. Reviewers: rnk Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D22585 llvm-svn: 276895	2016-07-27 17:39:41 +00:00
Michael Kuperstein	2dc08f7df8	[X86] Split out absdiff detection from SAD combine. NFC. Preparation for supporting PSADBW emission for straight-line code. llvm-svn: 276798	2016-07-26 20:01:29 +00:00
Simon Pilgrim	019e102426	[X86][SSE] Fixed issue with memory folding of (v)cvtsd2ss intrinsics Fixed typo in the intrinsic definitions of (v)cvtsd2ss with memory folding. This was only unearthed when rL276102 started using the intrinsic again..... llvm-svn: 276740	2016-07-26 10:41:28 +00:00
Simon Pilgrim	28c7d7093d	Fixed spelling in comment llvm-svn: 276738	2016-07-26 09:55:31 +00:00
Craig Topper	79011a660e	[X86] Remove isCommutable=1 from instructions that also load. Commuting such instruction isn't useful as it would unfold the load. The exception being FMA3 instructions. llvm-svn: 276733	2016-07-26 08:06:18 +00:00
Craig Topper	26000f8d90	[AVX512] Don't mark ADDSSZr_Int or MULSSZr_Int as commutable. The intrinsics have one of their arguments indicated as passing through the high bits and we can't commute that. llvm-svn: 276732	2016-07-26 08:06:14 +00:00
Joel Jones	373d7d30dd	MC] Provide an MCTargetOptions to implementors of MCAsmBackendCtorTy, NFC Some targets, notably AArch64 for ILP32, have different relocation encodings based upon the ABI. This is an enabling change, so a future patch can use the ABIName from MCTargetOptions to chose which relocations to use. Tested using check-llvm. The corresponding change to clang is in: http://reviews.llvm.org/D16538 Patch by: Joel Jones Differential Revision: https://reviews.llvm.org/D16213 llvm-svn: 276654	2016-07-25 17:18:28 +00:00
Elena Demikhovsky	64e5f929d0	AVX-512: Fixed [US]INT_TO_FP selection for i1 vectors. It failed with assertion before this patch. Differential Revision: https://reviews.llvm.org/D22735 llvm-svn: 276648	2016-07-25 16:51:00 +00:00
Craig Topper	ce415ff9c5	[AVX512] Add load folding support for the unmasked forms of the FMA instructions. llvm-svn: 276615	2016-07-25 07:20:35 +00:00
Craig Topper	318e40b6f7	[AVX512] Add some additional patterns so that we can fold broadcast loads in the first argument of an FMADD/FMSUB/FNMADD/FNMSUB/FMADDSUB/FMSUBADD node. Also add patterns to support all combinations of the broadcast input and the preserved input for masked versions. llvm-svn: 276614	2016-07-25 07:20:31 +00:00
Craig Topper	6bcbf5338c	[AVX512] Cleanup FMA operand order in patterns to match the VEX versions and to really be 213, 231, and 132. llvm-svn: 276613	2016-07-25 07:20:28 +00:00
Simon Pilgrim	381a0ade5a	[X86] Add 'FeatureSlowSHLD' to cpu 'bdver4' As with all AMD CPUs, excavator has poor SHLD/SHRD performance. Also added bdver3 to the test as it was missing. llvm-svn: 276569	2016-07-24 16:00:53 +00:00
Craig Topper	2dca3b287b	[X86] Make the FMA3 instruction names consistent between VEX and EVEX encoded versions. This places the 132/213/231 form number in front of the SS/SD/PS/PD. Move the Y for 256-bit versions to be after the PS/PD. Change the AVX512 scalar forms to include a Z in the their name. This new format should be consistent with the general naming of instructions. llvm-svn: 276559	2016-07-24 08:26:38 +00:00
Craig Topper	05629d05c7	[X86] Replace CodeGenOnly VPSRAVW/D/Q_Int instructions with patterns since the operand types exactly match the normal VPSRAVW/D/Q instructions. llvm-svn: 276555	2016-07-24 07:32:45 +00:00
Craig Topper	8152b9cd96	[X86] Fix typo in comment. llvm-svn: 276528	2016-07-23 16:44:08 +00:00
Craig Topper	b6519db90d	[AVX512] Implement commuting support for EVEX encoded FMA3 instructions. llvm-svn: 276521	2016-07-23 07:16:56 +00:00
Craig Topper	6172b0b3e9	[X86] Make one of the FMA3 commuting methods static. Remove a call to isFMA3 just to get the IsIntrisic flag, instead get it during the first call and pass it along. NFC llvm-svn: 276520	2016-07-23 07:16:53 +00:00
Craig Topper	ca8f5f309c	[X86] Fix switch statement indentation per coding standards. llvm-svn: 276519	2016-07-23 07:16:50 +00:00
Simon Pilgrim	ea0d4f9962	[X86][AVX] Added support for lowering to VBROADCASTF128/VBROADCASTI128 (reapplied) As reported on PR26235, we don't currently make use of the VBROADCASTF128/VBROADCASTI128 instructions (or the AVX512 equivalents) to load+splat a 128-bit vector to both lanes of a 256-bit vector. This patch enables lowering from subvector insertion/concatenation patterns and auto-upgrades the llvm.x86.avx.vbroadcastf128.pd.256 / llvm.x86.avx.vbroadcastf128.ps.256 intrinsics to match. We could possibly investigate using VBROADCASTF128/VBROADCASTI128 to load repeated constants as well (similar to how we already do for scalar broadcasts). Reapplied with fix for PR28657 - removed intrinsic definitions (clang companion patch to be be submitted shortly). Differential Revision: https://reviews.llvm.org/D22460 llvm-svn: 276416	2016-07-22 13:58:44 +00:00
Benjamin Kramer	5ba0e20315	Revert "[X86][AVX] Added support for lowering to VBROADCASTF128/VBROADCASTI128" It caused PR28657. This reverts commit r276281. llvm-svn: 276405	2016-07-22 11:03:10 +00:00
Craig Topper	52e2e8381b	[AVX512] Add ExeDomain to vector extend and truncate instructions. llvm-svn: 276394	2016-07-22 05:46:44 +00:00
Craig Topper	f4151bea72	[AVX512] Add initial support for the Execution Domain fixing pass to change some EVEX instructions. llvm-svn: 276393	2016-07-22 05:00:52 +00:00
Craig Topper	5ec33a9411	[AVX512] Fix the ExeDomain for some packed fp instructions. llvm-svn: 276392	2016-07-22 05:00:42 +00:00
Craig Topper	0b90756b0a	[AVX512] Add load folding for some AVX512VL logic and arithmetic instructions. llvm-svn: 276391	2016-07-22 05:00:39 +00:00
Craig Topper	ab13b33ded	[AVX512] Update X86InstrInfo::foldMemoryOperandCustom to handle the EVEX encoded instructions too. llvm-svn: 276390	2016-07-22 05:00:35 +00:00
Michael Kuperstein	c523333bbf	[X86] Do not use AND8ri8 in AVX512 pattern This variant is (as documented in the TD) for disassembler use only, and should not be used in patterns - it is longer, and is broken on 64-bit. llvm-svn: 276347	2016-07-21 22:24:08 +00:00
Simon Pilgrim	88e0940d3b	[X86][SSE] Allow folding of store/zext with PEXTRW of 0'th element Under normal circumstances we prefer the higher performance MOVD to extract the 0'th element of a v8i16 vector instead of PEXTRW. But as detailed on PR27265, this prevents the SSE41 implementation of PEXTRW from folding the store of the 0'th element. Additionally it prevents us from making use of the fact that the (SSE2) reg-reg version of PEXTRW implicitly zero-extends the i16 element to the i32/i64 destination register. This patch only preferentially lowers to MOVD if we will not be zero-extending the extracted i16, nor prevent a store from being folded (on SSSE41). Fix for PR27265. Differential Revision: https://reviews.llvm.org/D22509 llvm-svn: 276289	2016-07-21 14:54:17 +00:00
Simon Pilgrim	b11bdd95f6	[X86][SSE] Pull out duplicate EXTRW lowering code. NFCI. As requested on D22509, I've pulled out the v8i16 extraction lowering as the SSE41 and pre-SSE41 implementations are effectively the same. llvm-svn: 276285	2016-07-21 14:30:17 +00:00
Simon Pilgrim	c8e20b1150	[X86][AVX] Added support for lowering to VBROADCASTF128/VBROADCASTI128 As reported on PR26235, we don't currently make use of the VBROADCASTF128/VBROADCASTI128 instructions (or the AVX512 equivalents) to load+splat a 128-bit vector to both lanes of a 256-bit vector. This patch enables lowering from subvector insertion/concatenation patterns and auto-upgrades the llvm.x86.avx.vbroadcastf128.pd.256 / llvm.x86.avx.vbroadcastf128.ps.256 intrinsics to match. We could possibly investigate using VBROADCASTF128/VBROADCASTI128 to load repeated constants as well (similar to how we already do for scalar broadcasts). Differential Revision: https://reviews.llvm.org/D22460 llvm-svn: 276281	2016-07-21 14:10:54 +00:00
Matthias Braun	ca8210a952	X86InstrInfo: No need for liveness analysis in classifyLEAReg() classifyLEAReg() deals with switching operands from 32bit to 64bit in order to use a LEA64_32 instruction (for three address code goodness). It currently performs a liveness analysis to determine the kill/undef flag for the newly added operand. This should not be necessary: - If the previous operand had a kill flag, then the 32bit part of the register gets killed, this will kill the super register as well. - If the previous operand had an undef flag then we didn't care what value we read, just use the same flag on the new operand. (No matter what an operand with an undef flag won't affect liveness) This makes the code independent of the presence of kill flags because it avoids a call to MachineBasicBlock::computeRegisterLiveness(). Differential Revision: http://reviews.llvm.org/D22283 llvm-svn: 276222	2016-07-21 00:33:38 +00:00
Simon Pilgrim	1b4f511aaa	[X86][SSE] Add cost model values for CTPOP of vectors This patch adds costs for the vectorized implementations of CTPOP, the default values were seriously underestimating the cost of these and was encouraging vectorization on targets where serialized use of POPCNT would be much better. Differential Revision: https://reviews.llvm.org/D22456 llvm-svn: 276104	2016-07-20 10:41:28 +00:00
Craig Topper	09b99c3a75	[AVX512] Add a missing NoVLX to give priority to the AVX512 version of the pattern. llvm-svn: 276088	2016-07-20 05:05:50 +00:00
Craig Topper	7a95428f7c	[X86] Use 'HasAVX1Only' to properly give priority to the AVX2 version without relying on file ordering. llvm-svn: 276087	2016-07-20 05:05:48 +00:00
Craig Topper	cde436a663	[X86] Create some multiclasses to reduce the repeated patterns for VEXTRACT(F/I)128/VINSERT(I/F)128. NFC llvm-svn: 276086	2016-07-20 05:05:46 +00:00
Craig Topper	55913ead3d	[X86] Create some wrapper multiclasses to create AVX and SSE shift instructions with less repeated code. NFC llvm-svn: 276085	2016-07-20 05:05:44 +00:00
Simon Pilgrim	0ea8d275cc	[X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic IR D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ truncating conversions with generic IR instead. It turns out that the behaviour of these intrinsics is different enough from generic IR that this will cause problems, INF/NAN/out of range values are guaranteed to result in a 0x80000000 value - which plays havoc with constant folding which converts them to either zero or UNDEF. This is also an issue with the scalar implementations (which were already generic IR and what I was trying to match). This patch changes both scalar and packed versions back to using x86-specific builtins. It also deals with the other scalar conversion cases that are runtime rounding mode dependent and can have similar issues with constant folding. A companion clang patch is at D22105 Differential Revision: https://reviews.llvm.org/D22106 llvm-svn: 275981	2016-07-19 15:07:43 +00:00
Elena Demikhovsky	2c0780b8e5	AVX-512: Fixed BT instruction selection. The following condition expression ( a >> n) & 1 is converted to "bt a, n" instruction. It works on all intel targets. But on AVX-512 it was broken because the expression is modified to (truncate (a >>n) to i1). I added the new sequence (truncate (a >>n) to i1) to the BT pattern. Differential Revision: https://reviews.llvm.org/D22354 llvm-svn: 275950	2016-07-19 07:14:21 +00:00
Craig Topper	d6ca1dc45e	[AVX512] Give priority to EVEX encoded PSHUFB over the VEX versions. llvm-svn: 275942	2016-07-19 02:00:38 +00:00
Craig Topper	592dc30708	[X86] Remove superfluous parameter from a multiclass. All instantiations passed the same value. llvm-svn: 275941	2016-07-19 02:00:35 +00:00
Craig Topper	6189d3ecd4	[X86] Rename VINSERTzrr to use a capital Z to match other instructions. NFC llvm-svn: 275939	2016-07-19 01:26:19 +00:00
Simon Pilgrim	c941f6b329	[X86][AVX] Add target shuffle decode support for VBROADCAST Currently we only decode broadcasts from a vector of the same size. llvm-svn: 275823	2016-07-18 17:32:59 +00:00
Craig Topper	a3c55f5915	[AVX512] Add EVEX versions of scalar ADD/SUB/MUL/DIV to load folding tables. llvm-svn: 275775	2016-07-18 06:49:32 +00:00
Craig Topper	16a0744955	[AVX512] Add KADD/KAND/KOR/KXOR to X86InstrInfo::isAssociativeAndCommutative. llvm-svn: 275771	2016-07-18 06:14:59 +00:00
Craig Topper	463f949a3a	[X86] Add VPMULLW/D/Q instructions to X86InstrInfo::isAssociativeAndCommutative. llvm-svn: 275770	2016-07-18 06:14:57 +00:00
Craig Topper	1af6cc00dc	[X86] Add VPADD instructions to X86InstrInfo::isAssociativeAndCommutative. llvm-svn: 275769	2016-07-18 06:14:54 +00:00
Craig Topper	ba9b93d7f2	[X86] Add floating point packed logical ops to X86InstrInfo::isAssociativeAndCommutative. llvm-svn: 275768	2016-07-18 06:14:50 +00:00
Craig Topper	3a99de4067	[X86] Add AVX512 instructions to X86InstrInfo::isAssociativeAndCommutative. llvm-svn: 275767	2016-07-18 06:14:47 +00:00
Craig Topper	fe5a6dc581	[X86] Add more AVX512 instructions to X86InstrInfo::isHighLatencyDef. Also add all packed fp division instructions. llvm-svn: 275766	2016-07-18 06:14:45 +00:00
Craig Topper	f7a06c29bc	[X86] Add AVX512 load opcodes and a couple AVX load opcodes to X86InstrInfo::areLoadsFromSameBasePtr. llvm-svn: 275765	2016-07-18 06:14:43 +00:00
Craig Topper	650a15e2b3	[X86] Add more opcodes to isFrameLoadOpcode/isFrameStoreOpcode. Mainly AVX-512 related. llvm-svn: 275764	2016-07-18 06:14:39 +00:00
Craig Topper	5c913e84df	[AVX512] Use VMOVAPSZ128rr/VMOVAPS256rr for VR128X/VR256X physreg moves when VLX is supported. Ideally we would use VEX encoded moves instead of EVEX if the high 16 registers aren't referenced, but this a good first step. llvm-svn: 275763	2016-07-18 06:14:34 +00:00
Craig Topper	53f3d1b4d0	[X86] Fix 80-column violations. NFC llvm-svn: 275762	2016-07-18 06:14:26 +00:00
Simon Pilgrim	285d9e4d60	Strip trailing whitespace llvm-svn: 275726	2016-07-17 19:02:27 +00:00
Simon Pilgrim	1be1222293	[X86][SSE] lowerVectorShuffleAsPermuteAndUnpack tidyup. NFCI. Moved unpack type determination into TryUnpack lambda. Added missing comment describing lowerVectorShuffleAsPermuteAndUnpack call. llvm-svn: 275708	2016-07-17 15:48:25 +00:00
Guy Blank	3357ba36e2	test commit llvm-svn: 275703	2016-07-17 12:10:35 +00:00
Craig Topper	8093437f2e	[AVX512] Remove CodeGenOnly VBROADCAST m_Int instructions. They can be implemented with patterns selecting existing instructions. NFC llvm-svn: 275671	2016-07-16 03:42:59 +00:00
Nico Weber	8d66df15f4	Teach fast isel about the win64 calling convention. This mostly just works. Vectorcall rets are still not supported. The win64_eh test change is because fast isel doesn't use rsi for temporary computations, so it doesn't need to be pushed. The test case I'm changing was originally added to test pushes, but by now there are other test cases in that file exercising that code path. https://reviews.llvm.org/D22422 llvm-svn: 275607	2016-07-15 20:18:37 +00:00
Justin Lebar	9c375817ac	[SelectionDAG] Get rid of bool parameters in SelectionDAG::getLoad, getStore, and friends. Summary: Instead, we take a single flags arg (a bitset). Also add a default 0 alignment, and change the order of arguments so the alignment comes before the flags. This greatly simplifies many callsites, and fixes a bug in AMDGPUISelLowering, wherein the order of the args to getLoad was inverted. It also greatly simplifies the process of adding another flag to getLoad. Reviewers: chandlerc, tstellarAMD Subscribers: jholewinski, arsenm, jyknight, dsanders, nemanjai, llvm-commits Differential Revision: http://reviews.llvm.org/D22249 llvm-svn: 275592	2016-07-15 18:27:10 +00:00
Justin Lebar	0af80cd6f0	[CodeGen] Take a MachineMemOperand::Flags in MachineFunction::getMachineMemOperand. Summary: Previously we took an unsigned. Hooray for type-safety. Reviewers: chandlerc Subscribers: dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D22282 llvm-svn: 275591	2016-07-15 18:26:59 +00:00
Jacques Pienaar	71c30a14b7	Rename AnalyzeBranch* to analyzeBranch*. Summary: NFC. Rename AnalyzeBranch/AnalyzeBranchPredicate to analyzeBranch/analyzeBranchPredicate to follow LLVM coding style and be consistent with TargetInstrInfo's analyzeCompare and analyzeSelect. Reviewers: tstellarAMD, mcrosier Subscribers: mcrosier, jholewinski, jfb, arsenm, dschuff, jyknight, dsanders, nemanjai Differential Revision: https://reviews.llvm.org/D22409 llvm-svn: 275564	2016-07-15 14:41:04 +00:00
Simon Pilgrim	2683ad54ad	[X86][AVX2] Improve lowerShuffleAsRepeatedMaskAndLanePermute permutation of 64-bit sub-lanes As discussed on PR28136, lowerShuffleAsRepeatedMaskAndLanePermute was attempting to match repeated masks at the 128-bit level and then permute the resultant lanes at the 128-bit (AVX1) or 64-bit (AVX2) sub-lane level. This change allows us to create the repeated masks at the sub-lane level (and then concat them together to create a 128-bit repeated mask) and then select which sub-lane to permute. This has no effect on the AVX1 codegen. Fixes PR28136. llvm-svn: 275543	2016-07-15 09:49:12 +00:00
Simon Pilgrim	420b266d0a	[X86][AVX2] Allow VPERMPD/VPERMQ shuffles to call combineShuffle (reapplied) This improves the situation discussed in D19228 where we were forcing VPERMPD/VPERMQ where VPERM2F128/VPERM2I128 would have been better. This was incorrectly reverted in rL275421 during triage of PR28552. llvm-svn: 275497	2016-07-14 23:05:09 +00:00
Nirav Dave	a6c7595d0f	[X86][MC] Fix bracket expression parsing in intel-style assembly. Only perform struct field check on Identifier tokens. Fixes PR28547. Reviewers: rnk Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D22361 llvm-svn: 275445	2016-07-14 17:37:05 +00:00
Nico Weber	5bb284226b	Don't optimize movs to pushes in -O0 builds. https://reviews.llvm.org/D22362 llvm-svn: 275431	2016-07-14 15:40:22 +00:00
Nico Weber	ead8f8ffdd	Delete some trailing whitespace. llvm-svn: 275429	2016-07-14 15:07:44 +00:00
Ahmed Bougacha	85dc93c56b	[X86] Decode MPX BND registers. We were able to assemble, but not disassemble. Note that fixupRMValue was truncating EA_REG_BND0-3 because we hit the uint8_t max. The control registers were already squarely above it, but I don't think they ever go in .r/m, only in .reg. I also did notice an extra REX.W in our encoding, but I think that's fine. llvm-svn: 275427	2016-07-14 14:53:21 +00:00
Ahmed Bougacha	4f7a5e20ae	[X86] Don't mark addressing mode operands as "outs". NFC-ish. Nothing in-tree can tell the difference, but it's incorrect: the addressing mode registers aren't what's defined. llvm-svn: 275426	2016-07-14 14:53:17 +00:00
Nico Weber	3afaf16abc	Revert r275411, it cause PR28552. llvm-svn: 275421	2016-07-14 14:49:35 +00:00
Nico Weber	ecdf45b1e6	Teach fast isel calls and rets about stdcall. stdcall is callee-pop like thiscall, so the thiscall changes already did most of the work for this. This change only opts stdcall in and adds tests. llvm-svn: 275414	2016-07-14 13:54:26 +00:00
Simon Pilgrim	534e3240e8	Remove trailing whitespace. llvm-svn: 275412	2016-07-14 13:29:23 +00:00
Simon Pilgrim	3ecb6bdd5f	[X86][AVX2] Allow VPERMPD/VPERMQ shuffles to call combineShuffle This improves the situation discussed in D19228 where we were forcing VPERMPD/VPERMQ where VPERM2F128/VPERM2I128 would have been better. llvm-svn: 275411	2016-07-14 13:28:43 +00:00
Simon Pilgrim	053d32906f	[X86][AVX] Add support for narrowing 128-bit+ shuffle mask elements to 64-bits to allow combining Primarily this is to allow blend with zero instead of having to use vperm2f128, but we can use this in the future to deal with AVX512 cases where we need to keep the original element size to correctly fold masked operations. llvm-svn: 275406	2016-07-14 12:58:04 +00:00
Simon Pilgrim	a76a8e50e5	[X86][AVX] Add VBROADCASTF128/VBROADCASTI128 shuffle comments support llvm-svn: 275400	2016-07-14 12:07:43 +00:00
Simon Pilgrim	b8c261c931	[X86][AVX2] VBROADCASTSSrr/VBROADCASTSSYrr require AVX2 not AVX llvm-svn: 275391	2016-07-14 10:37:14 +00:00
Craig Topper	6840f1150f	[AVX512] Implement EXTLOAD lowering with patterns to select existing VPMOVZX instructions instead of creating CodeGenOnly instructions. llvm-svn: 275378	2016-07-14 06:41:34 +00:00
Eli Friedman	17e8ea18e9	[X86] Fix stupid typo in isel lowering. Apparently someone miscounted the number of zeros in the immediate. Fixes https://llvm.org/bugs/show_bug.cgi?id=28544 . llvm-svn: 275376	2016-07-14 05:48:25 +00:00
Dean Michael Berris	52735fc435	XRay: Add entry and exit sleds Summary: In this patch we implement the following parts of XRay: - Supporting a function attribute named 'function-instrument' which currently only supports 'xray-always'. We should be able to use this attribute for other instrumentation approaches. - Supporting a function attribute named 'xray-instruction-threshold' used to determine whether a function is instrumented with a minimum number of instructions (IR instruction counts). - X86-specific nop sleds as described in the white paper. - A machine function pass that adds the different instrumentation marker instructions at a very late stage. - A way of identifying which return opcode is considered "normal" for each architecture. There are some caveats here: 1) We don't handle PATCHABLE_RET in platforms other than x86_64 yet -- this means if IR used PATCHABLE_RET directly instead of a normal ret, instruction lowering for that platform might do the wrong thing. We think this should be handled at instruction selection time to by default be unpacked for platforms where XRay is not availble yet. 2) The generated section for X86 is different from what is described from the white paper for the sole reason that LLVM allows us to do this neatly. We're taking the opportunity to deviate from the white paper from this perspective to allow us to get richer information from the runtime library. Reviewers: sanjoy, eugenis, kcc, pcc, echristo, rnk Subscribers: niravd, majnemer, atrick, rnk, emaste, bmakam, mcrosier, mehdi_amini, llvm-commits Differential Revision: http://reviews.llvm.org/D19904 llvm-svn: 275367	2016-07-14 04:06:33 +00:00
Nico Weber	af7e8465e1	Teach fast isel about thiscall (and callee-pop) calls. http://reviews.llvm.org/D22315 llvm-svn: 275360	2016-07-14 01:52:51 +00:00
Nico Weber	eb9488b151	Fix a TODO in X86CallFrameOptimization to not rely on a codegen artifact. This happens to make X86CallFrameOptimization in -O0 / FastISel builds as well, but it's not clear if the pass should run in that setup. http://reviews.llvm.org/D22314 llvm-svn: 275320	2016-07-13 21:38:27 +00:00
Sanjay Patel	610a2f6525	[x86][SSE/AVX] optimize pcmp results better (PR28484) We know that pcmp produces all-ones/all-zeros bitmasks, so we can use that behavior to avoid unnecessary constant loading. One could argue that load+and is actually a better solution for some CPUs (Intel big cores) because shifts don't have the same throughput potential as load+and on those cores, but that should be handled as a CPU-specific later transformation if it ever comes up. Removing the load is the more general x86 optimization. Note that the uneven usage of vpbroadcast in the test cases is filed as PR28505: https://llvm.org/bugs/show_bug.cgi?id=28505 Differential Revision: http://reviews.llvm.org/D22225 llvm-svn: 275276	2016-07-13 16:04:07 +00:00
Simon Pilgrim	a99368fa35	[X86][AVX512] Add support for VPERMILPD/VPERMILPS variable shuffle mask comments llvm-svn: 275272	2016-07-13 15:45:36 +00:00
Simon Pilgrim	48d8340760	[X86][AVX] Add support for target shuffle combining to VPERMILPS variable shuffle mask Added AVX512F VPERMILPS shuffle decoding support llvm-svn: 275270	2016-07-13 15:10:43 +00:00
Simon Pilgrim	57548a6fa6	[X86][SSE] Check for lane crossing shuffles before trying to combine to PSHUFB Removes a return-on-fail that was making it tricky to add other variable mask shuffles. llvm-svn: 275262	2016-07-13 12:48:41 +00:00
Craig Topper	ff1c327ebb	[X86] Remove some seemingly unnecessary patterns that supported vector zext/sext with 256-bit source types producing a 256-bit result. These patterns just extracted the source down to 128-bits to use the instructions. AVX512 seems to have blindly copied them over for VLX, but did not create similar patterns for 512-bit sources. So I'm hoping the backend can't actually produce these cases. llvm-svn: 275240	2016-07-13 02:21:25 +00:00
Simon Pilgrim	6fa71da4a4	[X86][AVX] Add support for target shuffle combining to VPERM2F128/VPERM2I128 llvm-svn: 275212	2016-07-12 20:27:32 +00:00
Matthias Braun	96ec47db74	X86FixupBWInsts: No need for forward liveness analysis. With r274952 and r275201 in place there are no cases left where a forward liveness analysis yields different results than a backward one. So we can remove the forward stepping logic. Differential Revision: http://reviews.llvm.org/D22083 llvm-svn: 275204	2016-07-12 19:04:30 +00:00
Craig Topper	a6e6febe2c	[AVX512] Remove masked logic op intrinsics and autoupgrade them to native IR. llvm-svn: 275155	2016-07-12 05:27:53 +00:00
Duncan P. N. Exon Smith	7b4c18e8f3	X86: Avoid implicit iterator conversions, NFC Avoid implicit conversions from MachineInstrBundleIterator to MachineInstr, mainly by preferring MachineInstr& over MachineInstr and using range-based for loops. llvm-svn: 275149	2016-07-12 03:18:50 +00:00
Nico Weber	c7bf646a99	Teach FastISel about thiscall (and, hence, about callee-pop). http://reviews.llvm.org/D22115 llvm-svn: 275135	2016-07-12 01:30:35 +00:00
Michael Kuperstein	f0c59330e9	[X86] Make some cast costs more precise Make some AVX and AVX512 cast costs more precise. Based on part of a patch by Elena Demikhovsky (D15604). Differential Revision: http://reviews.llvm.org/D22064 llvm-svn: 275106	2016-07-11 21:39:44 +00:00
Quentin Colombet	fb82c7bc94	[X86] Fix tailcall return address clobber bug. This bug (llvm.org/PR28124) was introduced by r237977, which refactored the tail call sequence to be generated in two passes instead of one. Unfortunately, the stack adjustment produced by the first pass was not recognized by X86FrameLowering::mergeSPUpdates() in all cases, causing code such as the following, which clobbers the return address, to be generated: popl %edi popl %edi pushl %eax jmp tailcallee # TAILCALL To fix the problem, the entire stack adjustment is performed in X86ExpandPseudo::ExpandMI() for tail calls. Patch by Magnus Lång <margnus1@gmail.com> Differential Revision: http://reviews.llvm.org/D21325 llvm-svn: 275103	2016-07-11 21:03:03 +00:00
Michael Kuperstein	cfbac5f361	[X86] Disable FixupSetCC for CodeGenOpt::None It is an optimization pass, and should not run at -O0. Especially since Fast RA will not do the required register coalescing anyway, so it's a loss even from the optimization standpoint. This also works around (but doesn't quite fix) PR28489. llvm-svn: 275099	2016-07-11 20:40:44 +00:00
Nirav Dave	57033c6336	Add missing include from previous commit llvm-svn: 275069	2016-07-11 14:32:57 +00:00
Nirav Dave	8603062ee4	Fix branch relaxation in 16-bit mode. Thread through MCSubtargetInfo to relaxInstruction function allowing relaxation to generate jumps with 16-bit sized immediates in 16-bit mode. This fixes PR22097. Reviewers: dwmw2, tstellarAMD, craig.topper, jyknight Subscribers: jfb, arsenm, jyknight, llvm-commits, dsanders Differential Revision: http://reviews.llvm.org/D20830 llvm-svn: 275068	2016-07-11 14:23:53 +00:00
Simon Pilgrim	832463eada	[X86][SSE] Generalise target shuffle combine of shuffles using variable masks At present the only shuffle with a variable mask we recognise is PSHUFB, which influences if its worth the cost of mask creation/loading of a combined target shuffle with a variable mask. This change sets up the infrastructure to support other shuffles in the future but has no effect yet. llvm-svn: 275059	2016-07-11 12:49:35 +00:00
Elena Demikhovsky	d84f337953	AVX-512: DAG lowering for scalar MIN/MAX commutable ops DAG lowering was missing for the scalar FMINC, FMAXC nodes. The nodes are generated only in the "unsafe-fp-math" mode. Added tests. llvm-svn: 275048	2016-07-11 06:08:06 +00:00
Craig Topper	7ee070e7bc	[AVX512] Add support for 512-bit ANDN now that all ones build vectors survive long enough to allow the matching. llvm-svn: 275046	2016-07-11 05:36:53 +00:00
Craig Topper	516e14cd8e	[AVX512] Use vpternlog with an immediate of 0xff to create 512-bit all one vectors. llvm-svn: 275045	2016-07-11 05:36:48 +00:00
Craig Topper	8674849d6e	[X86] Add the AVX512 SET0 pseudos to foldMemoryOperandImpl since they are marked for CanFoldAsLoad. I don't really know how to test this. llvm-svn: 275044	2016-07-11 05:36:41 +00:00
Simon Pilgrim	ee4a33ae46	[X86][SSE] Relax type assertions for matchVectorShuffleAsInsertPS Calls to matchVectorShuffleAsInsertPS only need to ensure the inputs are 128-bit vectors. Only lowerVectorShuffleAsInsertPS needs to ensure that they are v4f32. llvm-svn: 275028	2016-07-10 22:26:05 +00:00
Simon Pilgrim	2191faa433	[X86][SSE] Add support for target shuffle combining to PSHUFLW/PSHUFHW llvm-svn: 275022	2016-07-10 21:02:47 +00:00
Craig Topper	0b0954570a	[AVX512] Add support for lowering to 512-bit SHUFPS. llvm-svn: 275011	2016-07-10 05:55:53 +00:00
Simon Pilgrim	606126e848	[X86][SSE] Add support for target shuffle combining to INSERTPS llvm-svn: 274990	2016-07-09 21:47:55 +00:00
Simon Pilgrim	59c6a211cd	[X86][SSE] Use scaleShuffleMask helper. NFCI. llvm-svn: 274988	2016-07-09 21:12:03 +00:00
Craig Topper	70610cf7b6	[X86] Remove and autoupgrade 512-bit non-temporal store intrinsics. llvm-svn: 274966	2016-07-09 04:38:27 +00:00
Simon Pilgrim	950419f948	[X86][AVX2] Add support for target shuffle combining to VPERMPD/VPERMQ llvm-svn: 274908	2016-07-08 19:23:29 +00:00
Simon Pilgrim	828c731880	[X86][SSE] Accept any shuffle mask that is all zeroes Until we have a better way to extract constants through bitcasted build vectors (and how to handle undefs of partial lanes etc.) at least accept build vectors that are all zeroes. llvm-svn: 274833	2016-07-08 10:39:12 +00:00
Craig Topper	f7bf6de0af	[AVX512] Remove and autoupgrade a duplicate set of 512-bit masked shift intrinsics. I'm not sure if clang ever used these builtin names or not. llvm-svn: 274827	2016-07-08 06:14:47 +00:00
Michael Kuperstein	3e3652aef2	Recommit r274692 - [X86] Transform setcc + movzbl into xorl + setcc xorl + setcc is generally the preferred sequence due to the partial register stall setcc + movzbl suffers from. As a bonus, it also encodes one byte smaller. This fixes PR28146. The original commit tried inserting an 8bit-subreg into a GR32 (not GR32_ABCD) which was not appreciated by fast regalloc on 32-bit. llvm-svn: 274802	2016-07-07 22:50:23 +00:00
Michael Kuperstein	edb38a94f8	Revert r274692 to check whether this is what breaks windows selfhost. llvm-svn: 274771	2016-07-07 16:55:35 +00:00
Rafael Espindola	b34cba97b7	Don't crash trying to relax 32 loads on COFF. Fixes pr28452. llvm-svn: 274754	2016-07-07 14:00:07 +00:00
Michael Kuperstein	1ef6c59b1d	[X86] Transform setcc + movzbl into xorl + setcc xorl + setcc is generally the preferred sequence due to the partial register stall setcc + movzbl suffers from. As a bonus, it also encodes one byte smaller. This fixes PR28146. Differential Revision: http://reviews.llvm.org/D21774 llvm-svn: 274692	2016-07-06 21:56:18 +00:00
Rafael Espindola	a29971faeb	Add initial support for R_386_GOT32X. This adds it only for movl mov@GOT(%reg), %reg. llvm-svn: 274678	2016-07-06 21:19:11 +00:00

... 3 4 5 6 7 ...

13866 Commits