llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	f5d05fb0ce	[X86] Add VRCPSSr_Int, VRSQRTSSr_Int, VSQRTSSr_Int, and VSQRTSDr_Int to hasUndefRegUpdate. llvm-svn: 277931	2016-08-06 19:31:44 +00:00
Simon Pilgrim	7d168e19e8	[X86][SSE] Enable commutation between MOVHLPS and UNPCKHPD Assuming SSE2 is available then we can safely commute between these, removing some unnecessary register moves and improving memory folding opportunities. VEX encoded versions don't benefit so I haven't added support to them. llvm-svn: 277930	2016-08-06 18:40:28 +00:00
Simon Pilgrim	f56309f11a	[X86][SSE] Add 2 input shuffle support to matchBinaryVectorShuffle Not actually used yet... llvm-svn: 277919	2016-08-06 11:22:39 +00:00
Benjamin Kramer	b7d3311c77	Move helpers into anonymous namespaces. NFC. llvm-svn: 277916	2016-08-06 11:13:10 +00:00
Simon Pilgrim	69b6a70834	[X86][SSE] Add initial support for 2 input target shuffle combining. At the moment only the INSERTPS matching can actually use 2 inputs but the plumbing is now in place. llvm-svn: 277839	2016-08-05 17:36:14 +00:00
Simon Pilgrim	24dc1e7a90	[X86][SSE] Update the the target shuffle matches to use the effective mask's value type directly instead of via the input value type. Preparation for adding 2 input support so we want to avoid unnecessary references to the input value type. llvm-svn: 277817	2016-08-05 14:33:11 +00:00
Simon Pilgrim	7080005e67	[X86][SSE] Consistently use the target shuffle root value type for vector size calculations. NFCI. Preparation for adding 2 input support so we want to avoid unnecessary references to the input value type. llvm-svn: 277814	2016-08-05 13:02:53 +00:00
Simon Pilgrim	6f7b0cd530	[X86][SSE] Added target shuffle combine binary compute matching function. NFCI. Added matchBinaryPermuteVectorShuffle and moved the blend+zero and insertps matching code into it. llvm-svn: 277808	2016-08-05 11:16:53 +00:00
Michael Kuperstein	3ceac2bbd5	[LV, X86] Be more optimistic about vectorizing shifts. Shifts with a uniform but non-constant count were considered very expensive to vectorize, because the splat of the uniform count and the shift would tend to appear in different blocks. That made the splat invisible to ISel, and we'd scalarize the shift at codegen time. Since r201655, CodeGenPrepare sinks those splats to be next to their use, and we are able to select the appropriate vector shifts. This updates the cost model to to take this into account by making shifts by a uniform cheap again. Differential Revision: https://reviews.llvm.org/D23049 llvm-svn: 277782	2016-08-04 22:48:03 +00:00
Simon Pilgrim	3dbce52c16	[X86][SSE] Rename target shuffle unary permute matching function. NFCI. In preparation for adding a binary permute matching function. llvm-svn: 277737	2016-08-04 17:16:50 +00:00
Simon Pilgrim	c2370b810d	[X86][SSE] Split off shuffle mask canonicalization from lowerVectorShuffle. NFCI. The new function now returns true if the shuffle should be commuted. This will allow target shuffle combines to share the code. llvm-svn: 277728	2016-08-04 14:21:32 +00:00
Nikolai Bozhenov	f679530ba1	[X86] Heuristic to selectively build Newton-Raphson SQRT estimation On modern Intel processors hardware SQRT in many cases is faster than RSQRT followed by Newton-Raphson refinement. The patch introduces a simple heuristic to choose between hardware SQRT instruction and Newton-Raphson software estimation. The patch treats scalars and vectors differently. The heuristic is that for scalars the compiler should optimize for latency while for vectors it should optimize for throughput. It is based on the assumption that throughput bound code is likely to be vectorized. Basically, the patch disables scalar NR for big cores and disables NR completely for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores. Secondly, vector SQRT has been greatly improved in Skylake and has better throughput compared to NR. Differential Revision: https://reviews.llvm.org/D21379 llvm-svn: 277725	2016-08-04 12:47:28 +00:00
Simon Pilgrim	5d5ca9c0cb	[X86][SSE] Add initial costs for vector CTTZ/CTLZ llvm-svn: 277716	2016-08-04 10:51:41 +00:00
Simon Pilgrim	8ae6dad49b	[X86][SSE] Don't decide when to scalarize CTTZ/CTLZ for performance at lowering - this is what cost models are for Improved CTTZ/CTLZ costings will be added shortly llvm-svn: 277713	2016-08-04 10:14:39 +00:00
Dean Michael Berris	7e9abea2ae	[XRay] Align entry and return sleds to 2 byte boundaries This should ensure that we can atomically write two bytes (on top of the retq and the one past it) and have those two bytes not straddle cache lines. We also move the label past the alignment instruction so that we can refer to the actual first instruction, as opposed to potential padding before the aligned instruction. Update the tests to allow us to reflect the new order of assembly. Reviewers: rSerge, echristo, majnemer Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D23101 llvm-svn: 277701	2016-08-04 07:37:28 +00:00
Simon Pilgrim	898f030f70	[X86][SSE] Enable target shuffle combining to combine multiple shuffle inputs. We currently only support combining target shuffles that consist of a single source input (plus elements known to be undef/zero). This patch generalizes the recursive combining of the target shuffle to collect all the inputs, merging any duplicates along the way, into a full set of src ops and its shuffle mask. We uncover a number of cases where we have failed to combine a unary shuffle because the input has been duplicated and separated during lowering. This will allow us to combine to 2-input shuffles in a future patch. Differential Revision: https://reviews.llvm.org/D22859 llvm-svn: 277631	2016-08-03 19:08:24 +00:00
Igor Breger	c59b3a2236	[AVX512] Add aliases for vcvttss2si{l\|q}, vcvttsd2si{l\|q}, vcvttss2usi{l\|q}, vcvttsd2usi{l\|q} instructions. Differential Revision: http://reviews.llvm.org/D23111 llvm-svn: 277586	2016-08-03 10:58:05 +00:00
Dean Michael Berris	0b8f6c8777	[XRay] Make the xray_instr_map section specification more correct Summary: We also add a test to show what currently happens when we create a section per function and emit an xray_instr_map. This illustrates the relationship (or lack thereof) between the per-function section and the xray_instr_map section. We also change the code generation slightly so that we don't always create group sections, but rather only do so if a function where the table is associated with is in a group. Also in this change: - Remove the "merge" flag on the xray_instr_map section. - Test that we're generating the right table for comdat and non-comdat functions. Reviewers: echristo, majnemer Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D23104 llvm-svn: 277580	2016-08-03 07:21:55 +00:00
Nirav Dave	8601ac11aa	[MC] Fix Intel Operand assembly parsing for .set ids Recommitting after fixing overaggressive fastpath return in parsing. Fix intel syntax special case identifier operands that refer to a constant (e.g. .set <ID> n) to be interpreted as immediate not memory in parsing. Associated commit to fix clang test commited shortly. Reviewers: rnk Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D22585 llvm-svn: 277489	2016-08-02 17:56:03 +00:00
Igor Breger	f44b79d08e	[AVX512] Don't use i128 masked gather/scatter/load/store. Do more accurately dataWidth check. Differential Revision: http://reviews.llvm.org/D23055 llvm-svn: 277435	2016-08-02 09:15:28 +00:00
Craig Topper	9433f975d0	[AVX-512] Mark VADDPS/PD and VMULPS/PD as commutable. This necessitated adding itineraries to all of the instructions that use the avx512_fp_binop_p class. llvm-svn: 277422	2016-08-02 06:16:53 +00:00
Craig Topper	553535848f	[AVX-512] Use SSE_MUL_ITINS_S/SSE_DIV_ITINS_S for the scalar FMUL/FDIV instructions to match SSE/AVX. llvm-svn: 277421	2016-08-02 06:16:51 +00:00
Craig Topper	05948fb36c	[AVX-512] Correct ExeDomain for many AVX-512 instructions. llvm-svn: 277416	2016-08-02 05:11:15 +00:00
Hans Wennborg	7a3a49b18a	Revert r276895 "[MC][X86] Fix Intel Operand assembly parsing for .set ids" This caused PR28805. Adding a regression test. llvm-svn: 277402	2016-08-01 23:00:01 +00:00
Simon Pilgrim	46f119a59f	[X86] Use implicit masking of SHLD/SHRD shift double instructions Similar to the regular shift instructions, SHLD/SHRD only use the bottom bits of the shift value llvm-svn: 277341	2016-08-01 12:11:43 +00:00
Craig Topper	c48c029610	[AVX-512] Fix duplicate column in AVX512 execution dependency table that was preventing VMOVDQU32/VMOVDQA32 from being recognized. Fix a bug in the code that stops execution dependency fix from turning operations on 32-bit integer element types into operations on 64-bit integer element types. llvm-svn: 277327	2016-08-01 07:55:33 +00:00
Craig Topper	749a111f1e	[AVX-512] Teach X86InstrInfo::getLargestLegalSuperClass to inflate to FR32X/FR64X if AVX512 is supported and VR128X/VR256X if VLX is supported. Had to update a stack folding test to clobber the other 16 registers since this now made them get used instead of spilling. llvm-svn: 277321	2016-08-01 05:31:50 +00:00
Craig Topper	3946176314	[AVX-512] Use FR32X/FR64X/VR128X/VR256X register classes in addRegisterClass if AVX512(for FR32X/FR64) or VLX(for VR128X/VR256) is supported. This is a minimal requirement to be able to allocate all 32 registers. llvm-svn: 277319	2016-08-01 04:29:13 +00:00
Craig Topper	da50eec26d	[X86] Move mask register handling into the main switch of getLoadStoreRegOpcode. No functional change intended. llvm-svn: 277318	2016-08-01 04:29:11 +00:00
Craig Topper	c0097dc7d0	[X86] Simplify code for determing GR or FR reg classes by querying for super classes instead of manually listing individual classes. llvm-svn: 277306	2016-07-31 20:20:08 +00:00
Craig Topper	7afdc0fb25	[AVX512] Always use EVEX encodings for 128/256-bit move instructions in getLoadStoreRegOpcode if VLX is supported. llvm-svn: 277305	2016-07-31 20:20:05 +00:00
Craig Topper	4c53e60360	[AVX512] Add VLX packed move instructions to the execution dependency fix pass and update tests. llvm-svn: 277304	2016-07-31 20:20:01 +00:00
Craig Topper	eb1cc981a5	[AVX512] Move FR32X/FR64X handling in getLoadStoreRegOpcode into the main switch. No functional change intended. llvm-svn: 277303	2016-07-31 20:19:55 +00:00
Craig Topper	338ec9a0cb	[AVX512] Stop treating VR512 specially in getLoadStoreRegOpcode and use the regular switch which already tried to handle it, but was unreachable. This has the added benefit of enabling aligned loads/stores if the stack is aligned. llvm-svn: 277302	2016-07-31 20:19:53 +00:00
Craig Topper	2a6bbb8203	[AVX512] Add X86::VR512RegClassID to X86RegisterInfo::getLargestLegalSuperClass. llvm-svn: 277301	2016-07-31 20:19:50 +00:00
Simon Pilgrim	6be48e4aa7	[X86] Improve 64-bit shifts on 32-bit targets (PR14593) As discussed on PR14593, this patch adds support for lowering to SHLD/SHRD from the patterns generated by DAGTypeLegalizer::ExpandShiftWithKnownAmountBit. Differential Revision: https://reviews.llvm.org/D23000 llvm-svn: 277299	2016-07-31 19:50:45 +00:00
Craig Topper	00d34ed64f	[AVX-512] Don't let ExeDependencyFix pass convert VPANDD/Q to VPANDPS/PD unless DQI instructions are supported. Same for ANDN, OR, and XOR. Thanks to Igor Breger for pointing out my mistake. llvm-svn: 277292	2016-07-31 17:15:07 +00:00
Elena Demikhovsky	6e9b16054f	AVX-512: Removed AssertZext node before TRUNCATE Removed AssertZext node, which was inserted between X86ISD::SETCC and "truncate to i1". Differential Revision: https://reviews.llvm.org/D22850 llvm-svn: 277289	2016-07-31 06:48:01 +00:00
Simon Pilgrim	5e0d6b509a	Strip trailing whitespace llvm-svn: 277280	2016-07-30 20:53:21 +00:00
Simon Pilgrim	8bbd3650a6	[X86] Use peekThroughOneUseBitcasts helper function llvm-svn: 277279	2016-07-30 20:51:26 +00:00
Simon Pilgrim	cf49fa3251	[X86][SSE] Let 64-bit targets use the fast 2i32-2f32 UINT_TO_FP conversion as well as 32-bit The 2i32-2i64 legalization means that we can use the slightly quicker double bits + fptrunc approach for the same results llvm-svn: 277271	2016-07-30 14:06:59 +00:00
Benjamin Kramer	205159c628	[X86] Fix lifetime of SMRange temporaries. Found by asan -fsanitize-address-use-after-scope. llvm-svn: 277266	2016-07-30 11:31:24 +00:00
Michael Kuperstein	f396b4c40d	[X86] Match PSADBW in straight-line code Up until now, we only had code to match PSADBW patterns that look like what comes out of the loop vectorizer - a partial reduction inside the loop body that gets fed into a horizontal operation in a different basic block. This adds support for straight-line patterns, like those generated by the SLP vectorizer. Differential Revision: https://reviews.llvm.org/D22889 llvm-svn: 277219	2016-07-29 21:45:51 +00:00
Simon Pilgrim	f107ffa8f0	[X86][AVX] Fix VBROADCASTF128 selection bug (PR28770) Support for lowering to VBROADCASTF128 etc. in D22460 was not correctly ensuring that the only users of the 128-bit vector load were the insertions of the vector into the lower/upper subvectors. llvm-svn: 277214	2016-07-29 21:05:10 +00:00
David L Kreitzer	8b959e5cfa	Avoid unnecessary 32-bit to 64-bit zero extensions following 32-bit CMOV instructions on x86_64. The 32-bit CMOV implicitly zero extends. Differential Revision: https://reviews.llvm.org/D22941 llvm-svn: 277148	2016-07-29 15:09:54 +00:00
Simon Pilgrim	cb780b32a3	[X86][SSE] Optimize the truncation of vector comparison results with PACKSS We currently default to using either generic shuffles or MASK+PACKUS/PACKSS to truncate all integer vectors. For vector comparisons, we know that the result will be either all or zero bits in every element, which can be efficiently truncated by directly using PACKSS to repeatedly halve the size of each element. Due to the limited input values (-1 or 0) we don't need to account for vector element size, so for simplicity we just use the PACKSS(vXi16,vXi16) implementation in all cases. Additionally for AVX2 PACKSS of 256bit data we must perform a PERMQ shuffle to reorder the data into the correct order. I did investigate performing a single shuffle after all the PACKSS calls but the need to cross 128bit lanes makes this difficult to achieve efficiently. We avoid performing this on AVX512 as it should have better alternative truncation instructions. Differential Revision: https://reviews.llvm.org/D22814 llvm-svn: 277132	2016-07-29 10:23:10 +00:00
Craig Topper	e4f868ea16	[AVX512] Mark EVEX VMOVSSrm and VMOVSDrm as canFoldAsLoad and isReMaterializable. llvm-svn: 277120	2016-07-29 06:06:04 +00:00
Craig Topper	5625d24977	[AVX512] Copy the patterns that recognize scalar arimetic operations inserting into the lower element of a packed vector from AVX/SSE so that we can use EVEX encoded instructions. llvm-svn: 277119	2016-07-29 06:06:00 +00:00
Craig Topper	c7de3a1018	[AVX512] Remove the intrinsic forms of VMOVSS/VMOVSD. We don't need two different forms of 'rr' and 'rm'. This matches SSE/AVX. I'm not convinced the patterns for the rm_Int was correct anyway. It had a tied source that should't exist for the unmasked version. The load form of MOVSS always zeros the most significant bits. I've left the patterns off the masked load instructions as I'm not sure what the correct pattern should be and we don't have any tests currently. Nor do we implement masked scalar load intrinsics in clang currently. llvm-svn: 277098	2016-07-29 02:49:08 +00:00
Matthias Braun	941a705b7b	MachineFunction: Return reference for getFrameInfo(); NFC getFrameInfo() never returns nullptr so we should use a reference instead of a pointer. llvm-svn: 277017	2016-07-28 18:40:00 +00:00
Craig Topper	7e27885f69	[X86] Remove CustomInserter for FMA3 instructions. Looks like since we got full commuting support for FMAs after this was added, the coalescer can now get this right on its own. Differential Revision: https://reviews.llvm.org/D22799 llvm-svn: 276987	2016-07-28 15:28:56 +00:00
Michael Kuperstein	e7605e28f9	[X86] Factor out another piece of the SAD combine. NFCI. llvm-svn: 276918	2016-07-27 20:59:51 +00:00
Nirav Dave	06a99a46e2	[MC][X86] Fix Intel Operand assembly parsing for .set ids Fix intel syntax special case identifier operands that refer to a constant (e.g. .set <ID> n) to be interpreted as immediate not memory in parsing. Reviewers: rnk Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D22585 llvm-svn: 276895	2016-07-27 17:39:41 +00:00
Michael Kuperstein	2dc08f7df8	[X86] Split out absdiff detection from SAD combine. NFC. Preparation for supporting PSADBW emission for straight-line code. llvm-svn: 276798	2016-07-26 20:01:29 +00:00
Simon Pilgrim	019e102426	[X86][SSE] Fixed issue with memory folding of (v)cvtsd2ss intrinsics Fixed typo in the intrinsic definitions of (v)cvtsd2ss with memory folding. This was only unearthed when rL276102 started using the intrinsic again..... llvm-svn: 276740	2016-07-26 10:41:28 +00:00
Simon Pilgrim	28c7d7093d	Fixed spelling in comment llvm-svn: 276738	2016-07-26 09:55:31 +00:00
Craig Topper	79011a660e	[X86] Remove isCommutable=1 from instructions that also load. Commuting such instruction isn't useful as it would unfold the load. The exception being FMA3 instructions. llvm-svn: 276733	2016-07-26 08:06:18 +00:00
Craig Topper	26000f8d90	[AVX512] Don't mark ADDSSZr_Int or MULSSZr_Int as commutable. The intrinsics have one of their arguments indicated as passing through the high bits and we can't commute that. llvm-svn: 276732	2016-07-26 08:06:14 +00:00
Joel Jones	373d7d30dd	MC] Provide an MCTargetOptions to implementors of MCAsmBackendCtorTy, NFC Some targets, notably AArch64 for ILP32, have different relocation encodings based upon the ABI. This is an enabling change, so a future patch can use the ABIName from MCTargetOptions to chose which relocations to use. Tested using check-llvm. The corresponding change to clang is in: http://reviews.llvm.org/D16538 Patch by: Joel Jones Differential Revision: https://reviews.llvm.org/D16213 llvm-svn: 276654	2016-07-25 17:18:28 +00:00
Elena Demikhovsky	64e5f929d0	AVX-512: Fixed [US]INT_TO_FP selection for i1 vectors. It failed with assertion before this patch. Differential Revision: https://reviews.llvm.org/D22735 llvm-svn: 276648	2016-07-25 16:51:00 +00:00
Craig Topper	ce415ff9c5	[AVX512] Add load folding support for the unmasked forms of the FMA instructions. llvm-svn: 276615	2016-07-25 07:20:35 +00:00
Craig Topper	318e40b6f7	[AVX512] Add some additional patterns so that we can fold broadcast loads in the first argument of an FMADD/FMSUB/FNMADD/FNMSUB/FMADDSUB/FMSUBADD node. Also add patterns to support all combinations of the broadcast input and the preserved input for masked versions. llvm-svn: 276614	2016-07-25 07:20:31 +00:00
Craig Topper	6bcbf5338c	[AVX512] Cleanup FMA operand order in patterns to match the VEX versions and to really be 213, 231, and 132. llvm-svn: 276613	2016-07-25 07:20:28 +00:00
Simon Pilgrim	381a0ade5a	[X86] Add 'FeatureSlowSHLD' to cpu 'bdver4' As with all AMD CPUs, excavator has poor SHLD/SHRD performance. Also added bdver3 to the test as it was missing. llvm-svn: 276569	2016-07-24 16:00:53 +00:00
Craig Topper	2dca3b287b	[X86] Make the FMA3 instruction names consistent between VEX and EVEX encoded versions. This places the 132/213/231 form number in front of the SS/SD/PS/PD. Move the Y for 256-bit versions to be after the PS/PD. Change the AVX512 scalar forms to include a Z in the their name. This new format should be consistent with the general naming of instructions. llvm-svn: 276559	2016-07-24 08:26:38 +00:00
Craig Topper	05629d05c7	[X86] Replace CodeGenOnly VPSRAVW/D/Q_Int instructions with patterns since the operand types exactly match the normal VPSRAVW/D/Q instructions. llvm-svn: 276555	2016-07-24 07:32:45 +00:00
Craig Topper	8152b9cd96	[X86] Fix typo in comment. llvm-svn: 276528	2016-07-23 16:44:08 +00:00
Craig Topper	b6519db90d	[AVX512] Implement commuting support for EVEX encoded FMA3 instructions. llvm-svn: 276521	2016-07-23 07:16:56 +00:00
Craig Topper	6172b0b3e9	[X86] Make one of the FMA3 commuting methods static. Remove a call to isFMA3 just to get the IsIntrisic flag, instead get it during the first call and pass it along. NFC llvm-svn: 276520	2016-07-23 07:16:53 +00:00
Craig Topper	ca8f5f309c	[X86] Fix switch statement indentation per coding standards. llvm-svn: 276519	2016-07-23 07:16:50 +00:00
Simon Pilgrim	ea0d4f9962	[X86][AVX] Added support for lowering to VBROADCASTF128/VBROADCASTI128 (reapplied) As reported on PR26235, we don't currently make use of the VBROADCASTF128/VBROADCASTI128 instructions (or the AVX512 equivalents) to load+splat a 128-bit vector to both lanes of a 256-bit vector. This patch enables lowering from subvector insertion/concatenation patterns and auto-upgrades the llvm.x86.avx.vbroadcastf128.pd.256 / llvm.x86.avx.vbroadcastf128.ps.256 intrinsics to match. We could possibly investigate using VBROADCASTF128/VBROADCASTI128 to load repeated constants as well (similar to how we already do for scalar broadcasts). Reapplied with fix for PR28657 - removed intrinsic definitions (clang companion patch to be be submitted shortly). Differential Revision: https://reviews.llvm.org/D22460 llvm-svn: 276416	2016-07-22 13:58:44 +00:00
Benjamin Kramer	5ba0e20315	Revert "[X86][AVX] Added support for lowering to VBROADCASTF128/VBROADCASTI128" It caused PR28657. This reverts commit r276281. llvm-svn: 276405	2016-07-22 11:03:10 +00:00
Craig Topper	52e2e8381b	[AVX512] Add ExeDomain to vector extend and truncate instructions. llvm-svn: 276394	2016-07-22 05:46:44 +00:00
Craig Topper	f4151bea72	[AVX512] Add initial support for the Execution Domain fixing pass to change some EVEX instructions. llvm-svn: 276393	2016-07-22 05:00:52 +00:00
Craig Topper	5ec33a9411	[AVX512] Fix the ExeDomain for some packed fp instructions. llvm-svn: 276392	2016-07-22 05:00:42 +00:00
Craig Topper	0b90756b0a	[AVX512] Add load folding for some AVX512VL logic and arithmetic instructions. llvm-svn: 276391	2016-07-22 05:00:39 +00:00
Craig Topper	ab13b33ded	[AVX512] Update X86InstrInfo::foldMemoryOperandCustom to handle the EVEX encoded instructions too. llvm-svn: 276390	2016-07-22 05:00:35 +00:00
Michael Kuperstein	c523333bbf	[X86] Do not use AND8ri8 in AVX512 pattern This variant is (as documented in the TD) for disassembler use only, and should not be used in patterns - it is longer, and is broken on 64-bit. llvm-svn: 276347	2016-07-21 22:24:08 +00:00
Simon Pilgrim	88e0940d3b	[X86][SSE] Allow folding of store/zext with PEXTRW of 0'th element Under normal circumstances we prefer the higher performance MOVD to extract the 0'th element of a v8i16 vector instead of PEXTRW. But as detailed on PR27265, this prevents the SSE41 implementation of PEXTRW from folding the store of the 0'th element. Additionally it prevents us from making use of the fact that the (SSE2) reg-reg version of PEXTRW implicitly zero-extends the i16 element to the i32/i64 destination register. This patch only preferentially lowers to MOVD if we will not be zero-extending the extracted i16, nor prevent a store from being folded (on SSSE41). Fix for PR27265. Differential Revision: https://reviews.llvm.org/D22509 llvm-svn: 276289	2016-07-21 14:54:17 +00:00
Simon Pilgrim	b11bdd95f6	[X86][SSE] Pull out duplicate EXTRW lowering code. NFCI. As requested on D22509, I've pulled out the v8i16 extraction lowering as the SSE41 and pre-SSE41 implementations are effectively the same. llvm-svn: 276285	2016-07-21 14:30:17 +00:00
Simon Pilgrim	c8e20b1150	[X86][AVX] Added support for lowering to VBROADCASTF128/VBROADCASTI128 As reported on PR26235, we don't currently make use of the VBROADCASTF128/VBROADCASTI128 instructions (or the AVX512 equivalents) to load+splat a 128-bit vector to both lanes of a 256-bit vector. This patch enables lowering from subvector insertion/concatenation patterns and auto-upgrades the llvm.x86.avx.vbroadcastf128.pd.256 / llvm.x86.avx.vbroadcastf128.ps.256 intrinsics to match. We could possibly investigate using VBROADCASTF128/VBROADCASTI128 to load repeated constants as well (similar to how we already do for scalar broadcasts). Differential Revision: https://reviews.llvm.org/D22460 llvm-svn: 276281	2016-07-21 14:10:54 +00:00
Matthias Braun	ca8210a952	X86InstrInfo: No need for liveness analysis in classifyLEAReg() classifyLEAReg() deals with switching operands from 32bit to 64bit in order to use a LEA64_32 instruction (for three address code goodness). It currently performs a liveness analysis to determine the kill/undef flag for the newly added operand. This should not be necessary: - If the previous operand had a kill flag, then the 32bit part of the register gets killed, this will kill the super register as well. - If the previous operand had an undef flag then we didn't care what value we read, just use the same flag on the new operand. (No matter what an operand with an undef flag won't affect liveness) This makes the code independent of the presence of kill flags because it avoids a call to MachineBasicBlock::computeRegisterLiveness(). Differential Revision: http://reviews.llvm.org/D22283 llvm-svn: 276222	2016-07-21 00:33:38 +00:00
Simon Pilgrim	1b4f511aaa	[X86][SSE] Add cost model values for CTPOP of vectors This patch adds costs for the vectorized implementations of CTPOP, the default values were seriously underestimating the cost of these and was encouraging vectorization on targets where serialized use of POPCNT would be much better. Differential Revision: https://reviews.llvm.org/D22456 llvm-svn: 276104	2016-07-20 10:41:28 +00:00
Craig Topper	09b99c3a75	[AVX512] Add a missing NoVLX to give priority to the AVX512 version of the pattern. llvm-svn: 276088	2016-07-20 05:05:50 +00:00
Craig Topper	7a95428f7c	[X86] Use 'HasAVX1Only' to properly give priority to the AVX2 version without relying on file ordering. llvm-svn: 276087	2016-07-20 05:05:48 +00:00
Craig Topper	cde436a663	[X86] Create some multiclasses to reduce the repeated patterns for VEXTRACT(F/I)128/VINSERT(I/F)128. NFC llvm-svn: 276086	2016-07-20 05:05:46 +00:00
Craig Topper	55913ead3d	[X86] Create some wrapper multiclasses to create AVX and SSE shift instructions with less repeated code. NFC llvm-svn: 276085	2016-07-20 05:05:44 +00:00
Simon Pilgrim	0ea8d275cc	[X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic IR D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ truncating conversions with generic IR instead. It turns out that the behaviour of these intrinsics is different enough from generic IR that this will cause problems, INF/NAN/out of range values are guaranteed to result in a 0x80000000 value - which plays havoc with constant folding which converts them to either zero or UNDEF. This is also an issue with the scalar implementations (which were already generic IR and what I was trying to match). This patch changes both scalar and packed versions back to using x86-specific builtins. It also deals with the other scalar conversion cases that are runtime rounding mode dependent and can have similar issues with constant folding. A companion clang patch is at D22105 Differential Revision: https://reviews.llvm.org/D22106 llvm-svn: 275981	2016-07-19 15:07:43 +00:00
Elena Demikhovsky	2c0780b8e5	AVX-512: Fixed BT instruction selection. The following condition expression ( a >> n) & 1 is converted to "bt a, n" instruction. It works on all intel targets. But on AVX-512 it was broken because the expression is modified to (truncate (a >>n) to i1). I added the new sequence (truncate (a >>n) to i1) to the BT pattern. Differential Revision: https://reviews.llvm.org/D22354 llvm-svn: 275950	2016-07-19 07:14:21 +00:00
Craig Topper	d6ca1dc45e	[AVX512] Give priority to EVEX encoded PSHUFB over the VEX versions. llvm-svn: 275942	2016-07-19 02:00:38 +00:00
Craig Topper	592dc30708	[X86] Remove superfluous parameter from a multiclass. All instantiations passed the same value. llvm-svn: 275941	2016-07-19 02:00:35 +00:00
Craig Topper	6189d3ecd4	[X86] Rename VINSERTzrr to use a capital Z to match other instructions. NFC llvm-svn: 275939	2016-07-19 01:26:19 +00:00
Simon Pilgrim	c941f6b329	[X86][AVX] Add target shuffle decode support for VBROADCAST Currently we only decode broadcasts from a vector of the same size. llvm-svn: 275823	2016-07-18 17:32:59 +00:00
Craig Topper	a3c55f5915	[AVX512] Add EVEX versions of scalar ADD/SUB/MUL/DIV to load folding tables. llvm-svn: 275775	2016-07-18 06:49:32 +00:00
Craig Topper	16a0744955	[AVX512] Add KADD/KAND/KOR/KXOR to X86InstrInfo::isAssociativeAndCommutative. llvm-svn: 275771	2016-07-18 06:14:59 +00:00
Craig Topper	463f949a3a	[X86] Add VPMULLW/D/Q instructions to X86InstrInfo::isAssociativeAndCommutative. llvm-svn: 275770	2016-07-18 06:14:57 +00:00
Craig Topper	1af6cc00dc	[X86] Add VPADD instructions to X86InstrInfo::isAssociativeAndCommutative. llvm-svn: 275769	2016-07-18 06:14:54 +00:00
Craig Topper	ba9b93d7f2	[X86] Add floating point packed logical ops to X86InstrInfo::isAssociativeAndCommutative. llvm-svn: 275768	2016-07-18 06:14:50 +00:00
Craig Topper	3a99de4067	[X86] Add AVX512 instructions to X86InstrInfo::isAssociativeAndCommutative. llvm-svn: 275767	2016-07-18 06:14:47 +00:00
Craig Topper	fe5a6dc581	[X86] Add more AVX512 instructions to X86InstrInfo::isHighLatencyDef. Also add all packed fp division instructions. llvm-svn: 275766	2016-07-18 06:14:45 +00:00
Craig Topper	f7a06c29bc	[X86] Add AVX512 load opcodes and a couple AVX load opcodes to X86InstrInfo::areLoadsFromSameBasePtr. llvm-svn: 275765	2016-07-18 06:14:43 +00:00
Craig Topper	650a15e2b3	[X86] Add more opcodes to isFrameLoadOpcode/isFrameStoreOpcode. Mainly AVX-512 related. llvm-svn: 275764	2016-07-18 06:14:39 +00:00
Craig Topper	5c913e84df	[AVX512] Use VMOVAPSZ128rr/VMOVAPS256rr for VR128X/VR256X physreg moves when VLX is supported. Ideally we would use VEX encoded moves instead of EVEX if the high 16 registers aren't referenced, but this a good first step. llvm-svn: 275763	2016-07-18 06:14:34 +00:00
Craig Topper	53f3d1b4d0	[X86] Fix 80-column violations. NFC llvm-svn: 275762	2016-07-18 06:14:26 +00:00
Simon Pilgrim	285d9e4d60	Strip trailing whitespace llvm-svn: 275726	2016-07-17 19:02:27 +00:00
Simon Pilgrim	1be1222293	[X86][SSE] lowerVectorShuffleAsPermuteAndUnpack tidyup. NFCI. Moved unpack type determination into TryUnpack lambda. Added missing comment describing lowerVectorShuffleAsPermuteAndUnpack call. llvm-svn: 275708	2016-07-17 15:48:25 +00:00
Guy Blank	3357ba36e2	test commit llvm-svn: 275703	2016-07-17 12:10:35 +00:00
Craig Topper	8093437f2e	[AVX512] Remove CodeGenOnly VBROADCAST m_Int instructions. They can be implemented with patterns selecting existing instructions. NFC llvm-svn: 275671	2016-07-16 03:42:59 +00:00
Nico Weber	8d66df15f4	Teach fast isel about the win64 calling convention. This mostly just works. Vectorcall rets are still not supported. The win64_eh test change is because fast isel doesn't use rsi for temporary computations, so it doesn't need to be pushed. The test case I'm changing was originally added to test pushes, but by now there are other test cases in that file exercising that code path. https://reviews.llvm.org/D22422 llvm-svn: 275607	2016-07-15 20:18:37 +00:00
Justin Lebar	9c375817ac	[SelectionDAG] Get rid of bool parameters in SelectionDAG::getLoad, getStore, and friends. Summary: Instead, we take a single flags arg (a bitset). Also add a default 0 alignment, and change the order of arguments so the alignment comes before the flags. This greatly simplifies many callsites, and fixes a bug in AMDGPUISelLowering, wherein the order of the args to getLoad was inverted. It also greatly simplifies the process of adding another flag to getLoad. Reviewers: chandlerc, tstellarAMD Subscribers: jholewinski, arsenm, jyknight, dsanders, nemanjai, llvm-commits Differential Revision: http://reviews.llvm.org/D22249 llvm-svn: 275592	2016-07-15 18:27:10 +00:00
Justin Lebar	0af80cd6f0	[CodeGen] Take a MachineMemOperand::Flags in MachineFunction::getMachineMemOperand. Summary: Previously we took an unsigned. Hooray for type-safety. Reviewers: chandlerc Subscribers: dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D22282 llvm-svn: 275591	2016-07-15 18:26:59 +00:00
Jacques Pienaar	71c30a14b7	Rename AnalyzeBranch* to analyzeBranch*. Summary: NFC. Rename AnalyzeBranch/AnalyzeBranchPredicate to analyzeBranch/analyzeBranchPredicate to follow LLVM coding style and be consistent with TargetInstrInfo's analyzeCompare and analyzeSelect. Reviewers: tstellarAMD, mcrosier Subscribers: mcrosier, jholewinski, jfb, arsenm, dschuff, jyknight, dsanders, nemanjai Differential Revision: https://reviews.llvm.org/D22409 llvm-svn: 275564	2016-07-15 14:41:04 +00:00
Simon Pilgrim	2683ad54ad	[X86][AVX2] Improve lowerShuffleAsRepeatedMaskAndLanePermute permutation of 64-bit sub-lanes As discussed on PR28136, lowerShuffleAsRepeatedMaskAndLanePermute was attempting to match repeated masks at the 128-bit level and then permute the resultant lanes at the 128-bit (AVX1) or 64-bit (AVX2) sub-lane level. This change allows us to create the repeated masks at the sub-lane level (and then concat them together to create a 128-bit repeated mask) and then select which sub-lane to permute. This has no effect on the AVX1 codegen. Fixes PR28136. llvm-svn: 275543	2016-07-15 09:49:12 +00:00
Simon Pilgrim	420b266d0a	[X86][AVX2] Allow VPERMPD/VPERMQ shuffles to call combineShuffle (reapplied) This improves the situation discussed in D19228 where we were forcing VPERMPD/VPERMQ where VPERM2F128/VPERM2I128 would have been better. This was incorrectly reverted in rL275421 during triage of PR28552. llvm-svn: 275497	2016-07-14 23:05:09 +00:00
Nirav Dave	a6c7595d0f	[X86][MC] Fix bracket expression parsing in intel-style assembly. Only perform struct field check on Identifier tokens. Fixes PR28547. Reviewers: rnk Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D22361 llvm-svn: 275445	2016-07-14 17:37:05 +00:00
Nico Weber	5bb284226b	Don't optimize movs to pushes in -O0 builds. https://reviews.llvm.org/D22362 llvm-svn: 275431	2016-07-14 15:40:22 +00:00
Nico Weber	ead8f8ffdd	Delete some trailing whitespace. llvm-svn: 275429	2016-07-14 15:07:44 +00:00
Ahmed Bougacha	85dc93c56b	[X86] Decode MPX BND registers. We were able to assemble, but not disassemble. Note that fixupRMValue was truncating EA_REG_BND0-3 because we hit the uint8_t max. The control registers were already squarely above it, but I don't think they ever go in .r/m, only in .reg. I also did notice an extra REX.W in our encoding, but I think that's fine. llvm-svn: 275427	2016-07-14 14:53:21 +00:00
Ahmed Bougacha	4f7a5e20ae	[X86] Don't mark addressing mode operands as "outs". NFC-ish. Nothing in-tree can tell the difference, but it's incorrect: the addressing mode registers aren't what's defined. llvm-svn: 275426	2016-07-14 14:53:17 +00:00
Nico Weber	3afaf16abc	Revert r275411, it cause PR28552. llvm-svn: 275421	2016-07-14 14:49:35 +00:00
Nico Weber	ecdf45b1e6	Teach fast isel calls and rets about stdcall. stdcall is callee-pop like thiscall, so the thiscall changes already did most of the work for this. This change only opts stdcall in and adds tests. llvm-svn: 275414	2016-07-14 13:54:26 +00:00
Simon Pilgrim	534e3240e8	Remove trailing whitespace. llvm-svn: 275412	2016-07-14 13:29:23 +00:00
Simon Pilgrim	3ecb6bdd5f	[X86][AVX2] Allow VPERMPD/VPERMQ shuffles to call combineShuffle This improves the situation discussed in D19228 where we were forcing VPERMPD/VPERMQ where VPERM2F128/VPERM2I128 would have been better. llvm-svn: 275411	2016-07-14 13:28:43 +00:00
Simon Pilgrim	053d32906f	[X86][AVX] Add support for narrowing 128-bit+ shuffle mask elements to 64-bits to allow combining Primarily this is to allow blend with zero instead of having to use vperm2f128, but we can use this in the future to deal with AVX512 cases where we need to keep the original element size to correctly fold masked operations. llvm-svn: 275406	2016-07-14 12:58:04 +00:00
Simon Pilgrim	a76a8e50e5	[X86][AVX] Add VBROADCASTF128/VBROADCASTI128 shuffle comments support llvm-svn: 275400	2016-07-14 12:07:43 +00:00
Simon Pilgrim	b8c261c931	[X86][AVX2] VBROADCASTSSrr/VBROADCASTSSYrr require AVX2 not AVX llvm-svn: 275391	2016-07-14 10:37:14 +00:00
Craig Topper	6840f1150f	[AVX512] Implement EXTLOAD lowering with patterns to select existing VPMOVZX instructions instead of creating CodeGenOnly instructions. llvm-svn: 275378	2016-07-14 06:41:34 +00:00
Eli Friedman	17e8ea18e9	[X86] Fix stupid typo in isel lowering. Apparently someone miscounted the number of zeros in the immediate. Fixes https://llvm.org/bugs/show_bug.cgi?id=28544 . llvm-svn: 275376	2016-07-14 05:48:25 +00:00
Dean Michael Berris	52735fc435	XRay: Add entry and exit sleds Summary: In this patch we implement the following parts of XRay: - Supporting a function attribute named 'function-instrument' which currently only supports 'xray-always'. We should be able to use this attribute for other instrumentation approaches. - Supporting a function attribute named 'xray-instruction-threshold' used to determine whether a function is instrumented with a minimum number of instructions (IR instruction counts). - X86-specific nop sleds as described in the white paper. - A machine function pass that adds the different instrumentation marker instructions at a very late stage. - A way of identifying which return opcode is considered "normal" for each architecture. There are some caveats here: 1) We don't handle PATCHABLE_RET in platforms other than x86_64 yet -- this means if IR used PATCHABLE_RET directly instead of a normal ret, instruction lowering for that platform might do the wrong thing. We think this should be handled at instruction selection time to by default be unpacked for platforms where XRay is not availble yet. 2) The generated section for X86 is different from what is described from the white paper for the sole reason that LLVM allows us to do this neatly. We're taking the opportunity to deviate from the white paper from this perspective to allow us to get richer information from the runtime library. Reviewers: sanjoy, eugenis, kcc, pcc, echristo, rnk Subscribers: niravd, majnemer, atrick, rnk, emaste, bmakam, mcrosier, mehdi_amini, llvm-commits Differential Revision: http://reviews.llvm.org/D19904 llvm-svn: 275367	2016-07-14 04:06:33 +00:00
Nico Weber	af7e8465e1	Teach fast isel about thiscall (and callee-pop) calls. http://reviews.llvm.org/D22315 llvm-svn: 275360	2016-07-14 01:52:51 +00:00
Nico Weber	eb9488b151	Fix a TODO in X86CallFrameOptimization to not rely on a codegen artifact. This happens to make X86CallFrameOptimization in -O0 / FastISel builds as well, but it's not clear if the pass should run in that setup. http://reviews.llvm.org/D22314 llvm-svn: 275320	2016-07-13 21:38:27 +00:00
Sanjay Patel	610a2f6525	[x86][SSE/AVX] optimize pcmp results better (PR28484) We know that pcmp produces all-ones/all-zeros bitmasks, so we can use that behavior to avoid unnecessary constant loading. One could argue that load+and is actually a better solution for some CPUs (Intel big cores) because shifts don't have the same throughput potential as load+and on those cores, but that should be handled as a CPU-specific later transformation if it ever comes up. Removing the load is the more general x86 optimization. Note that the uneven usage of vpbroadcast in the test cases is filed as PR28505: https://llvm.org/bugs/show_bug.cgi?id=28505 Differential Revision: http://reviews.llvm.org/D22225 llvm-svn: 275276	2016-07-13 16:04:07 +00:00
Simon Pilgrim	a99368fa35	[X86][AVX512] Add support for VPERMILPD/VPERMILPS variable shuffle mask comments llvm-svn: 275272	2016-07-13 15:45:36 +00:00
Simon Pilgrim	48d8340760	[X86][AVX] Add support for target shuffle combining to VPERMILPS variable shuffle mask Added AVX512F VPERMILPS shuffle decoding support llvm-svn: 275270	2016-07-13 15:10:43 +00:00
Simon Pilgrim	57548a6fa6	[X86][SSE] Check for lane crossing shuffles before trying to combine to PSHUFB Removes a return-on-fail that was making it tricky to add other variable mask shuffles. llvm-svn: 275262	2016-07-13 12:48:41 +00:00
Craig Topper	ff1c327ebb	[X86] Remove some seemingly unnecessary patterns that supported vector zext/sext with 256-bit source types producing a 256-bit result. These patterns just extracted the source down to 128-bits to use the instructions. AVX512 seems to have blindly copied them over for VLX, but did not create similar patterns for 512-bit sources. So I'm hoping the backend can't actually produce these cases. llvm-svn: 275240	2016-07-13 02:21:25 +00:00
Simon Pilgrim	6fa71da4a4	[X86][AVX] Add support for target shuffle combining to VPERM2F128/VPERM2I128 llvm-svn: 275212	2016-07-12 20:27:32 +00:00
Matthias Braun	96ec47db74	X86FixupBWInsts: No need for forward liveness analysis. With r274952 and r275201 in place there are no cases left where a forward liveness analysis yields different results than a backward one. So we can remove the forward stepping logic. Differential Revision: http://reviews.llvm.org/D22083 llvm-svn: 275204	2016-07-12 19:04:30 +00:00
Craig Topper	a6e6febe2c	[AVX512] Remove masked logic op intrinsics and autoupgrade them to native IR. llvm-svn: 275155	2016-07-12 05:27:53 +00:00
Duncan P. N. Exon Smith	7b4c18e8f3	X86: Avoid implicit iterator conversions, NFC Avoid implicit conversions from MachineInstrBundleIterator to MachineInstr, mainly by preferring MachineInstr& over MachineInstr and using range-based for loops. llvm-svn: 275149	2016-07-12 03:18:50 +00:00
Nico Weber	c7bf646a99	Teach FastISel about thiscall (and, hence, about callee-pop). http://reviews.llvm.org/D22115 llvm-svn: 275135	2016-07-12 01:30:35 +00:00
Michael Kuperstein	f0c59330e9	[X86] Make some cast costs more precise Make some AVX and AVX512 cast costs more precise. Based on part of a patch by Elena Demikhovsky (D15604). Differential Revision: http://reviews.llvm.org/D22064 llvm-svn: 275106	2016-07-11 21:39:44 +00:00
Quentin Colombet	fb82c7bc94	[X86] Fix tailcall return address clobber bug. This bug (llvm.org/PR28124) was introduced by r237977, which refactored the tail call sequence to be generated in two passes instead of one. Unfortunately, the stack adjustment produced by the first pass was not recognized by X86FrameLowering::mergeSPUpdates() in all cases, causing code such as the following, which clobbers the return address, to be generated: popl %edi popl %edi pushl %eax jmp tailcallee # TAILCALL To fix the problem, the entire stack adjustment is performed in X86ExpandPseudo::ExpandMI() for tail calls. Patch by Magnus Lång <margnus1@gmail.com> Differential Revision: http://reviews.llvm.org/D21325 llvm-svn: 275103	2016-07-11 21:03:03 +00:00
Michael Kuperstein	cfbac5f361	[X86] Disable FixupSetCC for CodeGenOpt::None It is an optimization pass, and should not run at -O0. Especially since Fast RA will not do the required register coalescing anyway, so it's a loss even from the optimization standpoint. This also works around (but doesn't quite fix) PR28489. llvm-svn: 275099	2016-07-11 20:40:44 +00:00
Nirav Dave	57033c6336	Add missing include from previous commit llvm-svn: 275069	2016-07-11 14:32:57 +00:00
Nirav Dave	8603062ee4	Fix branch relaxation in 16-bit mode. Thread through MCSubtargetInfo to relaxInstruction function allowing relaxation to generate jumps with 16-bit sized immediates in 16-bit mode. This fixes PR22097. Reviewers: dwmw2, tstellarAMD, craig.topper, jyknight Subscribers: jfb, arsenm, jyknight, llvm-commits, dsanders Differential Revision: http://reviews.llvm.org/D20830 llvm-svn: 275068	2016-07-11 14:23:53 +00:00
Simon Pilgrim	832463eada	[X86][SSE] Generalise target shuffle combine of shuffles using variable masks At present the only shuffle with a variable mask we recognise is PSHUFB, which influences if its worth the cost of mask creation/loading of a combined target shuffle with a variable mask. This change sets up the infrastructure to support other shuffles in the future but has no effect yet. llvm-svn: 275059	2016-07-11 12:49:35 +00:00
Elena Demikhovsky	d84f337953	AVX-512: DAG lowering for scalar MIN/MAX commutable ops DAG lowering was missing for the scalar FMINC, FMAXC nodes. The nodes are generated only in the "unsafe-fp-math" mode. Added tests. llvm-svn: 275048	2016-07-11 06:08:06 +00:00
Craig Topper	7ee070e7bc	[AVX512] Add support for 512-bit ANDN now that all ones build vectors survive long enough to allow the matching. llvm-svn: 275046	2016-07-11 05:36:53 +00:00
Craig Topper	516e14cd8e	[AVX512] Use vpternlog with an immediate of 0xff to create 512-bit all one vectors. llvm-svn: 275045	2016-07-11 05:36:48 +00:00
Craig Topper	8674849d6e	[X86] Add the AVX512 SET0 pseudos to foldMemoryOperandImpl since they are marked for CanFoldAsLoad. I don't really know how to test this. llvm-svn: 275044	2016-07-11 05:36:41 +00:00
Simon Pilgrim	ee4a33ae46	[X86][SSE] Relax type assertions for matchVectorShuffleAsInsertPS Calls to matchVectorShuffleAsInsertPS only need to ensure the inputs are 128-bit vectors. Only lowerVectorShuffleAsInsertPS needs to ensure that they are v4f32. llvm-svn: 275028	2016-07-10 22:26:05 +00:00
Simon Pilgrim	2191faa433	[X86][SSE] Add support for target shuffle combining to PSHUFLW/PSHUFHW llvm-svn: 275022	2016-07-10 21:02:47 +00:00
Craig Topper	0b0954570a	[AVX512] Add support for lowering to 512-bit SHUFPS. llvm-svn: 275011	2016-07-10 05:55:53 +00:00
Simon Pilgrim	606126e848	[X86][SSE] Add support for target shuffle combining to INSERTPS llvm-svn: 274990	2016-07-09 21:47:55 +00:00
Simon Pilgrim	59c6a211cd	[X86][SSE] Use scaleShuffleMask helper. NFCI. llvm-svn: 274988	2016-07-09 21:12:03 +00:00
Craig Topper	70610cf7b6	[X86] Remove and autoupgrade 512-bit non-temporal store intrinsics. llvm-svn: 274966	2016-07-09 04:38:27 +00:00
Simon Pilgrim	950419f948	[X86][AVX2] Add support for target shuffle combining to VPERMPD/VPERMQ llvm-svn: 274908	2016-07-08 19:23:29 +00:00
Simon Pilgrim	828c731880	[X86][SSE] Accept any shuffle mask that is all zeroes Until we have a better way to extract constants through bitcasted build vectors (and how to handle undefs of partial lanes etc.) at least accept build vectors that are all zeroes. llvm-svn: 274833	2016-07-08 10:39:12 +00:00
Craig Topper	f7bf6de0af	[AVX512] Remove and autoupgrade a duplicate set of 512-bit masked shift intrinsics. I'm not sure if clang ever used these builtin names or not. llvm-svn: 274827	2016-07-08 06:14:47 +00:00
Michael Kuperstein	3e3652aef2	Recommit r274692 - [X86] Transform setcc + movzbl into xorl + setcc xorl + setcc is generally the preferred sequence due to the partial register stall setcc + movzbl suffers from. As a bonus, it also encodes one byte smaller. This fixes PR28146. The original commit tried inserting an 8bit-subreg into a GR32 (not GR32_ABCD) which was not appreciated by fast regalloc on 32-bit. llvm-svn: 274802	2016-07-07 22:50:23 +00:00
Michael Kuperstein	edb38a94f8	Revert r274692 to check whether this is what breaks windows selfhost. llvm-svn: 274771	2016-07-07 16:55:35 +00:00
Rafael Espindola	b34cba97b7	Don't crash trying to relax 32 loads on COFF. Fixes pr28452. llvm-svn: 274754	2016-07-07 14:00:07 +00:00
Michael Kuperstein	1ef6c59b1d	[X86] Transform setcc + movzbl into xorl + setcc xorl + setcc is generally the preferred sequence due to the partial register stall setcc + movzbl suffers from. As a bonus, it also encodes one byte smaller. This fixes PR28146. Differential Revision: http://reviews.llvm.org/D21774 llvm-svn: 274692	2016-07-06 21:56:18 +00:00
Rafael Espindola	a29971faeb	Add initial support for R_386_GOT32X. This adds it only for movl mov@GOT(%reg), %reg. llvm-svn: 274678	2016-07-06 21:19:11 +00:00
Sanjay Patel	04b3496d9b	[x86] fix cost of SINT_TO_FP for i32 --> float (PR21356, PR28434) This is "cvtdq2ps" which does not appear to be particularly slow on any CPU according to Agner's tables. Choosing "5" as a cost here as suggested in: https://llvm.org/bugs/show_bug.cgi?id=21356 ...but it seems very conservative given that the instruction is fully pipelined, and I think these costs are supposed to model throughput. Note that related costs are also most likely too high, but this fixes PR21356 and partly fixes PR28434. llvm-svn: 274658	2016-07-06 19:15:54 +00:00
Michael Kuperstein	1b62e0e91f	[X86] Sort cast cost tables. NFC. Cast cost tables are now sorted, for each cast type, lexicographically on [source base type, source vector width, dest base type, base vector width]. llvm-svn: 274653	2016-07-06 18:26:48 +00:00
Simon Pilgrim	8ff7157513	[X86][SSE] Fixed typo in insertps lowering. We were checking for 2 insertions (which is caught earlier in the pattern matching loop) instead of the case where we have no insertions. Turns out this code never fires as we always try to lower to insertps after trying to lower to blendps, which would catch these cases - I'm about to make some changes to support combining to insertps which could cause this to fire so I don't want to remove it. llvm-svn: 274648	2016-07-06 18:09:08 +00:00
Elena Demikhovsky	ad0a56f3da	Re-commit of 274613. The prev commit failed on compilation. A minor change in one pattern in lib/Target/X86/X86InstrAVX512.td fixes the failure. llvm-svn: 274626	2016-07-06 14:15:43 +00:00
Elena Demikhovsky	02ced295aa	Reverted 274613 due to compilation failue. llvm-svn: 274615	2016-07-06 09:11:49 +00:00
Elena Demikhovsky	5a4f2476fd	AVX-512: Optimization for patterns with i1 scalar type The patch removes redundant kmov instructions (not all, we still have a lot of work here) and redundant "and" instructions after "setcc". I use "AssertZero" marker between X86ISD::SETCC node and "truncate" to eliminate extra "and $1" instruction. I also changed zext, aext and trunc patterns in the .td file. It allows to remove extra "kmov" instruictions. This patch fixes https://llvm.org/bugs/show_bug.cgi?id=28173. Fast ISEL mode is not supported correctly for AVX-512. ICMP/FCMP scalar instruction should return result in k-reg. It will be fixed in one of the next patches. I redirected handling of "cmp" to the DAG builder mode. (The code looks worse in one specific test case, but without this fix the new patch fails). Differential revision: http://reviews.llvm.org/D21956 llvm-svn: 274613	2016-07-06 09:01:20 +00:00
Simon Pilgrim	7643b337a2	[X86][AVX2] Simplified BROADCAST combining to avoid repeated matching attempts llvm-svn: 274583	2016-07-05 22:41:04 +00:00
Simon Pilgrim	bec6543d17	[X86][AVX2] Add support for target shuffle combining to BROADCAST Only support broadcast from vector register so far - memory folding support will have to wait. llvm-svn: 274572	2016-07-05 20:11:29 +00:00
Simon Pilgrim	48adedffb7	[X86][AVX512] Fixed decoding of permd/permpd variable mask shuffles + enabled them for target shuffle combining Corrected element mask masking to extract the bottom index bits (now matches the perm2 implementation but for unary inputs). llvm-svn: 274571	2016-07-05 18:31:17 +00:00
Simon Pilgrim	9769428e08	[X86][AVX512] Remove vector BROADCAST builtins. llvm-svn: 274555	2016-07-05 14:49:58 +00:00
Michael Zuckerman	bdc5f40dca	[LLVM][INTRINSICS] adding intrinsics of CLFLUSHOPT Differential Revision: http://reviews.llvm.org/D21789 llvm-svn: 274553	2016-07-05 14:42:12 +00:00
Simon Pilgrim	3ad040909a	[X86][AVX512] Add support for lowering shuffles to VSHUFPD llvm-svn: 274520	2016-07-04 20:41:24 +00:00
Craig Topper	5d16cd9d63	[AVX512] Remove masked VPERMD/VPERMQ/VPERMILPS/VPERMILPD intrinsics. They were autoupgraded to native IR in r274506 and r274506. llvm-svn: 274519	2016-07-04 19:58:38 +00:00
Simon Pilgrim	c804751a18	[X86] Add shuffle mask rescaling helper function. NFCI. llvm-svn: 274476	2016-07-03 21:28:17 +00:00
Simon Pilgrim	8e84fcf118	[X86][AVX2] Merge unary permute matching behind the same V2.isUndef() condition. NFCI. llvm-svn: 274474	2016-07-03 20:39:42 +00:00
Simon Pilgrim	7f096de0b8	[X86][AVX512] Add support for 512-bit shuffle lowering to VPERMPD/VPERMQ llvm-svn: 274473	2016-07-03 19:50:06 +00:00
Simon Pilgrim	68ea80649b	[X86][AVX512] Add support for VPERMPD/VPERMQ masked shuffle comments llvm-svn: 274469	2016-07-03 18:40:24 +00:00
Simon Pilgrim	a0d73835b2	[X86][AVX512] Add support for 512-bit shuffle decoding of VPERMPD/VPERMQ llvm-svn: 274468	2016-07-03 18:27:37 +00:00
Simon Pilgrim	5080e7f56c	[X86][AVX] Renamed VPERMILPI shuffle comment macros to be more specific llvm-svn: 274467	2016-07-03 18:02:43 +00:00
Simon Pilgrim	dbd6db0dc7	[X86][AVX512] Add support for VPALIGNR/PSHUFD/PSHUFHW/PSHUFLW masked shuffle comments llvm-svn: 274466	2016-07-03 15:00:51 +00:00
Simon Pilgrim	598bdb6bfe	[X86][AVX512] Add support for UNPCK masked shuffle comments llvm-svn: 274464	2016-07-03 14:26:21 +00:00
Simon Pilgrim	1f59076196	[X86][AVX512] Add support for VPERM/VSHUF masked shuffle comments llvm-svn: 274462	2016-07-03 13:55:41 +00:00
Simon Pilgrim	68f438a036	[X86][AVX512] Add support for PMOVZX masked shuffle comments llvm-svn: 274461	2016-07-03 13:33:28 +00:00
Simon Pilgrim	7c2fbdc101	[X86][AVX512] Add support for masked shuffle comments This patch adds support for including the avx512 mask register information in the mask/maskz versions of shuffle instruction comments. This initial version just adds support for MOVDDUP/MOVSHDUP/MOVSLDUP to reduce the mass of test regenerations, other shuffle instructions can be added in due course. Differential Revision: http://reviews.llvm.org/D21953 llvm-svn: 274459	2016-07-03 13:08:29 +00:00
Simon Pilgrim	129b720c18	[X86][AVX512] Add support for lowering shuffles to VPERMILPS llvm-svn: 274458	2016-07-03 12:47:21 +00:00
Simon Pilgrim	a7329dac6f	Fix spelling. llvm-svn: 274451	2016-07-02 20:21:39 +00:00
Simon Pilgrim	99e8a1aa0b	[X86][AVX512] Add support for lowering shuffles to VPERMILPD llvm-svn: 274450	2016-07-02 20:20:12 +00:00
Simon Pilgrim	cde7c54baa	[X86][AVX512] Add support for 512-bit PSHUFB lowering llvm-svn: 274444	2016-07-02 18:14:31 +00:00
Simon Pilgrim	77dda7c2e0	[X86][AVX512] Converted the MOVDDUP/MOVSLDUP/MOVSHDUP masked intrinsics to generic IR llvm-svn: 274443	2016-07-02 17:16:41 +00:00
Simon Pilgrim	f040d8c061	[X86][AVX512] Add support for lowering shuffles to MOVDDUP/MOVSLDUP/MOVSHDUP llvm-svn: 274436	2016-07-02 12:45:03 +00:00
Dehao Chen	ad2b4e1334	Do not count debug instructions when counting number of uses to reorder frame objects. Summary: The code generation should be independent of the debug info. Reviewers: zansari, davidxl, mkuper, majnemer Subscribers: majnemer, llvm-commits Differential Revision: http://reviews.llvm.org/D21911 llvm-svn: 274357	2016-07-01 15:40:25 +00:00
Craig Topper	2bd8b4b180	[CodeGen,Target] Remove the version of DAG.getVectorShuffle that takes a pointer to a mask array. Convert all callers to use the ArrayRef version. No functional change intended. For the most part this simplifies all callers. There were two places in X86 that needed an explicit makeArrayRef to shorten a statically sized array. llvm-svn: 274337	2016-07-01 06:54:47 +00:00
Duncan P. N. Exon Smith	d26fdc83c9	CodeGen: Use MachineInstr& in LiveVariables API, NFC Change all the methods in LiveVariables that expect non-null MachineInstr* to take MachineInstr& and update the call sites. This clarifies the API, and designs away a class of iterator to pointer implicit conversions. llvm-svn: 274319	2016-07-01 01:51:32 +00:00
Duncan P. N. Exon Smith	e4f5e4f4d1	CodeGen: Use MachineInstr& in TargetLowering, NFC This is a mechanical change to make TargetLowering API take MachineInstr& (instead of MachineInstr), since the argument is expected to be a valid MachineInstr. In one case, changed a parameter from MachineInstr to MachineBasicBlock::iterator, since it was used as an insertion point. As a side effect, this removes a bunch of MachineInstr* to MachineBasicBlock::iterator implicit conversions, a necessary step toward fixing PR26753. llvm-svn: 274287	2016-06-30 22:52:52 +00:00
David L Kreitzer	29711c0d83	Test commit. llvm-svn: 274284	2016-06-30 21:43:11 +00:00
Rafael Espindola	d86e8bb0ed	Delete MCCodeGenInfo. MC doesn't really care about CodeGen stuff, so this was just complicating target initialization. llvm-svn: 274258	2016-06-30 18:25:11 +00:00
Rafael Espindola	db6bd02185	Delete unused includes. NFC. llvm-svn: 274225	2016-06-30 12:19:16 +00:00
Duncan P. N. Exon Smith	9cfc75c214	CodeGen: Use MachineInstr& in TargetInstrInfo, NFC This is mostly a mechanical change to make TargetInstrInfo API take MachineInstr& (instead of MachineInstr* or MachineBasicBlock::iterator) when the argument is expected to be a valid MachineInstr. This is a general API improvement. Although it would be possible to do this one function at a time, that would demand a quadratic amount of churn since many of these functions call each other. Instead I've done everything as a block and just updated what was necessary. This is mostly mechanical fixes: adding and removing `` and `&` operators. The only non-mechanical change is to split ARMBaseInstrInfo::getOperandLatencyImpl out from ARMBaseInstrInfo::getOperandLatency. Previously, the latter took a `MachineInstr` which it updated to the instruction bundle leader; now, the latter calls the former either with the same `MachineInstr&` or the bundle leader. As a side effect, this removes a bunch of MachineInstr* to MachineBasicBlock::iterator implicit conversions, a necessary step toward fixing PR26753. Note: I updated WebAssembly, Lanai, and AVR (despite being off-by-default) since it turned out to be easy. I couldn't run tests for AVR since llc doesn't link with it turned on. llvm-svn: 274189	2016-06-30 00:01:54 +00:00
Nirav Dave	8e10380b73	Permit memory operands in ins/outs instructions [x86] (PR15455) While (ins\|outs)[bwld] instructions do not take %dx as a memory operand, various unofficial references do and objdump disassembles to this format. Extend special treatment of similar (in\|out)[bwld] operations. Reviewers: craig.topper, rnk, ab Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D18837 llvm-svn: 274152	2016-06-29 19:54:27 +00:00
Ahmed Bougacha	15a2f6d58c	[X86] Lower blended PACKUSes using appropriate types. When lowering two blended PACKUS, we used to disregard the types of the PACKUS inputs, indiscriminately generating a v16i8 PACKUS. This leads to non-selectable things like: (v16i8 (PACKUS (v4i32 v0), (v4i32 v1))) Instead, check that the PACKUSes have the same type, and use that as the final result type. llvm-svn: 274138	2016-06-29 16:56:09 +00:00
Rafael Espindola	a99ccfce1a	Drop support for creating $stubs. They are created by ld64 since OS X 10.5. llvm-svn: 274130	2016-06-29 14:59:50 +00:00
Dehao Chen	8cd84aaa6f	Relax the clearance calculating for breaking partial register dependency. Summary: LLVM assumes that large clearance will hide the partial register spill penalty. But in our experiment, 16 clearance is too small. As the inserted XOR is normally fairly cheap, we should have a higher clearance threshold to aggressively insert XORs that is necessary to break partial register dependency. Reviewers: wmi, davidxl, stoklund, zansari, myatsina, RKSimon, DavidKreitzer, mkuper, joerg, spatel Subscribers: davidxl, llvm-commits Differential Revision: http://reviews.llvm.org/D21560 llvm-svn: 274068	2016-06-28 21:19:34 +00:00
Matthias Braun	0b9a07883d	X86FrameLowering: Check subregs when deciding prolog kill flags llvm-svn: 274057	2016-06-28 20:31:56 +00:00
Michael Kuperstein	85de98fd24	[X86] Reorder source list alphabetically. NFC. llvm-svn: 274036	2016-06-28 17:11:15 +00:00
David Majnemer	1c7d532cde	[X86] Make WRPKRU/RDPKRU pass -verify-machineinstrs The original implementation attempted to zero registers using XOR %foo, %foo. This is problematic because it constitutes a read-modify-write of a register which might not be defined. Instead, use MOV32r0 to avoid these problems; expandPostRAPseudo does the right thing here. llvm-svn: 274024	2016-06-28 16:04:46 +00:00
Simon Pilgrim	5f71c909f0	[X86][AVX] Peek through bitcasts to find the source of broadcasts (reapplied) AVX1 can only broadcast vectors as floats/doubles, so for 256-bit vectors we insert bitcasts if we are shuffling v8i32/v4i64 types. Unfortunately the presence of these bitcasts prevents the current broadcast lowering code from peeking through cases where we have concatenated / extracted vectors to create the 256-bit vectors. This patch allows us to peek through bitcasts as long as the number of elements doesn't change (i.e. element bitwidth is the same) so the broadcast index is not affected. Note this bitcast peek is different from the stage later on which doesn't care about the type and is just trying to find a load node. As we're being more aggressive with bitcasts, we also need to ensure that the broadcast type is correctly bitcasted Differential Revision: http://reviews.llvm.org/D21660 llvm-svn: 274013	2016-06-28 13:24:05 +00:00
Simon Pilgrim	c15d217831	[X86][SSE] Added support for combining target shuffles to (V)PSHUFD/VPERMILPD/VPERMILPS immediate permutes This patch allows target shuffles to be combined to single input immediate permute instructions - (V)PSHUFD/VPERMILPD/VPERMILPS - allowing more general pattern matching than what we current do and improves the likelihood of memory folding compared to existing patterns which tend to reuse the input in multiple arguments. Further permute instructions (V)PSHUFLW/(V)PSHUFHW/(V)PERMQ/(V)PERMPD may be added in the future but its proven tricky to create tests cases for them so far. (V)PSHUFLW/(V)PSHUFHW is already handled quite well in combineTargetShuffle so it may be that removing some of that code may allow us to perform more of the combining in one place without duplication. Differential Revision: http://reviews.llvm.org/D21148 llvm-svn: 273999	2016-06-28 08:08:15 +00:00
Nick Lewycky	9980075133	NFC. Fix popular typo in comment 'deferencing' --> 'dereferencing'. Bonus changes, * placement in X86ISelLowering and 'exerce' -> 'exercise' in test. llvm-svn: 273984	2016-06-28 01:45:05 +00:00
Rafael Espindola	3beef8d6db	Move shouldAssumeDSOLocal to Target. Should fix the shared library build. llvm-svn: 273958	2016-06-27 23:15:57 +00:00
Rafael Espindola	f9e348bd59	Convert a few more comparisons to isPositionIndependent(). NFC. llvm-svn: 273945	2016-06-27 21:33:08 +00:00
Rafael Espindola	68760387df	Delete the IsStatic predicate. In all its uses it was equivalent to IsNotPIC. llvm-svn: 273943	2016-06-27 21:09:14 +00:00
Elena Demikhovsky	ad3929cc64	X86 Lowering - Fixed a crash in ICMP scalar instruction Fixed a bug in EmitTest() function in combining shl + icmp. https://llvm.org/bugs/show_bug.cgi?id=28119 llvm-svn: 273899	2016-06-27 18:07:16 +00:00
Nico Weber	1e058160dd	Revert 273848, it caused PR28329 llvm-svn: 273879	2016-06-27 14:36:46 +00:00
Simon Pilgrim	a45da385f8	[X86][AVX] Peek through bitcasts to find the source of broadcasts AVX1 can only broadcast vectors as floats/doubles, so for 256-bit vectors we insert bitcasts if we are shuffling v8i32/v4i64 types. Unfortunately the presence of these bitcasts prevents the current broadcast lowering code from peeking through cases where we have concatenated / extracted vectors to create the 256-bit vectors. This patch allows us to peek through bitcasts as long as the number of elements doesn't change (i.e. element bitwidth is the same) so the broadcast index is not affected. Note this bitcast peek is different from the stage later on which doesn't care about the type and is just trying to find a load node. Differential Revision: http://reviews.llvm.org/D21660 llvm-svn: 273848	2016-06-27 07:44:32 +00:00
Rafael Espindola	ae0d866f56	Refactor a duplicated predicate. NFC. llvm-svn: 273826	2016-06-26 22:13:55 +00:00
Craig Topper	8f577fd5b5	[X86] Rewrite lowerVectorShuffleWithPSHUFB to not require a ZeroableMask to be created. We can do everything with the starting mask and zeroable bit vector. This removes the last usage of isSingleInputShuffleMask. NFC llvm-svn: 273804	2016-06-26 05:10:56 +00:00
Craig Topper	8bba749a48	[X86] Replace calls to isSingleInputShuffleMask with just checking if V2 is UNDEF. Canonicalization and creation of shuffle vector ensures this is equivalent. llvm-svn: 273803	2016-06-26 05:10:53 +00:00
Craig Topper	9a2e979b3d	[X86] Convert ==/!= comparisons with -1 for checking undef in shuffle lowering to comparisons of <0 or >=0. While there do the same for other kinds of index checks that can just check for greater than 0. No functional change intended. llvm-svn: 273788	2016-06-25 19:05:29 +00:00
Craig Topper	53a39d1a63	[X86] Pull similar bitcasts on different paths to earlier shared point. NFC llvm-svn: 273787	2016-06-25 19:05:23 +00:00
Ahmed Bougacha	0851ecd1b0	[X86] Remove dead ISD opcodes. NFC. llvm-svn: 273716	2016-06-24 20:37:55 +00:00
David Majnemer	d770877328	Switch more loops to be range-based This makes the code a little more concise, no functional change is intended. llvm-svn: 273644	2016-06-24 04:05:21 +00:00
Craig Topper	024402dcdf	[X86] Combine two nearby calls to isSingleInputShuffleVector. NFC llvm-svn: 273643	2016-06-24 03:06:11 +00:00
Kyle Butt	991df7889b	Codegen: [X86] preservere memory refs for folded umul_lohi Memory references were not being propagated for this folded load. This prevented optimizations like LICM from hoisting the load. Added test to verify that this allows LICM to proceed. llvm-svn: 273617	2016-06-23 21:40:35 +00:00
Michael Kuperstein	0194d30e09	[X86] Extract HiPE prologue constants into metadata X86FrameLowering::adjustForHiPEPrologue() contains a hard-coded offset into an Erlang Runtime System-internal data structure (the PCB). As the layout of this data structure is prone to change, this poses problems for maintaining compatibility. To address this problem, the compiler can produce this information as module-level named metadata. For example (where P_NSP_LIMIT is the offending offset): !hipe.literals = !{ !2, !3, !4 } !2 = !{ !"P_NSP_LIMIT", i32 152 } !3 = !{ !"X86_LEAF_WORDS", i32 24 } !4 = !{ !"AMD64_LEAF_WORDS", i32 24 } Patch by Magnus Lang Differential Revision: http://reviews.llvm.org/D20363 llvm-svn: 273593	2016-06-23 18:17:25 +00:00
Craig Topper	597aa42fec	[AVX512] Remove masked unpack intrinsics and autoupgrade to vectorshuffle and selects. llvm-svn: 273543	2016-06-23 07:37:33 +00:00
Craig Topper	8f8bd37dd3	[X86] Add assert to ensure only 128-bit vector types are used. 256 or 512-bit would require lane handling which is missing. llvm-svn: 273542	2016-06-23 07:37:26 +00:00
Reid Kleckner	5340b279ae	[codeview] Add EFLAGS to the list of CodeView register numbers llvm-svn: 273516	2016-06-22 23:50:19 +00:00
Krzysztof Parzyszek	e116d500a7	[SDAG] Remove FixedArgs parameter from CallLoweringInfo::setCallee The setCallee function will set the number of fixed arguments based on the size of the argument list. The FixedArgs parameter was often explicitly set to 0, leading to a lack of consistent value for non- vararg functions. Differential Revision: http://reviews.llvm.org/D20376 llvm-svn: 273403	2016-06-22 12:54:25 +00:00
Etienne Bergeron	f6be62f2c8	[StackProtector] Fix computation of GSCookieOffset and EHCookieOffset with SEH4 Summary: Fix the computation of the offsets present in the scopetable when using the SEH (__except_handler4). This patch added an intrinsic to track the position of the allocation on the stack of the EHGuard. This position is needed when producing the ScopeTable. ``` struct _EH4_SCOPETABLE { DWORD GSCookieOffset; DWORD GSCookieXOROffset; DWORD EHCookieOffset; DWORD EHCookieXOROffset; _EH4_SCOPETABLE_RECORD ScopeRecord[1]; }; struct _EH4_SCOPETABLE_RECORD { DWORD EnclosingLevel; long (FilterFunc)(); union { void (HandlerAddress)(); void (*FinallyFunc)(); }; }; ``` The code to generate the EHCookie is added in `X86WinEHState.cpp`. Which is adding these instructions when using SEH4. ``` Lfunc_begin0: # BB#0: # %entry pushl %ebp movl %esp, %ebp pushl %ebx pushl %edi pushl %esi subl $28, %esp movl %ebp, %eax <<-- Loading FramePtr movl %esp, -36(%ebp) movl $-2, -16(%ebp) movl $L__ehtable$use_except_handler4_ssp, %ecx xorl ___security_cookie, %ecx movl %ecx, -20(%ebp) xorl ___security_cookie, %eax <<-- XOR FramePtr and Cookie movl %eax, -40(%ebp) <<-- Storing EHGuard leal -28(%ebp), %eax movl $__except_handler4, -24(%ebp) movl %fs:0, %ecx movl %ecx, -28(%ebp) movl %eax, %fs:0 movl $0, -16(%ebp) calll _may_throw_or_crash LBB1_1: # %cont movl -28(%ebp), %eax movl %eax, %fs:0 addl $28, %esp popl %esi popl %edi popl %ebx popl %ebp retl ``` And the corresponding offset is computed: ``` Luse_except_handler4_ssp$parent_frame_offset = -36 .p2align 2 L__ehtable$use_except_handler4_ssp: .long -2 # GSCookieOffset .long 0 # GSCookieXOROffset .long -40 # EHCookieOffset <<---- .long 0 # EHCookieXOROffset .long -2 # ToState .long _catchall_filt # FilterFunction .long LBB1_2 # ExceptionHandler ``` Clang is not yet producing function using SEH4, but it's a work in progress. This patch is a step toward having a valid implementation of SEH4. Unfortunately, it is not yet fully working. The EH registration block is not allocated at the right offset on the stack. Reviewers: rnk, majnemer Subscribers: llvm-commits, chrisha Differential Revision: http://reviews.llvm.org/D21231 llvm-svn: 273281	2016-06-21 15:58:55 +00:00
Craig Topper	283418fbb6	[AVX512] Add patterns for any-extending a mask that use the def of KMOVW/KMOVB without going through an EXTRACT_SUBREG and a MOVZX. llvm-svn: 273253	2016-06-21 07:37:32 +00:00
Craig Topper	0a0fb0fda1	[AVX512] Remove the masked vpcmpeq/vcmpgt intrinsics and autoupgrade them to native icmps. llvm-svn: 273240	2016-06-21 03:53:24 +00:00
Craig Topper	e4cf09ad07	[X86] Pre-allocate SmallVector instead of using push_back in a loop. NFC llvm-svn: 273234	2016-06-21 03:05:40 +00:00
Rafael Espindola	0d34826218	Simplify PICStyles. The main difference is that StubDynamicNoPIC is gone. The dynamic-no-pic mode as the name implies is simply not pic. It is just conservative about what it assumes to be dso local. llvm-svn: 273222	2016-06-20 23:41:56 +00:00
Simon Pilgrim	356e823b51	[X86][SSE] Add cost model for BSWAP of vectors The BSWAP of vector types is quite efficiently implemented using vector shuffles on SSE/AVX targets, we should reflect the typical cost of this to encourage vectorization. Differential Revision: http://reviews.llvm.org/D21521 llvm-svn: 273217	2016-06-20 23:08:21 +00:00
Simon Pilgrim	225b2e37a0	[X86][X87] Fix issue with sitofp i64 -> fp128 on 32-bit targets Fix for PR27726 - sitofp i64 to fp128 was loading the merged load i64 to a x87 register preventing legalization for conversion to fp128. Added 32-bit tests for fp128 cast/conversions. llvm-svn: 273210	2016-06-20 22:41:17 +00:00
Rafael Espindola	94eb31a7a9	Delete dead code. NFC. llvm-svn: 273206	2016-06-20 22:08:35 +00:00
Igor Breger	e59165ca63	[AVX512] [AVX512/AVX][Intrinsics] Fix Variable Bit Shift Right Arithmetic intrinsic lowering. Differential Revision: http://reviews.llvm.org/D20897 llvm-svn: 273138	2016-06-20 07:05:43 +00:00
Craig Topper	4296c025c0	[X86] Pass the SDLoc and Mask ArrayRef down from lowerVectorShuffle through all of the other routines instead of recreating them in the handlers for each type. NFC llvm-svn: 273137	2016-06-20 04:00:55 +00:00
Craig Topper	ddf5d2a4a5	[X86] Use existing ArrayRef variable instead of calling SVOp->getMask() repeatedly. Remove nearby else after return as well. NFC llvm-svn: 273136	2016-06-20 04:00:53 +00:00
Craig Topper	01ef65dd79	[X86] Avoid making a copy of a shuffle mask until we're sure we really need to. And just use a SmallVector to do the copy because its easy. llvm-svn: 273135	2016-06-20 04:00:50 +00:00
Simon Pilgrim	0887d5b02e	[X86][AVX512] Added 512-bit BITREVERSE tests and enabled AVX512BW lowering support llvm-svn: 273125	2016-06-19 20:59:19 +00:00
Simon Pilgrim	0c62bc0324	Strip trailing whitespace. NFCI. llvm-svn: 273124	2016-06-19 20:22:43 +00:00
Simon Pilgrim	2b007189b0	Fixed signed/unsigned warning. llvm-svn: 273120	2016-06-19 18:20:44 +00:00
Simon Pilgrim	3d881a0230	[X86][SSE] Allow target shuffle combining to match masks with SM_Sentinel values We currently only allow exact matches of shuffle mask patterns during target shuffle combining. This patch relaxes this to permit SM_SentinelUndef in the combined shuffle to always be accepted as well as allowing exact matching of the SM_SentinelZero value. I've adjusted some tests that were requiring exact shuffle masks to now include undef values. Differential Revision: http://reviews.llvm.org/D21495 llvm-svn: 273119	2016-06-19 18:03:52 +00:00
Craig Topper	bbb9a8d255	[X86] Add an assert to ensure that a routine is only used with 128-bit vectors. Reduce SmallVector size accordingly. llvm-svn: 273117	2016-06-19 15:37:39 +00:00

... 3 4 5 6 7 ...

13781 Commits