llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	6b30172372	[X86][SSE] Refactored shuffle BLEND combining to make future 16i16 support easier. NFCI. Call the matchVectorShuffleAsBlend test as early as possible. llvm-svn: 298925	2017-03-28 15:50:23 +00:00
Simon Pilgrim	aa675ca77d	Fix signed/unsigned comparison warning llvm-svn: 298917	2017-03-28 13:40:09 +00:00
Simon Pilgrim	d48f47e25c	[X86][SSE] Begin merging vector shuffle to BLEND for lowering and combining. Split off matchVectorShuffleAsBlend from lowerVectorShuffleAsBlend for reuse in combining. llvm-svn: 298914	2017-03-28 13:05:48 +00:00
Simon Pilgrim	61437ebaf4	Wdocumentation fix llvm-svn: 298911	2017-03-28 12:29:09 +00:00
Simon Pilgrim	6afe0e2833	[X86][SSE] Set second operand to undef instead of first operand in unary shuffle combines. Copy isn't necessary after the matchVectorShuffleWithUNPCK refactor and undef value will make some future undef/zero handling easier. llvm-svn: 298910	2017-03-28 12:16:42 +00:00
Simon Pilgrim	defee5683c	Strip trailing whitespace llvm-svn: 298909	2017-03-28 11:15:17 +00:00
Igor Breger	f580fce2c3	[GlobalISel][X86] support G_FRAME_INDEX instruction selection. Summary: G_LOAD/G_STORE, add alternative RegisterBank mapping. For G_LOAD, Fast and Greedy mode choose the same RegisterBank mapping (GprRegBank ) for the G_GLOAD + G_FADD , can't get rid of cross register bank copy GprRegBank->VecRegBank. Reviewers: zvi, rovka, qcolombet, ab Reviewed By: zvi Subscribers: llvm-commits, dberris, kristof.beyls, eladcohen, guyblank Differential Revision: https://reviews.llvm.org/D30979 llvm-svn: 298907	2017-03-28 09:35:06 +00:00
Gadi Haber	89d5f9391a	[X86][AVX2] bugzilla bug 21281 Performance regression in vector interleave in AVX2 This is a patch for an on-going bugzilla bug 21281 on the generated X86 code for a matrix transpose8x8 subroutine which requires vector interleaving. The generated code in AVX2 is currently non-optimal and requires 60 instructions as opposed to only 40 instructions generated for AVX1. The patch includes a fix for the AVX2 case where vector unpack instructions use less operations than the vector blend operations available in AVX2. In this case using vector unpack instructions is more efficient. Reviewers: zvi delena igorb craig.topper guyblank eladcohen m_zuckerman aymanmus RKSimon llvm-svn: 298840	2017-03-27 12:13:37 +00:00
Simon Pilgrim	92925ea701	[X86][SSE] Add computeKnownBitsForTargetNode support for (V)PSLL/(V)PSRL instructions llvm-svn: 298806	2017-03-26 13:17:55 +00:00
Simon Pilgrim	049d9c921f	[X86][AVX512F] Fix reg class for VMOVSSZrr/VMOVSSZrrk and VMOVSDZrr/VMOVSDZrrk Fixed -verify-machineinstrs errors in fast-isel-select-sse.ll (one of many in PR27481) The VMOVSSZrr/VMOVSSZrrk and VMOVSDZrr/VMOVSDZrrk instructions were assuming both source registers were V128X when the second is actually supposed to be FR32X/FR64X Differential Revision: https://reviews.llvm.org/D31200 llvm-svn: 298805	2017-03-26 12:52:28 +00:00
Igor Breger	531a203a06	[GlobalISel][X86] support G_FRAME_INDEX instruction selection. Summary: Support G_FRAME_INDEX instruction selection. Reviewers: zvi, rovka, ab, qcolombet Reviewed By: ab Subscribers: llvm-commits, dberris, kristof.beyls, eladcohen, guyblank Differential Revision: https://reviews.llvm.org/D30980 llvm-svn: 298800	2017-03-26 08:11:12 +00:00
Simon Pilgrim	bec234c970	[X86] Pull out repeated ScalarValueSizeInBits code. NFCI. llvm-svn: 298783	2017-03-25 21:22:12 +00:00
Simon Pilgrim	c0720a4052	[X86][SSE] Combine (VSRLI (VSRAI X, Y), (NumSignBits-1)) -> (VSRLI X, (NumSignBits-1)) Part 3 of 3. Differential Revision: https://reviews.llvm.org/D31347 llvm-svn: 298782	2017-03-25 20:43:01 +00:00
Simon Pilgrim	6397963c81	[X86][SSE] Added ComputeNumSignBitsForTargetNode support for (V)PSRAI Part 2 of 3. Differential Revision: https://reviews.llvm.org/D31347 llvm-svn: 298780	2017-03-25 19:58:36 +00:00
Simon Pilgrim	5400a4d0af	[X86][SSE] Generalised CMP+AND1 combine to ZERO/ALLBITS+MASK Patch to generalize combinePCMPAnd1 (for handling SETCC + ZEXT cases) to work for any input that has zero/all bits set masked with an 'all low bits' mask. Replaced the implicit assumption of shift availability with a call to SupportedVectorShiftWithImm. Part 1 of 3. Differential Revision: https://reviews.llvm.org/D31347 llvm-svn: 298779	2017-03-25 19:50:14 +00:00
Sanjay Patel	9ebb68843e	[x86] use PMOVMSK to replace memcmp libcalls for 16-byte equality This is the payoff for D31156 - if a target has efficient comparison instructions for vector-sized equality, we can replace memcmp calls with inline code that is both smaller and faster. Differential Revision: https://reviews.llvm.org/D31290 llvm-svn: 298775	2017-03-25 16:05:33 +00:00
Simon Pilgrim	6aac646308	[X86][SSE] Generalised lowerTruncate by PACKSS to work with any 'zero/all bits' result, not just comparisons. Added vector compare opcodes to X86TargetLowering::ComputeNumSignBitsForTargetNode Covered by existing tests added for D22814. llvm-svn: 298704	2017-03-24 16:12:31 +00:00
Nirav Dave	9ebefeb9b1	[X86] Fix Stale SDNode use in X86ISelDAGtoDAG Summary: Fixes pr32329. Reviewers: spatel, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31286 llvm-svn: 298633	2017-03-23 18:25:17 +00:00
Eric Christopher	cff8492492	Remove the subtarget argument from LowerFP_TO_INT since there's one stored on X86TargetLowering. llvm-svn: 298628	2017-03-23 17:35:08 +00:00
Eric Christopher	a19a14b42f	Remove unused X86Subtarget argument from getOnesVector. llvm-svn: 298627	2017-03-23 17:35:06 +00:00
Simon Pilgrim	1c048ab6ba	[X86][SSE] Extract elements from narrower shuffle masks. Add support for widening narrow shuffle masks so we can directly extract from the relevant input vector of the shuffle. llvm-svn: 298616	2017-03-23 16:09:34 +00:00
Igor Breger	a8ba572dcf	[GlobalISel][X86] Support G_STORE/G_LOAD operation Summary: 1. Support pointer type as function argumnet and return value 2. G_STORE/G_LOAD - set legal action for i8/i16/i32/i64/f32/f64/vec128 3. RegisterBank - support typeless operations like G_STORE/G_LOAD, for scalar use GPR bank. 4. Support instruction selection for G_LOAD/G_STORE Reviewers: zvi, rovka, ab, qcolombet Reviewed By: rovka Subscribers: llvm-commits, dberris, kristof.beyls, eladcohen, guyblank Differential Revision: https://reviews.llvm.org/D30973 llvm-svn: 298609	2017-03-23 15:25:57 +00:00
Zvi Rackover	db4b032205	X86FixupBWInsts: Minor cleanup. NFC Summary: Cleanup some remnants of code from when the X86FixupBWInsts pass did both forward liveness analysis and backward liveness analysis. Reviewers: MatzeB, myatsina, DavidKreitzer Reviewed By: MatzeB Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31264 llvm-svn: 298599	2017-03-23 14:08:26 +00:00
Simon Pilgrim	8a18299f20	[X86][SSE] Tidyup canWidenShuffleElements. NFCI. Pull out mask elements at the start, allowing us to make the widening pattern matching more readable. llvm-svn: 298594	2017-03-23 13:33:03 +00:00
Igor Breger	8a924bea78	[GlobalISel][X86] clang-format. NFC llvm-svn: 298590	2017-03-23 12:13:29 +00:00
Michael Zuckerman	85436ece89	[X86][TD][vpmovm2 ] New TD pattern for the vpmovm2 instruction Up until now, vpmovm2 instruction described its destination operand size by the source operand size. This patch adds new pattern for the vpmovm2 instruction. The node describes new expansion of the destination (from {128\|256} to 512). Differential Revision: https://reviews.llvm.org/D30654 llvm-svn: 298586	2017-03-23 09:57:01 +00:00
Eric Christopher	fd8510cfec	Clean up some Subtarget uses and casts in the X86 backend, removing unnecessary work or calls. llvm-svn: 298555	2017-03-22 22:44:52 +00:00
Simon Pilgrim	b19a507a88	[X86] Remove unnecessary duplicate code (PR30649). NFCI. llvm-svn: 298495	2017-03-22 11:23:49 +00:00
Craig Topper	3eb6ff9d09	[X86] Remove an unused function from release builds. Reported by gccs unused function warning. llvm-svn: 298485	2017-03-22 06:07:58 +00:00
Coby Tayree	07a8974c48	[X86][MS-compatability][llvm] allow MS TYPE/SIZE/LENGTH operators as a part of a compound expression This patch introduces X86AsmParser with the ability to handle the aforementioned ops within compound "MS" arithmetical expressions. Currently - only supported as a stand alone Operand, e.g.: "TYPE X" now allowed : "4 + TYPE X * 128" Clang side: https://reviews.llvm.org/D31174 Differential Revision: https://reviews.llvm.org/D31173 llvm-svn: 298425	2017-03-21 19:31:55 +00:00
Davide Italiano	200e5e184a	[X86] Remove extra semicolon to placate GCC. NFCI. llvm-svn: 298423	2017-03-21 19:17:23 +00:00
Reid Kleckner	b518054b87	Rename AttributeSet to AttributeList Summary: This class is a list of AttributeSetNodes corresponding the function prototype of a call or function declaration. This class used to be called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is typically accessed by parameter and return value index, so "AttributeList" seems like a more intuitive name. Rename AttributeSetImpl to AttributeListImpl to follow suit. It's useful to rename this class so that we can rename AttributeSetNode to AttributeSet later. AttributeSet is the set of attributes that apply to a single function, argument, or return value. Reviewers: sanjoy, javed.absar, chandlerc, pete Reviewed By: pete Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits Differential Revision: https://reviews.llvm.org/D31102 llvm-svn: 298393	2017-03-21 16:57:19 +00:00
Sanjay Patel	79379cae15	[x86] use PMOVMSK for vector-sized equality comparisons We could do better by splitting any oversized type into whatever vector size the target supports, but I left that for future work if it ever comes up. The motivating case is memcmp() calls on 16-byte structs, so I think we can wire that up with a TLI hook that feeds into this. Differential Revision: https://reviews.llvm.org/D31156 llvm-svn: 298376	2017-03-21 13:50:33 +00:00
Andrea Di Biagio	7937be7dd3	[DebugInfo][X86] Teach Optimize LEAs pass to handle debug values This patch fixes an issue in the Optimize LEAs pass where redundant LEAs were not removed because they were being used by debug values. The debug values are now ignored when determining whether LEAs are redundant. For now the debug values for the redundant LEAs are marked as undefined, effectively lost. The intention is for a follow up patch which will attempt to preserve the debug values where possible. Patch by Andrew Ng. Differential Revision: https://reviews.llvm.org/D30835 llvm-svn: 298360	2017-03-21 11:36:21 +00:00
Evgeniy Stepanov	e829eecc05	[Fuchsia] Use %gs for ABI slots under -mcmodel=kernel Make x86_64-fuchsia targets under -mcmodel=kernel use %gs rather than %fs to access ABI slots for stack-protector and safe-stack Patch by Roland McGrath. Differential Revision: https://reviews.llvm.org/D30870 llvm-svn: 298302	2017-03-20 20:35:37 +00:00
Craig Topper	5992c8d1dc	[AVX-512] Handle kor/kand/kandn/kxor/kxnor/knot intrinsics at lowering time instead of isel Summary: Currently we handle these intrinsics at isel with special patterns. But as they just map to normal logic operations, we should just handle them at lowering. This will expose them to DAG combine optimizations. Right now the kor-sequence test generates a bunch of regclass copies between GR16 and VK16 that the peephole optimizer and/or register coallescing are removing to keep everything in the mask domain. By handling the logic op intrinsics earlier, these copies become bitcasts in the DAG and get removed by DAG combine which seems more robust. This should help enable my plan to stop copying between K registers and GR8/GR16. The peephole optimizer can't remove a chain of copies between K and GR32 with insert_subreg/extract_subreg present in the chain so the kor-sequence test break. But this patch should dodge the problem entirely. Reviewers: zvi, delena, RKSimon, igorb Reviewed By: igorb Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31056 llvm-svn: 298228	2017-03-19 17:11:09 +00:00
Oren Ben Simhon	0ef61ec32a	[MIR] Support Customed Register Mask and CSRs The MIR printer dumps a string that describe the register mask of a function. A static predefined list of register masks matches a static list of strings. However when the register mask is not from the static predefined list, there is no descriptor string and the printer fails. This patch adds support to custom register mask printing and dumping. Also the list of callee saved registers (describing the registers that must be preserved for the caller) might be dynamic. As such this data needs to be dumped and parsed back to the Machine Register Info. Differential Revision: https://reviews.llvm.org/D30971 llvm-svn: 298207	2017-03-19 08:14:18 +00:00
Matthias Braun	e6ff30b696	ExecutionDepsFix: Let targets specialize the pass; NFC Let targets specialize the pass with the register class so we can get a parameterless default constructor and can put the pass into the pass registry to enable testing with -run-pass=. llvm-svn: 298184	2017-03-18 05:08:58 +00:00
Matthias Braun	e9f8209e87	ExecutionDepsFix: Normalize names; NFC Normalize ExeDepsFix, execution-fix, ExecutionDependencyFix and ExecutionDepsFix to the last one. llvm-svn: 298183	2017-03-18 05:05:40 +00:00
Nirav Dave	ac6081cb67	Make library calls sensitive to regparm module flag (Fixes PR3997). Reviewers: mkuper, rnk Subscribers: mehdi_amini, jyknight, aemerson, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D27050 llvm-svn: 298179	2017-03-18 00:44:07 +00:00
Nirav Dave	6de2c77944	Capitalize ArgListEntry fields. NFC. llvm-svn: 298178	2017-03-18 00:43:57 +00:00
Sanjay Patel	455703a0c6	[x86] clean up setcc with negated operand transform and add missing test; NFCI llvm-svn: 298118	2017-03-17 20:29:40 +00:00
Reid Kleckner	edf1cbb580	[X86] Emit fewer instructions to allocate >16GB stack frames Summary: Use this code pattern when RAX is live, instead of emitting up to 2 billion adjustments: pushq %rax movabsq +-$Offset+-8, %rax addq %rsp, %rax xchg %rax, (%rsp) movq (%rsp), %rsp Try to clean this code up a bit while I'm here. In particular, hoist the logic that handles the entire adjustment with `movabsq $imm, %rax` out of the loop. This negates the offset in the prologue and uses ADD because X86 only has a two operand subtract which always subtracts from the destination register, which can no longer be RSP. Fixes PR31962 Reviewers: majnemer, sdardis Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30052 llvm-svn: 298116	2017-03-17 20:25:49 +00:00
Sanjay Patel	25bd713d33	[x86] avoid adc/sbb assert when both sides of add are zexted (PR32316) As noted in the comment, we might want to account for this case, but I didn't look at what that would mean for the asm. I'm also not sure why this only reproduces with avx512, but I'm putting a conservative fix in for now to avoid the crash. Also, if both sides of an add are zexted, shouldn't we shrink that add? https://bugs.llvm.org/show_bug.cgi?id=32316 llvm-svn: 298107	2017-03-17 17:27:31 +00:00
Craig Topper	a8d4097445	[AVX-512] Make VEX encoded FMA instructions available when AVX512 is enabled regardless of whether +fma was added on the command line. We weren't able to handle isel of the 128/256-bit FMA instructions when AVX512F was enabled but VLX and FMA weren't. I didn't mask FeatureAVX512 imply FeatureFMA as I wasn't sure I wanted disabling FMA to also disable AVX512. Instead we just can't prevent FMA instructions if AVX512 is enabled. Another option would be to promote 128/256-bit to 512-bit, do the operation and extract it. But that requires a lot of extra isel patterns. Since no CPUs exist that support AVX512, but not FMA just using the VEX instructions seems better. llvm-svn: 298051	2017-03-17 07:37:31 +00:00
Craig Topper	02cd0bfa46	[X86] Remove unused predicate. NFC llvm-svn: 298050	2017-03-17 07:37:27 +00:00
Craig Topper	6a1290a0fd	[AVX-512] Give priority to EVEX encoded scalar FMA instructions when we have FMA, AVX512 and no VLX. We were giving priority if VLX was enabled. llvm-svn: 298046	2017-03-17 06:10:37 +00:00
Craig Topper	e4d5aa7efc	[X86] Cleanup the AddedComplexity values on move immediate instructions. NFC This makes the values a little more consistent between similar instruction and reduces the values some. This results in better grouping in the isel table saving a few bytes. llvm-svn: 298043	2017-03-17 05:59:54 +00:00
Reid Kleckner	45707d4d5a	Remove getArgumentList() in favor of arg_begin(), args(), etc Users often call getArgumentList().size(), which is a linear way to get the number of function arguments. arg_size(), on the other hand, is constant time. In general, the fact that arguments are stored in an iplist is an implementation detail, so I've removed it from the Function interface and moved all other users to the argument container APIs (arg_begin(), arg_end(), args(), arg_size()). Reviewed By: chandlerc Differential Revision: https://reviews.llvm.org/D31052 llvm-svn: 298010	2017-03-16 22:59:15 +00:00
Matthias Braun	e959544517	TargetInstrInfo: Provide default implementation of isTailCall(). In fact this default implementation should be the only implementation, keep it virtual for now to accomodate targets that don't model flags correctly. Differential Revision: https://reviews.llvm.org/D30747 llvm-svn: 297980	2017-03-16 20:02:30 +00:00
Simon Pilgrim	06c70adcf0	[X86] Add missing BITREVERSE costs for SSE2 vectors and i8/i16/i32/i64 scalars Prep work for PR31810 llvm-svn: 297876	2017-03-15 19:34:55 +00:00
Simon Pilgrim	493f4462bf	[X86][SSE] Fixed shuffle MOVSS/MOVSD combining of all zeroable inputs Turns out it can happen, so the assertion was too harsh Found during fuzz testing llvm-svn: 297833	2017-03-15 13:16:46 +00:00
Simon Pilgrim	a0b0b74b9a	Align cost model columns. NFCI. llvm-svn: 297824	2017-03-15 11:57:42 +00:00
Daniel Sanders	a228df75c0	[globalisel] LLVM_BUILD_GLOBAL_ISEL=OFF should prevent GlobalISel instruction selector from being declared. llvm-svn: 297786	2017-03-14 22:09:29 +00:00
Daniel Sanders	8a4bae9993	[globalisel][tblgen] Add support for ComplexPatterns Summary: Adds a new kind of MachineOperand: MO_Placeholder. This operand must not appear in the MIR and only exists as a way of creating an 'uninitialized' operand until a matcher function overwrites it. Depends on D30046, D29712 Reviewers: t.p.northover, ab, rovka, aditya_nandakumar, javed.absar, qcolombet Reviewed By: qcolombet Subscribers: dberris, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D30089 llvm-svn: 297782	2017-03-14 21:32:08 +00:00
Simon Pilgrim	cf2da96c82	[SelectionDAG] Add a signed integer absolute ISD node Reduced version of D26357 - based on the discussion on llvm-dev about canonicalization of UMIN/UMAX/SMIN/SMAX as well as ABS I've reduced that patch to just the ABS ISD node (with x86/sse support) to improve basic combines and lowering. ARM/AArch64, Hexagon, PowerPC and NVPTX all have similar instructions allowing us to make this a generic opcode and move away from the hard coded tablegen patterns which makes it tricky to match more complex patterns. At the moment this patch doesn't attempt legalization as we only create an ABS node if its legal/custom. Differential Revision: https://reviews.llvm.org/D29639 llvm-svn: 297780	2017-03-14 21:26:58 +00:00
Oren Ben Simhon	fe34c5e429	Disable Callee Saved Registers Each Calling convention (CC) defines a static list of registers that should be preserved by a callee function. All other registers should be saved by the caller. Some CCs use additional condition: If the register is used for passing/returning arguments – the caller needs to save it - even if it is part of the Callee Saved Registers (CSR) list. The current LLVM implementation doesn’t support it. It will save a register if it is part of the static CSR list and will not care if the register is passed/returned by the callee. The solution is to dynamically allocate the CSR lists (Only for these CCs). The lists will be updated with actual registers that should be saved by the callee. Since we need the allocated lists to live as long as the function exists, the list should reside inside the Machine Register Info (MRI) which is a property of the Machine Function and managed by it (and has the same life span). The lists should be saved in the MRI and populated upon LowerCall and LowerFormalArguments. The patch will also assist to implement future no_caller_saved_regsiters attribute intended for interrupt handler CC. Differential Revision: https://reviews.llvm.org/D28566 llvm-svn: 297715	2017-03-14 09:09:26 +00:00
Craig Topper	7a5ee1c5ed	[AVX-512] Use iPTR instead of i64 in patterns for extract_subvector/insert_subvector index. llvm-svn: 297707	2017-03-14 06:40:04 +00:00
Jonas Paulsson	a48ea231c0	[TargetTransformInfo] getIntrinsicInstrCost() scalarization estimation improved getIntrinsicInstrCost() used to only compute scalarization cost based on types. This patch improves this so that the actual arguments are checked when they are available, in order to handle only unique non-constant operands. Tests updates: Analysis/CostModel/X86/arith-fp.ll Transforms/LoopVectorize/AArch64/interleaved_cost.ll Transforms/LoopVectorize/ARM/interleaved_cost.ll The improvement in getOperandsScalarizationOverhead() to differentiate on constants made it necessary to update the interleaved_cost.ll tests even though they do not relate to intrinsics. Review: Hal Finkel https://reviews.llvm.org/D29540 llvm-svn: 297705	2017-03-14 06:35:36 +00:00
Craig Topper	9d50e187cd	[AVX-512] Pre-emptively fix more places in fastisel where we might copy a VK1 register into a AH/BH/CH/DH register. llvm-svn: 297704	2017-03-14 04:18:25 +00:00
Craig Topper	784f241b59	[AVX-512] Fix another case where we are copying from a mask register using AH/BH/CH/DH with fastisel. Fixes PR32256. Still planning to do an audit for other possible cases. llvm-svn: 297678	2017-03-13 21:58:54 +00:00
Simon Pilgrim	9df7d08cb2	[X86][MMX] Fix folding of shift value loads to cover whole 64-bits rL230225 made the assumption that only the lower 32-bits of an MMX register load is used as a shift value, when in fact the whole 64-bits are reloaded and treated as a i64 to determine the shift value. This patch reverts rL230225 to ensure that the whole 64-bits of memory are folded and ensures that the upper 32-bit are zero'd for cases where the shift value has come from a scalar source. Found during fuzz testing. Differential Revision: https://reviews.llvm.org/D30833 llvm-svn: 297667	2017-03-13 21:23:29 +00:00
Andrew Kaylor	a11d020699	Revert r295004 (Add MXCSR) due to errors reported by MachineVerifier I am leaving the code in clang which filters mxcsr from the clobber list because that is still technically correct and will be useful again when the MXCSR register is reintroduced. llvm-svn: 297664	2017-03-13 20:35:10 +00:00
Jessica Paquette	c984e21394	[Outliner] Add tail call support This commit adds tail call support to the MachineOutliner pass. This allows the outliner to insert jumps rather than calls in areas where tail calling is possible. Outlined tail calls include the return or terminator of the basic block being outlined from. Tail call support allows the outliner to take returns and terminators into consideration while finding candidates to outline. It also allows the outliner to save more instructions. For example, in the X86-64 outliner, a tail called outlined function saves one instruction since no return has to be inserted. llvm-svn: 297653	2017-03-13 18:39:33 +00:00
Craig Topper	616641632e	[X86] Lower AVX2 gather intrinsics similar to AVX-512. Apply the same input source optimizations to break execution dependencies. For AVX-512 we force the input to zero if the input is undef or the mask is all ones to break an execution dependency. This patch brings the same behavior to AVX2. llvm-svn: 297652	2017-03-13 18:34:46 +00:00
Craig Topper	eb7ea28bdd	[AVX-512] If gather mask is all ones, force the input to a zero vector. We were already forcing undef inputs to become a zero vector, this now catches an all ones mask too. Ideally we'd use undef and let execution dep fix handle picking the best register/clearance for the undef, but I don't think it can handle the early clobber today. llvm-svn: 297651	2017-03-13 18:17:46 +00:00
Craig Topper	48ba1e2d66	[AVX-512] Add VEX_WIG to VEX vcvtsd2ss/vcvtss2sd intrinsic instructions so they can be correctly matched by EVEX2VEX table generation. llvm-svn: 297601	2017-03-13 05:14:47 +00:00
Craig Topper	08b413acf2	[AVX-512] Use sse_loadf32/f64 for vcvtss2sd and vcvtsd2ss intrinsic patterns. llvm-svn: 297600	2017-03-13 05:14:44 +00:00
Craig Topper	5a63ca2ad2	[AVX-512] Use sse_load_f64/f32 in VCVTSS2SI/VCVTSD2SI patterns. llvm-svn: 297599	2017-03-13 03:59:06 +00:00
Craig Topper	111b2d6997	[X86] Remove unused SDTypeProfile. NFC llvm-svn: 297594	2017-03-12 23:05:03 +00:00
Craig Topper	2b92542908	[X86] Lower SSE/AVX cmpps/pd intrinsics directly to X86ISD::CMPP SDNodes. This allows us to remove a duplicate set of patterns. llvm-svn: 297593	2017-03-12 23:05:00 +00:00
Craig Topper	7d56c8315b	[AVX-512] Fix the valid immediates for the scatter/gather prefetch intrinsics. The immediate should be 1 or 2, not 0 or 1. This was found while adding bounds checking to clang. In fact the existing clang builtin test failed if we ran it all the way to assembly. llvm-svn: 297591	2017-03-12 22:29:12 +00:00
Sanjay Patel	f06b963a2b	[x86] don't blindly transform SETB into SBB I noticed unnecessary 'sbb' instructions in D30472 and while looking at 'ptest' codegen recently. This happens because we were transforming any 'setb' - even when we only wanted a single-bit result. This patch moves those transforms under visitAdd/visitSub, so we we're only creating sbb/adc when it is a win. I don't know why we need a SETCC_CARRY node type, but I'm not proposing to change that existing behavior in this patch. Also, I'm skeptical that sbb/adc are a win for all micro-arches, so I added comments to the test files where this transform still fires. The test changes here are all cases where we no longer produce sbb/adc. Avoiding partial register stalls (generating an xor to clear a register) is not handled in some cases, but that's a separate issue. Differential Revision: https://reviews.llvm.org/D30611 llvm-svn: 297586	2017-03-12 18:28:48 +00:00
Craig Topper	58647b16e5	[AVX-512] Fix a bad use of a high GR8 register after copying from a mask register during fast isel. This ends up extracting from bits 15:8 instead of the lower bits of the mask. I'm pretty sure there are more problems lurking here. But I think this fixes PR32241. I've added the test case from that bug and added asserts that will fail if we ever try to copy between high registers and mask registers again. llvm-svn: 297574	2017-03-12 03:37:37 +00:00
Craig Topper	6ab5edfa73	[AVX-512] Remove unused field in X86VectorVTInfo tablegen class. llvm-svn: 297572	2017-03-12 03:37:32 +00:00
Simon Pilgrim	18debfa5b4	[X86][SSE] Improve extraction of elements from v16i8 (pre-SSE41) Without SSE41 (pextrb) we currently extract byte elements from a vector by spilling to stack and reloading the byte. This patch is an initial attempt at using MOVD/PEXTRW to extract the relevant DWORD/WORD from the vector and then shift+truncate to collect the correct byte. Extraction of multiple bytes this way would result in code bloat, but as explained in the patch we could probably afford to be more aggressive with the supported extractions before again falling back on spilling - possibly through counting the number of extracts and which DWORD/WORD they originate? Differential Revision: https://reviews.llvm.org/D29841 llvm-svn: 297568	2017-03-11 20:42:31 +00:00
Simon Pilgrim	9ff5732c92	Remove unnecessary whitespace. llvm-svn: 297567	2017-03-11 20:23:59 +00:00
Craig Topper	02b463270c	[X86] Remove unnecessary commented out code. NFC llvm-svn: 297563	2017-03-11 18:25:56 +00:00
Simon Pilgrim	128a10a41d	[X86][SSE] Fix load folding for (V)CVTDQ2PD This only requires a 64-bit memory source, not the whole 128-bits. But the 128-bit case is still supported via X86InstrInfo::foldMemoryOperandImpl llvm-svn: 297523	2017-03-10 22:35:07 +00:00
Simon Pilgrim	bfe263352a	[X86] Fix Wunused-lambda-capture warning llvm-svn: 297521	2017-03-10 22:10:34 +00:00
Eric Christopher	f025a89b3c	Sink accessing TII to fix release Werror builds. llvm-svn: 297507	2017-03-10 21:20:17 +00:00
Evandro Menezes	8f70e249a7	[AArch64, X86] Additional debug information for MacroFusion In order to make it easier to parse information about the performance of MacroFusion, this patch adds the function and the instruction names to the debug output of this pass. llvm-svn: 297504	2017-03-10 20:20:04 +00:00
Simon Pilgrim	b02667c469	[APInt] Add APInt::insertBits() method to insert an APInt into a larger APInt We currently have to insert bits via a temporary variable of the same size as the target with various shift/mask stages, resulting in further temporary variables, all of which require the allocation of memory for large APInts (MaskSizeInBits > 64). This is another of the compile time issues identified in PR32037 (see also D30265). This patch adds the APInt::insertBits() helper method which avoids the temporary memory allocation and masks/inserts the raw bits directly into the target. Differential Revision: https://reviews.llvm.org/D30780 llvm-svn: 297458	2017-03-10 13:44:32 +00:00
Simon Pilgrim	e86b7e2256	[X86][SSE] Speed up constant pool shuffle mask decoding with direct copy (PR32037). If the constants are already the correct size, we can copy them directly into the shuffle mask. llvm-svn: 297381	2017-03-09 14:06:39 +00:00
Simon Pilgrim	836bcc689f	[X86][SSE] combineX86ShufflesRecursively can handle shuffle masks up to 64 elements wide By defining the mask types as SmallVector<int, 16> we were causing a lot of unnecessary heap usage. llvm-svn: 297267	2017-03-08 09:36:39 +00:00
Daniel Sanders	52b4ce727a	Recommit: [globalisel] Change LLT constructor string into an LLT-based object that knows how to generate it. Summary: This will allow future patches to inspect the details of the LLT. The implementation is now split between the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns. Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem. The problem with the previous commit appears to have been that TableGen was including CodeGen/LowLevelType.h instead of Support/LowLevelTypeImpl.h. Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30046 llvm-svn: 297241	2017-03-07 23:20:35 +00:00
Daniel Sanders	8ebec37d26	Revert r297177: Change LLT constructor string into an LLT-based object ... More module problems. This time it only showed up in the stage 2 compile of clang-x86_64-linux-selfhost-modules-2 but not the stage 1 compile. Somehow, this change causes the build to need Attributes.gen before it's been generated. llvm-svn: 297188	2017-03-07 19:21:23 +00:00
Sanjoy Das	c08a79fbf2	[X86] Add option to specify preferable loop alignment Summary: Loop alignment can cause a significant change of the perfromance for short loops. To be able to evaluate the impact of loop alignment this change introduces the new option x86-experimental-pref-loop-alignment. The alignment will be 2^Value bytes, the default value is 4. Patch by Serguei Katkov! Reviewers: craig.topper Reviewed By: craig.topper Subscribers: sanjoy, llvm-commits Differential Revision: https://reviews.llvm.org/D30391 llvm-svn: 297178	2017-03-07 18:47:22 +00:00
Daniel Sanders	8612326a08	[globalisel] Change LLT constructor string into an LLT-based object that knows how to generate it. Summary: This will allow future patches to inspect the details of the LLT. The implementation is now split between the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns. Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem. Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30046 llvm-svn: 297177	2017-03-07 18:32:25 +00:00
Ayman Musa	850fc977c8	[X86][AVX512] Adding new LLVM TableGen backend which generates the EVEX2VEX compressing tables. X86EvexToVex machine instruction pass compresses EVEX encoded instructions by replacing them with their identical VEX encoded instructions when possible. It uses manually supported 2 large tables that map the EVEX instructions to their VEX ideticals. This TableGen backend replaces the tables by automatically generating them. Differential Revision: https://reviews.llvm.org/D30451 llvm-svn: 297127	2017-03-07 08:11:19 +00:00
Ayman Musa	ac5a2c43af	[X86][AVX512] Add missing entries to EVEX2VEX tables evex2vex pass defines 2 tables which maps EVEX instructions to their VEX identical when possible. Adding all missing entries. Differential Revision: https://reviews.llvm.org/D30501 llvm-svn: 297126	2017-03-07 08:05:53 +00:00
Tim Northover	c2c545b8f7	GlobalISel: restrict G_EXTRACT instruction to just one operand. A bit more painful than G_INSERT because it was more widely used, but this should simplify the handling of extract operations in most locations. llvm-svn: 297100	2017-03-06 23:50:28 +00:00
Jessica Paquette	596f483a5e	[Outliner] Fixed Asan bot failure in r296418 Fixed the asan bot failure which led to the last commit of the outliner being reverted. The change is in lib/CodeGen/MachineOutliner.cpp in the SuffixTree's constructor. LeafVector is no longer initialized using reserve but just a standard constructor. llvm-svn: 297081	2017-03-06 21:31:18 +00:00
Reid Kleckner	812191584f	[X86] Fix arg copy elision for illegal types Use the store size of the argument type, which will be a byte-sized quantity, rather than dividing the size in bits by 8. Fixes PR32136 and re-enables copy elision from i64 arguments. Reverts the workaround in from r296950. llvm-svn: 297045	2017-03-06 18:39:39 +00:00
Benjamin Kramer	bb635e034c	[X86] Silence GCC enum compare warning. X86ISelLowering.cpp:26506:36: error: enumeral mismatch in conditional expression: 'llvm::X86ISD::NodeType' vs 'llvm::ISD::NodeType' [-Werror=enum-compare] llvm-svn: 296986	2017-03-05 12:53:20 +00:00
Simon Pilgrim	9f5c251d57	[X86][SSE] Lower 128-bit vectors to SIGN/ZERO_EXTEND_VECTOR_IN_REG ops As described on PR31712, we miss a variety of legalization combines because we lower these to X86ISD::VSEXT/VZEXT despite them having the same functionality. This patch makes 128-bit (SSE41) SIGN/ZERO_EXTEND_VECTOR_IN_REG ops legal, adds the necessary tablegen plumbing and uses a helper 'getExtendInVec' to decide when to use SIGN/ZERO_EXTEND_VECTOR_IN_REG or VSEXT/VZEXT. We're missing a couple of shuffle combines that will be added in a future patch for review. Later patches can then support the AVX2 cases as a mixture of SIGN/ZERO_EXTEND and SIGN/ZERO_EXTEND_VECTOR_IN_REG, and then finally deal with the AVX512 cases. Differential Revision: https://reviews.llvm.org/D30549 llvm-svn: 296985	2017-03-05 09:57:20 +00:00
Sanjay Patel	b974be5ef4	[x86] don't require a zext when forming ADC/SBB The larger goal is to move the ADC/SBB transforms currently in combineX86SetCC() to combineAddOrSubToADCOrSBB() because we're creating ADC/SBB in lots of places where we shouldn't. This was intended to be an NFC change, but avx-512 has something strange going on. It doesn't seem like any of the affected tests should really be using SET+TEST or ADC; a simple ADD could replace several instructions. But that's another bug... llvm-svn: 296978	2017-03-04 20:35:19 +00:00
Sanjay Patel	066f3208bf	[DAGCombiner] allow transforming (select Cond, C +/- 1, C) to (add(ext Cond), C) select Cond, C +/- 1, C --> add(ext Cond), C -- with a target hook. This is part of the ongoing process to obsolete D24480. The motivation is to canonicalize to select IR in InstCombine whenever possible, so we need to have a way to undo that easily in codegen. PowerPC is an obvious winner for this kind of transform because it has fast and complete bit-twiddling abilities but generally lousy conditional execution perf (although this might have changed in recent implementations). x86 also sees some wins, but the effect is limited because these transforms already mostly exist in its target-specific combineSelectOfTwoConstants(). The fact that we see any x86 changes just shows that that code is a mess of special-case holes. We may be able to remove some of that logic now. My guess is that other targets will want to enable this hook for most cases. The likely follow-ups would be to add value type and/or the constants themselves as parameters for the hook. As the tests in select_const.ll show, we can transform any select-of-constants to math/logic, but the general transform for any 2 constants needs one more instruction (multiply or 'and'). ARM is one target that I think may not want this for most cases. I see infinite loops there because it wants to use selects to enable conditionally executed instructions. Differential Revision: https://reviews.llvm.org/D30537 llvm-svn: 296977	2017-03-04 19:18:09 +00:00
Simon Pilgrim	40a0e66b37	[X86][SSE] Enable post-legalize vXi64 shuffle combining on 32-bit targets Long ago (2010 according to svn blame), combineShuffle probably needed to prevent the accidental creation of illegal i64 types but there doesn't appear to be any combines that can cause this any more as they all have their own legality checks. Differential Revision: https://reviews.llvm.org/D30213 llvm-svn: 296966	2017-03-04 12:50:47 +00:00
Matthias Braun	21f340fd25	X86ISelLowering: Only perform copy elision on legal types. This fixes cases where i1 types were not properly legalized yet and lead to the creating of 0-sized stack slots. This fixes http://llvm.org/PR32136 llvm-svn: 296950	2017-03-04 01:40:40 +00:00
Sanjay Patel	a84fd041c6	[x86] check for commuted add pattern to find ADC/SBB llvm-svn: 296933	2017-03-04 00:18:31 +00:00
Sanjay Patel	7ee83b41e0	[x86] refactor combineAddOrSubToADCOrSBB(); NFCI The comments were wrong, and this is not an obvious transform. This hopefully makes it clearer that we're missing the commuted patterns for adds. It's less clear that this is actually a good transform for all micro-arch. This is prep work for trying to clean up the current adc/sbb codegen because it's definitely not happening optimally. llvm-svn: 296918	2017-03-03 22:35:11 +00:00
Sanjay Patel	58e241896d	[x86] clean up materializeSBB(); NFCI This is producing SBB where it is obviously not necessary, so it needs to be limited. llvm-svn: 296894	2017-03-03 17:58:39 +00:00
Sanjay Patel	e8674825fe	[x86] fix formatting; NFC llvm-svn: 296875	2017-03-03 15:17:41 +00:00
Simon Pilgrim	c37a32d2b9	Use APInt::getHighBitsSet instead of APInt::getBitsSet for upper bit mask creation llvm-svn: 296874	2017-03-03 14:37:57 +00:00
Amjad Aboud	4f97751798	[X86] Generate VZEROUPPER for Skylake-avx512. VZEROUPPER should not be issued on Knights Landing (KNL), but on Skylake-avx512 it should be. Differential Revision: https://reviews.llvm.org/D29874 llvm-svn: 296859	2017-03-03 09:03:24 +00:00
Igor Breger	321cf3c650	[GlobalISel][X86] Support float/double and vector types. Summary: [GlobalISel][X86] Add support for f32/f64 and vector types in RegisterBank and InstructionSelector. Reviewers: delena, zvi Reviewed By: zvi Subscribers: dberris, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30533 llvm-svn: 296856	2017-03-03 08:06:46 +00:00
Simon Pilgrim	b3067dc374	[X86][MMX] Fixed i32 extraction on 32-bit targets MMX extraction often ends up as extract_i32(bitcast_v2i32(extract_i64(bitcast_v1i64(x86mmx v), 0)), 0) which fails to simplify on 32-bit targets as i64 isn't legal llvm-svn: 296782	2017-03-02 18:56:06 +00:00
Reid Kleckner	f7c0980c10	Elide argument copies during instruction selection Summary: Avoids tons of prologue boilerplate when arguments are passed in memory and left in memory. This can happen in a debug build or in a release build when an argument alloca is escaped. This will dramatically affect the code size of x86 debug builds, because X86 fast isel doesn't handle arguments passed in memory at all. It only handles the x86_64 case of up to 6 basic register parameters. This is implemented by analyzing the entry block before ISel to identify copy elision candidates. A copy elision candidate is an argument that is used to fully initialize an alloca before any other possibly escaping uses of that alloca. If an argument is a copy elision candidate, we set a flag on the InputArg. If the the target generates loads from a fixed stack object that matches the size and alignment requirements of the alloca, the SelectionDAG builder will delete the stack object created for the alloca and replace it with the fixed stack object. The load is left behind to satisfy any remaining uses of the argument value. The store is now dead and is therefore elided. The fixed stack object is also marked as mutable, as it may now be modified by the user, and it would be invalid to rematerialize the initial load from it. Supersedes D28388 Fixes PR26328 Reviewers: chandlerc, MatzeB, qcolombet, inglorion, hans Subscribers: igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D29668 llvm-svn: 296683	2017-03-01 21:42:00 +00:00
Ayman Musa	9b802e4650	[X86] Fix creating vreg def after use. llvm-svn: 296601	2017-03-01 10:20:48 +00:00
Daniel Sanders	983c9b98e9	Revert r296474 - [globalisel] Change LLT constructor string into an LLT subclass that knows how to generate it. There's a circular dependency that's only revealed when LLVM_ENABLE_MODULES=1. llvm-svn: 296478	2017-02-28 15:00:27 +00:00
Daniel Sanders	a5afdefec6	[globalisel] Change LLT constructor string into an LLT subclass that knows how to generate it. Summary: This will allow future patches to inspect the details of the LLT. The implementation is now split between the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns. Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem. Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30046 llvm-svn: 296474	2017-02-28 14:21:31 +00:00
Matthias Braun	81f68ec3a9	Revert "Add MIR-level outlining pass" Revert Machine Outliner for now, as it breaks the asan bot. This reverts commit r296418. llvm-svn: 296426	2017-02-28 02:24:30 +00:00
Matthias Braun	d36410945f	Add MIR-level outlining pass This is a patch for the outliner described in the RFC at: http://lists.llvm.org/pipermail/llvm-dev/2016-August/104170.html The outliner is a code-size reduction pass which works by finding repeated sequences of instructions in a program, and replacing them with calls to functions. This is useful to people working in low-memory environments, where sacrificing performance for space is acceptable. This adds an interprocedural outliner directly before printing assembly. For reference on how this would work, this patch also includes X86 target hooks and an X86 test. The outliner is run like so: clang -mno-red-zone -mllvm -enable-machine-outliner file.c Patch by Jessica Paquette<jpaquette@apple.com>! rdar://29166825 Differential Revision: https://reviews.llvm.org/D26872 llvm-svn: 296418	2017-02-28 00:33:32 +00:00
Simon Pilgrim	5c4efcdddf	[X86][SSE] Attempt to extract vector elements through target shuffles DAGCombiner already supports peeking thorough shuffles to improve vector element extraction, but legalization often leaves us in situations where we need to extract vector elements after shuffles have already been lowered. This patch adds support for VECTOR_EXTRACT_ELEMENT/PEXTRW/PEXTRB instructions to attempt to handle target shuffles as well. I've covered some basic scenarios including handling shuffle mask scaling and the implicit zero-extension of PEXTRW/PEXTRB, there is more that could be done here (that I've mentioned in TODOs) but I haven't found many cases where its worth it. Differential Revision: https://reviews.llvm.org/D30176 llvm-svn: 296381	2017-02-27 21:01:57 +00:00
Craig Topper	7502119ce8	[X86] Use APInt instead of SmallBitVector tracking undef elements from getTargetConstantBitsFromNode and getConstVector. Summary: SmallBitVector uses a malloc for more than 58 bits on a 64-bit target and more than 27 bits on a 32-bit target. Some of the vector types we deal with here use more than those number of elements and therefore cause a malloc. APInt on the other hand supports up to 64 bits without a malloc. That's the maximum number of bits we need here so we can avoid a malloc for all cases by using APInt. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30392 llvm-svn: 296355	2017-02-27 16:15:32 +00:00
Craig Topper	3917ca2af4	[X86] Use APInt instead of SmallBitVector for tracking Zeroable elements in shuffle lowering Summary: SmallBitVector uses a malloc for more than 58 bits on a 64-bit target and more than 27 bits on a 32-bit target. Some of the vector types we deal with here use more than those number of elements and therefore cause a malloc. APInt on the other hand supports up to 64 bits without a malloc. That's the maximum number of bits we need here so we can avoid a malloc for all cases by using APInt. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30390 llvm-svn: 296354	2017-02-27 16:15:30 +00:00
Craig Topper	e1be95c3d0	[X86] Fix SmallVector sizes in constant pool shuffle decoding to avoid heap allocation Some of the vectors are under sized to avoid heap allocation. In one case the vector was oversized. Differential Revision: https://reviews.llvm.org/D30387 llvm-svn: 296353	2017-02-27 16:15:27 +00:00
Craig Topper	53e5a38da9	[X86] Use APInt instead of SmallBitVector for tracking undef elements in constant pool shuffle decoding Summary: SmallBitVector uses a malloc for more than 58 bits on a 64-bit target and more than 27 bits on a 32-bit target. Some of the vector types we deal with here use more than those number of elements and therefore cause a malloc. APInt on the other hand supports up to 64 bits without a malloc. That's the maximum number of bits we need here so we can avoid a malloc for all cases by using APInt. This will incur a minor increase in stack usage due to APInt storing the bit count separately from the data bits unlike SmallBitVector, but that should be ok. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30386 llvm-svn: 296352	2017-02-27 16:15:25 +00:00
Craig Topper	ed0101a0b9	[X86] Check for less than 0 rather than explicit compare with -1. NFC llvm-svn: 296321	2017-02-27 06:05:30 +00:00
Craig Topper	6028584d8c	[X86] Fix execution domain for cmpss/sd instructions. llvm-svn: 296293	2017-02-26 06:45:59 +00:00
Craig Topper	036693302b	[AVX-512] Fix execution domain for scalar commutable min/max instructions. llvm-svn: 296292	2017-02-26 06:45:56 +00:00
Craig Topper	e70231be51	[AVX-512] Fix execution domain for vmovhpd/lpd/hps/lps. llvm-svn: 296291	2017-02-26 06:45:54 +00:00
Craig Topper	fe25988c68	[AVX-512] Fix the execution domain for AVX-512 integer broadcasts. llvm-svn: 296290	2017-02-26 06:45:51 +00:00
Craig Topper	49ba3f5406	[AVX-512] Disable the redundant patterns in the VPBROADCASTBr_Alt and VPBROADCASTWr_Alt instructions. NFC llvm-svn: 296289	2017-02-26 06:45:48 +00:00
Craig Topper	6bf9b809ce	[AVX-512] Fix execution domain for VPMADD52 instructions. llvm-svn: 296288	2017-02-26 06:45:45 +00:00
Craig Topper	aa8e903150	[AVX-512] Fix the execution domain for VSCALEF instructions. llvm-svn: 296286	2017-02-26 06:45:40 +00:00
Craig Topper	cac5d698df	[AVX-512] Fix execution domain of scalar VRANGE/REDUCE/GETMANT with sae. llvm-svn: 296285	2017-02-26 06:45:37 +00:00
Craig Topper	ed64904c74	[X86] Fix the execution domain for scalar SQRT intrinsic instruction. llvm-svn: 296284	2017-02-26 06:45:35 +00:00
Simon Pilgrim	0f5fb5f549	[APInt] Add APInt::extractBits() method to extract APInt subrange (reapplied) The current pattern for extract bits in range is typically: Mask.lshr(BitOffset).trunc(SubSizeInBits); Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable. This is another of the compile time issues identified in PR32037 (see also D30265). This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation. Differential Revision: https://reviews.llvm.org/D30336 llvm-svn: 296272	2017-02-25 20:01:58 +00:00
Craig Topper	2caa97c891	[AVX-512] Fix the execution domain for scalar FMA instructions. llvm-svn: 296271	2017-02-25 19:36:28 +00:00
Craig Topper	176f3310b6	[AVX-512] Fix the execution domain on some instructions. llvm-svn: 296270	2017-02-25 19:18:11 +00:00
Craig Topper	d2011e3612	[AVX-512] Remove unnecessary masked versions of VCVTSS2SD and VCVTSD2SS using the scalar register class. We only have patterns for the masked intrinsics. llvm-svn: 296264	2017-02-25 18:43:42 +00:00
Simon Pilgrim	cdf2bd656a	Revert: r296141 [APInt] Add APInt::extractBits() method to extract APInt subrange The current pattern for extract bits in range is typically: Mask.lshr(BitOffset).trunc(SubSizeInBits); Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable. This is another of the compile time issues identified in PR32037 (see also D30265). This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation. Differential Revision: https://reviews.llvm.org/D30336 llvm-svn: 296147	2017-02-24 18:31:04 +00:00
Simon Pilgrim	bd9fb2ae95	[APInt] Add APInt::extractBits() method to extract APInt subrange The current pattern for extract bits in range is typically: Mask.lshr(BitOffset).trunc(SubSizeInBits); Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable. This is another of the compile time issues identified in PR32037 (see also D30265). This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation. Differential Revision: https://reviews.llvm.org/D30336 llvm-svn: 296141	2017-02-24 17:46:18 +00:00
Simon Pilgrim	7f6a7c97a7	[X86][SSE] Target shuffle combine can try to combine up to 16 vectors Noticed while profiling PR32037, the target shuffle ops were being stored in SmallVector<*,8> types but the combiner could store as many as 16 ops at maximum depth (2 per depth). llvm-svn: 296130	2017-02-24 15:35:52 +00:00
Sanjay Patel	9f0fa52aa2	[x86] use DAG.getAllOnesConstant(); NFCI llvm-svn: 296128	2017-02-24 15:09:59 +00:00
Simon Pilgrim	aed352273e	[APInt] Add APInt::setBits() method to set all bits in range The current pattern for setting bits in range is typically: Mask \|= APInt::getBitsSet(MaskSizeInBits, LoPos, HiPos); Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation memory for the temporary variable. This is one of the key compile time issues identified in PR32037. This patch adds the APInt::setBits() helper method which avoids the temporary memory allocation completely, this first implementation uses setBit() internally instead but already significantly reduces the regression in PR32037 (~10% drop). Additional optimization may be possible. I investigated whether there is need for APInt::clearBits() and APInt::flipBits() equivalents but haven't seen these patterns to be particularly common, but reusing the code would be trivial. Differential Revision: https://reviews.llvm.org/D30265 llvm-svn: 296102	2017-02-24 10:15:29 +00:00
Craig Topper	8783bbb598	[AVX-512] Separate the fadd/fsub/fmul/fdiv/fmax/fmin with rounding mode ISD opcodes into separate packed and scalar opcodes. This is more consistent with the rest of the ISD opcodes. NFC llvm-svn: 296094	2017-02-24 07:21:10 +00:00
Craig Topper	f2529c188b	[AVX-512] Remove lzcnt intrinsics and autoupgrade them to generic ctlz intrinsics with select. Clang has been emitting cltz intrinsics for a while now. llvm-svn: 296091	2017-02-24 05:35:04 +00:00
Petr Hosek	a7d5916308	[Fuchsia] Use thread-pointer ABI slots for stack-protector and safe-stack The Fuchsia ABI defines slots from the thread pointer where the stack-guard value for stack-protector, and the unsafe stack pointer for safe-stack, are stored. This parallels the Android ABI support. Patch by Roland McGrath Differential Revision: https://reviews.llvm.org/D30237 llvm-svn: 296081	2017-02-24 03:10:10 +00:00
Evgeniy Stepanov	ee2d77f6d6	Disable TLS for stack protector on Android API<17. The TLS slot did not exist back then. llvm-svn: 296014	2017-02-23 21:06:35 +00:00
Ayman Musa	4b2c968c43	[X86][AVX] Disable VCVTSS2SD & VCVTSD2SS memory folding and fix the register class of their first input when creating node in fast-isel. (Quick fix to buildbot failure after rL295940 commit). llvm-svn: 295970	2017-02-23 13:15:44 +00:00
Ayman Musa	524dbdaa2b	[X86][AVX512] Remove VCVTSS2SDZ & VCVTSD2SSZ from memory folding tables as they introduce new read dependency when folding. (Quick fix to buildbot fail). llvm-svn: 295946	2017-02-23 08:13:36 +00:00
Ayman Musa	6e670cf44f	[X86][AVX512] Change VCVTSS2SD and VCVTSD2SS node types to keep consistency between VEX/EVEX versions. AVX versions of the converts work on f32/f64 types, while AVX512 version work on vectors. Differential Revision: https://reviews.llvm.org/D29988 llvm-svn: 295940	2017-02-23 07:24:21 +00:00
Simon Pilgrim	13cdd57964	[X86][SSE] getTargetConstantBitsFromNode - insert constant bits directly into masks. Minor optimization, don't create temporary mask APInts that are just going to be OR'd into the accumulate masks - insert directly instead. llvm-svn: 295848	2017-02-22 15:38:13 +00:00
Simon Pilgrim	3a895c4873	[X86][SSE] Use APInt::getBitsSet() instead of APInt::getLowBitsSet().shl() separately. NFCI. llvm-svn: 295845	2017-02-22 15:04:55 +00:00
Benjamin Kramer	5a7e0f8357	[GlobalISel] Fix compiler warnings and make assert assert something. llvm-svn: 295827	2017-02-22 12:59:47 +00:00
Igor Breger	f7359d893a	[X86][GlobalISel] Initial implementation , select G_ADD gpr, gpr Summary: Initial implementation for X86InstructionSelector. Handle selection COPY and G_ADD/G_SUB gpr, gpr . Reviewers: qcolombet, rovka, zvi, ab Reviewed By: rovka Subscribers: mgorny, dberris, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D29816 llvm-svn: 295824	2017-02-22 12:25:09 +00:00
Ayman Musa	ceea56c705	[X86] Fix memory operands definition for some instructions. Change integer memory operands to FP memory operands to some FP instructions. Differential Revision: https://reviews.llvm.org/D30201 llvm-svn: 295813	2017-02-22 08:06:29 +00:00
Craig Topper	56d4022997	[AVX-512] Allow legacy scalar min/max intrinsics to select EVEX instructions when available This patch introduces new X86ISD::FMAXS and X86ISD::FMINS opcodes. The legacy intrinsics now lower to this node. As do the AVX-512 masked intrinsics when the rounding mode is CUR_DIRECTION. I've merged a copy of the tablegen multiclass avx512_fp_scalar into avx512_fp_scalar_sae. avx512_fp_scalar still needs to support CUR_DIRECTION appearing as a rounding mode for X86ISD::FADD_ROUND and others. Differential revision: https://reviews.llvm.org/D30186 llvm-svn: 295810	2017-02-22 06:54:18 +00:00
Evandro Menezes	a8d3301ee1	[AArch64, X86] Add statistics for the MacroFusion pass llvm-svn: 295777	2017-02-21 22:16:13 +00:00
Evandro Menezes	b9b7f4b8d3	[AArch64, X86] Guard against both instrs being wild cards If both instrs are wild cards, the result can be a crash. llvm-svn: 295776	2017-02-21 22:16:11 +00:00
Geoff Berry	5d534b6a11	[CodeGenPrepare] Sink and duplicate more 'and' instructions. Summary: Rework the code that was sinking/duplicating (icmp and, 0) sequences into blocks where they were being used by conditional branches to form more tbz instructions on AArch64. The new code is more general in that it just looks for 'and's that have all icmp 0's as users, with a target hook used to select which subset of 'and' instructions to consider. This change also enables 'and' sinking for X86, where it is more widely beneficial than on AArch64. The 'and' sinking/duplicating code is moved into the optimizeInst phase of CodeGenPrepare, where it can take advantage of the fact the OptimizeCmpExpression has already sunk/duplicated any icmps into the blocks where they are used. One minor complication from this change is that optimizeLoadExt needed to be updated to always mark 'and's it has determined should be in the same block as their feeding load in the InsertedInsts set to avoid an infinite loop of hoisting and sinking the same 'and'. This change fixes a regression on X86 in the tsan runtime caused by moving GVNHoist to a later place in the optimization pipeline (see PR31382). Reviewers: t.p.northover, qcolombet, MatzeB Subscribers: aemerson, mcrosier, sebpop, llvm-commits Differential Revision: https://reviews.llvm.org/D28813 llvm-svn: 295746	2017-02-21 18:53:14 +00:00
Simon Pilgrim	8eb515d8c4	[X86] EltsFromConsecutiveLoads SDLoc argument should be const&. There appears never to have been a time that the reference was updated. llvm-svn: 295739	2017-02-21 17:42:28 +00:00
Simon Pilgrim	791955819c	[X86][AVX2] Fix VPBROADCASTQ folding on 32-bit targets. As i64 isn't a value type on 32-bit targets, we need to fold the VZEXT_LOAD into VPBROADCASTQ. llvm-svn: 295733	2017-02-21 16:41:44 +00:00
Simon Pilgrim	3546156122	[X86][SSE] Prefer to combine shuffles to VZEXT over VZEXT_MOVL. This matches what is already done during shuffle lowering and helps prevent the need for a zero-vector in cases where shuffles match both patterns. llvm-svn: 295723	2017-02-21 15:09:00 +00:00
Igor Breger	812f319794	[AVX512] Fix EXTRACT_VECTOR_ELT for v2i1/v4i1/v32i1/v64i1 with variable index. Differential Revision: https://reviews.llvm.org/D30189 llvm-svn: 295718	2017-02-21 14:01:25 +00:00
Craig Topper	d88389aa7e	[X86] Use SHLD with both inputs from the same register to implement rotate on Sandy Bridge and later Intel CPUs Summary: Sandy Bridge and later CPUs have better throughput using a SHLD to implement rotate versus the normal rotate instructions. Additionally it saves one uop and avoids a partial flag update dependency. This patch implements this change on any Sandy Bridge or later processor without BMI2 instructions. With BMI2 we will use RORX as we currently do. Reviewers: zvi Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30181 llvm-svn: 295697	2017-02-21 06:39:13 +00:00
Craig Topper	16d9730b86	[X86] Fix formatting. NFC llvm-svn: 295695	2017-02-21 06:27:13 +00:00
Craig Topper	d9fe664868	[AVX-512] Use sse_load_f32/f64 in place of scalar_to_vector and scalar load in some patterns. llvm-svn: 295693	2017-02-21 04:26:10 +00:00
Craig Topper	d890db6952	[AVX-512] Fix the ExeDomain for vcmpss/vcmpsd. llvm-svn: 295691	2017-02-21 04:26:04 +00:00
Sanjoy Das	90208720e3	Add a wrapper around copy_if in STLExtras; NFC I will add one more use for this in a later change. llvm-svn: 295685	2017-02-21 00:38:44 +00:00
Craig Topper	2012dda9a0	[AVX-512] Add a few more patterns for selecting masked vpternlog with broadcast loads where the passthru operand is not operand 0. llvm-svn: 295673	2017-02-20 17:44:09 +00:00
Simon Pilgrim	2967ed1c7e	[X86] Tidyup combineExtractVectorElt. NFCI. Pull out repeated code for extraction index operand and source vector value type. Use isNullConstant helper to check for zero extraction index. llvm-svn: 295670	2017-02-20 16:09:45 +00:00
Igor Breger	fda32d266a	[X86] Fix EXTRACT_VECTOR_ELT with variable index from v32i16 and v64i8 vector. Its more profitable to go through memory (1 cycles throughput) than using VMOVD + VPERMV/PSHUFB sequence ( 2/3 cycles throughput) to implement EXTRACT_VECTOR_ELT with variable index. IACA tool was used to get performace estimation (https://software.intel.com/en-us/articles/intel-architecture-code-analyzer) For example for var_shuffle_v16i8_v16i8_xxxxxxxxxxxxxxxx_i8 test from vector-shuffle-variable-128.ll I get 26 cycles vs 79 cycles. Removing the VINSERT node, we don't need it any more. Differential Revision: https://reviews.llvm.org/D29690 llvm-svn: 295660	2017-02-20 14:16:29 +00:00
Simon Pilgrim	5910ebe720	[X86][AVX512] Add support for ASHR v2i64/v4i64 support without VLX Use v8i64 ASHR instructions if we don't have VLX. Differential Revision: https://reviews.llvm.org/D28537 llvm-svn: 295656	2017-02-20 12:16:38 +00:00
Ayman Musa	51ffeab8c8	[X86][AVX] Extend hasVEX_WPrefix bit to accept WIG value (W Ignore) + update all AVX instructions with the new value. Add WIG value to all of AVX instructions which ignore the W-bit in their encoding, instead of giving them the default value of 0. This patch is needed for a follow up work on EVEX2VEX pass (replacing EVEX encoded instructions with their corresponding VEX version when possible). Differential Revision: https://reviews.llvm.org/D29876 llvm-svn: 295643	2017-02-20 08:27:54 +00:00
Craig Topper	c6c68f5958	[AVX-512] Add more patterns to fold masked VPTERNLOG with load when the passthru isn't operand 0. llvm-svn: 295640	2017-02-20 07:00:40 +00:00
Craig Topper	a5fa2e40f9	[AVX-512] Fix mistake in the immediate swizzle for some of the VPTERNLOG patterns. llvm-svn: 295638	2017-02-20 07:00:34 +00:00
Craig Topper	5b4e36aafa	[AVX-512] Add more VPTERNLOG patterns to enable folding of broadcast loads that aren't in operand 2. llvm-svn: 295634	2017-02-20 02:47:42 +00:00
Craig Topper	c184b671d9	[X86] Use memory form of shift right by 1 when the rotl immediate is one less than the operation size. An earlier commit already did this for the register form. llvm-svn: 295626	2017-02-20 00:37:23 +00:00
Craig Topper	63801df251	[AVX-512] Remove AddedComplexity from masked operations. The size of the patterns already increases their priority. llvm-svn: 295619	2017-02-19 21:44:35 +00:00
Simon Pilgrim	14a7eee0b4	[X86] Use peekThroughOneUseBitcasts helper. NFCI. llvm-svn: 295618	2017-02-19 21:40:51 +00:00
Davide Italiano	16b476ffcc	[X86] Prefer static_cast<> to C-style cast. NFCI. llvm-svn: 295617	2017-02-19 21:35:41 +00:00
Craig Topper	489057715e	[AVX-512] Disable peephole optimizations on the VPTERNLOG commute test. Add new patterns to enable isel to fold the loads on it own. llvm-svn: 295616	2017-02-19 21:32:15 +00:00
Simon Pilgrim	d590de2998	[X86][SSE] Use getTargetConstantBitsFromNode to find zeroable shuffle elements. Replaces existing approach that could only search BUILD_VECTOR nodes. Requires getTargetConstantBitsFromNode to discriminate cases with all/partial UNDEF bits in each element - this should also be useful when we get around to supporting getTargetShuffleMaskIndices with UNDEF elements. llvm-svn: 295613	2017-02-19 19:40:31 +00:00
Craig Topper	4e794c71a6	[AVX-512] Add patterns to recognize masked vpternlog when the passthrough operand is not operand 0. This uses a SDNodeXForm to swizzle the appropriate immediate bits to allow this to be matched. llvm-svn: 295612	2017-02-19 19:36:58 +00:00
Simon Pilgrim	4271186f9c	[X86][SSE] Enable initial support for domain crossing at high shuffle combine depths. As discussed on D27692, this permits another domain to be used to combine a shuffle at high depths. We currently set the required depth at 4 or more combined shuffles, this is probably too high for most targets but is a good starting point and already helps avoid a number of costly variable shuffles. llvm-svn: 295608	2017-02-19 17:19:38 +00:00
Simon Pilgrim	6d07d514de	[X86][SSE] Generalize INSERTPS/SHUFPS/SHUFPD combines across domains. Relax the INSERTPS/SHUFPS/SHUFPD combines to support integer inputs if permitted. llvm-svn: 295606	2017-02-19 15:15:40 +00:00
Simon Pilgrim	b4460cf5a9	[X86][SSE] Add domain crossing support for target shuffle combines. Add the infrastructure to flag whether float and/or int domains are permitable. A future patch will enable domain crossing based off shuffle depth and the value types of the source vectors. llvm-svn: 295604	2017-02-19 14:12:25 +00:00
Craig Topper	218d1a020e	[AVX-512] Add broadcast VPTERNLOG instructions to special case commuting switch. The instructions are marked commutable, but without special handling we don't get the immediate correct. While here also remove the masked memory forms that aren't commutable. llvm-svn: 295602	2017-02-19 08:03:26 +00:00
Craig Topper	007c93b2b9	[X86] Remove patterns for MOVSD with v4i32 types. We don't appear to really need them and if we do we should just use a bitcast to a 64-bit element type. llvm-svn: 295589	2017-02-19 02:08:48 +00:00
Craig Topper	06ae5e821c	[X86] Tighten up some of the SDNode type constraints. llvm-svn: 295588	2017-02-19 01:54:47 +00:00
Simon Pilgrim	599b872ca2	[X86] Fix enumeral/non-enumeral conditional expression warning. gcc only allows you to mix enums / ints if they have the same signedness. llvm-svn: 295586	2017-02-19 00:04:30 +00:00
Simon Pilgrim	2f2d8dc630	Fix signed/unsigned comparison warning. llvm-svn: 295580	2017-02-18 22:56:17 +00:00
Craig Topper	811756b4dc	[X86][XOP] Reduce the size of a multiclass by moving more stuff to parameters instead of doing 128-bit and 256-bit simultaneously. This requires some instructions to be renamed to move the Y earlier in the instruction name. The new names are more consistent with other instructions. llvm-svn: 295579	2017-02-18 22:53:43 +00:00
Simon Pilgrim	7a87eebcad	[X86] Fix enumeral/non-enumeral comparison warning. gcc only allows you to mix enums / ints if they have the same signedness. llvm-svn: 295576	2017-02-18 22:40:58 +00:00
Simon Pilgrim	2e78c94ea5	[X86][SSE] Avoid repeated calls to SDValue::getValueType. Added assertion to check input type of X86ISD::VZEXT during target known bits calculation. llvm-svn: 295575	2017-02-18 22:25:27 +00:00
Craig Topper	de10312bea	Recommit "[X86] Remove XOP VPCMOV intrinsics and autoupgrade them to native IR." Clang has now been fixed to not use these intrinsics. llvm-svn: 295571	2017-02-18 21:50:58 +00:00
Sanjay Patel	12c2093e1e	[x86] fold sext (xor Bool, -1) --> sub (zext Bool), 1 This is the same transform that is current used for: select Bool, 0, -1 llvm-svn: 295568	2017-02-18 21:03:28 +00:00
Craig Topper	ba2a726cc6	Revert "[X86] Remove XOP VPCMOV intrinsics and autoupgrade them to native IR." This reverts r295564. I missed that clang was still using the intrinsics despite our half implemented autoupgrade support. llvm-svn: 295565	2017-02-18 20:14:20 +00:00
Craig Topper	884db3f85d	[X86] Remove XOP VPCMOV intrinsics and autoupgrade them to native IR. It seems we were already upgrading 128-bit VPCMOV, but the intrinsic was still defined and being used in isel patterns. While I was here I also simplified the tablegen multiclasses. llvm-svn: 295564	2017-02-18 19:51:25 +00:00
Craig Topper	a505169ca5	[AVX-512] Remove 128/256-bit masked fp max/min intrinsics. Upgrade them to legacy unmasked intrinsics and select instructions. llvm-svn: 295543	2017-02-18 07:07:50 +00:00
Simon Pilgrim	7db8f42fe3	[X86] Simplify by pulling out valuetype. NFCI. llvm-svn: 295502	2017-02-17 22:10:10 +00:00
Simon Pilgrim	a4c350ff17	[X86][SSE] Add (V)MOVD folding pattern with zextloadi64i32 load node. Fixes PRPR31309 llvm-svn: 295492	2017-02-17 20:43:32 +00:00
Hans Wennborg	35905d6a67	Re-apply r282920 "X86: Allow conditional tail calls in Win64 "leaf" functions (PR26302)" The original commit was reverted in r283329 due to a miscompile in Chromium. That turned out to be the same issue as PR31257, which was fixed in r295262. llvm-svn: 295357	2017-02-16 19:04:42 +00:00
Andrea Di Biagio	42f7712e23	x86 interrupt calling convention: only save xmm registers if the target supports SSE The existing code always saves the xmm registers for 64-bit targets even if the target doesn't support SSE (which is common for kernels). Thus, the compiler inserts movaps instructions which lead to CPU exceptions when an interrupt handler is invoked. This commit fixes this bug by returning a register set without xmm registers from getCalleeSavedRegs and getCallPreservedMask for such targets. Patch by Philipp Oppermann. Differential Revision: https://reviews.llvm.org/D29959 llvm-svn: 295347	2017-02-16 18:25:37 +00:00
Simon Pilgrim	2fe568c95e	[X86] Remove local areOnlyUsersOf helper and use SDNode::areOnlyUsersOf instead. llvm-svn: 295326	2017-02-16 15:11:49 +00:00
Craig Topper	715873ead3	[AVX-512] Remove masked packss/packus intrinsics and autoupgrade to unmasked intrinsics with select instructions. For 512-bit add new unmasked intrinsics. The new 512-bit unmasked intrinsics will make it easy to handle these with the SSE/AVX intrinsics in InstCombine where we currently have a TODO. llvm-svn: 295290	2017-02-16 06:31:54 +00:00
Hans Wennborg	a468601e0e	[X86] Re-enable conditional tail calls and fix PR31257. This reverts r294348, which removed support for conditional tail calls due to the PR above. It fixes the PR by marking live registers as implicitly used and defined by the now predicated tailcall. This is similar to how IfConversion predicates instructions. Differential Revision: https://reviews.llvm.org/D29856 llvm-svn: 295262	2017-02-16 00:04:05 +00:00
Simon Pilgrim	5b4c30fb32	[X86][SSE] Don't call EltsFromConsecutiveLoads if any element is missing. Minor performance speedup - if any call to getShuffleScalarElt fails to get a result, don't both calling for the remaining elements as EltsFromConsecutiveLoads will fail anyhow. llvm-svn: 295235	2017-02-15 21:09:00 +00:00
Simon Pilgrim	da25d5c7b6	[X86][SSE] Propagate undef upper elements from scalar_to_vector during shuffle combining Only do this for integer types currently - floats types (in particular insertps) load folding often fails with this. llvm-svn: 295208	2017-02-15 17:41:33 +00:00
Simon Pilgrim	0f0e5bd3c6	[X86][SSE] Allow matchVectorShuffleWithUNPCK to recognise ZERO inputs Add support for specifying an UNPCK input as ZERO, particularly improves ZEXT cases with non-zero offsets llvm-svn: 295169	2017-02-15 11:46:15 +00:00
Ayman Musa	b8a4f255dd	[X86][AVX] Remove REX_W from AVX instructions. There is no meaning for REX_W in VEX encoded AVX instruction. Differential Revision: https://reviews.llvm.org/D29894 llvm-svn: 295157	2017-02-15 08:12:16 +00:00
Craig Topper	fbc7805e25	[X86] Don't create VBROADCAST nodes with 256-bit or 512-bit input types Summary: We don't seem to have great rules on what a valid VBROADCAST node looks like. And as a consequence we end up with a lot of patterns to try to catch everything. We have patterns with scalar inputs, 128-bit vector inputs, 256-bit vector inputs, and 512-bit vector inputs. As you can see from the things improved here we are currently missing patterns for 128-bit loads being extended to 256-bit before the vbroadcast. I'd like to propose that VBROADCAST should always take a 128-bit vector type as input. As a first step towards that this patch adds an EXTRACT_SUBVECTOR in front of VBROADCAST when the input is 256 or 512-bits. In the future I would like to add scalar_to_vector around all the scalar operations. And maybe we should consider adding a VBROADCAST+load node to avoid separating loads from the broadcasting operation when the load itself isn't foldable. This requires an additional change in target shuffle combining to look for the extract subvector and look through it to find the original operand. I'm sure this change isn't perfect but was enough to fix a few test failures that were being caused. Another interesting thing I noticed is that the changes in masked_gather_scatter.ll show cases were we don't remove a useless insert into element 1 before broadcasting element 0. Reviewers: delena, RKSimon, zvi Reviewed By: zvi Subscribers: igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D28747 llvm-svn: 295155	2017-02-15 06:58:47 +00:00
Craig Topper	ec5df5f4aa	[AVX-512] Add PACKSS/PACKUS instructions to load folding tables. llvm-svn: 295154	2017-02-15 06:51:39 +00:00
Diego Novillo	8adfc8ef3a	Remove unused variable. llvm-svn: 295065	2017-02-14 16:39:54 +00:00
Simon Pilgrim	6f732e026d	[X86][SSE] Allow matchVectorShuffleWithUNPCK to recognise UNDEF inputs Add support for specifying an UNPCK input as UNDEF llvm-svn: 295061	2017-02-14 16:22:04 +00:00
Simon Pilgrim	a0878dea9e	[X86][SSE] Move unary inputs handling inside matchVectorShuffleWithUNPCK. llvm-svn: 295053	2017-02-14 13:47:17 +00:00
Simon Pilgrim	3efdffcb27	[X86][SSE] Tidyup matchVectorShuffleWithUNPCK helper function call. Don't bother setting the V1/V2 operands again for unary shuffles. Don't bother legalizing the value type unless the match succeeds. llvm-svn: 295051	2017-02-14 12:54:39 +00:00
Craig Topper	d2d50cba2a	[AVX-512] Add PAVGB/PAVGW to load folding tables. llvm-svn: 295035	2017-02-14 06:54:57 +00:00
Andrew Kaylor	709f1c2a9b	[X86] Add MXCSR register This adds MXCSR to the set of recognized registers for X86 targets and updates the instructions that read or write it. I do not intend for all of the various floating point instructions that implicitly use the control bits or update the status bits of this register to ever have that usage modeled by default. However, when constrained floating point modes (such as strict FP exception status modeling or dynamic rounding modes) are enabled, implicit use/def information for MXCSR will be added to those instructions. Until those additional updates are made this should cause (almost?) no functional changes. Theoretically, this will prevent instructions like LDMXCSR and STMXCSR from being moved past one another, but that should be prevented anyway and I haven't found a case where it is happening now. Differential Revision: https://reviews.llvm.org/D29903 llvm-svn: 295004	2017-02-13 23:38:52 +00:00
Simon Pilgrim	fd6a84fbaa	Fix indentation. NFCI. llvm-svn: 294959	2017-02-13 15:31:08 +00:00
Simon Pilgrim	828dee1f70	[X86][SSE] Create matchVectorShuffleWithUNPCK helper function. Currently only used by target shuffle combining - will use it for lowering as well in a future patch. llvm-svn: 294943	2017-02-13 11:52:58 +00:00
Ayman Musa	f77219e035	[X86][AVX512] Fix operand classes for some AVX512 instructions to keep consistency between VEX/EVEX versions of the same instruction. Differential Revision: https://reviews.llvm.org/D29873 llvm-svn: 294937	2017-02-13 09:55:48 +00:00
Craig Topper	680c73e7ab	[X86] Genericize the handling of INSERT_SUBVECTOR from an EXTRACT_SUBVECTOR to support 512-bit vectors with 128-bit or 256-bit subvectors. We now detect that both the extract and insert indices are non-zero and convert to a shuffle. This will be lowered as a blend for 256-bit vectors or as a vshuf operations for 512-bit vectors. llvm-svn: 294931	2017-02-13 04:53:29 +00:00
Craig Topper	53eafa8ea4	[X86] Don't let LowerEXTRACT_SUBVECTOR call getNode for EXTRACT_SUBVECTOR. This results in the simplifications inside of getNode running while we're legalizing nodes popped off the worklist during the final DAG combine. This basically makes a DAG combine like operation occur during this legalize step, but we don't handle something quite the same way. I think we don't recursively added the removed nodes to the DAG combiner worklist. llvm-svn: 294929	2017-02-12 23:49:46 +00:00
Simon Pilgrim	cc9242bd1c	[X86] Fix typo in function name. NFCI. convertBitVectorToUnsiged - convertBitVectorToUnsigned llvm-svn: 294914	2017-02-12 20:53:44 +00:00
Craig Topper	cfe8ce3a58	[AVX-512] Add various EVEX move instructions to load folding tables using the VEX equivalents as a guide. llvm-svn: 294908	2017-02-12 18:47:46 +00:00
Craig Topper	5971b5488e	[AVX-512] Add VMOV64toSDZrm CodeGenOnly instruction based on the same instruction from AVX/SSE. I can't prove that we can select this instruction or the AVX/SSE version, but I'm adding it for consistency for now so I can continue matching the load folding tables. llvm-svn: 294907	2017-02-12 18:47:44 +00:00
Craig Topper	ec26801483	[X86] Fix a couple instruction names to use 'mr' instead of 'rm' to indicate they are stores. AVX-512 version was already named with 'mr'. llvm-svn: 294906	2017-02-12 18:47:40 +00:00
Craig Topper	6eca3170a8	[AVX-512] Add VPEXTRD/Q to load folding tables. llvm-svn: 294905	2017-02-12 18:47:37 +00:00
Simon Pilgrim	04ec0f2b2a	[X86][SSE] Update argument names to match function name. NFCI. The target shuffle match function arguments were using the term 'Ops' but the function names referred to them as 'Inputs' - use 'Inputs' consistently. llvm-svn: 294900	2017-02-12 16:46:41 +00:00
Simon Pilgrim	4cd841757a	[X86][AVX2] Add support for combining target shuffles to VPMOVZX Initial 256-bit vector support - 512-bit support requires extra checks for AVX512BW support (PMOVZXBW) that will be handled in a future patch. llvm-svn: 294896	2017-02-12 14:31:23 +00:00
Elena Demikhovsky	5d91ab46c0	AVX-512: Fixed DWARF register numbers for XMM16-31 The reference is here: https://software.intel.com/sites/default/files/article/402129/mpx-linux64-abi.pdf llvm-svn: 294890	2017-02-12 07:56:50 +00:00
Craig Topper	1c37e991e6	[X86] Move code for using blendi for insert_subvector out to an isel pattern. This gives the DAG combiner more opportunity to optimize without needing to dig through the blend. llvm-svn: 294876	2017-02-11 22:57:12 +00:00
Simon Pilgrim	755d9127f5	[X86][SSE] Use VSEXT/VZEXT constant folding for SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG Preparatory step for PR31712 llvm-svn: 294874	2017-02-11 22:47:06 +00:00
Simon Pilgrim	437d64c49e	[X86][SSE] Improve VSEXT/VZEXT constant folding. Generalize VSEXT/VZEXT constant folding to work with any target constant bits source not just BUILD_VECTOR . llvm-svn: 294873	2017-02-11 21:55:24 +00:00
Simon Pilgrim	4ef9672f0f	[X86][SSE] Add early-out when trying to match blend shuffle. NFCI. llvm-svn: 294864	2017-02-11 18:06:24 +00:00
Amaury Sechet	58ce15aba1	Fix indentation in X86ISelLowering. NFC llvm-svn: 294859	2017-02-11 17:48:48 +00:00
Craig Topper	255343483d	[AVX-512] Add VPMINS/MINU/MAXS/MAXU instructions to load folding tables. llvm-svn: 294858	2017-02-11 17:35:28 +00:00
Craig Topper	b2fa216dd5	[X86] Improve alphabetizing of load folding tables. NFC llvm-svn: 294857	2017-02-11 17:35:25 +00:00
Simon Pilgrim	0e6945e48a	[X86][SSE] Convert getTargetShuffleMaskIndices to use getTargetConstantBitsFromNode. Removes duplicate constant extraction code in getTargetShuffleMaskIndices. getTargetConstantBitsFromNode - adds support for VZEXT_MOVL(SCALAR_TO_VECTOR) and fail if the caller doesn't support undef bits. llvm-svn: 294856	2017-02-11 17:27:21 +00:00
Simon Pilgrim	d59fa0e38a	[X86] Merge repeated getScalarValueSizeInBits calls. NFCI. llvm-svn: 294852	2017-02-11 16:42:07 +00:00
Simon Pilgrim	6411a0ebed	[X86][3DNow!] Enable PFSUB<->PFSUBR commutation llvm-svn: 294847	2017-02-11 13:51:14 +00:00
Simon Pilgrim	4ead1d4aa9	[X86][3DNow!] Enable commutation for PFADD/PFMUL/PFCMPEQ/PAVGUSB/PMULHRW All commutations confirmed to give identical results - note PFMAX/PFMIN do not PFSUB<->PFSUBR should be commutable as well llvm-svn: 294846	2017-02-11 13:32:55 +00:00
Benjamin Kramer	efcf06f5f2	Move symbols from the global namespace into (anonymous) namespaces. NFC. llvm-svn: 294837	2017-02-11 11:06:55 +00:00
Craig Topper	1f6153bab4	[AVX-512] Add VPINSRB/W/D/Q instructions to load folding tables. llvm-svn: 294830	2017-02-11 07:01:40 +00:00
Craig Topper	a9818aadab	[AVX-512] Fix apparent typo in instruction name VMOVSSDrr_REV->VMOVSDZrr_REV. llvm-svn: 294829	2017-02-11 07:01:38 +00:00
Craig Topper	3afa777f10	[AVX-512] Add VPSADBW instructions to load folding tables. llvm-svn: 294827	2017-02-11 06:24:03 +00:00
Craig Topper	464b8cb244	[X86] Don't base domain decisions on VEXTRACTF128/VINSERTF128 if only AVX1 is available. Seems the execution dependency pass likes to use FP instructions when most of the consuming code is integer if a vextractf128 instruction produced the register. Without AVX2 we don't have the corresponding integer instruction available. This patch suppresses the domain on these instructions to GenericDomain if AVX2 is not supported so that they are ignored by domain fixing. If AVX2 is supported we'll report the correct domain and allow them to switch between integer and fp. Overall I think this produces better results in the modified test cases. llvm-svn: 294824	2017-02-11 05:32:57 +00:00
Ahmed Bougacha	2e275e272f	[X86] Bitcast subvector before broadcasting it. Since r274013, we've been looking through bitcasts on broadcast inputs. In the scalar-folding case (from a load, build_vector, or sc2vec), the input type didn't matter, as we'd simply bitcast the resulting scalar back. However, when broadcasting a 128-bit-lane-aligned element, we create an EXTRACT_SUBVECTOR. Use proper types, by creating an extract_subvector of the original input type. llvm-svn: 294774	2017-02-10 19:51:47 +00:00
Simon Pilgrim	8c8b10389d	[X86][SSE] Use SDValue::getConstantOperandVal helper. NFCI. Also reordered an if statement to test low cost comparisons first llvm-svn: 294748	2017-02-10 14:27:59 +00:00
Simon Pilgrim	c371159aac	[X86][SSE] Add support for extracting target constants from BUILD_VECTOR In some cases we call getTargetConstantBitsFromNode for nodes that haven't been lowered from BUILD_VECTOR yet Note: We're getting very close to being able to move most of the constant extraction code from getTargetShuffleMaskIndices into getTargetConstantBitsFromNode llvm-svn: 294746	2017-02-10 14:04:11 +00:00
Simon Pilgrim	1140281413	[X86][SSE] Add missing comment describing combing to SHUFPS. NFCI llvm-svn: 294745	2017-02-10 13:16:01 +00:00
Igor Breger	6677999e17	add #ifdef, fix compilation error in case LLVM_BUILD_GLOBAL_ISEL=OFF llvm-svn: 294726	2017-02-10 07:33:14 +00:00
Igor Breger	b4442f34cd	[X86][GlobalISel] Add general-purpose Register Bank Summary: [X86][GlobalISel] Add general-purpose Register Bank. Add trivial handling of G_ADD legalization . Add Regestry Bank selection for COPY and G_ADD instructions Reviewers: rovka, zvi, ab, t.p.northover, qcolombet Reviewed By: qcolombet Subscribers: qcolombet, mgorny, dberris, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D29771 llvm-svn: 294723	2017-02-10 07:05:56 +00:00
Eric Christopher	0824096cc0	Temporarily revert "For X86-64 linux and PPC64 linux align int128 to 16 bytes." until we can get better TargetMachine::isCompatibleDataLayout to compare - otherwise we can't code generate existing bitcode without a string equality data layout. This reverts commit r294702. llvm-svn: 294709	2017-02-10 04:35:32 +00:00
Eric Christopher	42b9248803	For X86-64 linux and PPC64 linux align int128 to 16 bytes. For other platforms we should find out what they need and likely make the same change, however, a smaller additional change is easier for platforms we know have it specified in the ABI. As part of this rewrite some of the handling in the backends for data layout and update a bunch of testcases. Based on a patch by Simonas Kazlauskas! llvm-svn: 294702	2017-02-10 03:32:21 +00:00
Simon Pilgrim	7f0d7e08b2	[X86] Remove duplicate call to getValueType. NFCI. llvm-svn: 294640	2017-02-09 22:35:59 +00:00
Peter Collingbourne	ef089bdb4b	X86: Introduce relocImm-based patterns for cmp. Differential Revision: https://reviews.llvm.org/D28690 llvm-svn: 294636	2017-02-09 22:02:28 +00:00
Peter Collingbourne	d7dd65ad7c	X86: Teach X86InstrInfo::analyzeCompare to recognize compares of symbols. This requires that we communicate to X86InstrInfo::optimizeCompareInstr that the second operand is neither a register nor an immediate. The way we do that is by setting CmpMask to zero. Note that there were already instructions where the second operand was not a register nor an immediate, namely X86::SUB*rm, so also set CmpMask to zero for those instructions. This seems like a latent bug, but I was unable to trigger it. Differential Revision: https://reviews.llvm.org/D28621 llvm-svn: 294634	2017-02-09 21:58:24 +00:00
Simon Pilgrim	e0b5c2acbd	Convert to for-range loop. NFCI. llvm-svn: 294610	2017-02-09 18:52:24 +00:00
Simon Pilgrim	6bf1bd3ed6	[X86][MMX] Remove the (long time) unused MMX_PINSRW ISD opcode. llvm-svn: 294596	2017-02-09 17:08:47 +00:00
Pierre Gousseau	6953b32475	[X86][btver2] PR31902: Fix a crash in combineOrCmpEqZeroToCtlzSrl under fast math. In combineOrCmpEqZeroToCtlzSrl, replace "getConstantOperand == 0" by "isNullConstant" to account for floating point constants. Differential Revision: https://reviews.llvm.org/D29756 llvm-svn: 294588	2017-02-09 14:43:58 +00:00
Simon Pilgrim	563e23e66e	[X86][SSE] Attempt to break register dependencies during lowerBuildVector LowerBuildVectorv16i8/LowerBuildVectorv8i16 insert values into a UNDEF vector if the build vector doesn't contain any zero elements, resulting in register dependencies with a previous use of the register. This patch attempts to break the register dependency by either always zeroing the vector before hand or (if we're inserting to the 0'th element) by using VZEXT_MOVL(SCALAR_TO_VECTOR(i32 AEXT(Elt))) which lowers to (V)MOVD and performs a similar function. Additionally (V)MOVD is a shorter instruction than PINSRB/PINSRW. We already do something similar for SSE41 PINSRD. On pre-SSE41 LowerBuildVectorv16i8 we go a little further and use VZEXT_MOVL(SCALAR_TO_VECTOR(i32 ZEXT(Elt))) if the build vector contains zeros to avoid the vector zeroing at the cost of a scalar zero extension, which can probably be brought over to the other cases in a future patch in some cases (load folding etc.) Differential Revision: https://reviews.llvm.org/D29720 llvm-svn: 294581	2017-02-09 11:50:19 +00:00
Craig Topper	3cac763532	[X86] Remove the HLE feature flag. We only implemented it for one of the 3 HLE instructions and that instruction is also under the RTM flag. Clang only implements the RTM flag from its command line. llvm-svn: 294562	2017-02-09 06:51:02 +00:00
Craig Topper	86576bd921	[X86] Remove INVPCID and SMAP feature flags. They aren't currently used by any instructions and not tested. If we implement intrinsics for their instructions in the future, the feature flags can be added back with proper testing. llvm-svn: 294561	2017-02-09 06:50:59 +00:00
Craig Topper	50f3d1452c	[X86] Clzero intrinsic and its addition under znver1 This patch does the following. 1. Adds an Intrinsic int_x86_clzero which works with __builtin_ia32_clzero 2. Identifies clzero feature using cpuid info. (Function:8000_0008, Checks if EBX[0]=1) 3. Adds the clzero feature under znver1 architecture. 4. The custom inserter is added in Lowering. 5. A testcase is added to check the intrinsic. 6. The clzero instruction is added to assembler test. Patch by Ganesh Gopalasubramanian with a couple formatting tweaks, a disassembler test, and using update_llc_test.py from me. Differential revision: https://reviews.llvm.org/D29385 llvm-svn: 294558	2017-02-09 04:27:34 +00:00
Simon Pilgrim	dcd10344a3	[X86][SSE] Tidyup LowerBuildVectorv16i8 and LowerBuildVectorv8i16. NFCI. Run clang-format and standardized variable names between functions. llvm-svn: 294456	2017-02-08 14:44:45 +00:00
Craig Topper	3fd463a15a	[X86] Add test for clflushopt intrinsic and only enable it to be selected if the feature flag is set. llvm-svn: 294407	2017-02-08 05:45:46 +00:00
Craig Topper	6c05192018	[X86] Remove the VMFUNC feature flag. It was only partially implemented and we have no support for codegening vmfunc instructions today. If that support ever gets added, the full feature flag support should come along with it. llvm-svn: 294406	2017-02-08 05:45:42 +00:00
Craig Topper	e0ac7f3beb	[X86] Remove PCOMMIT instruction support since Intel has deprecated this instruction with no plans to release products with it. Intel's documentation for the deprecation https://software.intel.com/en-us/blogs/2016/09/12/deprecate-pcommit-instruction llvm-svn: 294405	2017-02-08 05:45:39 +00:00
Hans Wennborg	819e3e02a9	[X86] Disable conditional tail calls (PR31257) They are currently modelled incorrectly (as calls, which clobber registers, confusing e.g. Machine Copy Propagation). Reverting until we figure out the proper solution. llvm-svn: 294348	2017-02-07 20:37:45 +00:00
Sanjay Patel	b0cee9b273	[x86] improve comments for SHRUNKBLEND node creation; NFC llvm-svn: 294344	2017-02-07 19:54:16 +00:00
Sanjoy Das	2f63cbcc0c	[ImplicitNullCheck] Extend Implicit Null Check scope by using stores Summary: This change allows usage of store instruction for implicit null check. Memory Aliasing Analisys is not used and change conservatively supposes that any store and load may access the same memory. As a result re-ordering of store-store, store-load and load-store is prohibited. Patch by Serguei Katkov! Reviewers: reames, sanjoy Reviewed By: sanjoy Subscribers: atrick, llvm-commits Differential Revision: https://reviews.llvm.org/D29400 llvm-svn: 294338	2017-02-07 19:19:49 +00:00
Sanjay Patel	ef6d573f67	[x86] use range-for loops; NFCI llvm-svn: 294337	2017-02-07 19:18:25 +00:00
Sanjay Patel	633ecbf3c4	[x86] use getSignBit() for clarity; NFCI llvm-svn: 294333	2017-02-07 19:01:35 +00:00
Simon Pilgrim	8c0f62d293	[X86][SSE] Ensure that vector shift-by-immediate inputs are correctly bitcast to the result type vXi8/vXi64 vector shifts are often shifted as vYi16/vYi32 types but we weren't always remembering to bitcast the input. Tested with a new assert as we don't currently manipulate these shifts enough for test cases to catch them. llvm-svn: 294308	2017-02-07 14:22:25 +00:00
Craig Topper	9191c3324a	[AVX-512] Add masked and unmasked shift by immediate instructions to load folding tables. llvm-svn: 294287	2017-02-07 07:31:00 +00:00
Craig Topper	62304d80e3	[AVX-512] Add masked shift instructions to load folding tables. This adds the masked versions of everything, but the shift by immediate instructions. llvm-svn: 294286	2017-02-07 07:30:57 +00:00
Craig Topper	45d9ddc687	[AVX-512] Add some of the shift instructions to the load folding tables. This includes unmasked forms of variable shift and shifting by the lower element of a register. Still need to do shift by immediate which was not foldable prior to avx512 and all the masked forms. llvm-svn: 294285	2017-02-07 07:30:54 +00:00
Craig Topper	39d86bb688	[X86] Change the Defs list for VZEROALL/VZEROUPPER back to not including YMM16-31. llvm-svn: 294277	2017-02-07 04:10:57 +00:00
Eugene Zelenko	90562dfb50	[X86] Fix some Include What You Use warnings; other minor fixes (NFC). This is preparation to reduce MCExpr.h dependencies.(vlsj-clangbuild)[622] llvm-svn: 294246	2017-02-06 21:55:43 +00:00
Simon Pilgrim	bfd4495512	[X86][SSE] Combine shuffle nodes with multiple uses if all the users are being combined. Currently we only combine shuffle nodes if they have a single user to prevent us from causing code bloat by splitting the shuffles into several different combines. We don't take into account that in some cases we will already have combined all the users during recursively calling up the shuffle tree. This patch keeps a list of all the shuffle nodes that have been combined so far and permits combining of further shuffle nodes if all its users are in that list. Differential Revision: https://reviews.llvm.org/D29399 llvm-svn: 294183	2017-02-06 13:44:45 +00:00
Igor Breger	5c31a4c9a3	[X86][GlobalISel] Add limited ret lowering support to the IRTranslator. Summary: Support return lowering for i8/i16/i32/i64/float/double, vector type supported for 64bit platform only. Support argument lowering for float/double types. Reviewers: t.p.northover, zvi, ab, rovka Reviewed By: zvi Subscribers: dberris, kristof.beyls, delena, llvm-commits Differential Revision: https://reviews.llvm.org/D29261 llvm-svn: 294173	2017-02-06 08:37:41 +00:00
Craig Topper	5d9ecd23e8	[AVX-512] Add VPSLLDQ/VPSRLDQ to load folding tables. llvm-svn: 294170	2017-02-06 05:12:14 +00:00
Craig Topper	f0eb60a6f3	[AVX-512] Add VPABSB/D/Q/W to load folding tables. llvm-svn: 294169	2017-02-06 03:18:01 +00:00
Craig Topper	864b1a5376	[AVX-512] Add VSHUFPS/PD to load folding tables. llvm-svn: 294168	2017-02-06 03:17:58 +00:00
Craig Topper	75218fb6b1	[AVX-512] Add VPMULLD/Q/W instructions to load folding tables. llvm-svn: 294164	2017-02-06 01:19:26 +00:00
Craig Topper	452a7770e6	[AVX-512] Add all masked and unmasked versions of VPMULDQ and VPMULUDQ to load folding tables. llvm-svn: 294163	2017-02-05 23:31:48 +00:00
Simon Pilgrim	380ce75687	[X86][SSE] Replace insert_vector_elt(vec, -1, idx) with shuffle Similar to what we already do for zero elt insertion, we can quickly rematerialize 'allbits' vectors so to avoid a unnecessary gpr value and insertion into a vector llvm-svn: 294162	2017-02-05 22:50:29 +00:00
Craig Topper	8eb1f315ac	[AVX-512] Add scalar masked max/min intrinsic instructions to the load folding tables. llvm-svn: 294153	2017-02-05 22:25:46 +00:00
Craig Topper	cb4bc8be5b	[AVX-512] Add scalar masked add/sub/mul/div intrinsic instructions to the load folding tables. llvm-svn: 294152	2017-02-05 22:25:42 +00:00
Craig Topper	59af67206d	[AVX-512] Add masked scalar FMA intrinsics to isNonFoldablePartialRegisterLoad to improve load folding of scalar loads. llvm-svn: 294151	2017-02-05 22:25:40 +00:00
Kamil Rytarowski	5d2bd8dd54	Revamp llvm::once_flag to be closer to std::once_flag Summary: Make this interface reusable similarly to std::call_once and std::once_flag interface. This makes porting LLDB to NetBSD easier as there was in the original approach a portable way to specify a non-static once_flag. With this change translating std::once_flag to llvm::once_flag is mechanical. Sponsored by <The NetBSD Foundation> Reviewers: mehdi_amini, labath, joerg Reviewed By: mehdi_amini Subscribers: emaste, clayborg Differential Revision: https://reviews.llvm.org/D29566 llvm-svn: 294143	2017-02-05 21:13:06 +00:00
Craig Topper	cac328f25e	[X86] Fix printing of sha256rnds2 to include the implicit %xmm0 argument. llvm-svn: 294132	2017-02-05 18:33:31 +00:00
Craig Topper	d7ae9ab1fa	[X86] Fix printing of blendvpd/blendvps/pblendvb to include the implicit %xmm0 argument. This makes codegen output more obvious about the %xmm0 usage. llvm-svn: 294131	2017-02-05 18:33:24 +00:00
Craig Topper	6a35a81fc5	[X86] In LowerTRUNCATE, create an ISD::VECTOR_SHUFFLE instead of explicitly creating a PSHUFB. This will be lowered by regular shuffle lowering to a PSHUFB later. Similar was already done for several other shuffles in this function. The test changes are because the old code used explicity zeroing for elements that could have been undef. While I was here I also changed other shuffle vectors in the same function to use the same input twice instead of creating UNDEF nodes. getVectorShuffle can create the UNDEF for us. llvm-svn: 294130	2017-02-05 18:33:14 +00:00
Craig Topper	978fdb75a4	[X86] Add support for folding (insert_subvector vec1, (extract_subvector vec2, idx1), idx1) -> (blendi vec2, vec1). llvm-svn: 294112	2017-02-04 23:26:46 +00:00
Craig Topper	3d95228dbe	[X86] Simplify the code that turns INSERT_SUBVECTOR into BLENDI. NFCI llvm-svn: 294111	2017-02-04 23:26:42 +00:00
Simon Pilgrim	034c1bd32c	[X86][SSE] Add support for combining scalar_to_vector(extract_vector_elt) into a target shuffle. Correctly flagging upper elements as undef. llvm-svn: 294020	2017-02-03 17:59:58 +00:00
Craig Topper	bbb2b95ce5	[X86] Mark 256-bit and 512-bit INSERT_SUBVECTOR operations as legal and remove the custom lowering. llvm-svn: 293969	2017-02-03 00:24:49 +00:00
Eugene Zelenko	fbd13c5c12	[X86] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 293949	2017-02-02 22:55:55 +00:00
Reid Kleckner	3c467e225e	[X86] Avoid sorted order check in release builds Effectively reverts r290248 and fixes the unused function warning with ifndef NDEBUG. llvm-svn: 293945	2017-02-02 22:06:30 +00:00
Craig Topper	c45657375b	[X86] Move turning 256-bit INSERT_SUBVECTORS into BLENDI from legalize to DAG combine. On one test this seems to have given more chance for DAG combine to do other INSERT_SUBVECTOR/EXTRACT_SUBVECTOR combines before the BLENDI was created. Looks like we can still improve more by teaching DAG combine to optimize INSERT_SUBVECTOR/EXTRACT_SUBVECTOR with BLENDI. llvm-svn: 293944	2017-02-02 22:02:57 +00:00
Michael Kuperstein	e6d59fdca5	[X86] Add costs for non-AVX512 single-source permutation integer shuffles Differential Revision: https://reviews.llvm.org/D29416 llvm-svn: 293932	2017-02-02 20:27:13 +00:00
Nirav Dave	e14300e270	[X86,ISEL] Fix X86 increment chain dependence calculation Merging Load-add-store pattern into a increment op previously dropped the load's chain from the instructions dependence if the store is chained to a TokenFactor. llvm-svn: 293892	2017-02-02 14:39:26 +00:00
Simon Pilgrim	20ab6b875a	[X86][SSE] Use MOVMSK for all_of/any_of reduction patterns This is a first attempt at using the MOVMSK instructions to replace all_of/any_of reduction patterns (i.e. an and/or + shuffle chain). So far this only matches patterns where we are reducing an all/none bits source vector (i.e. a comparison result) but we should be able to expand on this in conjunction with improvements to 'bool vector' handling both in the x86 backend as well as the vectorizers etc. Differential Revision: https://reviews.llvm.org/D28810 llvm-svn: 293880	2017-02-02 11:52:33 +00:00

... 4 5 6 7 8 ...

14969 Commits