llvm-project

Commit Graph

Author	SHA1	Message	Date
Sameer AbuAsal	9b65ffb097	[RISCV] Add machine function pass to merge base + offset Summary: In r333455 we added a peephole to fix the corner cases that result from separating base + offset lowering of global address.The peephole didn't handle some of the cases because it only has a basic block view instead of a function level view. This patch replaces that logic with a machine function pass. In addition to handling the original cases it handles uses of the global address across blocks in function and folding an offset from LW\SW instruction. This pass won't run for OptNone compilation, so there will be a negative impact overall vs the old approach at O0. Reviewers: asb, apazos, mgrang Reviewed By: asb Subscribers: MartinMosbeck, brucehoult, the_o, rogfer01, mgorny, rbar, johnrusso, simoncook, niosHD, kito-cheng, shiva0217, zzheng, llvm-commits, edward-jones Differential Revision: https://reviews.llvm.org/D47857 llvm-svn: 335786	2018-06-27 20:51:42 +00:00
Fangrui Song	b0d57a535b	[X86] Fix unmatched parenthesis in r335768 llvm-svn: 335769	2018-06-27 19:12:07 +00:00
Craig Topper	6bea2c7f9b	[X86] Teach the disassembler to use %eiz/%riz instead of NoRegister when the SIB byte is present, but doesn't encode an index register and there was another shorter encoding that would achieve the same result. The %eiz/%riz are dummy registers that force the encoder to emit a SIB byte when it normally wouldn't. By emitting them in the disassembly output we ensure that assembling the disassembler output would also produce a SIB byte. This should match the behavior of objdump from binutils. llvm-svn: 335768	2018-06-27 19:03:36 +00:00
Daniel Sanders	bdeb880d14	[globalisel][legalizer] Add AtomicOrdering to LegalityQuery and use it in AArch64 Now that we have the ability to legalize based on MMO's. Add support for legalizing based on AtomicOrdering and use it to correct the legalization of the atomic instructions. Also extend all() to be a variadic template as this ruleset now requires 3 and 4 argument versions. llvm-svn: 335767	2018-06-27 19:03:21 +00:00
Jessica Paquette	f472f6159a	[MachineOutliner] Don't outline sequences where x16/x17/nzcv are live across It isn't safe to outline sequences of instructions where x16/x17/nzcv live across the sequence. This teaches the outliner to check whether or not a specific canidate has x16/x17/nzcv live across it and discard the candidate in the case that that is true. https://bugs.llvm.org/show_bug.cgi?id=37573 https://reviews.llvm.org/D47655 llvm-svn: 335758	2018-06-27 17:43:27 +00:00
Craig Topper	812fcb35e7	[X86] Use bts/btr/btc for single bit set/clear/complement of a variable bit position If we are just modifying a single bit at a variable bit position we can use the BT* instructions to make the change instead of shifting a 1(or rotating a -1) and doing a binop. These instruction also ignore the upper bits of their index input so we can also remove an and if one is present on the index. Fixes PR37938. llvm-svn: 335754	2018-06-27 16:47:39 +00:00
Craig Topper	31cbe75b3b	[X86] Rename the autoupgraded of packed fp compare and fpclass intrinsics that don't take a mask as input to exclude '.mask.' from their name. I think the intrinsics named 'avx512.mask.' should refer to the previous behavior of taking a mask argument in the intrinsic instead of using a 'select' or 'and' instruction in IR to accomplish the masking. This is more consistent with the goal that eventually we will have no intrinsics that have masking builtin. When we reach that goal, we should have no intrinsics named "avx512.mask". llvm-svn: 335744	2018-06-27 15:57:53 +00:00
Stanislav Mekhanoshin	1a1687f1bb	[AMDGPU] Convert rcp to rcp_iflag If a source of rcp instruction is a result of any conversion from an integer convert it into rcp_iflag instruction. No FP exception can ever happen except division by zero if a single precision rcp argument is a representation of an integral number. Differential Revision: https://reviews.llvm.org/D48569 llvm-svn: 335742	2018-06-27 15:33:33 +00:00
Luke Geeson	316327150b	[AArch64] Reverting FP16 vcvth_n_s64_f16 to fix llvm-svn: 335737	2018-06-27 14:34:40 +00:00
Adhemerval Zanella	cadcfed7aa	[AArch64] Add custom lowering for v4i8 trunc store This patch adds a custom trunc store lowering for v4i8 vector types. Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h) and default action for v4i8 is to extract each element and issue 4 byte stores. A better strategy would be to extended the promoted v4i16 to v8i16 (with undef elements) and extract and store the word lane which represents the v4i8 subvectores. The construction: define void @foo(<4 x i16> %x, i8* nocapture %p) { %0 = trunc <4 x i16> %x to <4 x i8> %1 = bitcast i8* %p to <4 x i8>* store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2 ret void } Can be optimized from: umov w8, v0.h[3] umov w9, v0.h[2] umov w10, v0.h[1] umov w11, v0.h[0] strb w8, [x0, #3] strb w9, [x0, #2] strb w10, [x0, #1] strb w11, [x0] ret To: xtn v0.8b, v0.8h str s0, [x0] ret The patch also adjust the memory cost for autovectorization, so the C code: void foo (const int src, int width, unsigned char dst) { for (int i = 0; i < width; i++) dst++ = src++; } can be vectorized to: .LBB0_4: // %vector.body // =>This Inner Loop Header: Depth=1 ldr q0, [x0], #16 subs x12, x12, #4 // =4 xtn v0.4h, v0.4s xtn v0.8b, v0.8h st1 { v0.s }[0], [x2], #4 b.ne .LBB0_4 Instead of byte operations. llvm-svn: 335735	2018-06-27 13:58:46 +00:00
Ivan A. Kosarev	7231598fce	[NEON] Support vldNq intrinsics in AArch32 (LLVM part) This patch adds support for the q versions of the dup (load-to-all-lanes) NEON intrinsics, such as vld2q_dup_f16() for example. Currently, non-q versions of the dup intrinsics are implemented in clang by generating IR that first loads the elements of the structure into the first lane with the lane (to-single-lane) intrinsics, and then propagating it other lanes. There are at least two problems with this approach. First, there are no double-spaced to-single-lane byte-element instructions. For example, there is no such instruction as 'vld2.8 { d0[0], d2[0] }, [r0]'. That means we cannot rely on the to-single-lane intrinsics and instructions to implement the q versions of the dup intrinsics. Note that to-all-lanes instructions do support all sizes of data items, including bytes. The second problem with the current approach is that we need a separate vdup instruction to propagate the structure to each lane. So for vld4q_dup_f16() we would need four vdup instructions in addition to the initial vld instruction. This patch introduces dup LLVM intrinsics and reworks handling of the currently supported (non-q) NEON dup intrinsics to expand them into those LLVM intrinsics, thus eliminating the need for using to-single-lane intrinsics and instructions. Additionally, this patch adds support for u64 and s64 dup NEON intrinsics. These are marked as Arch64-only in the ARM NEON Reference, but it seems there are no reasons to not support them in AArch32 mode. Please correct, if that is wrong. That's what we generate with this patch applied: vld2q_dup_f16: vld2.16 {d0[], d2[]}, [r0] vld2.16 {d1[], d3[]}, [r0] vld3q_dup_f16: vld3.16 {d0[], d2[], d4[]}, [r0] vld3.16 {d1[], d3[], d5[]}, [r0] vld4q_dup_f16: vld4.16 {d0[], d2[], d4[], d6[]}, [r0] vld4.16 {d1[], d3[], d5[], d7[]}, [r0] Differential Revision: https://reviews.llvm.org/D48439 llvm-svn: 335733	2018-06-27 13:57:52 +00:00
Luke Geeson	68cb233c0f	[AArch64] Remove Duplicate FP16 Patterns with same encoding, match on existing patterns llvm-svn: 335715	2018-06-27 09:20:13 +00:00
Konstantin Zhuravlyov	30f03b3bc0	AMDGPU/NFC: Fix typo in comment llvm-svn: 335707	2018-06-27 05:36:03 +00:00
Craig Topper	33aba0eb4c	[X86] Don't store register and memory FMA3 opcodes in the same X86InstrFMA3Group. Nothing was using this relationship. By splitting them we no longer need to worry about register or memory entries being empty in a group. The memory folding tables in X86InstrInfo.cpp can be used to access this relationship if needed. llvm-svn: 335694	2018-06-27 00:42:24 +00:00
Konstantin Zhuravlyov	777477705a	AMDGPU: Silence unused warnings in waitcnt insertion pass in release build Differential Revision: https://reviews.llvm.org/D48607 llvm-svn: 335669	2018-06-26 21:33:38 +00:00
Jessica Paquette	67599c2e1e	[X86][AsmParser] Recommit r335658 Recommit of r335658 so that it does not change the behaviour of any existing error output. llvm-svn: 335668	2018-06-26 21:30:34 +00:00
Jessica Paquette	0a80af0761	Revert "[X86][AsmParser] Emit an error when RIP-relative instructions are used in 32-bit mode" This reverts commit 4850a9aae8b38c7deadc103d634ec7397e6c323b. It caused MC/X86/x86_errors.s to fail. Will fix and recommit shortly. llvm-svn: 335660	2018-06-26 20:57:19 +00:00
Jessica Paquette	0e40d4bfc3	[X86][AsmParser] Emit an error when RIP-relative instructions are used in 32-bit mode Right now, when we use RIP-relative instructions in 32-bit mode, we'll just assert and crash. This adds an error message which tells the user that they can't do that in 32-bit mode, so that we don't crash (and also can see the issue outside of assert builds). llvm-svn: 335658	2018-06-26 20:33:46 +00:00
Stanislav Mekhanoshin	dacda79ee6	[AMDGPU] Add llvm.amdgcn.fmad.ftz intrinsic This intrinsic selects v_mad_f32 regardless of fp32 denorm support. Differential Revision: https://reviews.llvm.org/D48573 llvm-svn: 335654	2018-06-26 20:04:19 +00:00
Matt Arsenault	8c4a35237a	AMDGPU: Add pass to lower kernel arguments to loads This replaces most argument uses with loads, but for now not all. The code in SelectionDAG for calling convention lowering is actively harmful for amdgpu_kernel. It attempts to split the argument types into register legal types, which results in low quality code for arbitary types. Since all kernel arguments are passed in memory, we just want the raw types. I've tried a couple of methods of mitigating this in SelectionDAG, but it's easier to just bypass this problem alltogether. It's possible to hack around the problem in the initial lowering, but the real problem is the DAG then expects to be able to use CopyToReg/CopyFromReg for uses of the arguments outside the block. Exposing the argument loads in the IR also has the advantage that the LoadStoreVectorizer can merge them. I'm not sure the best approach to dealing with the IR argument list is. The patch as-is just leaves the IR arguments in place, so all the existing code will still compute the same kernarg size and pointlessly lowers the arguments. Arguably the frontend should emit kernels with an empty argument list in the first place. Alternatively a dummy array could be inserted as a single argument just to reserve space. This does have some disadvantages. Local pointer kernel arguments can no longer have AssertZext placed on them as the equivalent !range metadata is not valid on pointer typed loads. This is mostly bad for SI which needs to know about the known bits in order to use the DS instruction offset, so in this case this is not done. More importantly, this skips noalias arguments since this pass does not yet convert this to the equivalent !alias.scope and !noalias metadata. Producing this metadata correctly seems to be tricky, although this logically is the same as inlining into a function which doesn't exist. Additionally, exposing these loads to the vectorizer may result in degraded aliasing information if a pointer load is merged with another argument load. I'm also not entirely sure this is preserving the current clover ABI, although I would greatly prefer if it would stop widening arguments and match the HSA ABI. As-is I think it is extending < 4-byte arguments to 4-bytes but doesn't align them to 4-bytes. llvm-svn: 335650	2018-06-26 19:10:00 +00:00
Brendon Cahoon	b7169c435a	[Hexagon] Add a "generic" cpu Add the generic processor for Hexagon so that it can be used with 3rd party programs that create a back-end with the "generic" CPU. This patch also enables the JIT for Hexagon. Differential Revision: https://reviews.llvm.org/D48571 llvm-svn: 335641	2018-06-26 18:44:05 +00:00
Simon Pilgrim	aa2bf2be31	[TargetLowering] isVectorClearMaskLegal - use ArrayRef<int> instead of const SmallVectorImpl<int>& This is more generic and matches isShuffleMaskLegal. Differential Revision: https://reviews.llvm.org/D48591 llvm-svn: 335605	2018-06-26 14:15:31 +00:00
Than McIntosh	3190993a02	[X86,ARM] Retain split-stack prolog check for sibling calls Summary: If a routine with no stack frame makes a sibling call, we need to preserve the stack space check even if the local stack frame is empty, since the call target could be a "no-split" function (in which case the linker needs to be able to fix up the prolog sequence in order to switch to a larger stack). This fixes PR37807. Reviewers: cherry, javed.absar Subscribers: srhines, llvm-commits Differential Revision: https://reviews.llvm.org/D48444 llvm-svn: 335604	2018-06-26 14:11:30 +00:00
Tim Northover	b73efb85ba	ARM: correctly decode VFP instructions following unpredictable t2IT When the condition code for an IT instruction is "AL" we get strange "15" predicates on subsequent instructions. These are dealt with for most instructions by treating them as "ARMCC::AL", but VFP takes a different path which didn't have this code. llvm-svn: 335594	2018-06-26 11:39:20 +00:00
Tim Northover	bf54858115	ARM: diagnose unpredictable IT instructions IT instructions are allowed to have the 'AL' predicate, but it must never result in an 'NV' predicated instruction. Essentially this means that all branches must be 't' rather than 'e' if the predicate is 'AL'. This patch adds a diagnostic for this during assembly (error because parsing hits an assertion if allowed to continue) and an annotation during disassembly. llvm-svn: 335593	2018-06-26 11:38:41 +00:00
Simon Pilgrim	bfaa09220b	[X86] Just use ArrayRef instead of SmallVectorImpl in a few static method arguments. NFCI. llvm-svn: 335590	2018-06-26 10:45:41 +00:00
Craig Topper	08dae1682d	[X86] Don't use getScalarShiftAmountTy to get the immediate type for target specific VSHLDQ/VSRLDQ nodes. These opcodes have a fixed type of i8 for their immediate and shouldn't have anything to do with the scalar shift amount used by target independent shift nodes. llvm-svn: 335578	2018-06-26 04:53:42 +00:00
Dan Gohman	910ba33d0c	[WebAssembly] Fix lowering of varargs functions with non-legal fixed arguments. CallLoweringInfo's NumFixedArgs field gives the number of fixed arguments before legalization. The ISD::OutputArg "Outs" array holds legalized arguments, so when indexing into it to find the non-fixed arguemn, we need to use the number of arguments after legalization. Fixes PR37934. llvm-svn: 335576	2018-06-26 03:18:38 +00:00
Craig Topper	c42ed4e3c4	[X86] Use XOR for SUB (C, X) during isel if will help fold an immediate Summary: Same idea as D48529, but restricted to X86 and done very late to avoid any surprises where subtract might be better for DAG combining. This seems like the safest way to do this trick. And we consider doing it as a DAG combine later. Reviewers: spatel, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D48557 llvm-svn: 335575	2018-06-26 03:11:15 +00:00
Dan Gohman	fd2f7aeb12	[WebAssembly] Fix a typo in a comment. llvm-svn: 335574	2018-06-26 03:03:41 +00:00
Craig Topper	689e363ff2	[X86] Redefine avx512 packed fpclass intrinsics to return a vXi1 mask and implement the mask input argument using an 'and' IR instruction. This recommits r335562 and 335563 as a single commit. The frontend will surround the intrinsic with the appropriate marshalling to/from a scalar type to match the sigature of the builtin that software expects. By exposing the vXi1 type directly in the llvm intrinsic we make it available to optimizers much earlier. This can enable the scalar marshalling code to be optimized away. llvm-svn: 335568	2018-06-26 01:37:02 +00:00
Craig Topper	6f4fdfa9af	Revert r335562 and 335563 "[X86] Redefine avx512 packed fpclass intrinsics to return a vXi1 mask and implement the mask input argument using an 'and' IR instruction." These were supposed to have been squashed to a single commit. llvm-svn: 335566	2018-06-26 01:31:53 +00:00
Craig Topper	9b4322ce31	foo llvm-svn: 335562	2018-06-26 00:43:34 +00:00
Craig Topper	913abc8b58	[X86] Simplify intrinsic table binary search to not require a temporary struct. std::lower_bound doesn't require the thing to search for to be the same type as the table entries. We just need to define an appropriate comparison function that can take an table entry and an intrinsic number. llvm-svn: 335518	2018-06-25 20:27:46 +00:00
Craig Topper	614f192471	[X86] Add comment about the sorting of the memory folding tables added in r335501. llvm-svn: 335517	2018-06-25 20:11:16 +00:00
Lei Huang	5d109ee3d4	[PowerPC] Fix incorrectly encoded wait instruction Encoding for the wait instruction was wrong. Fix according to ISA 3.0. Differential Revision: https://reviews.llvm.org/D48550 llvm-svn: 335514	2018-06-25 19:28:27 +00:00
Reid Kleckner	88fee5fdbc	Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code models" The large code model allows code and data segments to exceed 2GB, which means that some symbol references may require a displacement that cannot be encoded as a displacement from RIP. The large PIC model even relaxes the assumption that the GOT itself is within 2GB of all code. Therefore, we need a special code sequence to materialize it: .LtmpN: leaq .LtmpN(%rip), %rbx movabsq $_GLOBAL_OFFSET_TABLE_-.LtmpN, %rax # Scratch addq %rax, %rbx # GOT base reg From that, non-local references go through the GOT base register instead of being PC-relative loads. Local references typically use GOTOFF symbols, like this: movq extern_gv@GOT(%rbx), %rax movq local_gv@GOTOFF(%rbx), %rax All calls end up being indirect: movabsq $local_fn@GOTOFF, %rax addq %rbx, %rax callq *%rax The medium code model retains the assumption that the code segment is less than 2GB, so calls are once again direct, and the RIP-relative loads can be used to access the GOT. Materializing the GOT is easy: leaq _GLOBAL_OFFSET_TABLE_(%rip), %rbx # GOT base reg DSO local data accesses will use it: movq local_gv@GOTOFF(%rbx), %rax Non-local data accesses will use RIP-relative addressing, which means we may not always need to materialize the GOT base: movq extern_gv@GOTPCREL(%rip), %rax Direct calls are basically the same as they are in the small code model: They use direct, PC-relative addressing, and the PLT is used for calls to non-local functions. This patch adds reasonably comprehensive testing of LEA, but there are lots of interesting folding opportunities that are unimplemented. I restricted the MCJIT/eh-lg-pic.ll test to Linux, since the large PIC code model is not implemented for MachO yet. Differential Revision: https://reviews.llvm.org/D47211 llvm-svn: 335508	2018-06-25 18:16:27 +00:00
Craig Topper	3cc6cb1d35	[X86] Sort the static memory folding tables by reg opcode. Remove the reg->mem DenseMaps in favor of binary search. With the static tables sorted we can binary search them directly for reg->mem lookups. This removes 6 DenseMaps that had to be created when X86InstrInfo is constructed. We still have one Mem->Reg DenseMap for the reverse direction. This is created just as before by walking the reg->mem arrays to populate it. Differential Revision: https://reviews.llvm.org/D48527 llvm-svn: 335501	2018-06-25 17:26:56 +00:00
Craig Topper	b9cb88a4b0	[X86] Allow base and index for gather instructions to appear in other order for Intel syntax. llvm-svn: 335500	2018-06-25 17:26:51 +00:00
Alexander Richardson	85e200e934	Add Triple::isMIPS()/isMIPS32()/isMIPS64(). NFC There are quite a few if statements that enumerate all these cases. It gets even worse in our fork of LLVM where we also have a Triple::cheri (which is mips64 + CHERI instructions) and we had to update all if statements that check for Triple::mips64 to also handle Triple::cheri. This patch helps to reduce our diff to upstream and should also make some checks more readable. Reviewed By: atanasyan Differential Revision: https://reviews.llvm.org/D48548 llvm-svn: 335493	2018-06-25 16:49:20 +00:00
Matt Arsenault	b1cc4f52ff	AMDGPU/GlobalISel: Add support for llvm.amdgcn.kernarg.segment.ptr Note a normal select test is not currently possible because this relies on input registers tracked in SIMachineFunctionInfo which are not currently serializable in MIR, but this does work end-to-end from the IR. llvm-svn: 335490	2018-06-25 16:17:48 +00:00
Matt Arsenault	2811a20f77	AMDGPU: Remove commented out code llvm-svn: 335486	2018-06-25 15:42:20 +00:00
Matt Arsenault	b3feccd7fa	AMDGPU/GlobalISel: Fix G_IMPLICIT_DEF for pointers llvm-svn: 335485	2018-06-25 15:42:12 +00:00
Matt Arsenault	73eeb42e50	AMDGPU: Respect align argument parameter This should avoid relying on the pointee type to get the alignment, particularly since pointee types are supposed to be removed at some point. Also fixes not getting the alignment for unsized types. llvm-svn: 335478	2018-06-25 14:29:04 +00:00
Craig Topper	facea6b4a6	[X86] Block commuting operand 1 of FMA*_Int instructions in findThreeSrcCommutedOpIndices. Remove uncommutable returns from getThreeSrcCommuteCase/getFMA3OpcodeToCommuteOperands. We should be blocking the operand while we are in the routine that tries to find commutable operand indices. Doing it later means we might have missed out on another valid set of operands we could have commuted. The intrinsic case was the only case that could really prevent commuting in getFMA3OpcodeToCommuteOperands. All the other cases in getThreeSrcCommuteCase were not reachable conditions as they were protected by findThreeSrcCommutedOpIndices. With that abort case pushed earlier, we can remove all the abort checks and replace with asserts. llvm-svn: 335446	2018-06-25 06:05:37 +00:00
Heejin Ahn	04c4894911	[WebAssembly] Add WebAssemblyException information analysis Summary: A WebAssemblyException object contains BBs that belong to a 'catch' part of the try-catch-end structure. Because CFGSort requires all the BBs within a catch part to be sorted together as it does for loops, this pass calculates the nesting structure of catch part of exceptions in a function. Now this assumes the use of Windows EH instructions. Reviewers: dschuff, majnemer Subscribers: jfb, mgorny, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D44134 llvm-svn: 335439	2018-06-25 01:20:21 +00:00
Heejin Ahn	4934f76b58	[WebAssembly] Add WebAssemblyLateEHPrepare pass Summary: Add WebAssemblyLateEHPrepare pass that does several small jobs for exception handling. This runs before CFGSort, and is different from WasmEHPrepare pass that runs before ISel, even though the names are similar. Reviewers: dschuff, majnemer Subscribers: sbc100, jgravelle-google, sunfish, mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D46803 llvm-svn: 335438	2018-06-25 01:07:11 +00:00
Craig Topper	3b18bdc46d	[X86] Simplify some code by using isOneConstant. NFC llvm-svn: 335437	2018-06-25 01:01:47 +00:00
Craig Topper	4331d6218d	[X86] Remove the changes to combineScalarToVector made in r335037. They appear to be untested other than the test case for p37879.ll and I believe we should be using SimplifyDemandedElts here to handle these cases. llvm-svn: 335436	2018-06-25 00:21:53 +00:00
Craig Topper	ecf7c5b75f	[X86] Reduce the number of patterns needed for masked scalar ceil/floor isel. The scalar to vector on the mask register should not be part of the patterns. llvm-svn: 335435	2018-06-25 00:05:09 +00:00
Brad Smith	df1f50579f	[mips][ias] Enable IAS by default for OpenBSD / FreeBSD mips64/mips64el. Reviewers: atanasyan Differential Review: https://reviews.llvm.org/D31557 llvm-svn: 335434	2018-06-24 15:44:47 +00:00
Craig Topper	03523f6741	[X86] Regroup some isel patterns. NFC For some reason the 64-bit patterns were separated from their 8/16/32-bit friends, but only for add/sub/mul. For and/or/xor they were together. llvm-svn: 335429	2018-06-24 06:56:49 +00:00
Craig Topper	19772c89c7	[X86] Rename VFPCLASSSS and VFPCLASSSD internal instruction names to include a Z to match other EVEX instructions. llvm-svn: 335428	2018-06-24 06:29:50 +00:00
Craig Topper	d8d64a56b5	[X86] Make %eiz usage in 64-bit mode, force a 0x67 address size prefix. Fix some test CHECK lines. llvm-svn: 335414	2018-06-23 06:15:04 +00:00
Craig Topper	2545529034	[X86] Teach disassembler to use %eip instead of %rip when 0x67 prefix is used on a rip-relative address. llvm-svn: 335413	2018-06-23 06:03:48 +00:00
Craig Topper	68d64e3859	[X86][AsmParser] Improve base/index register checks. -Ensure EIP isn't used with an index reigster. -Ensure EIP isn't used as index register. -Ensure base register isn't a vector register. -Ensure eiz/riz usage matches the size of their base register. llvm-svn: 335412	2018-06-23 05:53:00 +00:00
Reid Kleckner	fd7c9ab971	[AMDGPU] Update includes for intrinsic changes :( llvm-svn: 335409	2018-06-23 03:05:39 +00:00
Reid Kleckner	f5890e4e43	[IR] Split Intrinsics.inc into enums and implementations Implements PR34259 Intrinsics.h is a very popular header. Most LLVM TUs care about things like dbg_value, but they don't care how they are implemented. After I split these out, IntrinsicImpl.inc is 1.7 MB, so this saves each LLVM TU from scanning 1.7 MB of source that gets pre-processed away. It also means we can modify intrinsic properties without triggering a full rebuild, but that's probably less of a win. I think the next best thing to do would be to split out the target intrinsics into their own header. Very, very few TUs care about target-specific intrinsics. It's very hard to split up the target independent intrinsics like llvm.expect, assume, and dbg.value, though. llvm-svn: 335407	2018-06-23 02:02:38 +00:00
Craig Topper	abdbb2c67a	[X86][AsmParser] Rework that allows (%dx) to be used in place of %dx with in/out instructions. Previously, to support (%dx) we left a wide open hole in our 16-bit memory address checking. This let this address value be used with any instruction without error in the parser. It would later fail in the encoder with an assertion failure on debug builds and who knows what on release builds. This patch passes the mnemonic down to the memory operand parsing function so we can allow the (%dx) form only on specific instructions. llvm-svn: 335403	2018-06-23 00:03:20 +00:00
Craig Topper	10e2f73793	[X86][AsmParser] Keep track of whether an explicit scale was specified while parsing an address in Intel syntax. Use it for improved error checking. This allows us to check these: -16-bit addressing doesn't support scale so we should error if we find one there. -Multiplying ESP/RSP by a scale even if the scale is 1 should be an error because ESP/RSP can't be an index. llvm-svn: 335398	2018-06-22 22:28:39 +00:00
Craig Topper	1d707539e4	[X86][AsmParser] In Intel syntax make sure we support ESP/RSP being the second register in memory expressions like [EAX+ESP]. By default, the second register gets assigned to the index register slot. But ESP can't be an index register so we need to swap it with the other register. There's still a slight bug that we allow [EAX+ESP*1]. The existence of the multiply even though its with 1 should force ESP to the index register and trigger an error, but it doesn't currently. llvm-svn: 335394	2018-06-22 21:57:24 +00:00
Craig Topper	9bc2c059c3	[X86] Don't accept (%si,%bp) 16-bit address expressions. The second register is the index register and should only be %si or %di if used with a base register. And in that case the base register should be %bp or %bx. This makes us compatible with gas. We do still need to support both orders with Intel syntax which uses [bp+si] and [si+bp] llvm-svn: 335384	2018-06-22 20:20:38 +00:00
Craig Topper	c26c62e0e5	[X86][AsmParser] Allow (%bp,%si) and (%bp,%di) to be encoded without using a zero displacement. (%bp) can't be encoded without a displacement. The encoding is instead used for displacement alone. So a 1 byte displacement of 0 must be used. But if there is an index register we can encode without a displacement. llvm-svn: 335379	2018-06-22 19:42:21 +00:00
Craig Topper	cd18bb523c	[X86][AsmParser] Check for invalid 16-bit base register in Intel syntax. llvm-svn: 335373	2018-06-22 17:50:40 +00:00
Craig Topper	22d1db122a	[X86] Don't allow ESP/RSP to be used as an index register in assembly. Fixes PR37892 llvm-svn: 335370	2018-06-22 17:15:58 +00:00
Sjoerd Meijer	1043dffbd3	Recommit of r335326, with the test fixed that I missed. llvm-svn: 335331	2018-06-22 10:03:03 +00:00
Simon Pilgrim	9c8f9374b5	[CostModel][AArch64] Add some initial costs for SK_Select and SK_PermuteSingleSrc AArch64 was only setting costs for SK_Transpose, which meant that many of the simpler shuffles (e.g. SK_Select and SK_PermuteSingleSrc for larger vector elements) was being severely overestimated by the default shuffle expansion. This patch adds costs to help improve SLP performance and avoid a regression in reductions introduced by D48174. I'm not very knowledgeable about AArch64 shuffle lowering so I've kept the extra costs to a minimum - someone who knows this code can add extra costs which should improve vectorization a lot more. Differential Revision: https://reviews.llvm.org/D48172 llvm-svn: 335329	2018-06-22 09:45:31 +00:00
Sjoerd Meijer	7ee5b090de	Reverting r335326 while I look at the test failure llvm-svn: 335328	2018-06-22 09:17:08 +00:00
Sjoerd Meijer	8d2f1565b7	[ARM] ARMv6m and v8m.baseline strict align This sets target feature FeatureStrictAlign for Armv6-m and Armv8-m.baseline, because it has no support for unaligned accesses. It looks like we always pass target feature "+strict-align" from Clang, so this is not a user facing problem, but querying the subtarget (in e.g. llc) for unaligned access support is incorrect. Differential Revision: https://reviews.llvm.org/D48437 llvm-svn: 335326	2018-06-22 08:48:13 +00:00
Matt Arsenault	3f8e7a3dbc	AMDGPU: Add patterns for i32/i64 local atomic load/store Not sure why the 32/64 split is needed in the atomic_load store hierarchies. The regular PatFrags do this, but we don't do it for the existing handling for global. llvm-svn: 335325	2018-06-22 08:39:52 +00:00
Mikhail Dvoretckii	0963562083	[X86] Changing the check for valid inputs in combineScalarToVector Changing the logic of scalar mask folding to check for valid input types rather than against invalid ones, making it more robust and fixing PR37879. Differential Revision: https://reviews.llvm.org/D48366 llvm-svn: 335323	2018-06-22 08:28:05 +00:00
Tom Stellard	6af7307650	AMDGPU/GlobalISel: Default to using TableGen'd instruction selector Summary: We can select all instructions that are marked as legal in a full piglit run, so now is a good time to make the TableGen'd instruction selector default for all opcodes. This is NFC for a full piglit run, which is why there are no tests. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48198 llvm-svn: 335319	2018-06-22 03:04:35 +00:00
Tom Stellard	26fac0f8e1	AMDGPU/GlobalISel: legalize and select 32-bit G_ASHR Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D48196 llvm-svn: 335318	2018-06-22 02:54:57 +00:00
Tom Stellard	9a6535718e	AMDGPU/GlobalISel: legalize and select 32-bit G_SITOFP Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48195 llvm-svn: 335316	2018-06-22 02:34:29 +00:00
Tom Stellard	7712ee8891	AMDGPU/GlobalISel: Implement select() for COPY Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46151 llvm-svn: 335315	2018-06-22 00:44:29 +00:00
Tom Stellard	3f1c6fe156	AMDGPU/GlobalISel: Implement select() for G_IMPLICIT_DEF Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46150 llvm-svn: 335307	2018-06-21 23:38:20 +00:00
Reid Kleckner	3a2fd1c2f3	Revert r335297 "[X86] Implement more of x86-64 large and medium PIC code models" MCJIT can't handle R_X86_64_GOT64 yet. llvm-svn: 335300	2018-06-21 22:19:05 +00:00
Reid Kleckner	3286a6c896	[X86] Commit some comments that weren't in the medium code model patch llvm-svn: 335298	2018-06-21 21:57:44 +00:00
Reid Kleckner	247fe6aeab	[X86] Implement more of x86-64 large and medium PIC code models Summary: The large code model allows code and data segments to exceed 2GB, which means that some symbol references may require a displacement that cannot be encoded as a displacement from RIP. The large PIC model even relaxes the assumption that the GOT itself is within 2GB of all code. Therefore, we need a special code sequence to materialize it: .LtmpN: leaq .LtmpN(%rip), %rbx movabsq $_GLOBAL_OFFSET_TABLE_-.LtmpN, %rax # Scratch addq %rax, %rbx # GOT base reg From that, non-local references go through the GOT base register instead of being PC-relative loads. Local references typically use GOTOFF symbols, like this: movq extern_gv@GOT(%rbx), %rax movq local_gv@GOTOFF(%rbx), %rax All calls end up being indirect: movabsq $local_fn@GOTOFF, %rax addq %rbx, %rax callq *%rax The medium code model retains the assumption that the code segment is less than 2GB, so calls are once again direct, and the RIP-relative loads can be used to access the GOT. Materializing the GOT is easy: leaq _GLOBAL_OFFSET_TABLE_(%rip), %rbx # GOT base reg DSO local data accesses will use it: movq local_gv@GOTOFF(%rbx), %rax Non-local data accesses will use RIP-relative addressing, which means we may not always need to materialize the GOT base: movq extern_gv@GOTPCREL(%rip), %rax Direct calls are basically the same as they are in the small code model: They use direct, PC-relative addressing, and the PLT is used for calls to non-local functions. This patch adds reasonably comprehensive testing of LEA, but there are lots of interesting folding opportunities that are unimplemented. Reviewers: chandlerc, echristo Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D47211 llvm-svn: 335297	2018-06-21 21:55:08 +00:00
Konstantin Zhuravlyov	e004b3d97b	AMDGPU: Remove ability to reserve VGPRs for debugger Differential Revision: https://reviews.llvm.org/D48234 llvm-svn: 335288	2018-06-21 20:28:19 +00:00
Scott Linder	1e8c2c705d	[AMDGPU] Update assembler for HSA Code Object v3 Update AMDGPU assembler syntax behind the code-object-v3 feature: * Replace/rename most AMDGPU assembler directives/symbols and document them. * Provide more diagnostics (e.g. values out of range, missing values, repeated values). * Provide path for backwards compatibility, even with underlying descriptor changes. Differential Revision: https://reviews.llvm.org/D47736 llvm-svn: 335281	2018-06-21 19:38:56 +00:00
Simon Dardis	3505045b42	[mips] Modify comment to test new email address (NFC). llvm-svn: 335269	2018-06-21 18:52:32 +00:00
Scott Linder	5792dd0f39	[AMDGPU] Fix bug with tracking processed blocks in SIInsertWaitcnts BlockWaitcntProcessedSet was not being cleared between calls, so it was producing incorrect counts in cases where MBB addresses happened to coincide across multiple calls. Differential Revision: https://reviews.llvm.org/D48391 llvm-svn: 335268	2018-06-21 18:48:48 +00:00
Konstantin Zhuravlyov	766c77efd7	AMDGPU/AMDHSA: Remove GridWorkGroupCountX/Y/Z and everything that comes with it from implementation and v3 header files. Leave definition in v2 header files for backwards compatibility. Differential Revision: https://reviews.llvm.org/D48191 llvm-svn: 335267	2018-06-21 18:36:04 +00:00
Sirish Pande	b60acb9e48	Revert "[AArch64] Coalesce Copy Zero during instruction selection" This reverts commit d8f57105010cc7e78026e511d5def873fc91e0e7. Original Commit: Author: Haicheng Wu <haicheng@codeaurora.org> Date: Sun Feb 18 13:51:33 2018 +0000 [AArch64] Coalesce Copy Zero during instruction selection Add special case for copy of zero to avoid a double copy. Differential Revision: https://reviews.llvm.org/D36104 Author's intention is to remove a BB that has one mov instruction. In order to do that, d8f571050 pessmizes MachineSinking by introducing a copy, such that mov instruction is NOT moved to the BB. Optimization downstream gets rid of the BB with only mov instruction. This works well if we have only one fall through branch as there is only one "extra" mov instruction. If we have multiple fall throughs, we will have a lot of redundant movs. In such a case, it's better to have this BB which has one mov instruction. This is causing degradation in jpeg, fft and other codebases. I believe if we want to remove a BB with only one branch instruction, we should not pessimize Machine Sinking at all, and find some other solution. llvm-svn: 335251	2018-06-21 16:05:24 +00:00
David Green	21a2973cc4	[ARM] Enable useAA() for the in-order Cortex-R52 This option allows codegen (such as DAGCombine or MI scheduling) to use alias analysis information, which can help with the codegen on in-order cpu's, especially machine scheduling. Here I have done things the same way as AArch64, adding a subtarget feature to enable this for specific cores, and enabled it for the R52 where we have a schedule to make use of it. Differential Revision: https://reviews.llvm.org/D48074 llvm-svn: 335249	2018-06-21 15:48:29 +00:00
Sameer AbuAsal	e01e711c64	[RISCV] Tail calls don't need to save return address Summary: When expanding the PseudoTail in expandFunctionCall() we were using X6 to save the return address. Since this is a tail call the return address is not needed, this patch replaces it with X0 to be ignored. This matches the behaviour listed in the ISA V2.2 document page 110. tail offset -----> jalr x0, x6, offset GCC exhibits the same behavior. Reviewers: apazos, asb, mgrang Reviewed By: asb Subscribers: rbar, johnrusso, simoncook, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, rogfer01 Differential Revision: https://reviews.llvm.org/D48343 llvm-svn: 335239	2018-06-21 14:37:09 +00:00
Mikhail Dvoretckii	22c82af5c8	[x86] Lower some trunc + shuffle patterns to vpmov[q\|d][b\|w] This should help in lowering the following four intrinsics: _mm256_cvtepi32_epi8 _mm256_cvtepi64_epi16 _mm256_cvtepi64_epi8 _mm512_cvtepi64_epi8 Differential Revision: https://reviews.llvm.org/D46957 llvm-svn: 335238	2018-06-21 14:16:45 +00:00
Nicolai Haehnle	15745ba5c1	AMDGPU: Remove redundant MIMG instruction variants Summary: For sample and gather ops, we can accurately determine the set of vaddr-size instruction variants that are required. This reduces the size of instruction tables by ~5%. The number of machine instruction opcodes is reduced from 10002 to 9476. Change-Id: Ie7fc65d3657b762c7816017fe70b2e9bec644a8a Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D48168 llvm-svn: 335232	2018-06-21 13:37:55 +00:00
Nicolai Haehnle	db6911a6f9	AMDGPU: Remove old-style image intrinsics Summary: This also removes the need for atomic pseudo instructions, since we select the correct encoding directly in SITargetLowering::lowerImage for dimension-aware image intrinsics. Mesa uses dimension-aware image intrinsics since commit a9a7993441. Change-Id: I7473d20009476a4ed6d919cae4e6dca9ff42e77a Reviewers: arsenm, rampitec, mareko, tpr, b-sumner Subscribers: kzhuravl, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48167 llvm-svn: 335231	2018-06-21 13:37:45 +00:00
Nicolai Haehnle	7a9c03f484	AMDGPU: Select MIMG instructions manually in SITargetLowering Summary: Having TableGen patterns for image intrinsics is hitting limitations: for D16 we already have to manually pre-lower the packing of data values, and we will have to do the same for A16 eventually. Since there is already some custom C++ code anyway, it is arguably easier to just do everything in C++, now that we can use the beefed-up generic tables backend of TableGen to provide all the required metadata and map intrinsics to corresponding opcodes. With this approach, all image intrinsic lowering happens in SITargetLowering::lowerImage. That code is dense due to all the cases that it handles, but it should still be easier to follow than what we had before, by virtue of it all being done in a single location, and by virtue of not relying on the TableGen pattern magic that very few people really understand. This means that we will have MachineSDNodes with MIMG instructions during DAG combining, but that seems alright: previously we had intrinsic nodes instead, but those are similarly opaque to the generic CodeGen infrastructure, and the final pattern matching just did a 1:1 translation to machine instructions anyway. If anything, the fact that we now merge the address words into a vector before DAG combine should be an advantage. Change-Id: I417f26bd88f54ce9781c1668acc01f3f99774de6 Reviewers: arsenm, rampitec, rtaylor, tstellar Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48017 llvm-svn: 335228	2018-06-21 13:36:57 +00:00
Nicolai Haehnle	0ab200b6c9	AMDGPU: Refactor MIMG instruction TableGen using generic tables Summary: This allows us to access rich information about MIMG opcodes from C++ code. Simplifying the mapping between equivalent opcodes of different data size becomes quite natural. This also flattens the MIMG-related class and multiclass hierarchy a little, and collapses together some of the scaffolding for sample and gather4 opcodes. Change-Id: I1a2549fdc1e881ff100e5393d2d87e73729a0ccd Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48016 llvm-svn: 335227	2018-06-21 13:36:44 +00:00
Nicolai Haehnle	e741d7e0fd	AMDGPU: Use generic tables instead of SearchableTable Summary: Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48014 Change-Id: Ibb43f90d955275571aff17d0c3ecfb5e5b299641 llvm-svn: 335226	2018-06-21 13:36:33 +00:00
Nicolai Haehnle	2367f03565	AMDGPU: Pass AMDGPUSampleVariant to MIMG_{Sampler,Gather}(_WQM) Summary: This will allows us to provide rich metadata about the instructions in tables that are accessible by custom C++ code. Change-Id: Id9305a26304ab6a6cceb6c65c8cd49141cc0101d Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48011 llvm-svn: 335224	2018-06-21 13:36:13 +00:00
Nicolai Haehnle	b3a9b68513	AMDGPU: Add implicit def of SCC to kill and indirect pseudos Summary: Kill instructions sometimes do use SCC in unusual circumstances, when v_cmpx cannot be used due to the operands that are involved. Additionally, even if SCC was never defined by the expansion, kill pseudos could previously occur between an s_cmp and an s_cbranch_scc, which breaks the SCC liveness tracking when the pseudo is expanded to split the basic block. While it would be possible to explicitly mark the SCC as live-in for the successor basic block, it's simpler to just mark the pseudo as using SCC, so that such a sequence is never emitted by instruction selection in the first place. A similar issue affects indirect source/dest pseudos in principle, although I haven't been able to come up with a test case where it actually matters (this affects instruction selection, so a MIR test can't be used). Fixes: dEQP-GLES3.functional.shaders.discard.dynamic_loop_always Change-Id: Ica8d82ecff1a763b892a1112cf1b06c948863a4f Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47761 llvm-svn: 335223	2018-06-21 13:36:08 +00:00
Nicolai Haehnle	f267431901	AMDGPU: Turn D16 for MIMG instructions into a regular operand Summary: This allows us to reduce the number of different machine instruction opcodes, which reduces the table sizes and helps flatten the TableGen multiclass hierarchies. We can do this because for each hardware MIMG opcode, we have a full set of IMAGE_xxx_Vn_Vm machine instructions for all required sizes of vdata and vaddr registers. Instead of having separate D16 machine instructions, a packed D16 instructions loading e.g. 4 components can simply use the same V2 opcode variant that non-D16 instructions use. We still require a TSFlag for D16 buffer instructions, because the D16-ness of buffer instructions is part of the opcode. Renaming the flag should help avoid future confusion. The one non-obvious code change is that for gather4 instructions, the disassembler can no longer automatically decide whether to use a V2 or a V4 variant. The existing logic which choose the correct variant for other MIMG instruction is extended to cover gather4 as well. As a bonus, some of the assembler error messages are now more helpful (e.g., complaining about a wrong data size instead of a non-existing instruction). While we're at it, delete a whole bunch of dead legacy TableGen code. Change-Id: I89b02c2841c06f95e662541433e597f5d4553978 Reviewers: arsenm, rampitec, kzhuravl, artem.tamazov, dp, rtaylor Subscribers: wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47434 llvm-svn: 335222	2018-06-21 13:36:01 +00:00
Simon Pilgrim	2a9cde026c	[X86][AVX] Reduce v4f64/v4i64 shuffle costs (PR37882) These were being over cautious for costs for one/two op general shuffles - VSHUFPD doesn't have to replicate the same shuffle in both lanes like VSHUFPS does. llvm-svn: 335216	2018-06-21 11:37:13 +00:00
Craig Topper	296526bf46	[X86] Remove masking from 512-bit floating max/min intrinsics. Use select instruction instead. llvm-svn: 335199	2018-06-21 05:00:56 +00:00
Simon Dardis	0f111dd704	[mips] Add microMIPS specific addressing patterns. These are identical but use microMIPS instructions instead of MIPS instructions. Also, flatten the 'let AdditionalPredicates = [InMicroMips]' by using the ISA_MICROMIPS adjective. Add tests for constant materialization. Reviewers: atanasyan, abeserminji, smaksimovic Differential Revision: https://reviews.llvm.org/D48275 llvm-svn: 335185	2018-06-20 22:40:12 +00:00
Craig Topper	c2696d577b	[X86] Use setcc ISD opcode for AVX512 integer comparisons all the way to isel I don't believe there is any real reason to have separate X86 specific opcodes for vector compares. Setcc has the same behavior just uses a different encoding for the condition code. I had to change the CondCodeAction for SETLT and SETLE to prevent some transforms from changing SETGT lowering. Differential Revision: https://reviews.llvm.org/D43608 llvm-svn: 335173	2018-06-20 21:05:02 +00:00
Simon Dardis	6021424c10	[mips] Correct predicates for loads, bit manipulation instructions and some pseudos Additionally, correct the definition of the rdhwr instruction. Reviewers: atanasyan, abeserminji, smaksimovic Differential Revision: https://reviews.llvm.org/D48216 llvm-svn: 335162	2018-06-20 19:59:58 +00:00
Matt Arsenault	5a4ec8127f	AMDGPU: Fix scalar_to_vector for v4i16/v4f16 llvm-svn: 335161	2018-06-20 19:45:48 +00:00
Matt Arsenault	3d06668ad4	AMDGPU: Fix missing C++ mode comment llvm-svn: 335160	2018-06-20 19:45:40 +00:00
Alex Bradbury	fafdebcfcb	[RISCV] Accept fmv.s.x and fmv.x.s as mnemonic aliases for fmv.w.x and fmv.x.w These instructions were renamed in version 2.2 of the user-level ISA spec, but the old name should also be accepted by standard tools. llvm-svn: 335154	2018-06-20 18:42:25 +00:00
Sam Clegg	52564675c4	[WebAssembly] Update know failures for the wasm waterfall Summary: The waterfall no longer builds .s files and no longers uses the wasm-o when it builds object files. Subscribers: dschuff, jgravelle-google, aheejin, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D48371 llvm-svn: 335135	2018-06-20 15:17:12 +00:00
Alex Bradbury	79d2b50ca8	[RISCV] Add InstAlias definitions for fgt.{s\|d}, fge.{s\|d} These are produced by GCC and supported by GAS, but not currently contained in the pseudoinstruction listing in the RISC-V ISA manual. llvm-svn: 335127	2018-06-20 14:03:02 +00:00
Krzysztof Parzyszek	d8b780dcd6	[Hexagon] Remove 'T' from HasVNN predicates, NFC Patch by Sumanth Gundapaneni. llvm-svn: 335124	2018-06-20 13:56:09 +00:00
Simon Dardis	eae99120b0	[mips] Fix the predicates of some DSP instructions from AdditionalPredicates to ASEPredicate Reviewers: atanasyan, abeserminji, smaksimovic Differential Revision: https://reviews.llvm.org/D48166 llvm-svn: 335122	2018-06-20 13:29:57 +00:00
Alex Bradbury	18b9bd7d6c	[RISCV] Add InstAlias definitions for sgt and sgtu These are produced by GCC and supported by GAS, but not currently contained in the pseudoinstruction listing in the RISC-V ISA manual. llvm-svn: 335120	2018-06-20 12:54:02 +00:00
Tim Northover	644a819534	ARM: convert ORR instructions to ADD where possible on Thumb. Thumb has more 16-bit encoding space dedicated to ADD than ORR, allowing both a 3-address encoding and a wider range of immediates. So, particularly when optimizing for code size (but it doesn't make things worse elsewhere) it's beneficial to select an OR operation to an ADD if we know overflow won't occur. This is made even better by LLVM's penchant for putting operations in canonical form by converting the other way. llvm-svn: 335119	2018-06-20 12:09:44 +00:00
Tim Northover	70666e7765	[AArch64] Implement FLT_ROUNDS macro. Very similar to ARM implementation, just maps to an MRS. Should fix PR25191. Patch by Michael Brase. llvm-svn: 335118	2018-06-20 12:09:01 +00:00
Andrea Di Biagio	2145b13fc9	[llvm-mca][X86] Teach how to identify register writes that implicitly clear the upper portion of a super-register. This patch teaches llvm-mca how to identify register writes that implicitly zero the upper portion of a super-register. On X86-64, a general purpose register is implemented in hardware as a 64-bit register. Quoting the Intel 64 Software Developer's Manual: "an update to the lower 32 bits of a 64 bit integer register is architecturally defined to zero extend the upper 32 bits". Also, a write to an XMM register performed by an AVX instruction implicitly zeroes the upper 128 bits of the aliasing YMM register. This patch adds a new method named clearsSuperRegisters to the MCInstrAnalysis interface to help identify instructions that implicitly clear the upper portion of a super-register. The rest of the patch teaches llvm-mca how to use that new method to obtain the information, and update the register dependencies accordingly. I compared the kernels from tests clear-super-register-1.s and clear-super-register-2.s against the output from perf on btver2. Previously there was a large discrepancy between the estimated IPC and the measured IPC. Now the differences are mostly in the noise. Differential Revision: https://reviews.llvm.org/D48225 llvm-svn: 335113	2018-06-20 10:08:11 +00:00
Roman Lebedev	d23b6831de	[X86][Znver1] Specify Register Files, RCU; FP scheduler capacity. Summary: First off: i do not have any access to that processor, so this is purely theoretical, no benchmarks. I have been looking into bdver2 scheduling profile, and while cross-referencing the existing btver2, znver1 profiles, and the reference docs (`Software Optimization Guide for AMD Family {15,16,17}h Processors`), i have noticed that only btver2 scheduling profile specifies these. Also, there is no mca test coverage. Reviewers: RKSimon, craig.topper, courbet, GGanesh, andreadb Reviewed By: GGanesh Subscribers: gbedwell, vprasad, ddibyend, shivaram, Ashutosh, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D47676 llvm-svn: 335099	2018-06-20 07:01:14 +00:00
Clement Courbet	7b9913fb9f	[X86] Add sched class WriteLAHFSAHF and fix values. Summary: I ran llvm-exegesis on SKX, SKL, BDW, HSW, SNB. Atom is from Agner and SLM is a guess. I've left AMD processors alone. Reviewers: RKSimon, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D48079 llvm-svn: 335097	2018-06-20 06:13:39 +00:00
Craig Topper	d22ad8568c	[X86] Use binary search of the EVEX->VEX static tables instead of populating two DenseMaps for lookups Summary: After r335018, the static tables are guaranteed sorted by the EVEX opcode to convert. We can use this to do a binary search and remove the need for any secondary data structures. Right now one table is 736 entries and the other is 482 entries. It might make sense to merge the two tables as a follow up. The effort it takes to select the table is probably similar to the extra binary search step it would require for a larger table. I haven't done any measurements to see if this has any effect on compile time, but I don't imagine that EVEX->VEX conversion is a place we spend a lot of time. Reviewers: RKSimon, spatel, chandlerc Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D48312 llvm-svn: 335092	2018-06-20 04:32:04 +00:00
Vlad Tsyrklevich	98724e582e	Revert r334980 and 334983 This reverts commits r334980 and r334983 because they were causing build timeouts on the x86_64-linux-ubsan bot. llvm-svn: 335085	2018-06-20 00:02:32 +00:00
Jessica Paquette	32de26d432	[MachineOutliner] NFC: Remove insertOutlinerPrologue, rename insertOutlinerEpilogue insertOutlinerPrologue was not used by any target, and prologue-esque code was beginning to appear in insertOutlinerEpilogue. Refactor that into one function, buildOutlinedFrame. This just removes insertOutlinerPrologue and renames insertOutlinerEpilogue. llvm-svn: 335076	2018-06-19 21:14:48 +00:00
Heejin Ahn	891a747266	[WebAssembly] Fix liveness tracking info after drop insertion Summary: This fixes liveness tracking information after `drop` instruction insertion in ExplicitLocals pass. When a drop instruction is inserted to drop a dead register operand, the original operand should be marked not dead anymore because it is now used by the new drop instruction. And the operand to the new drop instruction should be marked killed instead. This bug caused some programs to fail when `llc` is run with `-verify-machineinstrs` option. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D48253 llvm-svn: 335074	2018-06-19 20:30:42 +00:00
Krzysztof Parzyszek	03aa8f3a24	[Hexagon] Fix the value of HexagonII::TypeCVI_FIRST This value is the first vector instruction type in numerical order. The previous value was incorrect, leaving TypeCVI_GATHER outside of the range for vector instructions. This caused vector .new instructions to be incorrectly encoded in the presence of gather. llvm-svn: 335065	2018-06-19 18:09:54 +00:00
Craig Topper	0b7936737b	[X86] Initialize FMA3Info directly in its constructor instead of relying on std::call_once FMA3Info only exists as a managed static. As far as I know the ManagedStatic construction proccess is thread safe. It doesn't look like we ever access the ManagedStatic object without immediately doing a query on it that would require the map to be populated. So I don't think we're ever deferring the calculation of the tables from the construction of the object. So I think we should be able to just populate the FMA3Info map directly in the constructor and get rid of all of the initGroupsOnce stuff. Differential Revision: https://reviews.llvm.org/D48194 llvm-svn: 335064	2018-06-19 18:06:52 +00:00
Craig Topper	7ffa976993	[X86] Don't fold unaligned loads into SSE ROUNDPS/ROUNDPD for ceil/floor/nearbyint/rint/trunc. Incorrect patterns were added in r334460. This changes them to check alignment properly for SSE. llvm-svn: 335062	2018-06-19 17:51:42 +00:00
Krzysztof Parzyszek	5c2944c4f2	[Hexagon] Enforce restrictions on packetizing cache instructions llvm-svn: 335061	2018-06-19 17:26:20 +00:00
Simon Dardis	af38a8fed6	[mips] Mark microMIPS64 as being unsupported. There are no provided instruction definitions for this architecture. Reviewers: smaksimovic, atanasyan, abeserminji Differential Revision: https://reviews.llvm.org/D48320 llvm-svn: 335057	2018-06-19 16:05:44 +00:00
Simon Dardis	9e80c340f7	[mips] Fix the predicates of some aliases Previously, some aliases were marked as not being available for microMIPS32R6, but this was overridden at the top level. Reviewers: atanasyan, abeserminji, smaksimovic Differential Revision: https://reviews.llvm.org/D48321 llvm-svn: 335053	2018-06-19 15:25:01 +00:00
Strahinja Petrovic	bb2b00bb80	[PowerPC] Fix label address calculation for ppc32 This patch fixes calculating address of label on ppc32 (for -fPIC). Differential Revision: https://reviews.llvm.org/D46582 llvm-svn: 335043	2018-06-19 13:07:40 +00:00
Mikhail Dvoretckii	b1ce7765be	[X86] VRNDSCALE* folding from masked and scalar ffloor and fceil patterns This patch handles back-end folding of generic patterns created by lowering the X86 rounding intrinsics to native IR in cases where the instruction isn't a straightforward packed values rounding operation, but a masked operation or a scalar operation. Differential Revision: https://reviews.llvm.org/D45203 llvm-svn: 335037	2018-06-19 10:37:52 +00:00
Mikhail Dvoretckii	bd8ed2dbaa	Test commit. llvm-svn: 335026	2018-06-19 07:55:10 +00:00
QingShan Zhang	9f0fe9a3f8	If the arch is P9, we will select the DFLOADf32/DFLOADf64 pseudo instruction when we are loading a floating, and expand it post RA basing on the register pressure. However, we miss to do the add-imm peephole for these pseudo instruction. Differential Revision: https://reviews.llvm.org/D47568 Reviewed By: Nemanjai llvm-svn: 335024	2018-06-19 06:54:51 +00:00
Craig Topper	c2965214ef	[X86] Add the ability to force an EVEX2VEX mapping table entry from the .td files. Remove remaining manual table entries from the tablegen emitter. This adds an EVEX2VEXOverride string to the X86 instruction class in X86InstrFormats.td. If this field is set it will add manual entry in the EVEX->VEX tables that doesn't check the encoding information. Then use this mechanism to map VMOVDU/A8/16, 128-bit VALIGN, and VPSHUFF/I instructions to VEX instructions. Finally, remove the manual table from the emitter. This has the bonus of fully sorting the autogenerated EVEX->VEX tables by their EVEX instruction enum value. We may be able to use this to do a binary search for the conversion and get rid of the need to create a DenseMap. llvm-svn: 335018	2018-06-19 04:24:44 +00:00
Craig Topper	0a5e90cc2a	[X86] Add a new VEX_WPrefix encoding to tag EVEX instruction that have VEX.W==1, but can be converted to their VEX equivalent that uses VEX.W==0. EVEX makes heavy use of the VEX.W bit to indicate 64-bit element vs 32-bit elements. Many of the VEX instructions were split into 2 versions with different masking granularity. The EVEX->VEX table generate can collapse the two versions if the VEX version uses is tagged as VEX_WIG. But if the VEX version is instead marked VEX.W==0 we can't combine them because we don't know if there is also a VEX version with VEX.W==1. This patch adds a new VEX_W1X tag that indicates the EVEX instruction encodes with VEX.W==1, but is safe to convert to a VEX instruction with VEX.W==0. This allows us to remove a bunch of manual EVEX->VEX table entries. We may want to look into splitting up the VEX_WPrefix field which would simplify the disassembler. llvm-svn: 335017	2018-06-19 04:24:42 +00:00
Craig Topper	46c0b368d6	[X86] Simplify the TSFlags checking code in EvexToVexInstPass. NFCI The code was previously checking the L2 and L flag on 3 separate lines, treating the combination as an encoding. Instead its better to think of the L2 bit as being something that can't be done with VEX and early returning. Then we just need to check the L bit. llvm-svn: 335015	2018-06-19 03:17:46 +00:00
Heejin Ahn	817811caae	[WebAssembly] Add more utility functions Summary: Added more utility functions that will be used in EH-related passes Also changed `LoopBottom` function to `getBottom` and uses templates to be able to handle other classes as well, which will be used in CFGSort later. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D48262 llvm-svn: 335006	2018-06-19 00:32:03 +00:00
Heejin Ahn	6279d71227	[WebAssembly] Make rethrow instruction take a target BB argument Summary: This patch changes the rethrow instruction to take a BB argument in LLVM backend, like `br` and `br_if`s. This BB is a target catch BB the rethrow instruction unwinds to. This BB argument will be converted to an relative depth immediate at the end of CFGStackify pass, as in the same way of branches. RETHROW_TO_CALLER is a codegen-only instruction that should be used when a rethrow instruction does not have an unwind destination BB, i.e., it should rethrow to its caller function. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D48260 llvm-svn: 334998	2018-06-18 23:54:29 +00:00
Craig Topper	a7b7f2f4d8	[X86] Remove ReadAfterLd from avx512_shift_rmbi multiclass. The instructions that use this class don't have another source register. So I think this was just marking one of the address operands as ReadAfterLd? llvm-svn: 334994	2018-06-18 23:20:57 +00:00
Eric Christopher	88bbad24c0	Tidy comment language and explanation. llvm-svn: 334990	2018-06-18 22:21:19 +00:00
Eric Christopher	b1faaf3069	Pull non-lazy stub table emission into a separate function alongside the individual stub creation to increase readability a bit in the non-object file format specific function. llvm-svn: 334989	2018-06-18 22:21:18 +00:00
Eric Christopher	b6d0b99f3b	Add return statements to make it clear that all of these are mutually exclusive conditions. else if would have worked just as well, but this keeps the original readability a bit more clear. llvm-svn: 334988	2018-06-18 22:21:13 +00:00
Wouter van Oortmerssen	48dac3109e	[WebAssembly] Modified tablegen defs to have 2 parallel instuction sets. Summary: One for register based, much like the existing definitions, and one for stack based (suffix _S). This allows us to use registers in most of LLVM (which works better), and stack based in MC (which results in a simpler and more readable assembler / disassembler). Tried to keep this change as small as possible while passing tests, follow-up commit will: - Add reg->stack conversion in MI. - Fix asm/disasm in MC to be stack based. - Fix emitter to be stack based. tests passing: llvm-lit -v `find test -name WebAssembly` test/CodeGen/WebAssembly test/MC/WebAssembly test/MC/Disassembler/WebAssembly test/DebugInfo/WebAssembly test/CodeGen/MIR/WebAssembly test/tools/llvm-objdump/WebAssembly Reviewers: dschuff, sbc100, jgravelle-google, sunfish Subscribers: aheejin, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D48183 llvm-svn: 334985	2018-06-18 21:22:44 +00:00
Sander de Smalen	067eee1c13	[AArch64][SVE] Asm: Fix predicate pattern diagnostics. This patch uses the DiagnosticPredicate for SVE predicate patterns to improve their diagnostics, now giving a 'invalid operand' diagnostic if the type is not an immediate or one of the expected pattern labels. Reviewers: samparker, SjoerdMeijer, javed.absar, fhahn Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D48220 llvm-svn: 334983	2018-06-18 21:03:02 +00:00
Sander de Smalen	7ac9e193ec	[AArch64][SVE] Asm: Support for saturating INC/DEC (32bit scalar) instructions. The variants added by this patch are: - SQINC signed increment, e.g. sqinc x0, w0, all, mul #4 - SQDEC signed decrement, e.g. sqdec x0, w0, all, mul #4 - UQINC unsigned increment, e.g. uqinc w0, all, mul #4 - UQDEC unsigned decrement, e.g. uqdec w0, all, mul #4 This patch includes asmparser changes to parse a GPR64 as a GPR32 in order to satisfy the constraint check: x0 == GPR64(w0) in: sqinc x0, w0, all, mul #4 ^___^ (must match) Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D47716 llvm-svn: 334980	2018-06-18 20:50:33 +00:00
Wouter van Oortmerssen	78c62966c2	[WebAssembly] Cleaned up register accessors in WebAssemblyMachineFunctionInfo.h Tested: llvm-lit -v `find test -name WebAssembly` (This is a commit access "test commit" :) llvm-svn: 334979	2018-06-18 20:45:49 +00:00
Craig Topper	17bd84c12c	[X86] Encode the EVEX2VEX exception list information in .td files instead of the emitter source. Rather than having an exclusion list in tablegen sources, add a flag to the X86 instruction records that can be used to suppress checking for convertibility. llvm-svn: 334971	2018-06-18 18:47:07 +00:00
Sander de Smalen	13684d8400	[AArch64][SVE] Asm: Support for saturating INC/DEC (64bit scalar) instructions. Summary: The variants added by this patch are: - SQINC (signed increment) - UQINC (unsigned increment) - SQDEC (signed decrement) - UQDEC (unsigned decrement) For example: uqincw x0, all, mul #4 Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar Differential Revision: https://reviews.llvm.org/D47715 llvm-svn: 334948	2018-06-18 14:47:52 +00:00
Simon Pilgrim	9173c97ce4	[X86][BtVer2] Flag AVX2+ scheduler classes as unsupported Jaguar only supports up to AVX1 Differential Revision: https://reviews.llvm.org/D48274 llvm-svn: 334947	2018-06-18 14:31:14 +00:00
Sander de Smalen	d521c4353e	[AArch64][SVE] Asm: Support for vector element compares. This patch adds instructions for comparing elements from two vectors, e.g. cmpgt p0.s, p0/z, z0.s, z1.s and also adds support for comparing to a 64-bit wide element vector, e.g. cmpgt p0.s, p0/z, z0.s, z1.d The patch also contains aliases for certain comparisons, e.g.: cmple p0.s, p0/z, z0.s, z1.s => cmpge p0.s, p0/z, z1.s, z0.s cmplo p0.s, p0/z, z0.s, z1.s => cmphi p0.s, p0/z, z1.s, z0.s cmpls p0.s, p0/z, z0.s, z1.s => cmphs p0.s, p0/z, z1.s, z0.s cmplt p0.s, p0/z, z0.s, z1.s => cmpgt p0.s, p0/z, z1.s, z0.s llvm-svn: 334931	2018-06-18 10:59:19 +00:00
Clement Courbet	0d9da88d18	[X86] Fix NOOP sched overrides on BDW/HSW/SKL. Summary: Noop certainly does not use resources. Reviewers: RKSimon, craig.topper, andreadb Subscribers: gbedwell, llvm-commits, gchatelet Differential Revision: https://reviews.llvm.org/D48028 llvm-svn: 334927	2018-06-18 06:48:22 +00:00
Craig Topper	f0ab7bd196	[X86] Create X86InstrFMA3Group objects fully in a static table instead of on the heap. NFCI Previously we heap allocated the X86InstrFMA3Group objects which were created by passing them small register/memory opcode arrays that existed as individual static tables. Rather than a bunch of small static arrays we now have one large static table of X86InstrFMA3Group objects. Rather than storing a pointer to the opcode arrays in the X86InstrFMA3Group object, we now store have a register and memory array as part of the object. If a group doesn't have memory or register opcodes, the array entries will be 0. This greatly simplifies the destruction of the X86InstrFMA3Info object. We no longer need to delete the X86InstrFMA3Group objects as we destruct the DenseMap. And we don't need to keep track of which ones we already deleted. This reduces the llc binary size on my local machine by ~50k. I can only assume that's really due to the fact that we had something like 512 small static arrays that we passed to the init functions either one at a time or in pairs. So there were between 256 and 512 distinct calls to the init functions in the initOnceImpl method. llvm-svn: 334925	2018-06-18 06:32:22 +00:00
Craig Topper	16fdde5e63	[X86] Add '.s' aliases to the assembler for the various redundant move encodings to match gas and our EVEX instructions. We already have these aliases for EVEX enocded instructions, but not for the GPR, MMX, SSE, and VEX versions. Also remove the vpextrw.s EVEX alias. That's not something gas implements. llvm-svn: 334922	2018-06-18 05:00:50 +00:00
Craig Topper	916d0cf649	[X86] Move the 'vmovq.s' and similar assembly strings for EVEX vector moves with reversed operands to InstAliases. The .s assembly strings allow the reversed forms to be targeted from assembly which matches gas behavior. But when printing the instructions we should print them without the .s to match other tooling like objdump. By using InstAliases we can use the normal string in the instruction and just hide it from the assembly parser. Ideally we'd add the .s versions to the legacy SSE and VEX versions as well for full compatibility with gas. Not sure how we got to state where only EVEX was supported. llvm-svn: 334920	2018-06-18 01:28:05 +00:00
Craig Topper	9fe45d846e	[X86] Add all the FMA instructions direclty to the load folding table instead of proxying through X86InstrFMA3Info. These increases the size of the static tables, but is closer to what we would get if used the autogenerated table directly. This reduces the remaining large deltas between what's in the manual table and what's in the autogenerated table. llvm-svn: 334915	2018-06-17 18:00:16 +00:00
Craig Topper	b0e986f88e	[X86] Pass the parent SDNode to X86DAGToDAGISel::selectScalarSSELoad to simplify the hasSingleUseFromRoot handling. Some of the calls to hasSingleUseFromRoot were passing the load itself. If the load's chain result has a user this would count against that. By getting the true parent of the match and ensuring any intermediate between the match and the load have a single use we can avoid this case. isLegalToFold will take care of checking users of the load's data output. This fixed at least fma-scalar-memfold.ll to succed without the peephole pass. llvm-svn: 334908	2018-06-17 16:29:46 +00:00
Sander de Smalen	279b7e74e7	[AArch64][SVE] Asm: Support for bitwise operations on predicate vectors. This patch adds support for instructions performing bitwise operations on predicate vectors, including AND, BIC, EOR, NAND, NOR, ORN, ORR, and their status flag setting variants ANDS, BICS, EORS, NANDS, ORNS, ORRS. This patch also adds several aliases: orr p0.b, p1/z, p1.b, p1.b => mov p0.b, p1.b orrs p0.b, p1/z, p1.b, p1.b => movs p0.b, p1.b and p0.b, p1/z, p2.b, p2.b => mov p0.b, p1/z, p2.b ands p0.b, p1/z, p2.b, p2.b => movs p0.b, p1/z, p2.b eor p0.b, p1/z, p2.b, p1.b => not p0.b, p1/z, p2.b eors p0.b, p1/z, p2.b, p1.b => nots p0.b, p1/z, p2.b llvm-svn: 334906	2018-06-17 10:48:21 +00:00
Sander de Smalen	2c25b4cd36	[AArch64][SVE] Asm: Support for SEL (vector/predicate) instructions. Support for SVE's predicated select instructions to select elements from either vector, both in a data-vector and a predicate-vector variant. llvm-svn: 334905	2018-06-17 10:11:04 +00:00
Jonas Hahnfeld	c7410ed47a	[NVPTX] Ignore target-cpu and -features for inlining We don't want to prevent inlining because of target-cpu and -features attributes that were added to newer versions of LLVM/Clang: There are no incompatible functions in PTX, ptxas will throw errors in such cases. Differential Revision: https://reviews.llvm.org/D47691 llvm-svn: 334904	2018-06-17 09:55:20 +00:00
Heejin Ahn	9786946731	[WebAssembly] Simple comment fix. NFC. llvm-svn: 334899	2018-06-17 00:37:56 +00:00
Craig Topper	29f22d7baa	[X86] More additions to the load folding tables based on the autogenerated tables. Including more additions for NotMemoryFoldable to remove some entries from the autogenerated table. llvm-svn: 334898	2018-06-16 23:25:50 +00:00
Craig Topper	c435632862	[X86] Hide POP16/32/64rmr and PUSH16/32/64rmr instructions from the assembly parser. These all have a short form encoding that the assembler already prefers. Though that preference seems to only be based on order in the .td fie. Hiding the long form saves space in the table and prevents us from breaking the implicit order based priority. llvm-svn: 334897	2018-06-16 23:25:48 +00:00
Craig Topper	74412c7d59	[X86] Fix an inconsistency between AVX512 and AVX/SSE version on a couple instructions. VMOVPQIto64Zmr is not a 64-bit mode only instruction. But I don't know how to test this because VMOVPQIto64mr should always have priority over it in 32-bit mode since its only advantage is XMM16-XMM31 which aren't usable in 32-bit mode. VMOVPQIto64Zrr is a 64-bit mode only instruction, but we don't need to explicitly mark it as such because it uses a GR64 register which won't parse in 32-bit mode. llvm-svn: 334896	2018-06-16 23:25:47 +00:00
Stanislav Mekhanoshin	3b11794dbf	[AMDGPU] setcc (select cc, CT, CF), CF, eq \| ne -> xor cc, -1 \| cc This is the common case in the BE when we serialize condition and then rematerialize it. Use either original or inverted condition. Differential Revision: https://reviews.llvm.org/D48246 llvm-svn: 334882	2018-06-16 03:46:59 +00:00
Daniel Sanders	8ead1290e6	[globalisel][tablegen] Add support for C++ predicates on PatFrags and use it to support BFC on ARM. So far, we've only handled special cases of PatFrag like ImmLeaf. This patch adds support for the remaining cases using similar mechanisms. Like most C++ code from SelectionDAG, GISel and DAGISel expect to operate on different types and representations and as such the code is not compatible between the two. It's therefore necessary to add an alternative implementation in the GISelPredicateCode field. The target test for this feature could easily be done with IntImmLeaf and this would save on a little boilerplate. The reason I've chosen to implement this using PatFrag.GISelPredicateCode and not IntImmLeaf is because I was unable to find a rule that was blocked solely by lack of support for PatFrag predicates. I found that the ones I investigated as being likely candidates for the test were further blocked by other things. llvm-svn: 334871	2018-06-15 23:13:43 +00:00
Craig Topper	d00e375310	[X86] Add more instructions to the hasUndefRegUpdate list. Not sure any of these matter today because I don't think we ever produce them with IMPLICIT_DEF as an input. But by listing them we don't be suprised in the future. llvm-svn: 334867	2018-06-15 22:25:04 +00:00
Sean Fertile	cac28aeb3f	[PowerPC] Add support for high and higha symbol modifiers on tls modifers. Enables using the high and high-adjusted symbol modifiers on thread local storage modifers in powerpc assembly. Needed to be able to support 64 bit thread-pointer and dynamic-thread-pointer access sequences. Differential Revision: https://reviews.llvm.org/D47754 llvm-svn: 334856	2018-06-15 19:47:16 +00:00
Sean Fertile	80b8f82f17	[PPC64] Support "symbol@high" and "symbol@higha" symbol modifers. Add support for the "@high" and "@higha" symbol modifiers in powerpc64 assembly. The modifiers represent accessing the segment consiting of bits 16-31 of a 64-bit address/offset. Differential Revision: https://reviews.llvm.org/D47729 llvm-svn: 334855	2018-06-15 19:47:11 +00:00
Tomasz Krupa	bcaab53d47	[X86] Lowering sqrt intrinsics to native IR Summary: Complementary patch to lowering sqrt intrinsics in Clang. Reviewers: craig.topper, spatel, RKSimon, DavidKreitzer, uriel.k Reviewed By: craig.topper Subscribers: tkrupa, mike.dvoretsky, llvm-commits Differential Revision: https://reviews.llvm.org/D41599 llvm-svn: 334849	2018-06-15 18:05:24 +00:00
Craig Topper	1657b7b8d2	[X86] Prevent folding stack reloads into instructions in hasUndefRegUpdate. An earlier commit prevented folds from the peephole pass by checking for IMPLICIT_DEF. But later in the pipeline IMPLICIT_DEF just becomes and Undef flag on the input register so we need to check for that case too. llvm-svn: 334848	2018-06-15 17:56:17 +00:00
Sander de Smalen	a6edca72ba	[AArch64][SVE] Asm: Support for CPY SIMD/FP and GPR instructions. Predicated splat/copy of SIMD/FP register or general purpose register to SVE vector, along with MOV-aliases. llvm-svn: 334842	2018-06-15 16:39:46 +00:00
Sander de Smalen	18ac8f9f25	[AArch64][SVE] Asm: Support for INC/DEC (scalar) instructions. Increment/decrement scalar register by (scaled) element count given by predicate pattern, e.g. 'incw x0, all, mul #4'. Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D47713 llvm-svn: 334838	2018-06-15 15:47:44 +00:00
Matt Arsenault	63bc0e3cb9	AMDGPU: Add combine for short vector extract_vector_elts Try to access pieces 4 bytes at a time. This helps various hasOneUse extract_vector_elt combines, such as load width reductions. Avoids test regressions in a future commit. llvm-svn: 334836	2018-06-15 15:31:36 +00:00
Matt Arsenault	02dc7e19e2	AMDGPU: Make v4i16/v4f16 legal Some image loads return these, and it's awkward working around them not being legal. llvm-svn: 334835	2018-06-15 15:15:46 +00:00
Sander de Smalen	5eb51d7495	[AArch64][SVE] Asm: Support for FADD, FMUL and FMAX immediate instructions. Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar Reviewed By: javed.absar Differential Revision: https://reviews.llvm.org/D47712 llvm-svn: 334831	2018-06-15 13:57:51 +00:00
Simon Dardis	98b9849d34	[mips] Add licensing information of the microMIPS tablegen files. (NFC) llvm-svn: 334827	2018-06-15 13:29:35 +00:00
Sander de Smalen	3cbf171479	[AArch64][SVE] Asm: Add parsing/printing support for exact FP immediates. Some instructions require of a limited set of FP immediates as operands, for example '#0.5 or #1.0' for SVE's FADD instruction. This patch adds support for parsing and printing such FP immediates as exact values (e.g. #0.499999 is not accepted for #0.5). Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D47711 llvm-svn: 334826	2018-06-15 13:11:49 +00:00
Roman Lebedev	dec562c849	[AMDGPU] Recognize x & ~(-1 << y) pattern. Summary: The same pattern as D48010, but this one is IR-canonical as of D47428. Reviewers: nhaehnle, bogner, tstellar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #amdgpu Differential Revision: https://reviews.llvm.org/D48012 llvm-svn: 334817	2018-06-15 09:56:45 +00:00
Roman Lebedev	9c17dad8f2	[AMDGPU] Recognize x & ((1 << y) - 1) pattern. Summary: As a followup for D48007. Since we already handle `x << (bitwidth - y) >> (bitwidth - y)` pattern, which does not have ub for both the edge cases (`y == 0`, `y == bitwidth`), i think also handling a pattern that is ub for `y == bitwidth` should be fine. Reviewers: nhaehnle, bogner, tstellar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #amdgpu Differential Revision: https://reviews.llvm.org/D48010 llvm-svn: 334816	2018-06-15 09:56:39 +00:00
Roman Lebedev	aa8587d1fc	[AMDGPU] Recognize x & (-1 >> (32 - y)) pattern. Summary: D47980 will canonicalize the `x << (32 - y) >> (32 - y)`, which is the pattern the AMDGPU expects to `x & (-1 >> (32 - y))`, which is not recognized by AMDGPU. Thus, it needs to be recognized, too. Reviewers: nhaehnle, bogner, tstellar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #amdgpu Differential Revision: https://reviews.llvm.org/D48007 llvm-svn: 334815	2018-06-15 09:56:31 +00:00
Craig Topper	c8a763ed84	Revert r334802 "[X86] Prevent folding stack reloads with instructions that have an undefined register update." There's a typo causing the build to fail. llvm-svn: 334803	2018-06-15 06:15:26 +00:00
Craig Topper	5ec210cc27	[X86] Prevent folding stack reloads with instructions that have an undefined register update. We want to keep the load unfolded so we can use the same register for both sources to avoid a false dependency. llvm-svn: 334802	2018-06-15 06:11:36 +00:00
Craig Topper	3c4cc01226	[X86] Add more instructions to the memory folding tables using the autogenerated table as a guide. I think this covers most of the unmasked vector instructions. We're still missing a lot of the masked instructions. There are some test changes here because of the new folding support. I don't think these particular cases should be folded because it creates an undef register dependency. I think the changes introduced in r334175 are not handling stack folding. They're only blocking the peephole pass. llvm-svn: 334800	2018-06-15 05:49:19 +00:00
Craig Topper	f43807dd89	[X86] Add 'Z' to the internal names of various EVEX instructions for overall consistency. llvm-svn: 334785	2018-06-15 04:42:54 +00:00
Sanjay Patel	f85ca6abee	[x86] be more selective about converting 'and' to shuffle (PR37749) isVectorClearMaskLegal() is the TLI hook used by the generic DAGCombiner::XformToShuffleWithZero(). We've grown to accomodate/expect this transform to shuffle (disabling it more generally results in many regressions). So I'm narrowly excluding the 256-bit types that clearly are not worthwhile for AVX1. I think in most cases we are able to recover by converting the shuffle back into 'and' ops, but the cases in: https://bugs.llvm.org/show_bug.cgi?id=37749 ...show that there are cracks. llvm-svn: 334759	2018-06-14 19:55:02 +00:00
Craig Topper	bfa94d5086	[X86] Fix stale comment in folding tables. llvm-svn: 334758	2018-06-14 19:28:31 +00:00
Tom Stellard	a92847359a	AMDGPU/GlobalISel: Implement select() for @llvm.amdgcn.cvt.pkrtz Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45907 llvm-svn: 334757	2018-06-14 19:26:37 +00:00
Craig Topper	3ffeb41f6b	[X86] Add more vector instructions to the memory folding table using the autogenerated table as a guide. The test cahnge is because we now fold stack reload into RNDSCALE and RNDSCALE can be turned into ROUND by EVEX->VEX. llvm-svn: 334728	2018-06-14 15:40:31 +00:00
Craig Topper	82fa048371	[X86] Remove '128' from the internal name of some scalar FP instructions to be consistent with other scalar instructions. llvm-svn: 334727	2018-06-14 15:40:30 +00:00
Craig Topper	b0742bf30d	[X86] Disable load unfolding for a bunch of instruction where unfolding would increase the size of the load. Found by an audit of the manual table vs the autogenerated table. llvm-svn: 334726	2018-06-14 15:40:29 +00:00
Craig Topper	9f829f76e8	[X86] Remove NotMemoryFoldable from some AVX/AVX512 scalar instructions. Some of these instructions are already in the manual folding table so we should have them in the auto table too. llvm-svn: 334725	2018-06-14 15:40:27 +00:00
Simon Dardis	6ad680ab6a	[mips] Correct predicates for MSA pseudo instructions llvm-svn: 334708	2018-06-14 13:03:53 +00:00
Craig Topper	b2552e1e08	[x86] fix mappings of cvttp2si/cvttp2ui x86 intrinsics to x86-specific nodes and isel patterns (PR37551) Summary: The tests in: https://bugs.llvm.org/show_bug.cgi?id=37751 ...show miscompiles because we wrongly mapped and folded x86-specific intrinsics into generic DAG nodes. This patch corrects the mappings in X86IntrinsicsInfo.h and adds isel matching corresponding to the new patterns. The complete tests for the failure cases should be in avx-cvttp2si.ll and sse-cvttp2si.ll and avx512-cvttp2i.ll Reviewers: RKSimon, gbedwell, spatel Reviewed By: spatel Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D47993 llvm-svn: 334685	2018-06-14 03:16:58 +00:00
Tom Stellard	46bbbc33c0	AMDGPU/GlobalISel: Implement select() for 32-bit G_FADD and G_FMUL Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46171 llvm-svn: 334665	2018-06-13 22:30:47 +00:00
Craig Topper	f7f663e0a9	[X86] Move RCPSSr_Int, RSQRTSSr_Int, SQRTSDr_Int, SQRTSSr_Int to the correct load folding table. They were in the operand 1 folding table, but their foldable operand is operand 2. llvm-svn: 334648	2018-06-13 20:03:42 +00:00
Stanislav Mekhanoshin	7bec57300c	[AMDGPU] Corrected computeKnownBits for V_PERM_B32 Differential Revision: https://reviews.llvm.org/D48133 llvm-svn: 334640	2018-06-13 18:52:54 +00:00
Yaxun Liu	fb17bf60dd	[AMDGPU] Change enqueue kernel handle type Currently the handle type is a global pointer which holds 8 bytes. We need a larger type which hold 16 bytes, therefore change it to [i64 x 2]. Differential Revision: https://reviews.llvm.org/D48094 llvm-svn: 334625	2018-06-13 17:31:51 +00:00
Dmitry Preobrazhensky	32c6b5cb70	[AMDGPU][MC] Enabled parsing of relocations on VALU instructions See bug 37566: https://bugs.llvm.org/show_bug.cgi?id=37566 Reviewers: artem.tamazov, arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D47884 llvm-svn: 334622	2018-06-13 17:02:03 +00:00
Dmitry Preobrazhensky	ffbee7acdc	[AMDGPU][MC][GFX8][GFX9] Allow LDS direct reads for BUFFER_LOAD_DWORDX2/X3/X4 See bug 37653: https://bugs.llvm.org/show_bug.cgi?id=37653 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D47885 llvm-svn: 334609	2018-06-13 15:32:46 +00:00
Tom Stellard	264c171f36	AMDGPU: Move isSDNodeSourceOfDivergence() implementation to SITargetLowering Summary: The code that handles ISD:Register and ISD::CopyFromReg assumes the target is amdgcn, so this is broken on r600. We don't need this analysis on r600 anyway so we can safely move it to SITargetLowering. Reviewers: alex-t, arsenm, nhaehnle Reviewed By: arsenm Subscribers: msearles, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46298 llvm-svn: 334607	2018-06-13 15:06:37 +00:00
Zoran Jovanovic	3a7654c15d	[mips][microMIPS] Extending size reduction pass with LWP and SWP Author: milena.vujosevic.janicic Reviewers: sdardis The patch extends size reduction pass for MicroMIPS. It introduces reduction of two instructions into one instruction: Two SW instructions are transformed into one SWP instrucition. Two LW instructions are transformed into one LWP instrucition. Differential Revision: https://reviews.llvm.org/D39115 llvm-svn: 334595	2018-06-13 12:51:37 +00:00
Sanjay Patel	b983ac6fe1	[x86] eliminate even more sign-bit tests with vector select This shortcoming was noted in D47330, and the test diffs show we already had other examples where we failed to fold to a SHRUNKBLEND: /// Dynamic (non-constant condition) vector blend where only the sign bits /// of the condition elements are used. This is used to enforce that the /// condition mask is not valid for generic VSELECT optimizations. This patch implements an idea from D48043 and would obsolete that patch because it catches more cases (notable the AVX1 case that was missed there). All we're doing is allowing the existing transform to fire more often by removing the post-legalize constraint. All of the relevant feature checks and other predicates are left as-is. Differential Revision: https://reviews.llvm.org/D48078 llvm-svn: 334592	2018-06-13 12:28:32 +00:00
Alex Bradbury	96f492d7df	[RISCV] Add codegen support for atomic load/stores with RV32A Fences are inserted according to table A.6 in the current draft of version 2.3 of the RISC-V Instruction Set Manual, which incorporates the memory model changes and definitions contributed by the RISC-V Memory Consistency Model task group. Instruction selection failures will now occur for 8/16/32-bit atomicrmw and cmpxchg operations when targeting RV32IA until lowering for these operations is added in a follow-on patch. Differential Revision: https://reviews.llvm.org/D47589 llvm-svn: 334591	2018-06-13 12:04:51 +00:00
Alex Bradbury	dc790dd5d0	[RISCV] Codegen support for atomic operations on RV32I This patch adds lowering for atomic fences and relies on AtomicExpandPass to lower atomic loads/stores, atomic rmw, and cmpxchg to __atomic_* libcalls. test/CodeGen/RISCV/atomic-* are modelled on the exhaustive test/CodeGen/PPC/atomics-regression.ll, and will prove more useful once RV32A codegen support is introduced. Fence mappings are taken from table A.6 in the current draft of version 2.3 of the RISC-V Instruction Set Manual, which incorporates the memory model changes and definitions contributed by the RISC-V Memory Consistency Model task group. Differential Revision: https://reviews.llvm.org/D47587 llvm-svn: 334590	2018-06-13 11:58:46 +00:00
Clement Courbet	5eeed77f87	[TableGen] Emit a fatal error on inconsistencies in resource units vs cycles. Summary: For targets I'm not familiar with, I've automatically made the "default to 1 for each resource" behaviour explicit in the td files. For more obvious cases, I've ventured a fix. Some notes: - Exynos is especially fishy. - AArch64SchedThunderX2T99.td had some truncated entries. If I understand correctly, the person who wrote that interpreted the ResourceCycle as a range. I made the decision to use the upper/lower bound for consistency with the 'Latency' value. I'm sure there is a better choice. - The change to X86ScheduleBtVer2.td is an NFC, it just makes values more explicit. Also see PR37310. Reviewers: RKSimon, craig.topper, javed.absar Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D46356 llvm-svn: 334586	2018-06-13 09:41:49 +00:00
Hiroshi Inoue	0f7f59f073	[PowerPC] fix trivial typos in comment, NFC llvm-svn: 334583	2018-06-13 08:54:13 +00:00
Hiroshi Inoue	9bffc94cf0	[PowerPC] avoid verification failure due to PowerPC VSX Swap Removal pass This patch fixes a failure in lnt tests with -verify-machineinstrs option. When VSX Swap Removal pass swaps two register operands, it did not maintain kill flags associated with operands. This patch swaps flags as well as register number to avoid inconsistent kill flags information. llvm-svn: 334579	2018-06-13 08:25:14 +00:00
Craig Topper	3829d258ee	[X86] Remove masking from avx512vbmi2 concat and shift by immediate intrinsics. Use select in IR instead. llvm-svn: 334576	2018-06-13 07:19:21 +00:00
Craig Topper	55488731be	[X86] Mark all instructions that have masked store semantics with NotMemoryFoldable. Remove dependency on SchedRW from memory table autogenerator. Previously we were whitelisting in instructions based on their SchedRW value. With the masked store instructions explicitly removed via NotMemoryFoldable, we don't seem to need this check anymore. llvm-svn: 334563	2018-06-13 00:04:08 +00:00
Craig Topper	4f9cac667b	[X86] Remove VPCOMPRESSB/W from the autogenerated load folding table. llvm-svn: 334562	2018-06-13 00:04:04 +00:00
Stanislav Mekhanoshin	8fd3c4e431	[AMDGPU] DAG combine to produce V_PERM_B32 Differential Revision: https://reviews.llvm.org/D48099 llvm-svn: 334559	2018-06-12 23:50:37 +00:00
Krzysztof Parzyszek	82d284c1d2	[DAGCombiner] Recognize more patterns for ABS Differential Revision: https://reviews.llvm.org/D47831 llvm-svn: 334553	2018-06-12 21:51:49 +00:00
Petr Hosek	7250908016	[AArch64] Support reserving x20 register Register x20 is a callee-saved register which may be used for other purposes in certain contexts, for example to hold special variables within the kernel. This change adds support for reserving this register both to frontend and backend to make this register usable for these purposes. Differential Revision: https://reviews.llvm.org/D46552 llvm-svn: 334531	2018-06-12 20:00:50 +00:00
Craig Topper	3a34c3596d	[X86] Remove mayLoad flag from AVX512 truncating store instructions. llvm-svn: 334529	2018-06-12 19:59:08 +00:00
Reid Kleckner	98117a47e6	[MS][ARM64] Hoist __ImageBase handling into TargetLoweringObjectFileCOFF All COFF targets should use @IMGREL32 relocations for symbol differences against __ImageBase. Do the same for getSectionForConstant, so that immediates lowered to globals get merged across TUs. Patch by Chris January Differential Revision: https://reviews.llvm.org/D47783 llvm-svn: 334523	2018-06-12 18:56:05 +00:00
Konstantin Zhuravlyov	ce25bc3e82	AMDHSA/NFC: Code object v3 updates (additional): - Move section selection and alignment to AMDGPUAsmPrinter llvm-svn: 334521	2018-06-12 18:33:51 +00:00
Konstantin Zhuravlyov	00f2cb1116	AMDHSA: Code object v3 updates - Do not emit following assembler directives: - .hsa_code_object_version - .hsa_code_object_isa - .amd_amdgpu_isa - .amd_amdgpu_hsa_metadata - .amd_amdgpu_pal_metadata - Do not emit .note entries - Cleanup and bring in sync kernel descriptor header file - Emit kernel descriptor into .rodata with appropriate relocations and alignments llvm-svn: 334519	2018-06-12 18:02:46 +00:00
Fangrui Song	f72cdb50be	[MC] [X86] Teach leaq _GLOBAL_OFFSET_TABLE(%rip), %r15 to use R_X86_64_GOTPC32 instead of R_X86_64_PC32 Summary: This is similar to D46319 (ARM). x86-64 psABI p40 gives an example: leaq _GLOBAL_OFFSET_TABLE(%rip), %r15 # GOTPC32 reloc GNU as creates R_X86_64_GOTPC32. However, MC currently emits R_X86_64_PC32. Reviewers: javed.absar, echristo Subscribers: kristof.beyls, llvm-commits, peter.smith, grimar Differential Revision: https://reviews.llvm.org/D47507 llvm-svn: 334515	2018-06-12 16:20:44 +00:00
Simon Pilgrim	e39fa6cbbb	[CostModel] Replace ShuffleKind::SK_Alternate with ShuffleKind::SK_Select (PR33744) As discussed on PR33744, this patch relaxes ShuffleKind::SK_Alternate which requires shuffle masks to only match an alternating pattern from its 2 sources: e.g. v4f32: <0,5,2,7> or <4,1,6,3> This seems far too restrictive as most SIMD hardware which will implement it using a general blend/bit-select instruction, so replaces it with SK_Select, permitting elements from either source as long as they are inline: e.g. v4f32: <0,5,2,7>, <4,1,6,3>, <0,1,6,7>, <4,1,2,3> etc. This initial patch just updates the name and cost model shuffle mask analysis, later patch reviews will update SLP to better utilise this - it still limits itself to SK_Alternate style patterns. Differential Revision: https://reviews.llvm.org/D47985 llvm-svn: 334513	2018-06-12 16:12:29 +00:00
Craig Topper	ede97c9548	[X86] Remove TB_ALIGN_16 from VEXTRACTF128/VEXTRACTI128 in the memory folding table. llvm-svn: 334511	2018-06-12 15:48:03 +00:00
Krzysztof Parzyszek	bea23d065e	[Hexagon] Make floating point operations expensive for vectorization llvm-svn: 334508	2018-06-12 15:12:50 +00:00
Sanjay Patel	c3466d2568	[x86] move shrunkblend transform to helper function; NFCI We should be able to obsolete D48043 by easing the constraints on this existing code. llvm-svn: 334504	2018-06-12 14:21:51 +00:00
Krzysztof Parzyszek	3d671248ab	[SelectionDAG] Provide default expansion for rotates Implement default legalization of rotates: either in terms of the rotation in the opposite direction (if legal), or in terms of shifts and ors. Implement generating of rotate instructions for Hexagon. Hexagon only supports rotates by an immediate value, so implement custom lowering of ROTL/ROTR on Hexagon. If a rotate is not legal, use the default expansion. Differential Revision: https://reviews.llvm.org/D47725 llvm-svn: 334497	2018-06-12 12:49:36 +00:00
Simon Dardis	74fb5e6789	[mips] Guard some floating point instructions correctly Reviewers: smaksimovic, atanasyan, abeserminji Differential Revision: https://reviews.llvm.org/D47636 llvm-svn: 334491	2018-06-12 10:28:06 +00:00
Aleksandar Beserminji	8acdc10220	[mips] Extend LONG_BRANCH_LUi/ADDiu with extra parameter Extend LONG_BRANCH_LUi and LONG_BRANCH_ADDiu pseudo instructions with additional flag, so instead of always lowering to lui %hi(...), addiu %lo(...) or addiu %hi(...), now they can lower to either %lo, %hi, %higher or %highest depending on the added flag. Differential Revision: https://reviews.llvm.org/D47941 llvm-svn: 334490	2018-06-12 10:23:49 +00:00
Luke Geeson	dc82aa44e6	[AArch64] Audit on rL333879 to fix FP16 64bit bitpatterns llvm-svn: 334488	2018-06-12 09:35:20 +00:00
Craig Topper	88c230265b	[X86] Add NotMemoryFoldable to the VPCOMPRESS instructions. llvm-svn: 334481	2018-06-12 07:32:19 +00:00
Craig Topper	5799e4df75	[X86] Add NotMemoryFoldable to more instructions. These include PUSH/POP instructions that don't match the manual table. This also includes CMPXCHG which we never emit in non-locked form. llvm-svn: 334479	2018-06-12 07:32:17 +00:00
Craig Topper	66572df76e	[X86] Add NotMemoryFoldable to a bunch of instructions to suppress them from the autogenerated load folding table. Most of these are system instructions or other instructions we don't use in CodeGen. No point wasting space for them in the table. Removing them from the autogenerated table makes it easier to review the manual table. A few are real opcode collisions where the memory and register forms are completely different instructions. llvm-svn: 334474	2018-06-12 04:34:59 +00:00
Craig Topper	957b738432	[X86] Add isel patterns for folding loads when creating ROUND instructions from ffloor/fnearbyint/fceil/frint/ftrunc. We were missing packed isel folding patterns for all of sse41, avx, and avx512. For some reason avx512 had scalar load folding patterns under optsize(due to partial/undef reg update), but we didn't have the equivalent sse41 and avx patterns. Sometimes we would get load folding due to peephole pass anyway, but we're also missing avx512 instructions from the load folding table. I'll try to fix that in another patch. Some of this was spotted in the review for D47993. This patch adds all the folds to isel, adds a few spot tests, and disables the peephole pass on a few tests to ensure we're testing some of these patterns. llvm-svn: 334460	2018-06-12 00:48:57 +00:00
Mark Searles	987f292c56	[AMDGPU] prevent hitting Assertion `isReg() && "Wrong MachineOperand accessor"' The use iterator, used within findMaskOperands(), can return anything which is not a def. isUse() requires a register, so check isReg() before calling isUse(). Differential Revision: https://reviews.llvm.org/D48047 llvm-svn: 334459	2018-06-12 00:41:26 +00:00
George Burgess IV	c72204d5b5	Simplify; NFC Not shown in the diff: AQ is a `vector<SUnit >`, and SU is a `SUnit ` llvm-svn: 334451	2018-06-11 22:58:32 +00:00
Konstantin Zhuravlyov	3e5d66ac66	AMDGPU: Add 64-bit relative variant kind Differential Revision: https://reviews.llvm.org/D47601 llvm-svn: 334443	2018-06-11 21:37:57 +00:00
Craig Topper	3efdb7ce19	[X86] Push some variable declarations down into the individual switch cases that need them. NFC All of the cases are already wrapped in curly braces so declaring a variable there isn't an issue. And the variables aren't assigned or used in the larger scope. llvm-svn: 334436	2018-06-11 20:50:58 +00:00
Craig Topper	ceed99baf0	[X86] Reorder some type constraints to force things to be vectors and integer/fp before forcing them to be the same size. This may be needed by another patch that I'm working on. It should have no effect on any of the generated outputs. llvm-svn: 334430	2018-06-11 19:20:15 +00:00
Krzysztof Parzyszek	dd9415d550	[Hexagon] Late predicate producers cannot be used as dot-new sources llvm-svn: 334426	2018-06-11 18:45:52 +00:00
Simon Pilgrim	14ee66ef37	[X86][AVX512] Tag AVX5124FMAPS/AVX5124VNNIW with missing scheduler classes Necessary for D46276 as even though btver2 doesn't use these instructions, its now flagged as complete so complains if ANY instruction isn't tagged..... UnsupportedFeatures wouldn't help here as these instructions don't appear to have a feature predicate (like a lot of AVX512). llvm-svn: 334423	2018-06-11 17:28:00 +00:00
Stanislav Mekhanoshin	7ba3fc730c	[AMDGPU] Do not consider indirect acces through phi for wave limiter Rational: if there is indirect access that is usually an issue because load is not ready by the use. However, if use is inside a loop and load is outside that is potentially an issue for a first iteration only. Differential Revision: https://reviews.llvm.org/D47740 llvm-svn: 334420	2018-06-11 16:50:49 +00:00
Aleksandar Beserminji	62cf9d21ab	[mips] Fix spill slot for mips3, n64 abi When program is compiled for mips3 with n64 abi, wrong register class is used for creating an emergency spill slot. This patch fixes the correct register class to be chosen. This patch resolves PR35859. Thanks to John Baldwin for reporting the issue! Differential Revision: https://reviews.llvm.org/D47938 llvm-svn: 334419	2018-06-11 16:50:28 +00:00
Dylan McKay	d011869c82	[AVR] Set trackLivenessAfterRegAlloc This sets trackLivenessAfterRegAlloc on AVRRegisterInfo. Most existing targets set this flag. Without it, specific IR inputs cause LLVM to fail with: Assertion failed: (getParent()->getProperties().hasProperty( MachineFunctionProperties::Property::TracksLiveness) && "Liveness information is accurate"), function livein_begin file MachineBasicBlock.cpp, line 1354. With this commit, this no longer happens. Patch by Peter Nimmervoll. llvm-svn: 334409	2018-06-11 14:46:48 +00:00
Clement Courbet	7db69cc08a	[X86] Fix skylake server scheduling info. Summary: This fixes most of the scheduling info for SKX vector operations. I had to split a lot of the YMM/ZMM classes into separate classes for YMM and ZMM. The before/after llvm-exegesis analysis are in the phabricator diff. Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47721 llvm-svn: 334407	2018-06-11 14:37:53 +00:00
Clement Courbet	f4f6899cdf	[ExynosM1][Sched] Fix resource usage in scheduling model. This is part of https://reviews.llvm.org/D46356. llvm-svn: 334391	2018-06-11 07:33:08 +00:00
Clement Courbet	c48435bfe5	[X86] Explicitly mark unsupported classes in scheduling models. Summary: In preparation for D47721. HSW and SNB still define unsupported classes as they are used by KNL and generic models respectively. Reviewers: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47763 llvm-svn: 334389	2018-06-11 07:00:08 +00:00
Craig Topper	0e25c8239a	[X86] Remove masking from dbpsadbw intrinsics, use select in IR instead. llvm-svn: 334384	2018-06-11 06:18:22 +00:00
Daniel Cederman	33f67a256b	[Sparc] Add support for 13-bit PIC Summary: When compiling with -fpic, in contrast to -fPIC, use only the immediate field to index into the GOT. This saves space if the GOT is known to be small. The linker will warn if the GOT is too large for this method. Reviewers: jyknight, venkatra Reviewed By: jyknight Subscribers: brad, fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D47136 llvm-svn: 334383	2018-06-11 05:50:08 +00:00
Craig Topper	e71ad1f6d0	[X86] Remove and autoupgrade the expandload and compressstore intrinsics. We use the target independent intrinsics now. llvm-svn: 334381	2018-06-11 01:25:22 +00:00
Craig Topper	860562c915	[X86] Miscellaneous fixes to get the load folding table generator to work again. llvm-svn: 334377	2018-06-10 21:48:24 +00:00
Ivan A. Kosarev	847daa11f8	[NEON] Support VST1xN intrinsics in AArch32 mode (LLVM part) We currently support them only in AArch64. The NEON Reference, however, says they are 'ARMv7, ARMv8' intrinsics. Differential Revision: https://reviews.llvm.org/D47447 llvm-svn: 334361	2018-06-10 09:27:27 +00:00
Craig Topper	98a79934af	[X86] Remove masking from the 512-bit masked floating point add/sub/mul/div intrinsics. Use a select in IR instead. llvm-svn: 334358	2018-06-10 06:01:36 +00:00
Gabor Buella	5aa26980c4	[X86] NFC Use member initialization in X86Subtarget The separate initializeEnvironment function was sort of useless since r217071. ARM did this move already with r273556. llvm-svn: 334345	2018-06-09 09:19:40 +00:00
Eli Friedman	864df22307	[ARM] Allow CMPZ transforms even if the input has multiple uses. It looks like this got left in by accident in r289794; I can't think of any reason this check would be necessary. (Maybe it was meant to be a check that the AND has one use? But we check that a few lines earlier.) Differential Revision: https://reviews.llvm.org/D47921 llvm-svn: 334322	2018-06-08 21:16:56 +00:00
Simon Pilgrim	5c32989c91	[X86][SSE] Support v8i16/v16i16 rotations Extension to D46954 (PR37426), this patch adds support for v8i16/v16i16 rotations in a similar manner - the conversion of the shift/rotate amount to a multiplication factor and the use of PMULLW to shift left and PMULHUW (ISD::MULHU) to shift the wrapped bits back around to be ORd together. Differential Revision: https://reviews.llvm.org/D47822 llvm-svn: 334309	2018-06-08 17:58:42 +00:00
Simon Pilgrim	89deac6694	[X86][BtVer2] Add support for all SUB/XOR 32/64 scalar instructions that should match the dependency-breaking 'zero-idiom' As detailed on Agner's Microarchitecture doc (21.8 AMD Bobcat and Jaguar pipeline - Dependency-breaking instructions), these instructions are dependency breaking and fast-path zero the destination register (and appropriate EFLAGS bits). llvm-svn: 334303	2018-06-08 17:00:45 +00:00
Daniil Fukalov	c9a098b314	[AMDGPU] Inline asm - added i16, half and i128 types support AMDGPU inline assembler support i16, half and i128 typed variables in constraints, but they were reported as error. Needed to fix https://github.com/RadeonOpenCompute/ROCm/issues/341, e.g. to be able to load with global_load_dwordx4 to a 128bit integer variable Differential Revision: https://reviews.llvm.org/D44920 llvm-svn: 334301	2018-06-08 16:29:04 +00:00
Simon Pilgrim	a6afa310c9	[X86][SSE] Simplify combineVectorTruncationWithPACKUS to reduce code duplication Simplify combineVectorTruncationWithPACKUS to mask the upper bits followed by calling truncateVectorWithPACK instead of duplicating with similar code. This results in the codegen using (V)PACKUSDW on SSE41+ targets for vXi64/vXi32 inputs where before it always used PACKUSWB (along with a lot more bitcasting). I've raised PR37749 as until we avoid unnecessary concats back to 256-bit for bitwise ops, we can't avoid splitting the input value into 128-bit subvectors for masking. llvm-svn: 334289	2018-06-08 13:59:11 +00:00

... 3 4 5 6 7 ...

48293 Commits