llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	34279db355	[X86][SSE] Combine shuffles to ANY_EXTEND/ANY_EXTEND_VECTOR_INREG. We already do this for ZERO_EXTEND/ZERO_EXTEND_VECTOR_INREG - this just extends the pattern matcher to recognize cases where we don't need the zeros in the extension. llvm-svn: 363841	2019-06-19 17:21:15 +00:00
Simon Pilgrim	cdc0236e3a	[X86] getExtendInVec - take a ISD::*_EXTEND opcode instead of a IsSigned bool flag. NFCI. Prep work to support ANY_EXTEND/ANY_EXTEND_VECTOR_INREG without needing another flag. llvm-svn: 363818	2019-06-19 15:18:24 +00:00
Simon Pilgrim	d4754cac89	[X86] Add _EXTEND -> _EXTEND_VECTOR_INREG opcode conversion helper. NFCI. Given a _EXTEND or _EXTEND_VECTOR_INREG opcode, convert it to *_EXTEND_VECTOR_INREG. llvm-svn: 363812	2019-06-19 14:54:02 +00:00
Simon Pilgrim	2b309027ed	[X86] Merge extract_subvector(_EXTEND) and extract_subvector(_EXTEND_VECTOR_INREG) handling. NFCI. llvm-svn: 363808	2019-06-19 14:25:27 +00:00
Clement Courbet	4ef7c2868a	[X86] Add missing properties on llvm.x86.sse.{st,ld}mxcsr Summary: llvm.x86.sse.stmxcsr only writes to memory. llvm.x86.sse.ldmxcsr only reads from memory, and might generate an FPE. Reviewers: craig.topper, RKSimon Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62896 llvm-svn: 363773	2019-06-19 08:44:31 +00:00
Matt Arsenault	9cac4e6d14	Rename ExpandISelPseudo->FinalizeISel, delay register reservation This allows targets to make more decisions about reserved registers after isel. For example, now it should be certain there are calls or stack objects in the frame or not, which could have been introduced by legalization. Patch by Matthias Braun llvm-svn: 363757	2019-06-19 00:25:39 +00:00
Craig Topper	10e6128c62	[X86] Remove unnecessary line that makes v4f32 FP_ROUND Legal. NFC FP_ROUND defaults to Legal for all MVT types and nothing changes the v4f32 entry way from this default. If we needed this line we'd also need one for v8f32 with AVX512 which we don't have. llvm-svn: 363719	2019-06-18 19:04:03 +00:00
Simon Pilgrim	9c8593934a	[X86][AVX] extract_subvector(any_extend(x)) -> any_extend_vector_inreg(x) Part of fixing the X86 regression noted in D63281 - I've split this into X86 and generic parts - the generic commit will be coming shortly and will fix the vector-reduce-mul-widen.ll regression introduced here. llvm-svn: 363693	2019-06-18 15:30:50 +00:00
Simon Pilgrim	7dd529e54d	[X86] Replace any_extend* vector extensions with zero_extend* equivalents First step toward addressing the vector-reduce-mul-widen.ll regression in D63281 - we should replace ANY_EXTEND/ANY_EXTEND_VECTOR_INREG in X86ISelDAGToDAG to avoid having to add duplicate patterns when treating any extensions as legal. In future patches this will also allow us to keep any extension nodes around a lot longer in the DAG, which should mean that we can keep better track of undef elements that otherwise become zeros that we think we have to keep...... Differential Revision: https://reviews.llvm.org/D63326 llvm-svn: 363655	2019-06-18 09:50:13 +00:00
Craig Topper	f4284f8a9d	[X86] Move code that shrinks immediates for ((x << C1) op C2) into a helper function. NFCI Preliminary step for D59909 llvm-svn: 363645	2019-06-18 04:23:58 +00:00
Craig Topper	587427716c	[X86] Remove MOVDI2SSrm/MOV64toSDrm/MOVSS2DImr/MOVSDto64mr CodeGenOnly instructions. The isel patterns for these use a bitcast and load/store, but DAG combine should have canonicalized those away. For the purposes of the memory folding table these opcodes can be replaced by the MOVSSrm_alt/MOVSDrm_alt and MOVSSmr/MOVSDmr opcodes. llvm-svn: 363644	2019-06-18 03:23:15 +00:00
Craig Topper	8582ecd8d9	[X86] Introduce new MOVSSrm/MOVSDrm opcodes that use VR128 register class. Rename the old versions that use FR32/FR64 to MOVSSrm_alt/MOVSDrm_alt. Use the new versions in patterns that previously used a COPY_TO_REGCLASS to VR128. These patterns expect the upper bits to be zero. The current set up appears to work, but I'm not sure we should be enforcing upper bits being zero through a COPY_TO_REGCLASS. I wanted to flip the arrangement and use a COPY_TO_REGCLASS to FR32/FR64 for the patterns that need an f32/f64 result, but that complicated fastisel and globalisel. I've been doing some experiments with reducing some isel patterns and ended up in a situation where I had a (SUBREG_TO_REG (COPY_TO_RECLASS (VMOVSSrm), VR128)) and our post-isel peephole was unable to avoid using an instruction for the SUBREG_TO_REG due to the COPY_TO_REGCLASS. Having a VR128 instruction removes the COPY_TO_REGCLASS that was breaking this. llvm-svn: 363643	2019-06-18 03:23:11 +00:00
Craig Topper	971ad74ba2	Use VR128X instead of FR32X/FR64X for the register class in VMOVSSZmrk/VMOVSDZmrk. Removes COPY_TO_REGCLASS from some patterns. llvm-svn: 363630	2019-06-17 23:08:29 +00:00
Craig Topper	0e18300802	[X86] Make an assert in LowerSCALAR_TO_VECTOR stricter to make it clear what types are allowed here. NFC Make it clear that only integer type with i32 or smaller elements shoudl get to this part of the code. llvm-svn: 363629	2019-06-17 23:08:09 +00:00
Craig Topper	f3f968adcd	[X86] Add TB_NO_REVERSE to some memory folding table entries where the register form requires 64-bit mode, but the memory form does not. We don't know if its safe to unfold if we're in 32-bit mode. This is simlar to what was done to some load opcodes in r363523. I think its pretty unlikely we will try to unfold these anyway so I don't think this is testable. llvm-svn: 363595	2019-06-17 18:38:07 +00:00
Simon Pilgrim	835999e48a	[X86][SSE] Scalarize under-aligned XMM vector nt-stores (PR42026) If a XMM non-temporal store has less than natural alignment, scalarize the vector - with SSE4A we can stay on the vector and use MOVNTSD(f64), else we must move to GPRs and use MOVNTI(i32/i64). llvm-svn: 363592	2019-06-17 18:20:04 +00:00
Simon Pilgrim	bb9adfdb4e	[X86][AVX] Split under-aligned vector nt-stores. If a YMM/ZMM non-temporal store has less than natural alignment, split the vector - either they will be satisfactorily aligned or will continue to be split until they are XMMs - at which point the legalizer will scalarize it. llvm-svn: 363582	2019-06-17 17:22:38 +00:00
Warren Ristow	6452bdd29b	[LV] Suppress vectorization in some nontemporal cases When considering a loop containing nontemporal stores or loads for vectorization, suppress the vectorization if the corresponding vectorized store or load with the aligment of the original scaler memory op is not supported with the nontemporal hint on the target. This adds two new functions: bool isLegalNTStore(Type DataType, unsigned Alignment) const; bool isLegalNTLoad(Type DataType, unsigned Alignment) const; to TTI, leaving the target independent default implementation as returning true, but with overriding implementations for X86 that check the legality based on available Subtarget features. This fixes https://llvm.org/PR40759 Differential Revision: https://reviews.llvm.org/D61764 llvm-svn: 363581	2019-06-17 17:20:08 +00:00
Simon Pilgrim	12cb792d7f	[X86] combineLoad - begun making the load split code more generic. NFCI. This is currently only used for ymm->xmm splitting but we shouldn't hardcode the offsets/alignment. This is necessary for an upcoming patch to split under-aligned non-temporal vector loads. llvm-svn: 363570	2019-06-17 15:54:36 +00:00
Simon Pilgrim	454e6b9010	[X86][SSE] Prevent misaligned non-temporal vector load/store combines For loads, pre-SSE41 we can't perform NT loads at all, and after that we can only perform vector aligned loads, so if the alignment is less than for a xmm we'll just end up using the regular unaligned vector loads anyway. First step towards fixing PR42026 - the next step for stores will be to use SSE4A movntsd where possible and to avoid the stack spill on SSE2 targets. Differential Revision: https://reviews.llvm.org/D63246 llvm-svn: 363564	2019-06-17 14:26:10 +00:00
Craig Topper	9f2f127009	[X86] Add TB_NO_REVERSE to some folding table entries where the register from uses the REX prefix, but the memory form does not. It would not be safe to unfold the memory form the register form without checking that we are compiling for 64-bit mode. This probaby isn't a real functional issue since we are unlikely to unfold any of these instructions since they don't have any tied registers, aren't commutable, and don't have any inputs other than the address. llvm-svn: 363523	2019-06-16 22:33:09 +00:00
Sanjay Patel	d14389c0a5	[x86] split 256-bit vector selects if operands are vector concats This is similar logic/motivation to the select splitting in D62969. In D63233, the pattern changes so that we no longer have an extract_subvector of vselect, but the operands of the select are still being concatenated. The closest case is represented in either the first or last test diffs here - we have an extra instruction, but we converted 3-4 ymm instructions into 4-5 xmm instructions. I think that's the right trade-off for most AVX1 targets. In the example based on PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 ...this makes the loop about 30% faster (tested on Haswell by compiling with -mavx). Differential Revision: https://reviews.llvm.org/D63364 llvm-svn: 363508	2019-06-16 14:04:49 +00:00
Simon Pilgrim	fcffc2facc	[X86] CombineShuffleWithExtract - handle cases with different vector extract sources Insert the shorter vector source into an undef vector of the longer vector source's type. llvm-svn: 363507	2019-06-16 08:00:41 +00:00
Simon Pilgrim	456ca5d7f7	[X86] CombineShuffleWithExtract - assert all src ops types are multiples of rootsize. NFCI. llvm-svn: 363501	2019-06-15 19:12:44 +00:00
Simon Pilgrim	90e87af303	[X86][AVX] Handle lane-crossing shuffle(extract_subvector(x,c1),extract_subvector(y,c2),m1) shuffles Pull out the existing (non)lane-crossing fold into a helper lambda and use for lane-crossing unary shuffles as well. Fixes PR34380 llvm-svn: 363500	2019-06-15 18:30:43 +00:00
Simon Pilgrim	990f3ceb67	[X86][AVX] Decode constant bits from insert_subvector(c1, c2, c3) This mostly happens due to SimplifyDemandedVectorElts reducing a vector to insert_subvector(undef, c1, 0) llvm-svn: 363499	2019-06-15 17:05:24 +00:00
Kevin P. Neal	fece7c6c83	[FPEnv] Lower STRICT_FP_EXTEND and STRICT_FP_ROUND nodes in preprocess phase of ISelLowering to mirror non-strict nodes on x86. I recently discovered a bug on the x86 platform: The fp80 type was not handled well by x86 for constrained floating point nodes, as their regular counterparts are replaced by extending loads and truncating stores during the preprocess phase. Normally, platforms don't have this issue, as they don't typically attempt to perform such legalizations during instruction selection preprocessing. Before this change, strict_fp nodes survived until they were mutated to normal nodes, which happened shortly after preprocessing on other platforms. This modification lowers these nodes at the same phase while properly utilizing the chain.5 Submitted by: Drew Wock <drew.wock@sas.com> Reviewed by: Craig Topper, Kevin P. Neal Approved by: Craig Topper Differential Revision: https://reviews.llvm.org/D63271 llvm-svn: 363417	2019-06-14 16:28:55 +00:00
Eric Christopher	5e83d8fff4	Move commentary on opcode translation for code16 mov instructions to segment registers closer to the segment register check for when we add further optimizations. llvm-svn: 363355	2019-06-14 04:51:55 +00:00
Craig Topper	cf34a2bd5d	[X86Disassembler] Unify the EVEX and VEX code in emitContextTable. Merge the ATTR_VEXL/ATTR_EVEXL bits. NFCI Merging the two bits shrinks the context table from 16384 bytes to 8192 bytes. Remove the ATTRIBUTE_BITS macro and just create an enum directly. Then fix the ATTR_max define to be 8192 to reflect the table size so we stop hardcoding it separately. llvm-svn: 363330	2019-06-13 22:15:25 +00:00
Simon Pilgrim	757a2f13fd	[X86] Use fresh MemOps when emitting VAARG64 Previously it copied over MachineMemOperands verbatim which caused MOV32rm to have store flags set, and MOV32mr to have load flags set. This fixes some assertions being thrown with EXPENSIVE_CHECKS on. Committed on behalf of @luke (Luke Lau) Differential Revision: https://reviews.llvm.org/D62726 llvm-svn: 363268	2019-06-13 14:05:37 +00:00
Simon Pilgrim	6b56ad164c	[CodeGen] Add getMachineMemOperand + MachineMemOperand::Flags allocator helper wrapper. NFCI. Pre-commit for D62726 on behalf of @luke (Luke Lau) llvm-svn: 363257	2019-06-13 12:58:55 +00:00
Simon Pilgrim	0baf136a4d	[X86][SSE] Avoid assert for broadcast(horiz-op()) cases for non-f64 cases. Based on fuzz test from @craig.topper llvm-svn: 363251	2019-06-13 11:26:21 +00:00
Tom Stellard	f335672218	X86: Clean up pass initialization Summary: - Remove redundant initializations from pass constructors that were already being initialized by LLVMInitializeX86Target(). - Add initialization function for the FPS pass. Reviewers: craig.topper Reviewed By: craig.topper Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63218 llvm-svn: 363221	2019-06-13 02:09:32 +00:00
Simon Pilgrim	4e0648a541	[TargetLowering] Add MachineMemOperand::Flags to allowsMemoryAccess tests (PR42123) As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space. This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them. If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores. Differential Revision: https://reviews.llvm.org/D63075 llvm-svn: 363179	2019-06-12 17:14:03 +00:00
Simon Pilgrim	5b0e0dd709	[X86][AVX] Fold concat(vpermilps(x,c),vpermilps(y,c)) -> vpermilps(concat(x,y),c) Handles PSHUFD/PSHUFLW/PSHUFHW (AVX2) + VPERMILPS (AVX1). An extra AVX1 PSHUFD->VPERMILPS combine will be added in a future commit. llvm-svn: 363178	2019-06-12 16:38:20 +00:00
Craig Topper	ed4cd44870	[X86] Add VCMPSSZrr_Intk and VCMPSDZrr_Intk to isNonFoldablePartialRegisterLoad. The non-masked versions are already in there. I'm having some trouble coming up with a way to test this right now. Most load folding should happen during isel so I'm not sure how to get peephole pass to do it. llvm-svn: 363125	2019-06-12 06:29:53 +00:00
Simon Pilgrim	266f43964e	[TargetLowering] Add allowsMemoryAccess(MachineMemOperand) helper wrapper. NFCI. As suggested by @arsenm on D63075 - this adds a TargetLowering::allowsMemoryAccess wrapper that takes a Load/Store node's MachineMemOperand to handle the AddressSpace/Alignment arguments and will also implicitly handle the MachineMemOperand::Flags change in D63075. llvm-svn: 363048	2019-06-11 11:00:23 +00:00
Craig Topper	627d8168e7	[X86] Add load folding isel patterns to scalar_math_patterns and AVX512_scalar_math_fp_patterns. Also add a FIXME for the peephole pass not being able to handle this. llvm-svn: 363032	2019-06-11 04:30:53 +00:00
Tom Stellard	4b0b26199b	Revert CMake: Make most target symbols hidden by default This reverts r362990 (git commit `374571301d`) This was causing linker warnings on Darwin: ld: warning: direct access in function 'llvm::initializeEvexToVexInstPassPass(llvm::PassRegistry&)' from file '../../lib/libLLVMX86CodeGen.a(X86EvexToVex.cpp.o)' to global weak symbol 'void std::__1::__call_once_proxy<std::__1::tuple<void* (&)(llvm::PassRegistry&), std::__1::reference_wrapper<llvm::PassRegistry>&&> >(void*)' from file '../../lib/libLLVMCore.a(Verifier.cpp.o)' means the weak symbol cannot be overridden at runtime. This was likely caused by different translation units being compiled with different visibility settings. llvm-svn: 363028	2019-06-11 03:21:13 +00:00
Tom Stellard	374571301d	CMake: Make most target symbols hidden by default Summary: For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF this change makes all symbols in the target specific libraries hidden by default. A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these libraries public, which is mainly needed for the definitions of the LLVMInitialize* functions. This patch reduces the number of public symbols in libLLVM.so by about 25%. This should improve load times for the dynamic library and also make abi checker tools, like abidiff require less memory when analyzing libLLVM.so One side-effect of this change is that for builds with LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that access symbols that are no longer public will need to be statically linked. Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1): nm before/libLLVM-9svn.so \| grep ' [A-Zuvw] ' \| wc -l 36221 nm after/libLLVM-9svn.so \| grep ' [A-Zuvw] ' \| wc -l 26278 Reviewers: chandlerc, beanz, mgorny, rnk, hans Reviewed By: rnk, hans Subscribers: Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D54439 llvm-svn: 362990	2019-06-10 22:12:56 +00:00
Craig Topper	9000a72a4b	[X86] When promoting i16 compare with immediate to i32, try to use sign_extend for eq/ne if the input is truncated from a type with enough sign its. Summary: Our default behavior is to use sign_extend for signed comparisons and zero_extend for everything else. But for equality we have the freedom to use either extension. If we can prove the input has been truncated from something with enough sign bits, we can use sign_extend instead and let DAG combine optimize it out. A similar rule is used by type legalization in LegalizeIntegerTypes. This gets rid of the movzx in PR42189. The immediate will still take 4 bytes instead of the 2 bytes plus 0x66 prefix a cmp di, 32767 would get, but it avoids a length changing prefix. Reviewers: RKSimon, spatel, xbolva00 Reviewed By: xbolva00 Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63032 llvm-svn: 362920	2019-06-10 04:50:12 +00:00
Craig Topper	ceb807bbbc	[X86] Disable f32->f64 extload when sse2 is enabled Summary: We can only use the memory form of cvtss2sd under optsize due to a partial register update. So previously we were emitting 2 instructions for extload when optimizing for speed. Also due to a late optimization in preprocessiseldag we had to handle (fpextend (loadf32)) under optsize. This patch forces extload to expand so that it will always be in the (fpextend (loadf32)) form during isel. And when optimizing for speed we can just let each of those pieces select an instruction independently. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62710 llvm-svn: 362919	2019-06-10 04:37:16 +00:00
Craig Topper	dd10099d5c	[X86] Use EVEX instructions for f128 FAND/FOR/FXOR when avx512vl is enabled. llvm-svn: 362915	2019-06-10 01:18:55 +00:00
Craig Topper	f7ba8b808a	[X86] Convert f32/f64 FANDN/FAND/FOR/FXOR to vector logic ops and scalar_to_vector/extract_vector_elts to reduce isel patterns. Previously we did the equivalent operation in isel patterns with COPY_TO_REGCLASS operations to transition. By inserting scalar_to_vetors and extract_vector_elts before isel we can allow each piece to be selected individually and accomplish the same final result. I ideally we'd use vector operations earlier in lowering/combine, but that looks to be more difficult. The scalar-fp-to-i64.ll changes are because we have a pattern for using movlpd for store+extract_vector_elt. While an f64 store uses movsd. The encoding sizes are the same. llvm-svn: 362914	2019-06-10 00:41:07 +00:00
Jatin Bhateja	2a30aeb010	[X86] NFCI : Comment updation for EVEX to VEX translation. Reviewers: llvm-commits, jbhateja Reviewed By: jbhateja Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63055 llvm-svn: 362898	2019-06-09 09:59:26 +00:00
Craig Topper	2ba0e2518b	[X86] Remove (store (f32 (extractelt (v4f32))) isel patterns which is redundant. We emit a MOVSSmr and a COPY_TO_REGCLASS, but that's what we would get from selecting the store and extractelt independently. llvm-svn: 362895	2019-06-09 03:21:33 +00:00
Craig Topper	7d8494c41c	[X86] Mutate scalar fceil/ffloor/ftrunc/fnearbyint/frint into X86ISD::RNDSCALE during PreProcessIselDAG to cut down on number of isel patterns. Similar was done for vectors in r362535. Removes about 1200 bytes from the isel table. llvm-svn: 362894	2019-06-08 23:53:31 +00:00
Jonas Paulsson	fdc4ea34e3	[SystemZ, RegAlloc] Favor 3-address instructions during instruction selection. This patch aims to reduce spilling and register moves by using the 3-address versions of instructions per default instead of the 2-address equivalent ones. It seems that both spilling and register moves are improved noticeably generally. Regalloc hints are passed to increase conversions to 2-address instructions which are done in SystemZShortenInst.cpp (after regalloc). Since the SystemZ reg/mem instructions are 2-address (dst and lhs regs are the same), foldMemoryOperandImpl() can no longer trivially fold a spilled source register since the reg/reg instruction is now 3-address. In order to remedy this, new 3-address pseudo memory instructions are used to perform the folding only when the dst and lhs virtual registers are known to be allocated to the same physreg. In order to not let MachineCopyPropagation run and change registers on these transformed instructions (making it 3-address), a new target pass called SystemZPostRewrite.cpp is run just after VirtRegRewriter, that immediately lowers the pseudo to a target instruction. If it would have been possibe to insert a COPY instruction and change a register operand (convert to 2-address) in foldMemoryOperandImpl() while trusting that the caller (e.g. InlineSpiller) would update/repair the involved LiveIntervals, the solution involving pseudo instructions would not have been needed. This is perhaps a potential improvement (see Phabricator post). Common code changes: * A new hook TargetPassConfig::addPostRewrite() is utilized to be able to run a target pass immediately before MachineCopyPropagation. * VirtRegMap is passed as an argument to foldMemoryOperand(). Review: Ulrich Weigand, Quentin Colombet https://reviews.llvm.org/D60888 llvm-svn: 362868	2019-06-08 06:19:15 +00:00
Craig Topper	bd03230cb0	[X86] Remove unnecessary new line escape from the end of a macro. NFC llvm-svn: 362837	2019-06-07 20:30:40 +00:00
Sanjay Patel	6880bceda2	[x86] narrow extract subvector of vector select This is a potentially large perf win for AVX1 targets because of the way we auto-vectorize to 256-bit but then expect the backend to legalize/optimize for the half-implemented AVX1 ISA. On the motivating example from PR37428 (even though this patch doesn't solve the vector shift issue): https://bugs.llvm.org/show_bug.cgi?id=37428 ...there's a 16% speedup when compiling with "-mavx" (perf tested on Haswell) because we eliminate the remaining 256-bit vblendv ops. I added comments on a couple of tests that require further work. If we have 256-bit logic ops separating the vselect and extract, we should probably narrow everything to 128-bit, but that requires a larger pattern match. Differential Revision: https://reviews.llvm.org/D62969 llvm-svn: 362797	2019-06-07 13:17:46 +00:00
Pengfei Wang	f8b28931a7	[X86] -march=cooperlake (llvm) Support intel -march=cooperlake in llvm Patch by Shengchen Kan (skan) Differential Revision: https://reviews.llvm.org/D62836 llvm-svn: 362776	2019-06-07 08:31:35 +00:00
Craig Topper	f320f26716	[X86] Make a bunch of merge masked binops commutable for loading folding. This primarily affects add/fadd/mul/fmul/and/or/xor/pmuludq/pmuldq/max/min/fmaxc/fminc/pmaddwd/pavg. We already commuted the unmasked and zero masked versions. I've added 512-bit stack folding tests for most of the instructions affected. I've tested needing commuting and not commuting across unmasked, merged masked, and zero masked. The 128/256 bit instructions should behave similarly. llvm-svn: 362746	2019-06-06 21:00:04 +00:00
Craig Topper	6b67dfa54c	[X86] Make masked floating point equality/ordered compares commutable for load folding purposes. Same as what is supported for the unmasked form. llvm-svn: 362717	2019-06-06 16:39:04 +00:00
Craig Topper	9226ba6b37	[X86] Don't turn avx masked.load with constant mask into masked.load+vselect when passthru value is all zeroes. This is intended to enable the use of an immediate blend or more optimal instruction. But if the passthru is zero we don't need any additional instructions. llvm-svn: 362675	2019-06-06 05:41:27 +00:00
Craig Topper	3975b15dba	[X86] Fix mistake that marked VADDSSrrb_Int/VADDSDrrb_Int/VMULSSrrb_Int/VMULSDrrb_Int as commutable. One of the sources controls the pass through value for the upper bits of the result so we can't really commute it. In practice this problem isn't a functional issue because we would only try to commute this instruction in order to fold a load. But we can't do embedded rounding and fold a load at the same time. So the load fold would never succeed so I don't think we would ever commute or at least keep the version after commuting. llvm-svn: 362647	2019-06-05 21:00:31 +00:00
Craig Topper	d0fff89b81	[X86] Add the vector integer min/max instructions to isAssociativeAndCommutative. As far as I know these should be freely reassociatable just like the floating point MAXC/MINC instructions. The reduce test changes are largely regressions and caused by the "generic" CPU we default to not having a scheduler model. The machine-combiner-int-vec.ll test shows the positive benefits of this change. Differential Revision: https://reviews.llvm.org/D62787 llvm-svn: 362629	2019-06-05 18:25:09 +00:00
Sanjay Patel	2bf82879bd	[x86] split more 256-bit stores of concatenated vectors As suggested in D62498 - collectConcatOps() matches both concat_vectors and insert_subvector patterns, and we see more test improvements by using the more general match. llvm-svn: 362620	2019-06-05 16:40:57 +00:00
Simon Pilgrim	de586bd1fd	[X86][AVX] Generalize split256BitStore to splitVectorStore. NFCI. Enables us to use this to split 512-bit vectors in future patches. llvm-svn: 362617	2019-06-05 16:14:14 +00:00
Simon Pilgrim	886a55eaa0	[X86][AVX] combineX86ShuffleChain - combine shuffle(extractsubvector(x),extractsubvector(y)) We already handle the case where we combine shuffle(extractsubvector(x),extractsubvector(x)), this relaxes the requirement to permit different sources as long as they have the same value type. This causes a couple of cases where the VPERMV3 binary shuffles occur at a wider width than before, which I intend to improve in future commits - but as only the subvector's mask indices are defined, these will broadcast so we don't see any increase in constant size. llvm-svn: 362599	2019-06-05 12:56:53 +00:00
Craig Topper	78fdce25a1	[X86] Cleanup convertIntLogicToFPLogic a little. NFCI -Use early returns to reduce indentation -Replace multipe ifs with a switch. -Replace an assert with an llvm_unreachable default in the switch. -Check that the FP type we're going to use for the X86ISD::FAND/FOR/FXOR is legal rather than checking that the integer type matches the width of a legal scalar fp type. This all runs after legalization so it shouldn't really matter, but making sure we're using a valid type in the X86ISD node is really whats important. llvm-svn: 362565	2019-06-05 01:00:34 +00:00
Craig Topper	137de38009	[X86] Mutate fceil/ffloor/ftrunc/fnearbyint/frint into X86ISD::RNDSCALE during PreProcessIselDAG to cut down on pattern permutations We already need to have patterns for X86ISD::RNDSCALE to support software intrinsics. But we currently have 5 sets of patterns for the 5 rounding operations. For of these 6 patterns we have to support 3 vectors widths, 2 element sizes, sse/vex/evex encodings, load folding, and broadcast load folding. This results in a fair amount of bytes in the isel table. This patch adds code to PreProcessIselDAG to morph the fceil/ffloor/ftrunc/fnearbyint/frint to X86ISD::RNDSCALE. This way we can remove everything, but the intrinsic pattern while still allowing the operations to be considered Legal for DAGCombine and Legalization. This shrinks the DAGISel by somewhere between 9K and 10K. There is one complication to this, the STRICT versions of these nodes are currently mutated to their none strict equivalents at isel time when the node is visited. This won't be true in the future since that loses the chain ordering information. For now I've also added support for the non-STRICT nodes to Select so we can change the STRICT versions there after they've been mutated to their non-STRICT versions. We'll probably need a STRICT version of RNDSCALE or something to handle this in the future. Which will take us back to needing 2 sets of patterns for strict and non-strict, but that's still better than the 11 or 12 sets of patterns we'd need. We can probably do something similar for scalar, but I haven't looked at it yet. Differential Revision: https://reviews.llvm.org/D62757 llvm-svn: 362535	2019-06-04 18:03:07 +00:00
Benjamin Kramer	03ff1b3c30	[X86] Fold single-use variable into assert. NFC. Avoids an unused variable warning in Release builds. llvm-svn: 362534	2019-06-04 18:01:07 +00:00
Sanjay Patel	606eb2367f	[x86] split 256-bit store of concatenated vectors This shows up as a side issue to the main problem for the AVX target example from PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3 But as we can see in the pile of existing test diffs, it's actually a widespread problem that affects any AVX or later target. Apart from a couple of oddballs, I think these are all improvements for the reasons stated in the code comment: we do not want to enable YMM unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit stores anyway. We could say that MergeConsecutiveStores() is going overboard on some of these examples, but that won't solve the problem completely. But that is a reason I'm proposing this as a lowering rather than a combine: we will infinite loop fighting the merge code if we try this earlier. Differential Revision: https://reviews.llvm.org/D62498 llvm-svn: 362524	2019-06-04 16:40:04 +00:00
Sanjay Patel	1e63dd0b44	[SelectionDAG][x86] limit post-legalization store merging by type The proposal in D62498 showed that x86 would benefit from vector store splitting, but that may conflict with the generic DAG combiner's store merging transforms. Add memory type to the existing TLI hook that enables the merging transforms, so we can limit those changes to scalars only for x86. llvm-svn: 362507	2019-06-04 15:15:59 +00:00
Simon Pilgrim	a6e289e9f8	[X86][SSE] Pulled out (sub (xor X, M), M) 'ConditionalNegate' out pattern match code. NFCI. As discussed on D62777 - we should be able to use this in more SSE41+ cases as well but that requires us to separate it from the OR(AND(),ANDN()) matcher. llvm-svn: 362504	2019-06-04 15:02:33 +00:00
Craig Topper	dcf865f0ca	[X86] Fix the pattern for merge masked vcvtps2pd. r362199 fixed it for zero masking, but not zero masking. The load folding in the peephole pass hid the bug. This patch turns off the peephole pass on the relevant test to ensure coverage. llvm-svn: 362440	2019-06-03 19:29:14 +00:00
Simon Pilgrim	8a32ca381d	[CostModel][X86] Improve masked load/store AVX1/AVX2 costs A mixture of internal tests and review of the scheduler models indicates we're overestimating the cost of a masked load, which we're estimating at 4x regular memory ops - more realistic values indicates that its closer to 2x. Masked stores costs are a lot more diverse but 8x is roughly in the middle of the range. e.g. SandyBridge defm : X86WriteRes<WriteFMaskedLoad, [SBPort23,SBPort05], 8, [1,2], 3>; defm : X86WriteRes<WriteFMaskedLoadY, [SBPort23,SBPort05], 9, [1,2], 3>; defm : X86WriteRes<WriteFMaskedStore, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>; defm : X86WriteRes<WriteFMaskedStoreY, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>; e.g. Btver2 defm : X86WriteRes<WriteFMaskedLoad, [JLAGU, JFPU01, JFPX], 6, [1, 2, 2], 1>; defm : X86WriteRes<WriteFMaskedLoadY, [JLAGU, JFPU01, JFPX], 6, [2, 4, 4], 2>; defm : X86WriteRes<WriteFMaskedStore, [JSAGU, JFPU01, JFPX], 6, [1, 1, 4], 1>; defm : X86WriteRes<WriteFMaskedStoreY, [JSAGU, JFPU01, JFPX], 6, [2, 2, 4], 2>; Differential Revision: https://reviews.llvm.org/D61257 llvm-svn: 362338	2019-06-02 20:37:02 +00:00
Simon Pilgrim	59a8db628b	[TTI][X86] Cleanup getMaskedMemoryOpCost. NFCI. Prep work before resurrecting D61257. llvm-svn: 362335	2019-06-02 18:06:42 +00:00
Simon Pilgrim	71a39bcf68	[X86] isHorizontalBinOp - add extract_subvector(shuffle(x)) handling (PR39921) Let's us match horizontal op patterns on fast-variable-shuffle targets (Haswell etc.) llvm-svn: 362327	2019-06-02 15:47:49 +00:00
Simon Pilgrim	7a869e7036	[DAGCombine] Fold insert_subvector(bitcast(x),bitcast(y),c1) -> bitcast(insert_subvector(x,y),c2) Move this combine from x86 into generic DAGCombine, which currently only manages cases where the bitcast is between types of the same scalarsize. Differential Revision: https://reviews.llvm.org/D59188 llvm-svn: 362324	2019-06-02 14:42:11 +00:00
Craig Topper	396a915c26	[X86] Add the SSE versions of PMULLW and PMULLD to isAssociativeAndCommutative. llvm-svn: 362309	2019-06-02 00:42:58 +00:00
Craig Topper	c288a19bb7	[X86] Add AVX512BF16 and AVX512VP2INTERSECT instructions to the loading folding tables. llvm-svn: 362288	2019-06-01 06:20:59 +00:00
Craig Topper	48fdb61766	[X86] Make the X86FoldTablesEmitter functional again. Fix the spacing in the output to make it easier to diff. Fix a few other formatting issues in the manual table. And remove some old FIXMEs. llvm-svn: 362287	2019-06-01 06:20:55 +00:00
Craig Topper	31d00d80a2	[X86] Remove patterns for X86VSintToFP/X86VUintToFP+loadv4f32 to v2f64. These patterns can incorrectly narrow a volatile load from 128-bits to 64-bits. Similar to PR42079. Switch to using (v4i32 (bitcast (v2i64 (scalar_to_vector (loadi64))))) as the load pattern used in the instructions. This probably still has issues in 32-bit mode where loadi64 isn't legal. Maybe we should use VZMOVL for widened loads even when we don't need the upper bits as zeroes? llvm-svn: 362203	2019-05-31 07:38:26 +00:00
Craig Topper	b79cc5f802	[X86] Remove avx512 isel patterns for fpextend+load. Prefer to only match fp extloads instead. DAG combine will usually fold fpextend+load to an fp extload anyway. So the 256 and 512 patterns were probably unnecessary. The 128 bit pattern was special in that it looked for a v4f32 load, but then used it in an instruction that only loads 64-bits. This is bad if the load happens to be volatile. We could probably make the patterns volatile aware, but that's more work for something that's probably rare. The peephole pass might kick in and save us anyway. We might also be able to fix this with some additional DAG combines. This also adds patterns for vselect+extload to enabled masked vcvtps2pd to be used. Previously we looked for the unlikely vselect+fpextend+load. llvm-svn: 362199	2019-05-31 06:21:53 +00:00
Craig Topper	23066033a1	[X86] Correct the ins operand order for MASKPAIR16STORE to match other store instructions. This makes the 5 address operands come first. And the data operand comes last. This matches the operand order the instruction is created with. It's also the expected order in X86MCInstLower. So everything appeared to work, but the operands didn't match their declared type. Fixes a -verify-machineinstrs failure. Also remove the isel patterns from these instructions since they should only be used for stack spills and reloads. I'm not even sure what types the patterns were looking for to match. llvm-svn: 362193	2019-05-31 05:20:27 +00:00
Pengfei Wang	2e67d0c842	[X86] Add VP2INTERSECT instructions Support Intel AVX512 VP2INTERSECT instructions in llvm Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D62366 llvm-svn: 362188	2019-05-31 02:50:41 +00:00
Craig Topper	70dc2200a2	[X86] Remove result type constraints from the extloadv2f32/extloadv4f32/extloadv8f32 PatFrags. NFC The result types aren't mentioned in the pattern name so really shouldn't be in the PatFrags. The users of these either have their own type constraint or rely on the type constranit system to realize the only legal extend would be to f64. llvm-svn: 362175	2019-05-30 23:35:24 +00:00
Craig Topper	d6b74cc859	[X86] Remove code that unnecessarily sets EXTLOAD with src type of v2f32/v4f32/v8f32 as Legal for SSE2/AVX/AVX512 respectively. NFC The LoadExt table defaults to all combinations being Legal. For vector types, only src VTs with an i1 element type were ever changed. So we don't need to mark them legal manually. llvm-svn: 362170	2019-05-30 22:29:06 +00:00
Simon Pilgrim	32aac1727a	[X86][SSE] Improve bool vector extload (PR26091) We already have good codegen for (vXiY *ext(vXi1 bitcast(iX))) cases, this patch uses it for loads of vXi1 types as well - changing the load into a iX integer load, and bitcasting so that combineToExtendBoolVectorInReg can then use it. Differential Revision: https://reviews.llvm.org/D62449 llvm-svn: 362081	2019-05-30 10:25:20 +00:00
Pengfei Wang	1f67d94279	[X86] Add ENQCMD instructions For more details about these instructions, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference. Patch by Tianqing Wang (tianqing) Differential Revision: https://reviews.llvm.org/D62281 llvm-svn: 362053	2019-05-30 03:59:16 +00:00
Pengfei Wang	72e3f9662b	Revert "[X86] Use 'llvm_unreachable' instead of nullptr in unreachable code to" This reverts commit c1b3716614bc0a107e6f41a7d3d503baefad8a5b. llvm-svn: 361918	2019-05-29 02:49:59 +00:00
Pengfei Wang	818c652643	[X86] Use 'llvm_unreachable' instead of nullptr in unreachable code to avoid static check fail RegClassOrBank is an object of RegClassOrRegBank, which is defined as using llvm::RegClassOrRegBank = typedef PointerUnion<const TargetRegisterClass , const RegisterBank > so control flow can not get here. Use ""llvm_unreachable" here to avoid "null pointer" confusion. Patch by Shengchen Kan (skan) Differential Revision: https://reviews.llvm.org/D62006 Signed-off-by: pengfei <pengfei.wang@intel.com> llvm-svn: 361912	2019-05-29 02:20:37 +00:00
Fangrui Song	656afe370d	[X86] Fix x86-64 call foo@tlsdesc(%rax) and support R_386_TLSGOTDESC R_386_TLS_DESC_CALL D18885 emitted 5 bytes for call foo@tlsdesc(%rax). It should use the 2-byte form instead and let R_X86_64_TLSDESC_CALL apply to the beginning of the call instruction. The 2-byte form was deliberately chosen to make ->LE and ->IE relaxation work: 0: 48 8d 05 00 00 00 00 lea 0x0(%rip),%rax # 7 <.text+0x7> 3: R_X86_64_GOTPC32_TLSDESC a-0x4 7: ff 10 callq *(%rax) 7: R_X86_64_TLSDESC_CALL a => 0: 48 c7 c0 fc ff ff ff mov $0xfffffffffffffffc,%rax 7: 66 90 xchg %ax,%ax Also change the symbol type to STT_TLS when VK_TLSCALL or VK_TLSDESC is seen. Reviewed By: compnerd Differential Revision: https://reviews.llvm.org/D62512 llvm-svn: 361910	2019-05-29 02:02:59 +00:00
Adhemerval Zanella	6d7bf5e8df	[CodeGen] Add lrint/llrint builtins This patch add the ISD::LRINT and ISD::LLRINT along with new intrinsics. The changes are straightforward as for other floating-point rounding functions, with just some adjustments required to handle the return value being an interger. The idea is to optimize lrint/llrint generation for AArch64 in a subsequent patch. Current semantic is just route it to libm symbol. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D62017 llvm-svn: 361875	2019-05-28 20:47:44 +00:00
Sanjay Patel	f7980e727f	Revert "[x86] split 256-bit store of concatenated vectors" This reverts commit `d5a8637072`. Most likely suspect for this bot failure: http://lab.llvm.org:8011/builders/clang-cmake-x86_64-avx2-linux/builds/9684 llvm-svn: 361850	2019-05-28 17:37:58 +00:00
David Greene	561fcc0d63	[X86-64] Fix 256-bit SET0 lowering for non-VLX targets If we don't have VLX then 256-bit SET0 should be lowered to VPXOR with ZMM registers. This restores functionality accidentally removed by r309926. Differential Revision: https://reviews.llvm.org/D62415 llvm-svn: 361843	2019-05-28 15:37:01 +00:00
Sanjay Patel	d5a8637072	[x86] split 256-bit store of concatenated vectors This shows up as a side issue to the main problem for the AVX target example from PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3 But as we can see in the pile of existing test diffs, it's actually a widespread problem that affects any AVX or later target. Apart from a couple of oddballs, I think these are all improvements for the reasons stated in the code comment: we do not want to enable YMM unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit stores anyway. We could say that MergeConsecutiveStores() is going overboard on some of these examples, but that won't solve the problem completely. But that is the reason I'm proposing this as a lowering rather than a combine: we will infinite loop fighting the merge code if we try this earlier. Differential Revision: https://reviews.llvm.org/D62498 llvm-svn: 361822	2019-05-28 13:54:17 +00:00
Sanjay Patel	6bf4ca9d2e	[x86] fix 256-bit vector store splitting to honor 'volatile' Forking this out of the discussion in D62498 (and assuming that will be committed later, so adding the helper function here). The LangRef says: "the backend should never split or merge target-legal volatile load/store instructions." Differential Revision: https://reviews.llvm.org/D62506 llvm-svn: 361815	2019-05-28 12:58:07 +00:00
Benjamin Kramer	57e267a2e9	[X86] Custom lower CONCAT_VECTORS of v2i1 The generic legalizer cannot handle this. Add an assert instead of silently miscompiling vectors with elements smaller than 8 bits. llvm-svn: 361814	2019-05-28 12:52:57 +00:00
Simon Pilgrim	4b48aa0e30	[X86] X86CmovConverterPass::collectCmovCandidates - fix uninitialized variable warnings. NFCI. llvm-svn: 361804	2019-05-28 10:53:23 +00:00
Simon Pilgrim	a044410f37	[X86][SSE] Add shuffle combining support for ISD::ANY_EXTEND_VECTOR_INREG Reuses what we already have in place for ISD::ZERO_EXTEND_VECTOR_INREG just with a different sentinel llvm-svn: 361734	2019-05-26 16:00:35 +00:00
Simon Pilgrim	58a8541dcc	[X86][AVX] combineBitcastvxi1 - peek through bitops to determine size of original vector We were only testing for direct SETCC results - this allows us to peek through AND/OR/XOR combinations of the comparison results as well. There's a missing SEXT(PACKSS) fold that I need to investigate for v8i1 cases before I can enable it there as well. llvm-svn: 361716	2019-05-26 10:54:23 +00:00
Simon Pilgrim	40fa52b174	[X86] lowerBuildVectorToBitOp - support build_vector(shift()) -> shift(build_vector(),C) Commonly occurs in sign-extension cases llvm-svn: 361706	2019-05-25 18:02:17 +00:00
Nikita Popov	d87eceda0e	[X86] Combine fminnum/fmaxnum with non-nan operand to fmin/fmax If we have a known non-nan operand, place it in the second operand of fmin/fmax that is returned if either operand is nan. Differential Revision: https://reviews.llvm.org/D62448 llvm-svn: 361704	2019-05-25 16:44:29 +00:00
Craig Topper	46e5052b8e	[X86FixupLEAs] Turn optIncDec into a generic two address LEA optimizer. Support LEA64_32r properly. INC/DEC is really a special case of a more generic issue. We should also turn leas into add reg/reg or add reg/imm regardless of the slow lea flags. This also supports LEA64_32 which has 64 bit input registers and 32 bit output registers. So we need to convert the 64 bit inputs to their 32 bit equivalents to check if they are equal to base reg. One thing to note, the original code preserved the kill flags by adding operands to the new instruction instead of using addReg. But I think tied operands aren't supposed to have the kill flag set. I dropped the kill flags, but I could probably try to preserve it in the add reg/reg case if we think its important. Not sure which operand its supposed to go on for the LEA64_32r instruction due to the super reg implicit uses. Though I'm also not sure those are needed since they were probably just created by an INSERT_SUBREG from a 32-bit input. Differential Revision: https://reviews.llvm.org/D61472 llvm-svn: 361691	2019-05-25 06:17:47 +00:00
Craig Topper	4b08fcdeb1	[X86] Add zero idioms to the haswell, broadwell, and skylake schedule models. Add 256-bit fp xor to sandybridge zero idioms This copies the Sandy Bridge zero idiom support to later CPUs. Adding the AVX2 and AVX512F/VL instructions as appropriate. Differential Revision: https://reviews.llvm.org/D62360 llvm-svn: 361690	2019-05-25 04:47:49 +00:00
Simon Pilgrim	95b8d9bbf8	[SelectionDAG] computeKnownBits - support constant pool values from target This patch adds the overridable TargetLowering::getTargetConstantFromLoad function which allows targets to return any constant value loaded by a LoadSDNode node - only X86 makes use of this so far but everything should be in place for other targets. computeKnownBits then uses this function to improve codegen, notably vector code after legalization. A future commit will do the same for ComputeNumSignBits but computeKnownBits sees the bigger benefit. This required a couple of fixes: * SimplifyDemandedBits must early-out for getTargetConstantFromLoad cases to prevent infinite loops of constant regeneration (similar to what we already do for BUILD_VECTOR). * Fix a DAGCombiner::visitTRUNCATE issue as we had trunc(shl(v8i32),v8i16) <-> shl(trunc(v8i16),v8i32) infinite loops after legalization on AVX512 targets. Differential Revision: https://reviews.llvm.org/D61887 llvm-svn: 361620	2019-05-24 10:03:11 +00:00
Robert Lougher	170dfeb2ff	Resubmit r360436 "[X86] Avoid SFB - Fix inconsistent codegen with/without debug info" Fixes https://bugs.llvm.org/show_bug.cgi?id=40969 The functions findPotentiallyBlockedCopies and buildCopy are currently not accounting for the presence of debug instructions. In the former this results in the optimization not being trigerred, and in the latter results in inconsistent codegen. This patch enables the optimization to be performed in a debug build and ensures the codegen is consistent with non-debug builds. Patch by Chris Dawson. Differential Revision: https://reviews.llvm.org/D61680 llvm-svn: 361527	2019-05-23 18:15:12 +00:00
Fangrui Song	86c9ca48c3	[X86] Support -fno-plt __tls_get_addr calls In general dynamic/local dynamic TLS models, with -fno-plt, * x86: emit `calll ___tls_get_addr@GOT(%ebx)` instead of `calll ___tls_get_addr@PLT` Note, on x86, if we can get rid of %ebx as the PIC register, it may be better to use a register not preserved across function calls. x86_64: emit `callq *__tls_get_addr@GOTPCREL(%rip)` instead of `callq __tls_get_addr@PLT` Reorganize the code by separating 32-bit and 64-bit. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D62106 llvm-svn: 361453	2019-05-23 01:05:13 +00:00

1 2 3 4 5 ...

19003 Commits