llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	d7be2bff16	[X86] combineShiftRightArithmetic - break if-else chain as they all return (style). NFC.	2022-02-07 09:54:34 +00:00
Kazu Hirata	3a3cb929ab	[llvm] Use = default (NFC)	2022-02-06 22:18:35 -08:00
Simon Pilgrim	74b98ab1db	[X86] Fold ZERO_EXTEND_VECTOR_INREG(BUILD_VECTOR(X,Y,?,?)) -> BUILD_VECTOR(X,0,Y,0) Helps avoid some unnecessary shift by splat amount extensions before shuffle combining gets limited by with one use checks	2022-02-06 12:53:11 +00:00
Phoebe Wang	0b7669f333	[X86] Introduce more common modern tunings into `generic` GCC has updated its generic `-mtune` to haswell. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616 Update it to match with GCC. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D118534	2022-02-05 10:31:30 +08:00
Sanjay Patel	fff3e1dbaa	[x86] enable fast sqrtss/sqrtps tuning for AMD Zen cores As discussed in D118534, all of the recent AMD CPUs have relatively fast (<14 cycle latency) "sqrtss" and "sqrtps" instructions: https://uops.info/table.html?search=sqrtps&cb_lat=on&cb_tp=on&cb_SNB=on&cb_SKL=on&cb_ZENp=on&cb_ZEN2=on&cb_ZEN3=on&cb_measurements=on&cb_avx=on&cb_sse=on So we should set this tuning flag to alter codegen of plain "sqrt(X)" expansion (as opposed to reciprocal-sqrt - there is other test coverage for that pattern). The expansion is both slower and less accurate than the hardware instruction. Differential Revision: https://reviews.llvm.org/D119001	2022-02-04 13:59:20 -05:00
Sanjay Patel	7b03725097	Revert "[x86] try harder to scalarize a vector load with extracted integer op uses" This reverts commit `b4b97ec813`. As discussed in post-commit feedback at: https://reviews.llvm.org/D118376 ...there's a stage 2 failure on a Mac running a clang-refactor tool test.	2022-02-04 07:45:57 -05:00
Simon Pilgrim	ea7a3e6a6a	[X86] simplifyX86varShift - use KnownBits.getMaxValue().ult() to check for out of bounds shift amounts This is easier to grok than MaskedValueIsZero for high bits.	2022-02-03 16:02:45 +00:00
Fangrui Song	de88c1aba2	[asan][X86] Change some std::string variables to StringRef. NFC	2022-02-02 16:34:35 -08:00
Sanjay Patel	f523e83b20	[x86] make helper function to create sbb with zero operands; NFC As noted in D116804, we want to effectively invert that patch for CPUs (intel) that don't break the false dependency on sbb %eax, %eax So we will likely want to create that here in the X86DAGToDAGISel::Select() case for X86::SETCC_CARRY.	2022-02-02 16:56:10 -05:00
Sanjay Patel	6592bcecd4	[x86] invert a vector select IR canonicalization with a binop identity constant This is an intentionally limited/different form of D90113. That patch bravely tries to generalize folds where we pull a binop into the arms of a select: N0 + (Cond ? 0 : FVal) --> Cond ? N0 : (N0 + FVal) ...but it is not universally profitable. This is the inverse of IR canonicalization as discussed in D113442. We know that this transform is not entirely profitable even within x86, so we only handle x86 vector fadd/fsub as a 1st step. The intent is to prevent AVX512 regressions as mentioned in D113442. The plan is to port this to DAGCombiner (so it will eventually look more like D90113) and add more types/cases in pieces with many more tests to verify that we are seeing improvements. Differential Revision: https://reviews.llvm.org/D118644	2022-02-02 08:17:53 -05:00
Simon Pilgrim	5aa2acc86b	[DAG] SimplifyDemandedVectorElts - remove KnownZero/KnownUndef from DCI helper wrapper None of the external users actual touch these (they're purely used internally down the recursive call) - its trivial to add another wrapper if anything ever does want to track known elements.	2022-02-02 12:04:49 +00:00
serge-sans-paille	e188aae406	Cleanup header dependencies in LLVMCore Based on the output of include-what-you-use. This is a big chunk of changes. It is very likely to break downstream code unless they took a lot of care in avoiding hidden ehader dependencies, something the LLVM codebase doesn't do that well :-/ I've tried to summarize the biggest change below: - llvm/include/llvm-c/Core.h: no longer includes llvm-c/ErrorHandling.h - llvm/IR/DIBuilder.h no longer includes llvm/IR/DebugInfo.h - llvm/IR/IRBuilder.h no longer includes llvm/IR/IntrinsicInst.h - llvm/IR/LLVMRemarkStreamer.h no longer includes llvm/Support/ToolOutputFile.h - llvm/IR/LegacyPassManager.h no longer include llvm/Pass.h - llvm/IR/Type.h no longer includes llvm/ADT/SmallPtrSet.h - llvm/IR/PassManager.h no longer includes llvm/Pass.h nor llvm/Support/Debug.h And the usual count of preprocessed lines: $ clang++ -E -Iinclude -I../llvm/include ../llvm/lib/IR/*.cpp -std=c++14 -fno-rtti -fno-exceptions \| wc -l before: 6400831 after: 6189948 200k lines less to process is no that bad ;-) Discourse thread on the topic: https://llvm.discourse.group/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D118652	2022-02-02 06:54:20 +01:00
Simon Pilgrim	7ec8fc2932	[X86] combineAnd() - per-element simplification - call SimplifyDemandedBits using mask demanded bits if SimplifyDemandedVectorElts fails We already call SimplifyDemandedVectorElts using whether each vector mask element is zero/nonzero, this just extends this to also try SimplifyDemandedBits using the demanded bits mask generated from the nonzero elements. This also requires an additional TargetLowering::SimplifyDemandedBits DemandedBits/DemandedElts wrapper.	2022-01-31 13:58:00 +00:00
Simon Pilgrim	156f83adc2	[X86] combineVectorTruncation - use PACKUSDW(BLENDW(X,0),BLENDW(Y,0)) for v8i32->v8i16 truncation Limit this to SSE41 - AVX1 targets to avoid UNPCKL(PSHUFB,PSHUFB), pre-SSE41 we don't have PACKUSDW/BLENDW and with AVX2 we can perform this as PERMQ(PSHUFB()).	2022-01-30 20:07:04 +00:00
Simon Pilgrim	b7e04ccd99	[X86][AVX] matchUnaryShuffle - avoid creation of on-the-fly nodes (PR45974) Don't extract the ANY/ZERO_EXTEND_VECTOR_INREG subvector source until we're definitely combining to a new node.	2022-01-30 17:59:14 +00:00
Simon Pilgrim	2cdbaca394	[X86] Attempt to fold MOVMSK(CMPEQ(AND(X,C1),0)) -> MOVMSK(NOT(SHL(X,C2))) Allows pow2 mask tests to avoid an unnecessary constant load. Noticed while investigating how to extend MatchVectorAllZeroTest to support more allof/anyof patterns.	2022-01-30 15:53:21 +00:00
Simon Pilgrim	ee9eeed773	[X86] LowerFunnelShift - enable v8i16 lowering	2022-01-29 16:20:36 +00:00
Simon Pilgrim	6777289dd9	[X86] lowerShuffleAsBlend - pull out repeated getVectorNumElements() calls. NFC.	2022-01-29 16:16:29 +00:00
Simon Pilgrim	f1305f2369	[X86] combinePredicateReduction - always use PMOVMSKB(PCMPEQB()) for allof(icmp_eq()) reductions This greatly simplifies the codegen for recognising PTEST patterns and matches the codegen from the very similar LowerVectorAllZero	2022-01-29 15:16:59 +00:00
Simon Pilgrim	67a399fd57	[X86] SimplifyDemandedBits - add X86ISD::BLENDV SimplifyMultipleUseDemandedBits handling Lets us see through multiple use operands	2022-01-29 14:26:41 +00:00
Simon Pilgrim	7e849fd97b	[X86] LowerFunnelShift - allow non-constant vXi8 unpack(y,x) << zext(z) lowering pre-AVX512 Without AVX512 (which can efficiently extend/truncate to vXi16/vXi32), unpacking/packing to vXi16 is more efficient that relying on the (uops-heavy) PBLENDV shift expansion	2022-01-29 13:58:30 +00:00
Luo, Yuanke	be44177ede	[X86][avx512fp16] Promote fp16 to fp32 for frem. Promote fp16 to fp32 for frem. Differential Revision: https://reviews.llvm.org/D118470	2022-01-29 11:41:27 +08:00
Sanjay Patel	b4b97ec813	[x86] try harder to scalarize a vector load with extracted integer op uses extract_vec_elt (load X), C --> scalar load (X+C) As noted in the comment, DAGCombiner has this fold -- and the code in this patch is adapted from DAGCombiner::scalarizeExtractedVectorLoad() -- but x86 should benefit even if the loaded vector has other uses as long as we apply some other x86-specific conditions. The motivating example from #50310 is shown in vec_int_to_fp.ll. Fixes #50310 Differential Revision: https://reviews.llvm.org/D118376	2022-01-28 10:22:52 -05:00
Simon Pilgrim	c7bb3665a1	[X86] SimplifyDemandedBitsForTargetNode - fold MOVMSK(YMM) -> MOVMSK(XMM) If we don't demand the upper elements of the 256-bit vector, then just perform as a 128-bit vector	2022-01-28 14:42:53 +00:00
Simon Pilgrim	2a13beaa70	[X86] combineSetCCMOVMSK - don't fold MOVMSK(BITCAST(PCMPEQ(X,0))) -> PTESTZ(X,X) if we're not testing every element comparison	2022-01-28 13:22:37 +00:00
Simon Pilgrim	cce6490eca	[X86] combineSetCCMOVMSK - match all_of patterns with X86ISD::CMP as well as X86ISD::SUB Previous folds by combineSetCCMOVMSK might have converted these to CMP when changing the bitwidth, and the CMP->SUB fold might not have happened (or will happen)	2022-01-28 11:43:10 +00:00
Simon Pilgrim	93c9b39d25	[X86] Fix MOVMSK(CONCAT(X,Y)) -> MOVMSK(AND/OR(X,Y)) fold for float types and demanded elements rG9103b73fe052 was assuming that we could OR/AND with the source vector, but that will fail on float/double vectors without bitcasting - it also missed the case that any_of checks might be testing less than all the source elements	2022-01-28 11:01:47 +00:00
Simon Pilgrim	9103b73fe0	[X86] Fold MOVMSK(CONCAT(X,Y)) -> MOVMSK(AND/OR(X,Y)) for all_of/any_of patterns Makes it easier for later folds and avoids unnecessary 256-bit ops (especially on AVX1-only targets where we miss a lot of integer instructions)	2022-01-27 18:28:09 +00:00
Simon Pilgrim	ccda0f2226	[X86][SSE] Add combineBitOpWithShift for BITOP(SHIFT(X,Z),SHIFT(Y,Z)) -> SHIFT(BITOP(X,Y),Z) vector folds InstCombine performs this more generally with SimplifyUsingDistributiveLaws, but we don't need anything that complex here - this is mainly to fix up cases where logic ops get created late on during lowering, often in conjunction with sext/zext ops for type legalization. https://alive2.llvm.org/ce/z/gGpY5v	2022-01-27 14:54:41 +00:00
Simon Pilgrim	389ae775e4	[X86] Fold TESTZ(OR(LO(X),HI(X)),OR(LO(Y),HI(Y))) -> TESTZ(X,Y) Helps fix a number of poor codegen cases for allof(cmp()) with 256-bit vectors on AVX1	2022-01-27 13:20:36 +00:00
Benjamin Kramer	f15014ff54	Revert "Rename llvm::array_lengthof into llvm::size to match std::size from C++17" This reverts commit `ef82063207`. - It conflicts with the existing llvm::size in STLExtras, which will now never be called. - Calling it without llvm:: breaks C++17 compat	2022-01-26 16:55:53 +01:00
serge-sans-paille	ef82063207	Rename llvm::array_lengthof into llvm::size to match std::size from C++17 As a conquence move llvm::array_lengthof from STLExtras.h to STLForwardCompat.h (which is included by STLExtras.h so no build breakage expected).	2022-01-26 16:17:45 +01:00
Simon Pilgrim	99ae5c13f6	[X86] Add 'getSplitVectorSrc' helper to determine if subvectors all come from the same source Helps determine if the subvector ops come from the same larger vector and match the lower/upper extractions	2022-01-26 15:17:21 +00:00
Simon Pilgrim	157f9b68a3	[X86] combineVectorSignBitsTruncation - fix indentation. NFC.	2022-01-25 11:54:22 +00:00
Martin Storsjö	70cb8daed4	[X86] [CodeView] Add codeview mappings for registers ST0-ST7 These can end up needed after https://reviews.llvm.org/D116821. Suggested by Alexandre Ganea. Differential Revision: https://reviews.llvm.org/D118072	2022-01-25 10:09:06 +02:00
Simon Pilgrim	902184e6cc	[X86] combinePredicateReduction - generalize allof(cmpeq(x,0)) handling to allof(cmpeq(x,y)) There's no further reasons to limit this to cmpeq-with-zero, the outstanding regressions with lowering to PTEST have now been addressed Improves codegen for Issue #53379	2022-01-25 00:24:06 +00:00
Simon Pilgrim	11bb4a1111	[X86] combinePredicateReduction - split vXi16 allof(cmpeq()) to vXi8 allof(cmpeq()) vXi16 patterns allof(cmp()) reduction patterns will have to be pack the comparison results to vXi8 to use PMOVMSKB. If we're reducing cmpeq(), then we can compare the vXi8 halves directly - similar to what we already do for vXi64 -> vXi32 for cases without PCMPEQQ.	2022-01-24 22:43:29 +00:00
Simon Pilgrim	8d298355ca	[X86] combineSetCCMOVMSK - detect and(pcmpeq(),pcmpeq()) ptest pattern. Handle cases where we've split an allof(cmpeq()) pattern to a legal vector type	2022-01-24 21:42:03 +00:00
Simon Pilgrim	6997f4d07f	[X86] combineSetCCMOVMSK - fold allof(cmpeq(x,y)) -> ptest(sub(x,y)) (PR53379) As suggested on PR53379, for all-of icmp-eq patterns, we can use ptest(sub(x,y)) on SSE41+ targets This is a generalization of the existing allof(cmpeq(x,0)) -> ptest(x) pattern We can probably extend this further, in particularly to handle 256-bit cases on pre-AVX2 targets, but this part of the generalization is pretty trivial Fixes Issue #53379	2022-01-24 16:44:37 +00:00
Simon Pilgrim	577a6dc9a1	[X86] getVectorMaskingNode - fix indentation. NFC. clang-format	2022-01-24 11:08:41 +00:00
Kazu Hirata	bf039a8620	[Target] Use range-based for loops (NFC)	2022-01-23 22:53:15 -08:00
Simon Pilgrim	4762c077e7	[X86] LowerFunnelShift - always lower vXi8 fshl by constant amounts as unpack(y,x) << zext(z) This can always be lowered as PMULLW+PSRLWI+PACKUSWB	2022-01-23 21:35:05 +00:00
Simon Pilgrim	32dc14f876	[X86] LowerFunnelShift - use supportedVectorShiftWithBaseAmnt to check for supported scalar shifts Allows us to reuse the ISD shift opcode instead of a mixture of ISD/X86ISD variants	2022-01-23 21:13:58 +00:00
Phoebe Wang	37d1d02200	[X86][MS] Change the alignment of f80 to 16 bytes on Windows 32bits to match with ICC MSVC currently doesn't support 80 bits long double. ICC supports it when the option `/Qlong-double` is specified. Changing the alignment of f80 to 16 bytes so that we can be compatible with ICC's option. Reviewed By: rnk, craig.topper Differential Revision: https://reviews.llvm.org/D115942	2022-01-23 09:58:46 +08:00
David Green	b27e5459d5	[DAG] Convert truncstore(extend(x)) back to store(x) Pulled out of D106237, this folds truncstore(extend(x)) back to store(x) if the original store was legal. This can come up due to the order we fold nodes. A fold from X86 needs to be adjusted to prevent infinite loops, to have it pick the operand of a trunc more directly. Differential Revision: https://reviews.llvm.org/D117901	2022-01-22 13:20:36 +00:00
Joao Moreira	82af95029e	[X86] Enable ibt-seal optimization when LTO is used in Kernel Intel's CET/IBT requires every indirect branch target to be an ENDBR instruction. Because of that, the compiler needs to correctly emit these instruction on function's prologues. Because this is a security feature, it is desirable that only actual indirect-branch-targeted functions are emitted with ENDBRs. While it is possible to identify address-taken functions through LTO, minimizing these ENDBR instructions remains a hard task for user-space binaries because exported functions may end being reachable through PLT entries, that will use an indirect branch for such. Because this cannot be determined during compilation-time, the compiler currently emits ENDBRs to every non-local-linkage function. Despite the challenge presented for user-space, the kernel landscape is different as no PLTs are used. With the intent of providing the most fit ENDBR emission for the kernel, kernel developers proposed an optimization named "ibt-seal" which replaces the ENDBRs for NOPs directly in the binary. The discussion of this feature can be seen in [1]. This diff brings the enablement of the flag -mibt-seal, which in combination with LTO enforces a different policy for ENDBR placement in when the code-model is set to "kernel". In this scenario, the compiler will only emit ENDBRs to address taken functions, ignoring non-address taken functions that are don't have local linkage. A comparison between an LTO-compiled kernel binaries without and with the -mibt-seal feature enabled shows that when -mibt-seal was used, the number of ENDBRs in the vmlinux.o binary patched by objtool decreased from 44383 to 33192, and that the number of superfluous ENDBR instructions nopped-out decreased from 11730 to 540. The 540 missed superfluous ENDBRs need to be investigated further, but hypotheses are: assembly code not being taken care of by the compiler, kernel exported symbols mechanisms creating bogus address taken situations or even these being removed due to other binary optimizations like kernel's static_calls. For now, I assume that the large drop in the number of ENDBR instructions already justifies the feature being merged. [1] - https://lkml.org/lkml/2021/11/22/591 Reviewed By: xiangzhangllvm Differential Revision: https://reviews.llvm.org/D116070	2022-01-21 10:55:34 +08:00
Simon Pilgrim	866311e71c	[X86] lowerToAddSubOrFMAddSub - lower 512-bit ADDSUB patterns to blend(fsub,fadd) AVX512 doesn't provide a ADDSUB instruction, but if we've built this from a build vector of scalar fsub/fadd elements we can still lower to blend(fsub,fadd)	2022-01-20 15:16:05 +00:00
Simon Pilgrim	304cfc706a	[X86] combineConcatVectorOps - remove superfluous Subtarget.hasAVX() check This function only ever gets called by AVX targets, and we already assert for this at the top of the function	2022-01-20 12:56:09 +00:00
Simon Pilgrim	c4f5fd76da	[X86] combineConcatVectorOps - add handling for X86ISD::VSHL/VSRL/VSRA These can be handled the same as the vector shift by immediate variants that are already handled.	2022-01-20 12:56:08 +00:00
Luo, Yuanke	5dea7a865e	Combine to vpdpbusd when operand is constant and small enough. Differential Revision: https://reviews.llvm.org/D116363	2022-01-20 11:10:49 +08:00

1 2 3 4 5 ...

22308 Commits