llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjay Patel	f368040c14	[DAGCombiner] try to move splat after binop with splat constant binop (splat X), (splat C) --> splat (binop X, C) binop (splat C), (splat X) --> splat (binop C, X) We do this in IR, and there's a similar fold for the case with 2 non-constant operands just above the code diff in this patch. This was discussed in D79718, and the extra shuffle in the test (llvm/test/CodeGen/X86/vector-fshl-128.ll::sink_splatvar) where it was noticed disappears because demanded elements analysis is no longer blocked. The large majority of the test diffs seem to be benign code scheduling changes, but I do see another type of win: moving the splat later allows binop narrowing in some cases. Regressions were avoided on x86 and ARM with the INSERT_VECTOR_ELT restriction. Differential Revision: https://reviews.llvm.org/D79886	2020-05-26 08:12:46 -04:00
Yi Kong	c1c9eb0ab7	[Transforms] Check validity of profile reader before invoking it Although an invalid sampling profile would fail the compilation anyway, this avoids crashing the compiler.	2020-05-26 20:11:24 +08:00
Sam Parker	bd9dce8f9a	[CostModel] getUserCost for intrinsic throughput Last part of recommitting 'Unify Intrinsic Costs' `259eb619ff`. This patch now uses getUserCost from getInstructionThroughput. Differential Revision: https://reviews.llvm.org/D80012	2020-05-26 12:23:37 +01:00
Sam Parker	8aaabadece	[CostModel] Unify getCastInstrCost Add the remaining cast instruction opcodes to the base implementation of getUserCost and directly return the result. This allows getInstructionThroughput to return getUserCost for the casts. This has required changes to PPC and SystemZ because they implement getUserCost and/or getCastInstrCost with adjustments for vector operations. Adjusts have also been made in the remaining backends that implement the method so that they still produce a cost of zero or one for cost kinds other than throughput. Differential Revision: https://reviews.llvm.org/D79848	2020-05-26 11:29:57 +01:00
hsmahesha	09f7dcb64e	[AMDGPU/MemOpsCluster] Code clean-up around mem ops clustering logic Summary: Clean-up code around mem ops clustering logic. This patch cleans up code within the function clusterNeighboringMemOps(). It is WIP, and this patch is a first cut. Reviewers: foad, rampitec, arsenm, vpykhtin, javedabsar Reviewed By: foad Subscribers: MatzeB, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, javed.absar, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80119	2020-05-26 15:49:21 +05:30
Simon Pilgrim	6f802ec433	[X86] Fix fshr comment copy+paste typo. NFC. Noticed by @foad on D80466.	2020-05-26 10:55:57 +01:00
Georgii Rymar	2e365ca2f7	[DebugInfo/llvm-objdump] - Print "ZERO terminator" for terminator entries when dumping .eh_frame. A CIE with the Length == 0 is a terminator: https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/ehframechpt.html And GNU objdump recognizes them and prints the following for such entries: "00000000 ZERO terminator" This patch teaches llvm-objdump to do the same. I had to update tests to use "CHECK-NEXT" too. (Note: it looks perhaps not right that printing is done inside the DebugInfo library, I'd expect to see the change in the llvm-objdump's code somewhere instead, but that is how it done atm). Differential revision: https://reviews.llvm.org/D80476	2020-05-26 12:52:42 +03:00
Xing GUO	2c04b8aacd	[ObjectYAML][DWARF] Make variable names consistent.	2020-05-26 17:45:10 +08:00
Xing GUO	590f3a72c2	[ObjectYAML][DWARF] Use .empty() to indicate if the DWARF sections are empty.	2020-05-26 17:45:10 +08:00
Georgii Rymar	3d4c873a14	[yaml2obj] - Map section names to chunks for each ELFYAML::ProgramHeader early. NFCI. Each `ELFYAML::ProgramHeader` currently contains a list of section names included. We are trying to map them to Fill/Sections very late, though we can create such mapping early, in `initProgramHeaders`. The benefit is that with such change it is possible to access mapped chunks earlier (for example during writing section content) and have simpler code. Differential revision: https://reviews.llvm.org/D80520	2020-05-26 12:32:10 +03:00
vpykhtin	92f3828dc5	[AMDGPU] Fix wait counts in the presence of 16bit subregisters Differential Revision: https://reviews.llvm.org/D80033	2020-05-26 12:19:27 +03:00
Georgii Rymar	2569787e44	[DebugInfo] - Fix multiple issues in DWARFDebugFrame::parse(). I've noticed an issue with "Data.getRelocatedValue(...)" call. it might silently ignore an error when a content is truncated. That leads to an infinite loop in the code (e.g. llvm-readobj hangs). After fixing the issue I've found that actually we always tried to read past the end of a section, even when a content was valid. It happened because the terminator CIE (a CIE with the length == 0) was never handled. At first I've tried just to stop adding the terminator entry (and return), but it does not seem to be correct, because tools like llvm-objdump might want to print something for such entries (see comments in the code and test cases). This patch fixes issues mentioned, provides new test cases for both llvm-readobj and lib/DebugInfo and adds FIXMEs to existent test cases related. Differential revision: https://reviews.llvm.org/D80299	2020-05-26 12:13:13 +03:00
Sam Parker	871556a494	[CostModel] Unify Intrinsic Costs. Recommitting most of the remaining changes from `259eb619ff`, but excluding the call to getUserCost from getInstructionThroughput. Though there's still no test changes, I doubt that this is an NFC... With the two getIntrinsicInstrCosts folded into one, now fold in the scalar/code-size orientated getIntrinsicCost. The remaining scalar intrinsics were memcpy, cttz and ctlz which now have special handling in the BasicTTI implementation. This had required a change in the AMDGPU backend for fabs as it should always be 'free'. I've also changed the X86 backend to return the BaseT implementation when the CostKind isn't RecipThroughput. Differential Revision: https://reviews.llvm.org/D80012	2020-05-26 09:48:26 +01:00
Craig Topper	80cc43b420	[AArch64] Set i32 ISD::MULHU/S to Expand instead of Legal. Looks like there are no isel patterns for these. A DAG combine turns it into i64 multiply and a shift which hides this. Extracted from D80485	2020-05-26 00:41:09 -07:00
Fangrui Song	872c5fb143	[AsmPrinter] Don't generate .Lfoo$local for -fno-PIC and -fPIE -fno-PIC and -fPIE code generally cannot be linked in -shared mode and there is no benefit accessing via local aliases. Actually, a .Lfoo$local reference will be converted to a STT_SECTION (if no section relaxation) reference which will cause the section symbol (sizeof(Elf64_Sym)=24) to be generated.	2020-05-25 23:35:49 -07:00
Fangrui Song	9d55e4ee13	Make explicit -fno-semantic-interposition (in -fpic mode) infer dso_local -fno-semantic-interposition is currently the CC1 default. (The opposite disables some interprocedural optimizations.) However, it does not infer dso_local: on most targets accesses to ExternalLinkage functions/variables defined in the current module still need PLT/GOT. This patch makes explicit -fno-semantic-interposition infer dso_local, so that PLT/GOT can be eliminated if targets implement local aliases for AsmPrinter::getSymbolPreferLocal (currently only x86). Currently we check whether the module flag "SemanticInterposition" is 0. If yes, infer dso_local. In the future, we can infer dso_local unless "SemanticInterposition" is 1: frontends other than clang will also benefit from the optimization if they don't bother setting the flag. (There will be risks if they do want ELF interposition: they need to set "SemanticInterposition" to 1.)	2020-05-25 20:48:18 -07:00
Nemanja Ivanovic	793cc518b9	[PowerPC] Prevent legalization loop from promoting SELECT_CC from v4i32 to v4i32 As reported in https://bugs.llvm.org/show_bug.cgi?id=45709 we can hit an infinite loop in legalization since we set the legalization action for ISD::SELECT_CC for all fixed length vector types to Promote. Without some different legalization action for the type being promoted to, the legalizer simply loops. Since we don't have patterns to match the node, the right legalization action should be Expand. Differential revision: https://reviews.llvm.org/D79854	2020-05-25 20:09:07 -05:00
Kazu Hirata	cec20db588	[Inlining] Set inline-deferral-scale to 2. Summary: This patch sets inline-deferral-scale to 2. Both internal and SPEC benchmarking show that 2 is the best number among -1, 2, 3, and 4. inline-deferral-scale SPECint2006 ------------------------------------------------------------ -1 38.0 (the default without this patch) 2 38.5 3 38.1 4 38.1 With the new default number, shouldBeDeferred returns true if: TotalCost < IC.getCost() * 2 where TotalCost is TotalSecondaryCost + IC.getCost() * NumCallerUsers. If TotalCost >= 0 and NumCallerUsers >= 2, then TotalCost >= IC.getCost() * 2, so shouldBeDeferred returns true only when NumCallerUsers is 1. Now, if TotalSecondaryCost < 0, which can happen if InlineConstants::LastCallToStaticBonus, a huge number, has been subtracted from TotalSecondaryCost, then TotalCost may be negative. In this case, shouldBeDeferred may return true even when NumCallerUsers >= 2. Reviewers: davidxl, nikic Reviewed By: davidxl Subscribers: xbolva00, hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80229	2020-05-25 15:44:20 -07:00
Florian Hahn	179c80117c	[LoopUnroll] Remove dead NextBlocks argument (NFC).	2020-05-25 22:09:11 +01:00
Marek Kurdej	bc93c2d72e	[Transforms] Fix typos. NFC	2020-05-25 22:34:08 +02:00
Craig Topper	51a276c759	[X86] Teach combineTruncatedArithmetic to push truncate through subtracts where only one of the inputs is free to truncate. Fix combineSubToSubus to handle the new DAG to avoid a regression. There are still regressions in test14/test15/test16. Where it looks like were trying to set up cases we could match to umin+trunc+subus but the handling was never finished. The regression here isn't unique to sub. Its a lost opportunity for taking an AND with two truncated inputs and producing a larger AND with a single truncate. The same thing could happen with any other node we handle in combineTruncatedArithmetic since we are moving the truncate up the DAG. Differential Revision: https://reviews.llvm.org/D80483	2020-05-25 11:42:42 -07:00
Dmitry Preobrazhensky	77aec3b4c0	[AMDGPU][MC][GFX8+] Enabled clamp for v_add_u16, v_sub_u16 and v_subrev_u16 See https://bugs.llvm.org/show_bug.cgi?id=45926 Reviewers: arsenm, rampitec, vpykhtin Differential Revision: https://reviews.llvm.org/D80430	2020-05-25 19:55:38 +03:00
serge-sans-paille	356bf5ea5d	Stack clash: update live-ins This fixes http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/7150	2020-05-25 15:57:58 +02:00
Whitney Tsang	5d6c5b463c	[LoopUtils] Use llvm::find Summary: Fixes this build error: llvm/lib/Transforms/Utils/LoopUtils.cpp:679:26: error: no matching function for call to 'find' Loop::iterator I = find(ParentLoop->begin(), ParentLoop->end(), L); ^~~~ Authored By: orivej Reviewer: Whitney Reviewed By: Whitney Subscribers: hiraditya, llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D80473	2020-05-25 13:34:56 +00:00
Benjamin Kramer	82bee922af	Make FEATURE_AVX512VP2INTERSECT match between compiler-rt and LLVM compiler-rt also doesn't support bits >= 64 as far as I know.	2020-05-25 15:18:04 +02:00
serge-sans-paille	8eae32188b	Improve stack-clash implementation on x86 - test both 32 and 64 bit version - probe the tail in dynamic-alloca - generate more concise code Differential Revision: https://reviews.llvm.org/D79482	2020-05-25 14:48:14 +02:00
Simon Pilgrim	8b4ecafee6	InstructionSimplify.h - remove unnecessary includes. NFC. Remove unused User.h include. Replace SetVector.h with forward declaration. Sort the forward declarations + remove FastMathFlags (defined in Operator.h). Fix implicit SetVector.h dependency in LowerConstantIntrinsics.cpp.	2020-05-25 13:45:03 +01:00
Sanjay Patel	fa038e0350	[x86] favor vector constant load to avoid GPR to XMM transfer, part 2 This replaces the build_vector lowering code that was just added in D80013 and matches the pattern later from the x86-specific "vzext_movl". That seems to result in the same or better improvements and gets rid of the 'TODO' items from that patch. AFAICT, we always shrink wider constant vectors to 128-bit on these patterns, so we still get the implicit zero-extension to ymm/zmm without wasting space on larger vector constants. There's a trade-off there because that means we miss potential load-folding. Similarly, we could load scalar constants here with implicit zero-extension even to 128-bit. That saves constant space, but it means we forego load-folding, and so it increases register pressure. This seems like a good middle-ground between those 2 options. Differential Revision: https://reviews.llvm.org/D80131	2020-05-25 08:01:48 -04:00
Simon Pilgrim	8f48814879	FunctionLoweringInfo.h - move APInt.h dependency to FunctionLoweringInfo.cpp. NFC.	2020-05-25 12:58:35 +01:00
Stefan Pintilie	5a4bcec8db	[PowerPC][NFC] Split PPCELFStreamer::emitInstruction Split off PPCELFStreamer::emitPrefixedInstruction from PPCELFStreamer::emitInstruction. Differential Revision: https://reviews.llvm.org/D79626	2020-05-25 06:48:58 -05:00
Simon Pilgrim	9fa58d1bf2	[DAG] Add SimplifyDemandedVectorElts binop SimplifyMultipleUseDemandedBits handling For the supported binops (basic arithmetic, logicals + shifts), if we fail to simplify the demanded vector elts, then call SimplifyMultipleUseDemandedBits and try to peek through ops to remove unnecessary dependencies. This helps with PR40502. Differential Revision: https://reviews.llvm.org/D79003	2020-05-25 12:41:22 +01:00
Simon Pilgrim	0e83e67cd3	SystemZInstrBuilder.h - remove unnecessary PseudoSourceValue.h include. NFC.	2020-05-25 12:41:22 +01:00
Dmitry Preobrazhensky	b087b91c91	[AMDGPU][CODEGEN] Added 'A' constraint for inline assembler Summary: 'A' constraint requires an immediate int or fp constant that can be inlined in an instruction encoding. Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D78494	2020-05-25 14:23:34 +03:00
Ayal Zaks	840450549c	[LV] Clamp MaxVF to power of 2. If a loop has a constant trip count known to be a multiple of MaxVF (times user UF), LV infers that no tail will be generated for any chosen VF. This relies on the chosen VF's being powers of 2 bound by MaxVF, and assumes MaxVF is a power of 2. Make sure the latter holds, in particular when MaxVF is set by a memory dependence distance which may not be a power of 2. Differential Revision: https://reviews.llvm.org/D80491	2020-05-25 11:24:33 +03:00
Fangrui Song	1b79509f97	[MCDwarf] Delete unneeded DW_AT_unspecified_parameters	2020-05-24 22:36:57 -07:00
Fangrui Song	20e9fc55fe	[MCDwarf] Delete unneeded DW_AT_prototyped for DW_TAG_label	2020-05-24 22:24:24 -07:00
Orivej Desh	838d12207b	[TargetLoweringObjectFileImpl] Use llvm::transform Fixes a build issue with libc++ configured with _LIBCPP_RAW_ITERATORS (ADL not effective) ``` llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp:1602:3: error: no matching function for call to 'transform' transform(HexString.begin(), HexString.end(), HexString.begin(), tolower); ^~~~~~~~~ ``` Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D80475	2020-05-24 20:59:24 -07:00
Craig Topper	51dec88c5d	[X86] Remove isCommutable flag from MULX instructions. The fixed register constraint on EDX/RDX as an input makes this not really commutable.	2020-05-24 15:02:36 -07:00
Simon Pilgrim	8a5aea7b50	[X86][AVX] Fold extract_subvector(subv_broadcast(x),c) -> (x) If we're extracting an subvector from a broadcasted subvector of the same type then we can use the source vector directly.	2020-05-24 18:49:39 +01:00
Simon Pilgrim	e508d643cf	[X86][AVX] Fold extract_subvector(broadcast(x),c) -> extract_subvector(broadcast(x),0) iff c != 0 If we're extracting an upper subvector from a broadcast we're better off extracting the lowest subvector instead as it avoids an actual extract instruction and might help SimplifyDemandedVectorElts further simplify the code.	2020-05-24 18:05:54 +01:00
Sanjay Patel	57bb4787d7	[Pass Manager] remove EarlyCSE as clean-up for VectorCombine EarlyCSE was added with D75145, but the motivating test is not regressed by removing the extra pass now. That might be because VectorCombine altered the way it processes instructions, or it might be from (re)moving VectorCombine in the pipeline. The extra round of EarlyCSE appears to cost approximately 0.26% in compile-time as discussed in D80236, so we need some evidence to justify its inclusion here, but we do not have that (yet). I suspect that between SLP and VectorCombine, we are creating patterns that InstCombine and/or codegen are not prepared for, but we will need to reduce those examples and include them as PhaseOrdering and/or test-suite benchmarks.	2020-05-24 12:36:21 -04:00
Florian Hahn	0deab8a54f	[LV] Either get invariant condition OR vector condition. Currently we unconditionally get the first lane of the condition operand, even if we later use the full vector condition. This can result in some unnecessary instructions being generated. Suggested as follow-up in D80219.	2020-05-24 17:16:42 +01:00
Simon Pilgrim	1e7865d946	[X86] SimplifyMultipleUseDemandedBitsForTargetNode - add initial X86ISD::VSRAI handling. This initial version only peeks through cases where we just demand the sign bit of an ashr shift, but we could generalize this further depending on how many sign bits we already have. The pr18014.ll case is a minor annoyance - we've failed to to move the psrad/paddd after the blendvps which would have avoided the extra move, but we have still increased the ILP.	2020-05-24 16:07:46 +01:00
Simon Pilgrim	71bed8206b	AMDGPU.h - reduce TargetMachine.h include. NFC. Replace TargetMachine.h include with forward declaration and CodeGen.h include in AMDGPU.h. Exposes a couple of implicit dependencies that require additional forward declarations/includes.	2020-05-24 15:27:41 +01:00
Kang Zhang	86e3abc9e6	[PowerPC] Add some InstAlias definitions Summary: This patch add the InstAlias definitions for below instructions. ADDI ADDIS ADDI8 ADDIS8 RLWINM8 ISEL ISEL8 OR OR_rec ORI ORI8 XORI8 CNTLZW8 CNTLZW8_rec TEND TSR RFEBB NOR NOR_rec MTCRF SUBF SUBF_rec SUBFC SUBFC_rec RLDICL_32_64 TW Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D77559	2020-05-24 14:05:28 +00:00
Sanjay Patel	c048a02b5b	[InstCombine] fold FP trunc into exact itofp Similar to D79116 and rGbfd512160fe0 - if the 1st cast is exact, then we can go directly to the destination type because there is no double-rounding.	2020-05-24 09:30:19 -04:00
Sanjay Patel	7eed772a27	[PatternMatch] abbreviate vector inst matchers; NFC Readability is not reduced with these opcodes/match lines, so reduce odds of awkward wrapping from 80-col limit.	2020-05-24 09:19:47 -04:00
Simon Pilgrim	b05b69e056	AMDGPUInstPrinter.cpp - add CommandLine.h include. NFC. Fixes implicit dependency that will be exposed by a future patch.	2020-05-24 14:17:04 +01:00
Florian Hahn	15224408f0	[VPlan] Use VPUser for VPWidenSelectRecipe operands (NFC). VPWidenSelectRecipe already contains a VPUser, but it is not used. This patch updates the code related to VPWidenSelectRecipe to use VPUser for its operands. Reviewers: Ayal, gilr, rengolin Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D80219	2020-05-24 13:58:08 +01:00
Simon Pilgrim	725b3463c5	AMDGPUTargetObjectFile.h - remove unnecessary includes. NFC. As we're inheriting from TargetLoweringObjectFileELF, TargetLoweringObjectFileImpl.h already declares all types we require in the overrides.	2020-05-24 13:57:02 +01:00

1 2 3 4 5 ...

134746 Commits