llvm-project

Commit Graph

Author	SHA1	Message	Date
Nikita Popov	ab2284a643	[LowerConstantIntrinsics] Make TLI a required dependency The way the pass is actually used in the optimization pipeline, TLI will be available, but this is not the case when running just -lower-constant-intrinsics in tests, which ends up being quite confusing. Require TLI unconditionally, as we usually do.	2022-03-18 14:59:18 +01:00
Tomas Matheson	62c481542e	Revert "[ARM][AArch64] generate subtarget feature flags" This reverts commit `dd8b0fecb9`.	2022-03-18 11:58:20 +00:00
David Green	fe6057a293	[AArch64] Custom lower concat(v4i8 load, ...) We already have custom lowering for v4i8 load, which loads as a f32, converts to a vector and bitcasts and extends the result to a v4i16. This adds some custom lowering of concat(v4i8 load, ...) to keep the result as an f32 and create a buildvector of the resulting f32 loads. This helps not create all the extends and bitcasts, which are often difficult to fully clean up. Differential Revision: https://reviews.llvm.org/D121400	2022-03-18 11:58:02 +00:00
Tomas Matheson	dd8b0fecb9	[ARM][AArch64] generate subtarget feature flags This patch aims to reduce a lot of the boilerplate around adding new subtarget features. From the SubtargetFeatures tablegen definitions, a series of calls to the macro GET_SUBTARGETINFO_MACRO are generated in ARM/AArch64GenSubtargetInfo.inc. ARMSubtarget/AArch64Subtarget can then use this macro to define bool members and the corresponding getter methods. Some naming inconsistencies have been fixed to allow this, and one unused member removed. This implementation only applies to boolean members; in future both BitVector and enum members could also be generated. Differential Revision: https://reviews.llvm.org/D120906	2022-03-18 11:48:20 +00:00
Nikita Popov	6ffb3ad631	[SCEV] Use constant ranges when determining reachable blocks (PR54434) This avoids false positive verification failures if the condition is not literally true/false, but SCEV still makes use of the fact that a loop is not reachable through more complex reasoning. Fixes https://github.com/llvm/llvm-project/issues/54434.	2022-03-18 12:04:35 +01:00
Mohammed Nurul Hoque	7afa44f5f5	[RISCV] Add more sign-extending ops to MIR sext.w pass. This patch adds single-bit and bit-counting ops to list of sign-extending ops. A single-bit write propagates sign-extendedness if it's not in the sign-bits. Bit extraction and bit counting always outputs a small number, so sign-extended. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D121152	2022-03-18 18:21:17 +08:00
Nikita Popov	fc8946fae7	[InstCombine] Remove integer SPF of SPF folds (NFCI) Now that we canonicalize to intrinsics, these folds should no longer be needed. Only one fold that also applies to floating-point min/max is retained.	2022-03-18 10:20:48 +01:00
Nikita Popov	f96428e16d	[MemorySSA] Don't optimize uses during construction This changes MemorySSA to be constructed in unoptimized form. MemorySSA::ensureOptimizedUses() can be called to optimize all uses (once). This should be done by passes where having optimized uses is beneficial, either because we're going to query all uses anyway, or because we're doing def-use walks. This should help reduce the compile-time impact of MemorySSA for some use cases (the reason why I started looking into this is D117926), which can avoid optimizing all uses upfront, and instead only optimize those that are actually queried. Actually, we have an existing use-case for this, which is EarlyCSE. Disabling eager use optimization there gives a significant compile-time improvement, because EarlyCSE will generally only query clobbers for a subset of all uses (this change is not included in this patch). Differential Revision: https://reviews.llvm.org/D121381	2022-03-18 09:56:16 +01:00
Florian Hahn	4a699ae9c6	[LoopSimplifyCFG] Check predecessors of exits before marking them dead. LoopSimplifyCFG may process loops that are not in loop-simplify/canonical form. For loops not in canonical form, exit blocks may be reachable from non-loop blocks and we cannot consider them as dead if they only are not reachable from the loop itself. Unfortunately the smallest test I could come up with requires running multiple passes: -passes='loop-mssa(loop-instsimplify,loop-simplifycfg,simple-loop-unswitch)' The reason is that loops are canonicalized at the beginning of loop pipelines, so a later transform has to break canonical form in a way that breaks LoopSimplifyCFG's dead-exit analysis. Alternatively we could try to require all loop passes to maintain canonical form. That in turn would also require additional verification. Fixes #54023, #49931. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D121925	2022-03-18 08:54:44 +00:00
Amir Ayupov	a954ade8ed	[X86][NFC] Generate mnemonic tables Produce mnemonic tables, adding the functions to llvm::X86 namespace. Reviewed By: MaskRay, skan Differential Revision: https://reviews.llvm.org/D121572	2022-03-18 01:46:48 -07:00
Shengchen Kan	920c2e5763	[X86][NFC] Rename target feature hasCMov->hasCMOV This is a follow-up patch for D121975.	2022-03-18 14:05:52 +08:00
Kai Luo	31906a6090	[AtomicExpand][PowerPC] Fix all-one mask value When generating a all-one mask value whose bitwidth is larger than 64, signed extension should be used rather then zero extension. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D120865	2022-03-18 13:35:54 +08:00
Craig Topper	6cfe41dcc8	[X86] Rename more target feature related things consistency. NFC -Rename ModeBit to IsBit to match X86Subtarget. -Rename FeatureLAHFSAHF to FeatureLAFHSAFH64 to match X86Subtarget. -Use consistent capitalization Reviewed By: skan Differential Revision: https://reviews.llvm.org/D121975	2022-03-17 22:27:17 -07:00
Shengchen Kan	1a70febf82	[X86] Set Int_MemBarrier as a meta-instruction Compiler only emits a comment for `Int_MemBarrier`, so it should be marked as a meta-instruction, which can help improve accuracy of debug location. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D121879	2022-03-18 13:12:28 +08:00
Andrew Wei	0af3e6a22d	[InstCombine] Sink instructions with multiple users in a successor block. This patch tries to sink instructions when they are only used in a successor block. This is a further enhancement patch based on Anna's commit: D109700, which allows sinking an instruction having multiple uses in a single user. In this patch, sink instructions with multiple users in a single successor block will be supported. It could fix a known issue from rust: https://github.com/rust-lang/rust/issues/51346#issuecomment-394443610 Reviewed By: nikic, reames Differential Revision: https://reviews.llvm.org/D121585	2022-03-18 11:53:45 +08:00
Abinav Puthan Purayil	aee3684995	[AMDGPU] Use COPY_TO_REGCLASS for buffer_atomic_cmpswap selection GlobalISel was selecting the av_* regclass for some cases. Differential Revision: https://reviews.llvm.org/D121933	2022-03-18 08:56:23 +05:30
Jessica Clarke	63ea7797dd	[RISCV] Fix buildbot breakage by explicitly instantiating templates RISCVISelDAGToDAG's selectImm uses RISCVTargetLowering::getAddr (specifically the ConstantPoolSDNode) as of `41454ab256` ("[RISCV] Use constant pool for large integers"), but nothing explicitly instantiates any of the templates, the only reason they exist is because of the various lowering methods in RISCVISelLowering.cpp that themselves use the methods. However, with inlining, those can end up not existing as real functions and thus not be exported, leading to link errors. Up until now this hasn't happened, but for whatever reason D121654 has triggered this on the sanitizer-ppc64be-linux buildbot, giving: ../../../../lib/libLLVMRISCVCodeGen.a(RISCVISelDAGToDAG.cpp.o): In function `selectImm(llvm::SelectionDAG, llvm::SDLoc const&, llvm::MVT, long, llvm::RISCVSubtarget const&)': RISCVISelDAGToDAG.cpp:(.text._ZL9selectImmPN4llvm12SelectionDAGERKNS_5SDLocENS_3MVTElRKNS_14RISCVSubtargetE+0x3d8): undefined reference to `llvm::SDValue llvm::RISCVTargetLowering::getAddr<llvm::ConstantPoolSDNode>(llvm::ConstantPoolSDNode, llvm::SelectionDAG&, bool) const' collect2: error: ld returned 1 exit status Fix this by explicitly instantiating getAddr in its four different forms so separate translation units can reliably use it. Fixes: `41454ab256` ("[RISCV] Use constant pool for large integers")	2022-03-18 02:22:17 +00:00
Weining Lu	b75d2ec124	[LoongArch] Add some blank lines to make .td more tidy. NFC	2022-03-18 09:49:16 +08:00
Vasileios Porpodas	9136145eb0	Revert "[SLP] Fix lookahead operand reordering for splat loads." due to build failures This reverts commit `5efa78985b`.	2022-03-17 18:22:04 -07:00
Vasileios Porpodas	5efa78985b	[SLP] Fix lookahead operand reordering for splat loads. Splat loads are inexpensive in X86. For a 2-lane vector we need just one instruction: `movddup (%reg), xmm0`. Using the standard Splat score leads to worse code. This patch adds a new score dedicated for splat loads. Please note that a splat is usually three IR instructions: - It is usually a load and 2 inserts: %ld = load double, double* %gep %ins1 = insertelement <2 x double> poison, double %ld, i32 0 %ins2 = insertelement <2 x double> %ins1, double %ld, i32 1 - But it can also be a load, an insert and a shuffle: %ld = load double, double* %gep %ins = insertelement <2 x double> poison, double %ld, i32 0 %shf = shufflevector <2 x double> %ins, <2 x double> poison, <2 x i32> zeroinitializer Because of this some of the lit tests contain more IR instructions. Differential Revision: https://reviews.llvm.org/D121354	2022-03-17 18:05:54 -07:00
Paul Kirth	964398ccb1	Revert "Revert "Revert "[misexpect] Re-implement MisExpect Diagnostics""" This reverts commit `6cf560d69a`.	2022-03-18 00:21:33 +00:00
Paul Kirth	6cf560d69a	Revert "Revert "[misexpect] Re-implement MisExpect Diagnostics"" I mistakenly reverted my commit, so I'm relanding it. This reverts commit `10866a1df4`.	2022-03-18 00:04:22 +00:00
Paul Kirth	10866a1df4	Revert "[misexpect] Re-implement MisExpect Diagnostics" This reverts commit `e7749d4713`.	2022-03-17 23:54:26 +00:00
Paul Kirth	e7749d4713	[misexpect] Re-implement MisExpect Diagnostics Reimplements MisExpect diagnostics from D66324 to reconstruct its original checking methodology only using MD_prof branch_weights metadata. New checks rely on 2 invariants: 1) For frontend instrumentation, MD_prof branch_weights will always be populated before llvm.expect intrinsics are lowered. 2) for IR and sample profiling, llvm.expect intrinsics will always be lowered before branch_weights are populated from the IR profiles. These invariants allow the checking to assume how the existing branch weights are populated depending on the profiling method used, and emit the correct diagnostics. If these invariants are ever invalidated, the MisExpect related checks would need to be updated, potentially by re-introducing MD_misexpect metadata, and ensuring it always will be transformed the same way as branch_weights in other optimization passes. Frontend based profiling is now enabled without using LLVM Args, by introducing a new CodeGen option, and checking if the -Wmisexpect flag has been passed on the command line. Differential Revision: https://reviews.llvm.org/D115907	2022-03-17 23:46:23 +00:00
Yonghong Song	2e94d8e67a	[BPF] handle unsigned icmp ops in BPFAdjustOpt pass When investigating an issue with bcc tool inject.py, I found a verifier failure with latest clang. The portion of code can be illustrated as below: struct pid_struct { u64 curr_call; u64 conds_met; u64 stack[2]; }; struct pid_struct bpf_map_lookup_elem(); int foo() { struct pid_struct p = bpf_map_lookup_elem(); if (!p) return 0; p->curr_call--; if (p->conds_met < 1 \|\| p->conds_met >= 3) return 0; if (p->stack[p->conds_met - 1] == p->curr_call) p->conds_met--; ... } The verifier failure looks like: ... 8: (79) r1 = (u64 )(r0 +0) R0_w=map_value(id=0,off=0,ks=4,vs=32,imm=0) R10=fp0 fp-8=mmmm???? 9: (07) r1 += -1 10: (7b) (u64 )(r0 +0) = r1 R0_w=map_value(id=0,off=0,ks=4,vs=32,imm=0) R1_w=inv(id=0) R10=fp0 fp-8=mmmm???? 11: (79) r2 = (u64 )(r0 +8) R0_w=map_value(id=0,off=0,ks=4,vs=32,imm=0) R1_w=inv(id=0) R10=fp0 fp-8=mmmm???? 12: (bf) r3 = r2 13: (07) r3 += -3 14: (b7) r4 = -2 15: (2d) if r4 > r3 goto pc+13 R0=map_value(id=0,off=0,ks=4,vs=32,imm=0) R1=inv(id=0) R2=inv(id=2) R3=inv(id=0,umin_value=18446744073709551614,var_off=(0xffffffff00000000; 0xffffffff)) R4=inv-2 R10=fp0 fp-8=mmmm???? 16: (07) r2 += -1 17: (bf) r3 = r2 18: (67) r3 <<= 3 19: (bf) r4 = r0 20: (0f) r4 += r3 math between map_value pointer and register with unbounded min value is not allowed Here the compiler optimized "p->conds_met < 1 \|\| p->conds_met >= 3" to r2 = p->conds_met r3 = r2 r3 += -3 r4 = -2 if (r3 < r4) return 0 r2 += -1 r3 = r2 ... In the above, r3 is initially equal to r2, but is modified used by the comparison. But later on r2 is used again. This caused verification failure. BPF backend has a pass, AdjustOpt, to prevent such transformation, but only focused on signed integers since typical bpf helper returns signed integers. To fix this case, let us handle unsigned integers as well. Differential Revision: https://reviews.llvm.org/D121937	2022-03-17 16:24:39 -07:00
Johannes Doerfert	4308fdf83b	[Attributor] Remove more non-deterministic behavior and debug output	2022-03-17 17:42:32 -05:00
Johannes Doerfert	59a6b668ab	[OpenMP][FIX] Initialize member to avoid undefined value in debug output	2022-03-17 17:42:32 -05:00
Johannes Doerfert	88ea86c369	[Attributor][FIX] Remove reference into map that might dangle The reference was taken and the map was modified after. This can (and did) lead to dangling pointers and all sorts of problems afterwards.	2022-03-17 17:42:32 -05:00
Ellis Hoag	f6b5142ac2	[AlwaysInliner] Emit inline remark only when successful Failures in `InlineFunction()` are caught after D121722, but `emitInlinedIntoBasedOnCost()` should only be called when inlining is successful. This also removes an unnecessary call to `shouldInline()` which always returned `InlineCost::getAlways()`. Reviewed By: kyulee, nikic Differential Revision: https://reviews.llvm.org/D121946	2022-03-17 15:40:24 -07:00
Kyungwoo Lee	ddb85f34f5	[ObjCARC] Fix non-determinism We often failed in the assertion, non-deterministically with a large IR: ``` Assertion `notDifferentParent(LocA.Ptr, LocB.Ptr) && "BasicAliasAnalysis doesn't support interprocedural queries." ``` Looking at the comment in https://reviews.llvm.org/D87806, it appears it's actually a module pass for new PM while the legacy PM still works as a function pass. The fix is to align the same behavior in between new PM and old PM, which initializes ObjCARCContract for each function. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D121949	2022-03-17 15:01:09 -07:00
Changpeng Fang	dd5895cc39	AMDGPU: Use the implicit kernargs for code object version 5 Summary: Specifically, for trap handling, for targets that do not support getDoorbellID, we load the queue_ptr from the implicit kernarg, and move queue_ptr to s[0:1]. To get aperture bases when targets do not have aperture registers, we load private_base or shared_base directly from the implicit kernarg. In clang, we use implicitarg_ptr + offsets to implement __builtin_amdgcn_workgroup_size_{xyz}. Reviewers: arsenm, sameerds, yaxunl Differential Revision: https://reviews.llvm.org/D120265	2022-03-17 14:12:36 -07:00
Andrew Litteken	f7d90ad57b	[IROutliner] Make sure that loop debug info is stripped. As pointed out in https://github.com/llvm/llvm-project/issues/54155#issuecomment-1057465479, there was a crash when loop info was being outlined. It was not being properly stripped and adjusted, so would point to the wrong location. This uses similar logic found in the CodeExtractor to adjust the loop debug info. Reviewer: fhahn, paquette Differential Revision: https://reviews.llvm.org/D120869	2022-03-17 14:41:53 -06:00
Stanislav Mekhanoshin	d9ac55fab2	[AMDGPU] New MFMA names for existing instructions Old names are supported as aliases. _1k MFMA got new opcodes. Differential Revision: https://reviews.llvm.org/D121741	2022-03-17 13:05:36 -07:00
Ben Barham	4125524112	[VFS] Add print/dump to the whole FileSystem hierarchy For now most are implemented by printing out the name of the filesystem, but this can be expanded in the future. Only `OverlayFileSystem` and `RedirectingFileSystem` are properly implemented in this patch. - `OverlayFileSystem`: Prints each filesystem in the order that any operations are actually run on them. Optionally prints recursively. - `RedirectingFileSystem`: Prints out all mappings, as well as the `ExternalFS`. Most of this was already implemented other than the handling for the `DirectoryRemap` case and to actually print out the mapping. Each FS should implement `printImpl` rather than `print`, where the latter just fowards to the former. This is to avoid spreading the default arguments through to the subclasses (where we may miss updating in the future). Differential Revision: https://reviews.llvm.org/D121421	2022-03-17 13:02:40 -07:00
Stanislav Mekhanoshin	522b259976	[AMDGPU] Allow v_accvgpr_write to use SGPR src on gfx940 Differential Revision: https://reviews.llvm.org/D121843	2022-03-17 12:12:06 -07:00
Vang Thao	27e1931508	[AMDGPU] Fix PreRARematerialize scheduler pass sinking subreg defs When collecting trivially rematerializable defs, skip any subreg defs. We do not want to sink these. Differential Revision: https://reviews.llvm.org/D121874	2022-03-17 11:38:53 -07:00
Sterling Augustine	bd38234d76	Reland "Use a stable-sort when combining bases" Differential Revision: https://reviews.llvm.org/D121922	2022-03-17 11:32:16 -07:00
Alexey Bataev	d65cc85977	[SLP]Do not schedule instructions with constants/argument/phi operands and external users. No need to schedule entry nodes where all instructions are not memory read/write instructions and their operands are either constants, or arguments, or phis, or instructions from others blocks, or their users are phis or from the other blocks. The resulting vector instructions can be placed at the beginning of the basic block without scheduling (if operands does not need to be scheduled) or at the end of the block (if users are outside of the block). It may save some compile time and scheduling resources. Differential Revision: https://reviews.llvm.org/D121121	2022-03-17 11:03:45 -07:00
Julian Lettner	22570bac69	Lower `@llvm.global_dtors` using `__cxa_atexit` on MachO For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with `__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`. Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this. Enable fallback to the old behavior via Clang driver flag (`-fregister-global-dtors-with-atexit`) or llc / code generation flag (`-lower-global-dtors-via-cxa-atexit`). This escape hatch will be removed in the future. Differential Revision: https://reviews.llvm.org/D121736	2022-03-17 10:47:13 -07:00
Archibald Elliott	f496330f97	[ARM] Fix Decode of tsb csync There is a crash in the ARM backend when attempting to decode a "tsb csync" instruction using `llvm-objdump --triple=armv8.4a -d`. The crash was in `ARMMCInstrAnalysis::evaluateBranch` where the number of operands in the decoded instruction (0) did not match the number of operands in the instruction description (1). This is becuase `tsb csync` looks like it has an operand during assembly, but there is only one valid operand (csync), so there is no encoding space in the instruction for the operand, so the decoder never has a field to decode that represents `csync`. The fix is to add a custom decode method, which ensures that this instruction does have the right number of operands after decoding. This method merely adds the only available operand value, `ARM_TSB::CSYNC`. Reviewed By: tmatheson Differential Revision: https://reviews.llvm.org/D121479	2022-03-17 17:29:31 +00:00
Augusto Noronha	9b3af5e7b7	[dsymutil] Apply relocations present in Swift reflection sections The strippable Swift reflection sections contain subtractor relocations that need to be applied. There are two situations we need to support. 1) Both symbols used in the relocation come from the .o file (for example, one symbol lives in __swift5_fieldmd and the second in __swift5_reflstr). 2) One symbol comes from th .o file and the second from the main binary (for example, __swift5_fieldmd and __swift5_typeref). Differential Revision: https://reviews.llvm.org/D120574	2022-03-17 14:23:20 -03:00
Ellis Hoag	84c6689b15	[AlwaysInliner] Check inliner errors even without assserts When we build clang without asserts we should still check the result of `InlineFunction()` to be sure there wasn't an error. Otherwise we could incorrectly merge attributes in the next line. This also removes a redundent call to `getCaller()`. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D121722	2022-03-17 10:16:23 -07:00
Matt Arsenault	8d66603a48	Revert "RegAllocGreedy: Fix last chance recolor assert in impossible case" This reverts commit `c46aab01c0`. This evidently blocks compiling in some cases that used to work before. I'm also not fully convinced this is the correct place to fix this problem.	2022-03-17 13:12:01 -04:00
Craig Topper	bbd2ecf9f0	[RISCV] Add +experimental-zvfh extension to cover half types in vectors. Currently we allow half types in vectors if the scalar Zfh extension is enabled. This behavior is not inline with the vector spec. For f32 and f64 types, the Zve32f, Zve64f, Zve64d, and V explicitly control the availablity of floating point types in vectors. In order to make our compiler compliant, we either need to remove all support for half in vectors or we need an extension to control it. Draft spec here https://github.com/riscv/riscv-v-spec/pull/780 Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D121345	2022-03-17 10:04:02 -07:00
Sterling Augustine	84810e1f74	Revert "Use a stable-sort when combining bases" This reverts commit `81417261a1`.	2022-03-17 09:54:13 -07:00
Craig Topper	7e15303062	[RISCV] Simplify scalable vector case in lowerVectorMaskExt. Since we have SPLAT_VECTOR_PARTS these days, I don't think we need to go through extra lengths to avoid introducing an illegal scalar type. We can just call getConstant using the scalable vector type and let it create either a SPLAT_VECTOR or a SPLAT_VECTOR_PARTS. Reviewed By: frasercrmck, rogfer01 Differential Revision: https://reviews.llvm.org/D121645	2022-03-17 09:43:13 -07:00
Yonghong Song	d2b4a675a8	[BPF] Fix a bug in BPFAdjustOpt pass for icmp transformation When checking a bcc issue related to bcc tool inject.py, I found a bug in BPFAdjustOpt pass for icmp transformation, caused by typo's. For the following condition: Cond2Op != ICmpInst::ICMP_SLT && Cond1Op != ICmpInst::ICMP_SLE it should be Cond2Op != ICmpInst::ICMP_SLT && Cond2Op != ICmpInst::ICMP_SLE This patch fixed the problem and a test case is added. Differential Revision: https://reviews.llvm.org/D121883	2022-03-17 09:25:18 -07:00
Matt Devereau	a9e08bc7c1	[AArch64][SVE] InstCombine llvm.aarch64.sve.sel to select InstCombine llvm.aarch64.sve.sel to select. This allows an existing instCombine added in `20b0fa91c9` to fire. Differential Revision: https://reviews.llvm.org/D121792	2022-03-17 16:20:48 +00:00
Sterling Augustine	81417261a1	Use a stable-sort when combining bases While experimenting with different algorithms for std::sort I discovered that combine-vmovdrr.ll fails if this sort is not stable. I suspect that the test is too stringent in its check--the resultant code looks functionally identical to me under both stable and unstable sorting, but a generic fix is quite a bit more difficult to implement. Thanks to scw@google.com for finding the proper fix. Differential Revision: https://reviews.llvm.org/D121870	2022-03-17 09:02:13 -07:00
Marco Elver	b09439e20b	[AtomicExpandPass][NFC] Reformat with clang-format NFCI.	2022-03-17 16:58:16 +01:00
Fraser Cormack	fe74183564	[Coroutines][NFC] Format line to 80 cols	2022-03-17 15:34:24 +00:00
Arthur Eubanks	295172ef51	[OpaquePtr][LLParser] Automatically detect opaque pointers in .ll files This allows us to not have to specify -opaque-pointers when updating IR tests from typed pointers to opaque pointers. We detect opaque pointers in .ll files by looking for relevant tokens, either "ptr" or "*". Reviewed By: #opaque-pointers, nikic Differential Revision: https://reviews.llvm.org/D119482	2022-03-17 08:37:18 -07:00
Jay Foad	313f306b26	[AMDGPU] Stop using getMinimalPhysRegClass in LowerFormalArguments NFCI. The motivation for this is avoid problems in future if we add new classes containing only a subset of all VGPRs, or a subset of all SGPRs. getMinimalPhysRegClass would favour these smaller classes, which is not what we want here. Differential Revision: https://reviews.llvm.org/D121914	2022-03-17 15:19:17 +00:00
Jeremy Morse	12a2f7494e	[DebugInfo][InstrRef] Prefer stack locations for variables This patch adjusts what location is picked for a known variable value -- preferring to leave locations on the stack, even when a value is re-loaded into a register. The benefit is reduced location list entropy, on a clang-3.4 build I found that .debug_loclists reduces in size by 6%, from 29Mb down to 27Mb. Testing: a few tests need the stack slot to be written to explicitly, to force LiveDebugValues into restoring the variable location to a register. I've added an explicit test for the desired behaviour in livedebugvalues_recover_clobbers.mir . Differential Revision: https://reviews.llvm.org/D120732	2022-03-17 14:26:15 +00:00
Marco Elver	cbe1e67ead	[Instruction] Introduce getAtomicSyncScopeID() An analysis may just be interested in checking if an instruction is atomic but system scoped or single-thread scoped, like ThreadSanitizer's isAtomic(). Unfortunately Instruction::isAtomic() can only answer the "atomic" part of the question, but to also check scope becomes rather verbose. To simplify and reduce redundancy, introduce a common helper getAtomicSyncScopeID() which returns the scope of an atomic operation. Start using it in ThreadSanitizer. NFCI. Reviewed By: dvyukov Differential Revision: https://reviews.llvm.org/D121910	2022-03-17 14:59:37 +01:00
Florian Hahn	151c144350	[LV] Use usesScalars in widenPHIInstruction. This uses the existing VPlan helpers to check whether there are scalar uses of a phi recipe. It remove one of the few remaining dependencies on the cost model from VPlan code generation. Depends on D121612. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D121613	2022-03-17 13:16:32 +00:00
Sanjay Patel	67e9151096	[x86] try harder to use shift instead of test if it can save some immediate bytes We favor 'and' and 'test' in earlier phases of optimization, and that's usually the better option, but we can save a few instruction bytes by converting a mask constant to a shift here. Differential Revision: https://reviews.llvm.org/D121147	2022-03-17 09:10:57 -04:00
Florian Hahn	a6e70e4056	[VPlan] VPInterleaveRecipe only requires the first lane of the address. VPInterleaveRecipe only uses the first lane of the address. Add onlyFirstLaneUsed implementation. This is needed for a follow-up patch. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D121612	2022-03-17 11:56:43 +00:00
Dmitry Preobrazhensky	9c632b61eb	[AMDGPU][MC] A fix for commit `5977dfb` The commit code `5977dfba64` failed to compile with GCC5. This patch addresses the issue. For a related discussion, see https://reviews.llvm.org/D121696	2022-03-17 14:41:21 +03:00
Nikita Popov	1dbeb64493	[SLP] Avoid unnecessary getIncomingValueForBlock() call (NFC) This code just wants to check all incoming values, we don't care care what the incoming block is here.	2022-03-17 12:23:46 +01:00
Jay Foad	a3a4591856	[LegacyPassManager] Move structural hashing into Pass classes. NFC. Move structural hashing into virtual methods on Pass. This will allow MachineFunctionPass to override the method to add hashing of the MachineFunction. Differential Revision: https://reviews.llvm.org/D120123	2022-03-17 09:51:12 +00:00
David Green	0b6df40c52	[AArch64] Combine ISD::AND into AArch64ISD::ANDS If we already have a AArch64ISD::ANDS node with identical operands, we can merge any ISD::AND into it, reducing the instruction count by calculating the value and the flags in a single operation. This code is taken from the X86 backend, and could also handle AArch64ISD::ADDS and AArch64ISD::SUBS, but I couldn't find any test cases where it came up. Differential Revision: https://reviews.llvm.org/D118584	2022-03-17 09:44:11 +00:00
Nikita Popov	4010a7a5d0	Reapply [InstCombine] Support switch in phi to cond fold Reapply with an explicit check for multi-edges, as the expected behavior of multi-edge dominance is unclear (D120811). ----- For conditional branches, we know the value is i1 0 or i1 1 along the outgoing edges. For switches we can apply exactly the same optimization, just with the known values determined by the switch cases.	2022-03-17 10:03:09 +01:00
Lian Wang	214afc7116	[RISCV] Add patterns for vnsrl.wi and vnsra.wi instructions Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D121675	2022-03-17 07:22:32 +00:00
Abinav Puthan Purayil	f59cb41ba1	[AMDGPU] Select buffer_atomic_cmpswap* in tblgen This change replaces the manual selection of buffer_atomic_cmpswap* instructions in SelectionDAG and GlobalISel with a tblgen based selection in BUFInstructions.td. This allows us to select the return and no-return variants in tblgen. Differential Revision: https://reviews.llvm.org/D121770	2022-03-17 10:12:32 +05:30
Heejin Ahn	b8038a916d	[WebAssembly] Disable SimplifyDemandedVectorElts after legalization This fixes a reported bug that caused an infinite loop during the SelectionDAG optimization phase in ISel, by creating an overridable hook in `TargetLowering` that allows us to bail out from running `SimplifyDemandedVectorElts`. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D121869	2022-03-16 20:52:43 -07:00
Heejin Ahn	0ca2132067	[WebAssembly] Improve EH/SjLj error messages This includes a function name and a relevant instruction in error messages when possible, making them more helpful. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D120678	2022-03-16 20:50:34 -07:00
Christudasan Devadasan	6dd21d1db1	[AMDGPU][SIFoldOperands] Consider the alignment constraints Enforced an alignment check while folding the operands.	2022-03-17 08:27:53 +05:30
Christudasan Devadasan	af717d4aca	[AMDGPU][MachineVerifier] Alignment check for fp32 packed math instructions The fp32 packed math instructions are introduced in gfx90a. If their vector register operands are not properly aligned, the verifier should flag them. Currently, the verifier failed to report it and the compiler ended up emitting a broken assembly. This patch fixes that missed case in TII::verifyInstruction. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D121794	2022-03-17 08:21:35 +05:30
Lian Wang	b26abcad81	[RISCV][NFC] Replace redundant code with VLOpFrag Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D121783	2022-03-17 02:05:21 +00:00
Craig Topper	2e10671ec7	[RISCV] Improve detection of when to skip (and (srl x, c2) c1) -> (srli (slli x, c3-c2), c3) isel. We have a special case to skip this transform if c1 is 0xffffffff and x is sext_inreg in order to use sraiw+zext.w. But we were only checking that we have a sext_inreg opcode, not how many bits are being sign extended. This commit adds a check that it is a sext_inreg from i32 so we know for sure that an sraiw can be created.	2022-03-16 14:54:34 -07:00
Arthur Eubanks	2371c5a0e0	[OpaquePtr][ARM] Use elementtype on ldrex/ldaex/stlex/strex Includes verifier changes checking the elementtype, clang codegen changes to emit the elementtype, and ISel changes using the elementtype. Basically the same as D120527. Reviewed By: #opaque-pointers, nikic Differential Revision: https://reviews.llvm.org/D121847	2022-03-16 14:11:53 -07:00
Alexey Bataev	150ea76543	Revert "[SLP]Do not schedule instructions with constants/argument/phi operands and external users." This reverts commit `1eeb2bfe72` to fix a bug reported in https://reviews.llvm.org/D121121	2022-03-16 13:54:59 -07:00
Thomas Lively	7e8913d775	[WebAssembly] Fix names of SIMD instructions containing '_zero' Fix the instruction names to match the WebAssembly spec: - `i32x4.trunc_sat_zero_f64x2_{s,u}` => `i32x4.trunc_sat_f64x2_{s,u}_zero` - `f32x4.demote_zero_f64x2` => `f32x4.demote_f64x2_zero` Also rename related things like intrinsics, builtins, and test functions to match. Reviewed By: aheejin Differential Revision: https://reviews.llvm.org/D121661	2022-03-16 13:34:57 -07:00
Florian Hahn	470a975c84	[ConstraintElimination] Add missing dominance check. When dealing with an unconditional branch, the condition can only added if BB properly dominates the successor.	2022-03-16 20:01:24 +00:00
Jay Foad	fb8d23b8e7	[AMDGPU] Define new feature HasFlatScratchSVSMode. NFC. This is by analogy with HasFlatScratchSTMode and is slightly more informative than using isGFX940Plus. Differential Revision: https://reviews.llvm.org/D121804	2022-03-16 19:54:02 +00:00
Yonghong Song	98e2274458	[BPF] fix a CO-RE bitfield relocation error with >8 record alignment Jussi Maki reported a fatal error like below for a bitfield CO-RE relocation: fatal error: error in backend: Unsupported field expression for llvm.bpf.preserve.field.info, requiring too big alignment The failure is related to kernel struct thread_struct. The following is a simplied example. Suppose we have below structure: struct t2 { int a[8]; } __attribute__((aligned(64))) __attribute__((preserve_access_index)); struct t1 { int f1:1; int f2:2; struct t2 f3; } __attribute__((preserve_access_index)); Note that struct t2 has aligned 64, which is used sometimes in the kernel to enforce cache line alignment. The above struct will be encoded into BTF and the following is what C code looks like and the struct will appear in the file like vmlinux.h. struct t2 { int a[8]; long: 64; long: 64; long: 64; long: 64; } __attribute__((preserve_access_index)); struct t1 { int f1: 1; int f2: 2; long: 61; long: 64; long: 64; long: 64; long: 64; long: 64; long: 64; long: 64; struct t2 f3; } __attribute__((preserve_access_index)); Note that after origin_source -> BTF -> new_source transition, the new source has the same memory layout as the old one but the alignment interpretation inside the compiler could be different. The bpf program will use the later explicitly padded structure as in vmlinux.h. In the above case, the compiler internal ABI alignment for new struct t1 is 16 while it is 4 for old struct t1. I didn't do a thorough investigation why the ABI alignment is 16 and I suspect it is related to anonymous padding in the above. Current BPF bitfield CO-RE handling requires alignment <= 8 so proper bitfield operatin can be performed. Therefore, alignment 16 will cause a compiler fatal error. To fix the ABI alignment >=16, let us check whether the bitfield can be held within a 8-byte-aligned range. If this is the case, we can use alignment 8. Otherwise, a fatal error will be reported. Differential Revision: https://reviews.llvm.org/D121821	2022-03-16 12:16:46 -07:00
Jessica Clarke	659363c0cc	[RISCV] Ensure PseudoLA* can be hoisted Since we mark the pseudos as mayLoad but do not provide any MMOs, isSafeToMove conservatively returns false, stopping MachineLICM from hoisting the instructions. PseudoLA_TLS_GD does not actually expand to a load, so stop marking that as mayLoad to allow it to be hoisted, and for the others make sure to add MMOs during lowering to indicate they're GOT loads and thus can be freely moved. Fixes https://github.com/llvm/llvm-project/issues/54372 Reviewed By: MaskRay, arichardson Differential Revision: https://reviews.llvm.org/D121654	2022-03-16 18:45:36 +00:00
Sanjay Patel	83413bb617	[x86] reduce indentation; NFC We may be able to refine the conditions for these transforms ( D120648 ).	2022-03-16 13:39:02 -04:00
Snehasish Kumar	49c048add4	[memprof] Add a test to verify callstack order. We add to ensure that we are observing the correct callstack order in memprof during symbolization. There was some confusion whether the order of DIFrame objects were reversed but in reality the leaf function is at index 0 so no code changes are required. Differential Revision: https://reviews.llvm.org/D121759	2022-03-16 10:10:57 -07:00
Fangrui Song	30718f3aa6	[llvm-objcopy] --weaken-symbol/--weaken: weaken STB_GNU_UNIQUE symbols STB_GNU_UNIQUE is like STB_GLOBAL with extra semantics: * gold and ld.lld: changed to STB_GLOBAL if --no-gnu-unique is specified * glibc: unique even with dlopen `RTLD_LOCAL`, implies DF_1_NODELETE Therefore, I think it makes sense for --weaken-symbol/--weaken-symbols/--weaken to change STB_GNU_UNIQUE symbols. binutils 2.39 will have the same behavior: https://sourceware.org/bugzilla/show_bug.cgi?id=28926 Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D120638	2022-03-16 09:48:19 -07:00
Marco Elver	555df03012	[SelectionDAG][NFC] Clean up SDCallSiteDbgInfo accessors * Consistent naming: addCallSiteInfo vs. getCallSiteInfo; * Use ternary operator to reduce verbosity; * const'ify getters; * Add comments; NFCI. Differential Revision: https://reviews.llvm.org/D121820	2022-03-16 17:46:06 +01:00
Malhar Jajoo	a36d269658	[VPlan] Avoid collecting scalars for SVE This patch ensures scalars (except for uniforms) are no longer collected (prior to LVP planning phase) for scalable vectorization. This is to avoid the chances of generating scalarized instructions later (during LVP execute phase) as they are not supported for scalable vectorization. Relevant test has also been added. Differential Revision: https://reviews.llvm.org/D121452	2022-03-16 16:33:34 +00:00
Nikita Popov	0958450251	Revert "[FuzzMutate] Don't insert instructions after musttail call" This reverts commit `6a23d27644`. The newly added tests fail on the llvm-clang-x86_64-sie-win buildbot. Not sure why a failure only occurs there, possibly differen PRNG sequence?	2022-03-16 17:29:27 +01:00
Nikita Popov	6a23d27644	[FuzzMutate] Don't insert instructions after musttail call	2022-03-16 16:58:33 +01:00
Amir Ayupov	2c4e38fa6f	[X86] Emit REX prefix immediately before the opcode Fix prefix emission order to emit REX immediately before the opcode (SDM vol2, 2.1, Figure 2-1). According to SDM vol2 2.2.1, "Other placements are ignored". This fix has a side effect of outputting segment override prefix in a different order than previously (benign). Follow-up to https://reviews.llvm.org/D120592 Reviewed By: skan, craig.topper Differential Revision: https://reviews.llvm.org/D120871	2022-03-16 08:30:31 -07:00
Amir Ayupov	1d3719820f	[X86] Preserve redundant Address-Size override prefix Print and emit redundant Address-Size override prefix if it's set on the instruction. Reviewed By: skan Differential Revision: https://reviews.llvm.org/D120592	2022-03-16 08:30:29 -07:00
Joe Nash	fb81f06f63	[AMDGPU] Calculate RegWidth in bits in AsmParser NFC. Switch from calculations based on dwords to bits, to be more flexible. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D121730	2022-03-16 10:52:14 -04:00
Nikita Popov	d7cf7ec05d	[SROA] Handle over-large loads during presplitting When a load extends past the extent of the alloca, SROA will restrict the slice size to extend to the end of the alloca only. However, presplitting was asserting that the load size and the slice size match exactly, which does not hold in this case. Relax the assertion to only require that the load size is greater or equal than the slice size.	2022-03-16 15:41:11 +01:00
Shengchen Kan	ac64d0d230	[NFC][CodeGen] Remove redundant if clause in TargetPassConfig::addPass	2022-03-16 22:14:23 +08:00
Florian Hahn	f473d4aa80	[ConstraintElimination] Support BBs with single successor in CanAdd. If BB has a single successor, conditions can be added safely.	2022-03-16 14:13:52 +00:00
Alexey Bataev	1eeb2bfe72	[SLP]Do not schedule instructions with constants/argument/phi operands and external users. No need to schedule entry nodes where all instructions are not memory read/write instructions and their operands are either constants, or arguments, or phis, or instructions from others blocks, or their users are phis or from the other blocks. The resulting vector instructions can be placed at the beginning of the basic block without scheduling (if operands does not need to be scheduled) or at the end of the block (if users are outside of the block). It may save some compile time and scheduling resources. Differential Revision: https://reviews.llvm.org/D121121	2022-03-16 06:05:43 -07:00
Dmitry Preobrazhensky	5977dfba64	[AMDGPU][MC][NFC] Refactored custom operands handling The original design of custom operands support assumed that most GPUs have the same or very similar operand names end encodings. This is no longer the case. As a result the support code becomes over-complicated and difficult to maintain. This change implements a different design with the following benefits: - support of aliases; - support of operands with overlapped encodings; - identification of defined but unsupported operands. Differential Revision: https://reviews.llvm.org/D121696	2022-03-16 16:04:55 +03:00
Shengchen Kan	37b378386e	[NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments	2022-03-16 20:25:42 +08:00
Simon Pilgrim	e3deb7d88b	[X86] computeKnownBitsForTargetNode - add X86ISD::AND KnownBits handling Fixes #54171	2022-03-16 11:05:36 +00:00
Simon Moll	91fad1167a	[VE] v512\|256 f32\|64 fneg isel and tests fneg instruction isel and tests. We do this also in preparation of fused negatate-multiple-add fp operations. Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D121620	2022-03-16 11:31:26 +01:00
Florian Hahn	e5822ded56	[FunctionAttrs] Infer argmemonly . This patch adds initial argmemonly inference, by checking the underlying objects of locations returned by MemoryLocation. I think this should cover most cases, except function calls to other argmemonly functions. I'm not sure if there's a reason why we don't infer those yet. Additional argmemonly can improve codegen in some cases. It also makes it easier to come up with a C reproducer for `7662d1687b` (already fixed, but I'm trying to see if C/C++ fuzzing could help to uncover similar issues.) Compile-time impact: NewPM-O3: +0.01% NewPM-ReleaseThinLTO: +0.03% NewPM-ReleaseLTO+g: +0.05% https://llvm-compile-time-tracker.com/compare.php?from=067c035012fc061ad6378458774ac2df117283c6&to=fe209d4aab5b593bd62d18c0876732ddcca1614d&stat=instructions Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D121415	2022-03-16 10:24:33 +00:00
Nikita Popov	20531b3a6b	[RelLookupTableConverter] Avoid querying TTI for declarations This code queries TTI on a single function, which is considered to be representative. This is a bit odd, but probably fine in practice. However, I think we should at least avoid querying declarations, which e.g. will generally lack target attributes, and for which we don't seem to ever query TTI in other places.	2022-03-16 10:39:28 +01:00
Matthias Gehre	09854f2af3	[SelectionDAG] Emit calls to __divei4 and friends for division/remainder of large integers Emit calls to __divei4 and friends for divison/remainder of large integers. This fixes https://github.com/llvm/llvm-project/issues/44994. The overall RFC is in https://discourse.llvm.org/t/rfc-add-support-for-division-of-large-bitint-builtins-selectiondag-globalisel-clang/60329 The compiler-rt part is in https://reviews.llvm.org/D120327 Differential Revision: https://reviews.llvm.org/D120329	2022-03-16 09:36:28 +00:00
Nikita Popov	f3cbe60aa9	[AAEval] Remove unused function (NFC)	2022-03-16 10:25:45 +01:00

1 2 3 4 5 ...

156499 Commits