llvm-project

Commit Graph

Author	SHA1	Message	Date
Nathan Chancellor	e6b086bef2	Revert "[InstCombine] Fix miscompile on GEP+load to icmp fold (PR45210)" This reverts commit `4f2fd3818b`. The Linux kernel fails to build after this commit. See https://reviews.llvm.org/D99481 for a reproducer. Signed-off-by: Nathan Chancellor <nathan@kernel.org>	2021-05-31 20:21:26 -07:00
Arthur Eubanks	372237487e	[OpaquePtr] Remove some uses of PointerType::getElementType()	2021-05-31 16:11:25 -07:00
Albion Fung	db26cd30b6	[PowerPC] Improve f32 to i32 bitcast code gen The code gen for f32 to i32 bitcast is not currently the most efficient; this patch removes some unneccessary instructions gerneated. Differential revision: https://reviews.llvm.org/D100782	2021-05-31 16:00:58 -05:00
Congzhe Cao	bfefde22b6	[LoopInterhcange] Handle movement of reduction phis appropriately This patch fixes pr43326 and pr48212. Currently when we move reduction phis to the right place, loop interchange assumes the first phi in loop headers is an induction phi, skips the first phi and assumes the rest of phis are candidate reduction phis to move. However, it may not always be the case. This patch loops over all phis in loop headers and considers a phi node as a candidate reduction phi to move only when it is indeed a reduction phi across outer and inner loop. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D102743	2021-05-31 16:27:38 -04:00
Florian Hahn	aa00b1d763	[LV] Try to sink users recursively for first-order recurrences. Update isFirstOrderRecurrence to explore all uses of a recurrence phi and check if we can sink them. If there are multiple users to sink, they are all mapped to the previous instruction. Fixes PR44286 (and another PR or two). Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D84951	2021-05-31 19:55:33 +01:00
Arthur Eubanks	2c3afa3237	[OpaquePtr] Clean up some uses of Type::getPointerElementType() These depend on pointee types.	2021-05-31 09:54:57 -07:00
Andrea Di Biagio	9853d0db1e	[MCA][NFCI] Minor changes to InstrBuilder and Instruction. This is based on the assumption that most simulated instructions don't define more than one or two registers. This is true for example on x86, where most instruction definitions don't declare more than one register write. The default code region size has been increased from 8 to 16. This is based on the assumption that, for small microbenchmarks, the typical code snippet size is often less than 16 instructions. mca::Instruction now uses bitfields to pack flags. No functional change intended.	2021-05-31 17:05:13 +01:00
Arthur Eubanks	8815ce03e8	Remove "Rewrite Symbols" from codegen pipeline It breaks up the function pass manager in the codegen pipeline. With empty parameters, it looks at the -mllvm flag -rewrite-map-file. This is likely not in use. Add a check that we only have one function pass manager in the codegen pipeline. Some tests relied on the fact that we had a module pass somewhere in the codegen pipeline. addr-label.ll crashes on ARM due to this change. This is because a ARMConstantPoolConstant containing a BasicBlock to represent a blockaddress may hold an invalid pointer to a BasicBlock if the blockaddress is invalidated by its BasicBlock getting removed. In that case all referencing blockaddresses are RAUW a constant int. Making ARMConstantPoolConstant::CVal a WeakVH fixes the crash, but I'm not sure that's the right fix. As a workaround, create a barrier right before ISel so that IR optimizations can't happen while a ARMConstantPoolConstant has been created. Reviewed By: rnk, MaskRay, compnerd Differential Revision: https://reviews.llvm.org/D99707	2021-05-31 08:32:36 -07:00
Anirudh Prasad	b8dcd920ec	[AsmParser][SystemZ][z/OS] Introducing HLASM Parser support to AsmParser - Part 2 - This patch is the second (and hopefully final) part of providing HLASM syntax for inline asm statements for z/OS to LLVM (continuing on from https://reviews.llvm.org/D98276) - This second part deals with providing label support - As mentioned in https://reviews.llvm.org/D98276, if the first token is not a space we process the first token as a label, and the remaining tokens as a possible machine instruction - To achieve this, a new `parseAsHLASMLabel` function is introduced. This function processes the first token, validates whether it is an "acceptable" label according to HLASM standards, and then emits it - After handling and emitting the label, call the `parseAsMachineInstruction` instruction to process the remaining tokens as a machine instruction. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D103320	2021-05-31 11:27:02 -04:00
Daniil Fukalov	e853d3b274	[NFC] MemoryDependenceAnalysis cleanup. 1. Removed redundant includes, 2. Removed never defined and used `releaseMemory()`. 3. Fixed member functions names first letter case. 4. Renamed duplicate (in nested struct `NonLocalPointerInfo`) name `NonLocalDeps` to `NonLocalDepsMap`. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D102358	2021-05-31 18:07:55 +03:00
Sanjay Patel	63fe4cb082	[SDAG] add check to sext-of-setcc fold to bypass changing a legal op I accidentaly pushed a draft of D103280 that was discussed during the review, but it was not supposed to be the final version. Rather than revert and recommit, I'm updating the existing code. This way we have a record of the codegen diff that would result if we decide to remove this predicate in the future.	2021-05-31 08:58:11 -04:00
Roman Lebedev	f7c95c3322	[NFC] ScalarEvolution: apply SSO to the ExprValueMap value ExprValueMap is a map from SCEV * to a set-vector of (Value , ConstantInt ) pair, and while the map itself will likely be big-ish (have many keys), it is a reasonable assumption that each key will refer to a small-ish number of pairs. In particular looking at n=512 case from https://bugs.llvm.org/show_bug.cgi?id=50384, the small-size of 4 appears to be the sweet spot, it results in the least allocations while minimizing memory footprint. ``` $ for i in $(ls heaptrack.opt.*.gz); do echo $i; heaptrack_print $i \| tail -n 6; echo ""; done heaptrack.opt.0-orig.gz total runtime: 14.32s. calls to allocation functions: 8222442 (574192/s) temporary memory allocations: `2419000` (168924/s) peak heap memory consumption: 190.98MB peak RSS (including heaptrack overhead): 239.65MB total memory leaked: 67.58KB heaptrack.opt.1-n1.gz total runtime: 13.72s. calls to allocation functions: 7184188 (523705/s) temporary memory allocations: 2419017 (176338/s) peak heap memory consumption: 191.38MB peak RSS (including heaptrack overhead): 239.64MB total memory leaked: 67.58KB heaptrack.opt.2-n2.gz total runtime: 12.24s. calls to allocation functions: 6146827 (502355/s) temporary memory allocations: 2418997 (197695/s) peak heap memory consumption: 163.31MB peak RSS (including heaptrack overhead): 211.01MB total memory leaked: 67.58KB heaptrack.opt.3-n4.gz total runtime: 12.28s. calls to allocation functions: 6068532 (494260/s) temporary memory allocations: 2418985 (197017/s) peak heap memory consumption: 155.43MB peak RSS (including heaptrack overhead): 201.77MB total memory leaked: 67.58KB heaptrack.opt.4-n8.gz total runtime: 12.06s. calls to allocation functions: 6068042 (503321/s) temporary memory allocations: 2418992 (200646/s) peak heap memory consumption: 166.03MB peak RSS (including heaptrack overhead): 213.55MB total memory leaked: 67.58KB heaptrack.opt.5-n16.gz total runtime: 12.14s. calls to allocation functions: 6067993 (499958/s) temporary memory allocations: 2418999 (199307/s) peak heap memory consumption: 187.24MB peak RSS (including heaptrack overhead): 233.69MB total memory leaked: 67.58KB ``` While that test may be an edge worst-case scenario, https://llvm-compile-time-tracker.com/compare.php?from=dee85d47d9f15fc268f7b18f279dac2774836615&to=98a57e31b1947d5bcdf4a5605ac2ab32b4bd5f63&stat=instructions agrees that this also results in improvements in the usual situations.	2021-05-31 15:34:03 +03:00
Sanjay Patel	434c8e013a	[SDAG] try harder to fold casts into vector compare sext (vsetcc X, Y) --> vsetcc (zext X), (zext Y) -- (when the zexts are free and a bunch of other conditions) We have a couple of similar folds to this already for vector selects, but this pattern slips through because it is only a setcc. The tests are based on the motivating case from: https://llvm.org/PR50055 ...but we need extra logic to get that example, so I've left that as a TODO for now. Differential Revision: https://reviews.llvm.org/D103280	2021-05-31 07:14:01 -04:00
Djordje Todorovic	dee85d47d9	[LiveDebugVariables] Stop trimming locations of non-inlined vars The D35953, D62650 and D73691 introduced trimming of variables locations in LiveDebugVariables pass, since there are some cases where after the virtregrewrite we have exploded number of DBG_VALUEs created for some inlined variables. As it looks, all problematic cases were regarding inlined variables, so it seems reasonable to stop trimming the location ranges for non-inlined variables. It has very good impact on the llvm-locstats report. Differential Revision: https://reviews.llvm.org/D102917	2021-05-31 02:59:19 -07:00
Fraser Cormack	2b37c405cc	[RISCV] Scale scalably-typed split argument offsets by VSCALE This patch fixes a bug in lowering scalable-vector types in RISC-V's main calling convention. When scalable-vector types are split and passed indirectly, the target is responsible for scaling the offset -- initially set to the known-minimum store size -- by the scalable factor. Before this we were issuing overlapping loads or stores to the different parts, leading to incorrect codegen. Credit to @HsiangKai for spotting this. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D103262	2021-05-31 10:43:13 +01:00
Juneyoung Lee	7161bb87c9	[InsCombine] Fix a few remaining vec transforms to use poison instead of undef This is a patch that replaces shufflevector and insertelement's placeholder value with poison. Underlying motivation is to fix the semantics of shufflevector with undef mask to return poison instead (D93818) The consensus has been made in the late 2020 via mailing list as well as the thread in https://bugs.llvm.org/show_bug.cgi?id=44185 . This patch is a simple syntactic change to the existing code, hence directly pushed as a commit.	2021-05-31 18:47:09 +09:00
David Green	222aeb4d51	[DSE] Remove stores in the same loop iteration DSE will currently only remove stores in the same block unless they can be guaranteed to be loop invariant. This expands that to any stores that are in the same Loop, at the same loop level. This should still account for where AA/MSSA will not handle aliasing between loops, but allow the dead stores to be removed where they overlap in the same loop iteration. It requires adding loop info to DSE, but that looks fairly harmless. The test case this helps is from code like this, which can come up in certain matrix operations: for(i=..) dst[i] = 0; for(j=..) dst[i] += src[in+j]; After LICM, this becomes: for(i=..) dst[i] = 0; sum = 0; for(j=..) sum += src[in+j]; dst[i] = sum; The first store is dead, and with this patch is now removed. Differntial Revision: https://reviews.llvm.org/D100464	2021-05-31 10:22:37 +01:00
Fraser Cormack	eb23936591	[RISCV] Support vector conversions between fp and i1 This patch custom lowers FP_TO_[US]INT and [US]INT_TO_FP conversions between floating-point and boolean vectors. As the default action is scalarization, this patch both supports scalable-vector conversions and improves the code generation for fixed-length vectors. The lowering for these conversions can piggy-back on the existing lowering, which lowers the operations to a supported narrowing/widening conversion and then either an extension or truncation. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103312	2021-05-31 09:55:39 +01:00
Andy Wingo	bc1ad6e3c4	Revert "[WebAssembly][CodeGen] IR support for WebAssembly local variables" This reverts commit `bf35f4af51`. There was an error in a shared-library build.	2021-05-31 10:55:15 +02:00
Andy Wingo	bf35f4af51	[WebAssembly][CodeGen] IR support for WebAssembly local variables This patch adds TargetStackID::WasmLocal. This stack holds locations of values that are only addressable by name -- not via a pointer to memory. For the WebAssembly target, these objects are lowered to WebAssembly local variables, which are managed by the WebAssembly run-time and are not addressable by linear memory. For the WebAssembly target IR indicates that an AllocaInst should be put on TargetStackID::WasmLocal by putting it in the non-integral address space WASM_ADDRESS_SPACE_WASM_VAR, with value 1. SROA will mostly lift these allocations to SSA locals, but any alloca that reaches instruction selection (usually in non-optimized builds) will be assigned the new TargetStackID there. Loads and stores to those values are transformed to new WebAssemblyISD::LOCAL_GET / WebAssemblyISD::LOCAL_SET nodes, which then lower to the type-specific LOCAL_GET_I32 etc instructions via tablegen patterns. Differential Revision: https://reviews.llvm.org/D101140	2021-05-31 10:40:38 +02:00
Hyeongyu Kim	4f2fd3818b	[InstCombine] Fix miscompile on GEP+load to icmp fold (PR45210) As noted in PR45210: https://bugs.llvm.org/show_bug.cgi?id=45210 ...the bug is triggered as Eli say when sext(idx) * ElementSize overflows. ``` // assume that GV is an array of 4-byte elements GEP = gep GV, 0, Idx // this is accessing Idx * 4 L = load GEP ICI = icmp eq L, value => ICI = icmp eq Idx, NewIdx ``` The foldCmpLoadFromIndexedGlobal function simplifies GEP+load operation to icmp. And there is a problem because Idx * ElementSize can overflow. Let's assume that the wanted value is at offset 0. Then, there are actually four possible values for Idx to match offset 0: 0x00..00, 0x40..00, 0x80..00, 0xC0..00. We should return true for all these values, but currently, the new icmp only returns true for 0x00..00. This problem can be solved by masking off (trailing zeros of ElementSize) bits from Idx. ``` ... => Idx' = and Idx, 0x3F..FF ICI = icmp eq Idx', NewIdx ``` Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D99481	2021-05-31 14:08:20 +09:00
Ben Shi	22668c6e1f	[AVR][NFC] Refactor 8-bit & 16-bit shifts Reviewed By: dylanmckay Differential Revision: https://reviews.llvm.org/D98335	2021-05-31 10:30:46 +08:00
David Green	2176be556b	[ARM] Guard against loop variant gather ptr operands This ensures that the operands of any gather/scatter instructions that we attempt to push out of the loop are invariant, preventing invalid IR from being generated.	2021-05-30 18:02:14 +01:00
Ben Shi	86812faa5f	[AVR] Improve inline assembly Reviewed By: dylanmckay Differential Revision: https://reviews.llvm.org/D96394	2021-05-30 23:44:43 +08:00
Sanjay Patel	7bb8bfa062	[InstCombine] fix miscompile from vector select substitution This is similar to the fix in `c590a9880d` ( PR49832 ), but we missed handling the pattern for select of bools (no compare inst). We can't substitute a vector value because the equality condition replacement that we are attempting requires that the condition is true/false for the entire value. Vector select can be partly true/false. I added an assert for vector types, so we shouldn't hit this again. Fixed formatting while auditing the callers. https://llvm.org/PR50500	2021-05-30 07:11:58 -04:00
Florian Hahn	126f90b252	[DAGCombine] Poison-prove scalarizeExtractedVectorLoad. extractelement is poison if the index is out-of-bounds, so just scalarizing the load may introduce an out-of-bounds load, which is UB. To avoid introducing new UB, we can mask the index so it only contains valid indices. Fixes PR50382. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D103077	2021-05-30 11:40:55 +01:00
Mindong Chen	71acce68da	[NFCI] Move DEBUG_TYPE definition below #includes When you try to define a new DEBUG_TYPE in a header file, DEBUG_TYPE definition defined around the #includes in files include it could result in redefinition warnings even compile errors. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D102594	2021-05-30 17:31:01 +08:00
Pengxuan Zheng	056733d019	[SafeStack] Use proper API to get stack guard Using the proper API automatically sets `__stack_chk_guard` to `dso_local` if `Reloc::Static`. This wasn't strictly necessary until recently when dso_local was no longer implied by `TargetMachine::shouldAssumeDSOLocal` for `__stack_chk_guard`. By using the proper API, we can avoid generating unnecessary GOT relocations. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D102646	2021-05-30 00:52:48 -07:00
Arthur Eubanks	71cca4f728	Revert "[TargetLowering] Only inspect attributes in the arguments for ArgListEntry" This reverts commit `1c7f32334d`. Some code still needs to properly set parameter ABI attributes, see D101806.	2021-05-29 23:08:15 -07:00
Arthur Eubanks	3a6f12f915	Revert "[NFC] Use ArgListEntry indirect types more in ISel lowering" This reverts commit `bc7d15c61d`. Dependent change is to be reverted.	2021-05-29 22:40:33 -07:00
David Green	65831422a9	[ARM] Guard against WhileLoopStart kill flags If the operand of the WhileLoopStart is flagged as killed, that currently gets propogated to both the t2CMPri as the instruction is reverted, and the newly created t2DoLoopStart. Only the second should remain as killing the operand, the first dropping the flags.	2021-05-29 21:04:26 +01:00
Jessica Clarke	00dfd4f870	Revert "[RISCV] Remove -riscv-no-aliases in favour of new -M no-aliases" The replacement doesn't work for llc, but it is needed by patchable-function-entry.ll. This reverts commit `aa9a30b83a`.	2021-05-29 15:11:37 +01:00
Jessica Clarke	762f707c00	[Support] Fix getMainExecutable on FreeBSD when called via an absolute path On FreeBSD, absolute paths are passed unmodified in AT_EXECPATH, but relative paths are resolved to absolute paths, and any symlinks will be followed in the process. This means that the resource dir calculation will be wrong if Clang is invoked as an absolute path to a symlink, and this currently causes clang/test/Driver/rocm-detect.hip to fail on FreeBSD. Thus, make sure to call realpath on the result, just like is done on macOS. Whilst here, clean up the old fallback auxargs loop to use the actual type for auxargs rather than using lots of hacky casts that rely on addresses and pointers being the same (which is not the case on CHERI, and thus Arm's prototype Morello, although for little-endian systems it happens to work still as the word-sized integer will be padded to a full pointer, and it's someone academic given dereferencing past the end of environ will give a bounds fault, but CheriBSD is new enough that the elf_aux_info path will be used). This also makes the code easier to follow, and removes the confusing double-increment of p. Reviewed By: dim, arichardson Differential Revision: https://reviews.llvm.org/D103346	2021-05-29 14:59:46 +01:00
Jessica Clarke	aa9a30b83a	[RISCV] Remove -riscv-no-aliases in favour of new -M no-aliases Whilst here, also remove a couple of unnecessary -o - instances. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D103201	2021-05-29 14:58:28 +01:00
Sanjay Patel	c7da0c383a	[InstCombine] fold zext of masked bit set/clear This does not solve PR17101, but it is one of the underlying diffs noted here: https://bugs.llvm.org/show_bug.cgi?id=17101#c8 We could ease the one-use checks for the 'clear' (no 'not' op) half of the transform, but I do not know if that asymmetry would make things better or worse. Proofs: https://rise4fun.com/Alive/uVB Name: masked bit set %sh1 = shl i32 1, %y %and = and i32 %sh1, %x %cmp = icmp ne i32 %and, 0 %r = zext i1 %cmp to i32 => %s = lshr i32 %x, %y %r = and i32 %s, 1 Name: masked bit clear %sh1 = shl i32 1, %y %and = and i32 %sh1, %x %cmp = icmp eq i32 %and, 0 %r = zext i1 %cmp to i32 => %xn = xor i32 %x, -1 %s = lshr i32 %xn, %y %r = and i32 %s, 1 Note: this is a re-post of a patch that I committed at: rGa041c4ec6f7a The commit was reverted because it exposed another bug: rGb212eb7159b40 But that has since been corrected with: rG8a156d1c2795189 ( D101191 ) Differential Revision: https://reviews.llvm.org/D72396	2021-05-29 08:52:26 -04:00
Sanjay Patel	52f2970036	[InstCombine] reduce code duplication; NFC	2021-05-29 08:33:25 -04:00
Ulrich Weigand	c123c178b2	[SystemZ] Set getExtendForAtomicOps to ISD::ANY_EXTEND The implementation of subword atomics does not actually guarantee the result is zero-extended, which now caused build bot failures after https://reviews.llvm.org/D101342 was landed.	2021-05-29 12:15:18 +02:00
Nikita Popov	625920dabf	[LoopUnroll] Make DomTree explicitly required (NFC) Some of the code was already assuming that DT is non-null, so make that requirement more explicit and remove unnecessary null checks.	2021-05-29 09:37:32 +02:00
LemonBoy	b577ec4956	[AtomicExpandPass][AArch64] Promote xchg with floating-point types to integer ones Follow the same strategy used for atomic loads/stores by converting the operands to equally-sized integer types. This change prevents the atomic expansion pass from generating illegal LL/SC pairs when targeting AArch64: `expand-atomicrmw-xchg-fp.ll` would previously instantiate intrinsics such as `llvm.aarch64.ldaxr.p0f32` that cannot be lowered. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D103232	2021-05-29 08:57:27 +02:00
Luke	c4c3869554	[RISCV] Enable interleaved vectorization for RVV Enable interleaved vectorization for RVV. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D101469	2021-05-29 11:03:27 +08:00
Fangrui Song	38dbdde792	[Internalize] Simplify comdat renaming with noduplicates after D103043 I realized that we can use `comdat noduplicates` which is available on ELF. Add a special case for wasm which doesn't support the feature.	2021-05-28 16:58:38 -07:00
Amara Emerson	018a9641ff	[AArch64][GlobalISel] Fix a crash during selection of a G_ZEXT(s8 = G_LOAD) We have special handling for a zext of a load <32b because the load does a zext for free. In that case, we just select the G_ZEXT as if it were a copy but this triggered the copy checking code to balk at the mismatched size. This was being hidden because normally these get combined into G_ZEXTLOAD but for atomics this doesn't happen. The test case here just uses a normal load because the particular atomic isn't supported yet anyway.	2021-05-28 16:35:24 -07:00
Nikita Popov	90310dfff8	[LoopUnroll] Use changeToUnreachable() (NFC) When fulling unrolling with a non-latch exit, the latch block is folded to unreachable. Replace this folding with the existing changeToUnreachable() helper, rather than performing it manually. This also moves the fold to happen after the manual DT update for exit blocks. I believe this is correct in that the conversion of an unconditional backedge into unreachable should not affect the DT at all. Differential Revision: https://reviews.llvm.org/D103340	2021-05-29 00:11:21 +02:00
Craig Topper	bc6799f2f7	[RISCV] Add separate MxList tablegen classes for widening/narrowing and sext.zext.vf2/4/8. NFC This is cleaner than slicing the MxList to remove elements from the beginning or end since that requires hardcoding the size. I don't expect the size of the list to change, but we shouldn't repeat it in multiple places.	2021-05-28 14:06:19 -07:00
Nikita Popov	f765445a69	[LoopUnroll] Clean up exit folding (NFC) This does some non-functional cleanup of exit folding during unrolling. The two main changes are: * First rewrite latch->header edges, which is unrelated to exit folding. * Combine folding for latch and non-latch exits. After the previous change, the only difference in their logic is that for non-latch exits we currently only fold "known non-exit" cases, but not "known exit" cases. I think this helps a lot to clarify this code and prepare it for future changes. Differential Revision: https://reviews.llvm.org/D103333	2021-05-28 22:31:13 +02:00
Bardia Mahjour	06eaffa858	[NFC] Remove confusing info about MainLoop VF/UF from debug message	2021-05-28 16:10:04 -04:00
Eli Friedman	0b3b0a727a	[AArch64][RISCV] Make sure isel correctly honors failure orderings. If a cmpxchg specifies acquire or seq_cst on failure, make sure we generate code consistent with that ordering even if the success ordering is not acquire/seq_cst. At one point, it was ambiguous whether this sort of construct was valid, but the C++ standad and LLVM now accept arbitrary combinations of success/failure orderings. This doesn't address the corresponding issue in AtomicExpand. (This was reported as https://bugs.llvm.org/show_bug.cgi?id=33332 .) Fixes https://bugs.llvm.org/show_bug.cgi?id=50512. Differential Revision: https://reviews.llvm.org/D103284	2021-05-28 12:47:40 -07:00
Craig Topper	58cb649212	[RISCV] Add octuple to LMULInfo tablegen class, remove octuple_from_str. NFCI octuple_from_str was always used with the MX field from an LMULInfo. Might as well just precompute it and put it in the class.	2021-05-28 11:53:05 -07:00
Craig Topper	2830d924b0	[VP] Make getMaskParamPos/getVectorLengthParamPos return unsigned. Lowercase function names. Parameter positions seem like they should be unsigned. While there, make function names lowercase per coding standards. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D103224	2021-05-28 11:28:47 -07:00
Craig Topper	d24d2447cd	[SelectionDAG] Fix typo in assert. NFC	2021-05-28 10:37:11 -07:00
Florian Hahn	007f268c35	[VectorCombine] Check indices for all extracts we scalarize. We need to make sure that the indices of all extracts we scalarize are valid.	2021-05-28 18:35:29 +01:00
Stefan Pintilie	0159652058	Revert "Return "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration" (try 2)" This reverts commit `be1a23203b`.	2021-05-28 12:21:22 -05:00
Stefan Pintilie	24bd657202	Revert "[NFCI][LoopDeletion] Only query SCEV about loop successor if another successor is also in loop" This reverts commit `b0b2bf3b5d`.	2021-05-28 12:21:22 -05:00
Stefan Pintilie	fd55331203	Revert "[NFC] Formatting fix" This reverts commit `59d938e649`.	2021-05-28 12:21:22 -05:00
Stefan Pintilie	807fc7cdc9	Revert "[NFC] Reuse existing variables instead of re-requesting successors" This reverts commit `c467585682`.	2021-05-28 12:21:22 -05:00
Stefan Pintilie	dd226803c2	Revert "[NFCI][LoopDeletion] Do not call complex analysis for known non-zero BTC" This reverts commit `7d418dadf6`.	2021-05-28 12:21:21 -05:00
Sanjay Patel	403cfe5d70	[PassManager] unify late simplifycfg options between regular and LTO pipelines This is split off from D102002, and I think it is clear that the difference in behavior was not intended. Options were added to SimplifyCFG over time, but different chunks of the pass pipelines were not kept in sync.	2021-05-28 13:06:49 -04:00
Nemanja Ivanovic	e0c8265437	Revert "Fix "enumerator 'llvm::TargetStackID::WasmLocal' in switch of enum 'llvm::TargetStackID::Value' is not handled" MSVC warnings. NFCI." Since `ca5f07f8c4` already reverted the cause for this warning, this commit now causes warnings about a default label in a switch that covers the enum. This reverts commit `cf2eeb114c`.	2021-05-28 10:53:49 -05:00
eopXD	fa488ea864	[LoopNest][LoopFlatten] Change LoopFlattenPass to LoopNest pass This patch changes LoopFlattenPass from FunctionPass to LoopNestPass. Utilize LoopNest and let function 'Flatten' generate information from it. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D102904	2021-05-28 15:43:12 +00:00
David Stenberg	b6e1fb7e32	[IR] Make TypeFinder aware of DIArgList values TypeFinder did not find types under DIArgList. This resulted in a case of invalid IR after GlobalOpt removed a global that was the only non-DIArgList use of a struct type. error: use of undefined type named 'struct.S' call void @llvm.dbg.value( metadata !DIArgList([1 x %struct.S]* undef, i64 %idxprom), metadata !24, metadata !DIExpression([...])) Reviewed By: jmorse Differential Revision: https://reviews.llvm.org/D103306	2021-05-28 17:09:45 +02:00
Simon Pilgrim	cf2eeb114c	Fix "enumerator 'llvm::TargetStackID::WasmLocal' in switch of enum 'llvm::TargetStackID::Value' is not handled" MSVC warnings. NFCI.	2021-05-28 12:47:22 +01:00
Andy Wingo	ca5f07f8c4	Revert "[WebAssembly][CodeGen] IR support for WebAssembly local variables" This reverts commit `00ecf18979`, as it broke the AMDGPU build. Will reland later with a fix.	2021-05-28 12:42:12 +02:00
Tim Northover	9ff2eb1ea5	SwiftTailCC: teach verifier musttail rules applicable to this CC. SwiftTailCC has a different set of requirements than the C calling convention for a tail call. The exact argument sequence doesn't have to match, but fewer ABI-affecting attributes are allowed. Also make sure the musttail diagnostic triggers if a musttail call isn't actually a tail call.	2021-05-28 11:12:00 +01:00
Tim Northover	d88f96dff3	ARM: support mandatory tail calls for tailcc & swifttailcc This adds support for callee-pop conventions to the ARM backend so that it can ensure a call marked "tail" is actually a tail call.	2021-05-28 11:10:51 +01:00
dongAxis	66ff1cbd71	[NFC][Transforms][Utils] remove useless variable in CloneBasicBlock	2021-05-28 17:50:38 +08:00
Florian Hahn	ec1f6f7e3f	Revert "[LAA] Support pointer phis in loop by analyzing each incoming pointer." This reverts commit `1ed7f8ede5`. This change can cause loop-distribute to crash in some cases. Revert until I have more time to wrap up a fix. See PR50296, PR5028 and D102266.	2021-05-28 10:33:52 +01:00
Sebastian Neubauer	690f5b7a01	[AMDGPU] Fix function calls with flat scratch When flat scratch is used, the stack pointer needs to be added when writing arguments to the stack. For buffer instructions, this is done in SelectMUBUFScratchOffen and SelectMUBUFScratchOffset. Move that to call argument lowering, like it is done in GlobalISel. Differential Revision: https://reviews.llvm.org/D103166	2021-05-28 11:22:13 +02:00
Andy Wingo	00ecf18979	[WebAssembly][CodeGen] IR support for WebAssembly local variables This patch adds TargetStackID::WasmLocal. This stack holds locations of values that are only addressable by name -- not via a pointer to memory. For the WebAssembly target, these objects are lowered to WebAssembly local variables, which are managed by the WebAssembly run-time and are not addressable by linear memory. For the WebAssembly target IR indicates that an AllocaInst should be put on TargetStackID::WasmLocal by putting it in the non-integral address space WASM_ADDRESS_SPACE_WASM_VAR, with value 1. SROA will mostly lift these allocations to SSA locals, but any alloca that reaches instruction selection (usually in non-optimized builds) will be assigned the new TargetStackID there. Loads and stores to those values are transformed to new WebAssemblyISD::LOCAL_GET / WebAssemblyISD::LOCAL_SET nodes, which then lower to the type-specific LOCAL_GET_I32 etc instructions via tablegen patterns. Differential Revision: https://reviews.llvm.org/D101140	2021-05-28 11:07:41 +02:00
Yang Fan	f2264ebb08	[ConstantFolding] Fix -Wunused-variable warning (NFC) GCC warning: ``` /llvm-project/llvm/lib/Analysis/ConstantFolding.cpp: In function ‘llvm::Constant* llvm::ConstantFoldLoadFromConstPtr(llvm::Constant, llvm::Type, const llvm::DataLayout&)’: /llvm-project/llvm/lib/Analysis/ConstantFolding.cpp:713:19: warning: unused variable ‘SimplifiedGEP’ [-Wunused-variable] 713 \| if (auto *SimplifiedGEP = dyn_cast<GEPOperator>(Simplified)) { \| ^~~~~~~~~~~~~ ```	2021-05-28 16:17:12 +08:00
eopXD	e96d6f4821	Revert "[LoopNest][LoopFlatten] Change LoopFlattenPass to LoopNest pass" This reverts commit `7952ddb21f`. Differential Revision: https://reviews.llvm.org/D103302	2021-05-28 07:58:06 +00:00
eopXD	7e06cf8f1b	Revert "[LoopNest][LoopFlatten] Change LoopFlattenPass to LoopNest pass" This reverts commit `ffc4d3e068`.	2021-05-28 07:48:04 +00:00
eopXD	ffc4d3e068	[LoopNest][LoopFlatten] Change LoopFlattenPass to LoopNest pass This patch changes LoopFlattenPass from FunctionPass to LoopNestPass. Utilize LoopNest and let function 'Flatten' generate information from it. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D102904	2021-05-28 07:25:53 +00:00
eopXD	7952ddb21f	[LoopNest][LoopFlatten] Change LoopFlattenPass to LoopNest pass This patch changes LoopFlattenPass from FunctionPass to LoopNestPass. Utilize LoopNest and let function 'Flatten' generate information from it. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D102904	2021-05-28 07:11:26 +00:00
Amara Emerson	59a4ee9728	[AArch64][GlobalISel] Legalize oversize G_EXTRACT_VECTOR_ELT sources. Also changes the fewerElements helper to use the lookthrough constant helper instead of m_ICst, since m_ICst doesn't look through extends. Differential Revision: https://reviews.llvm.org/D103227	2021-05-27 23:52:24 -07:00
Max Kazantsev	6a2af607ad	Revert "[NFCI] Lazily evaluate SCEVs of PHIs" This reverts commit `51d334a845`. Reported failures, need to analyze.	2021-05-28 11:05:30 +07:00
Jinsong Ji	b2581196eb	[AIX] Enable stackprotect feature AIX use `__ssp_canary_word` instead of `__stack_chk_guard`. This patch update the target hook to use correct symbol, so that the basic stackprotect feature can work. The traceback will be handled in follow up patch. Reviewed By: #powerpc, shchenz Differential Revision: https://reviews.llvm.org/D103100	2021-05-28 02:18:15 +00:00
Jianzhou Zhao	fc1d39849e	[dfsan] Add a flag about whether to propagate offset labels at gep DFSan has flags to control flows between pointers and objects referred by pointers. For example, a = p; L(a) = L(p) when -dfsan-combine-pointer-labels-on-load = false L(a) = L(p) + L(p) when -dfsan-combine-pointer-labels-on-load = true p = b; L(p) = L(b) when -dfsan-combine-pointer-labels-on-store = false L(p) = L(b) + L(p) when -dfsan-combine-pointer-labels-on-store = true The question is what to do with p += c. In practice we found many confusing flows if we propagate labels from c to p. So a new flag works like this p += c; L(p) = L(p) when -dfsan-propagate-via-pointer-arithmetic = false L(p) = L(p) + L(c) when -dfsan-propagate-via-pointer-arithmetic = true Reviewed-by: gbalats Differential Revision: https://reviews.llvm.org/D103176	2021-05-28 00:06:19 +00:00
Jordan Rupprecht	e41aaea262	[NFC][libObject] clang-format Archive{.h,.cpp} In preparation for D100651	2021-05-27 16:48:40 -07:00
Andrea Di Biagio	57646d38d5	[MCA] Minor changes to the InOrderIssueStage. NFC The constructor of InOrderIssueStage no longer takes as input a reference to the target scheduling model. The stage can always query the subtarget to obtain a reference to the scheduling model. The ResourceManager is no longer stored internally as a unique_ptr. Moved a couple of method definitions to the .cpp file.	2021-05-28 00:33:59 +01:00
Arthur Eubanks	8086f9d87e	[ConstFold] Simplify a load's GEP operand through local aliases MSVC-style RTTI produces loads through a GEP of a local alias which itself is a GEP. Currently we aren't able to devirtualize any virtual calls when MSVC RTTI is enabled. This patch attempts to simplify a load's GEP operand by calling SymbolicallyEvaluateGEP() with an option to look through local aliases. Differential Revision: https://reviews.llvm.org/D101100	2021-05-27 16:04:19 -07:00
Craig Topper	0fa5aac292	[RISCV] Teach VSETVLI insertion to look through PHIs to prove we don't need to insert a vsetvli. If an instruction's AVL operand is a PHI node in the same block, we may be able to peek through the PHI to find vsetvli instructions that produce the AVL in other basic blocks. If we can prove those vsetvli instructions have the same VTYPE and were the last vsetvli in their respective blocks, then we don't need to insert a vsetvli for this pseudo instruction. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D103277	2021-05-27 15:34:08 -07:00
Arthur Eubanks	2d2a902078	[SanCov] Properly set ABI parameter attributes Arguments need to have the proper ABI parameter attributes set. Followup to D101806. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D103288	2021-05-27 15:27:21 -07:00
Andrea Di Biagio	50770d8de5	[MCA] Refactor the InOrderIssueStage stage. NFCI Moved the logic that checks for RAW hazards from the InOrderIssueStage to the RegisterFile. Changed how the InOrderIssueStage keeps track of backend stalls. Stall events are now generated from method notifyStallEvent(). No functional change intended.	2021-05-27 22:28:04 +01:00
Quinn Pham	62b5df7fe2	[PowerPC] Added multiple PowerPC builtins This is the first in a series of patches to provide builtins for compatibility with the XL compiler. Most of the builtins already had intrinsics and only needed to be implemented in the front end. Intrinsics were created for the three iospace builtins, eieio, and icbt. Pseudo instructions were created for eieio and iospace_eieio to ensure that nops were inserted before the eieio instruction. Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D102443	2021-05-27 16:23:03 -05:00
Craig Topper	020df692d8	[RISCV] Fix typo, use addImm instead of addReg.	2021-05-27 14:04:51 -07:00
Adrian Prantl	f3869a5c32	Support stripping indirectly referenced DILocations from !llvm.loop metadata in stripDebugInfo(). This patch fixes an oversight in https://reviews.llvm.org/D96181 and also takes into account loop metadata pointing to other MDNodes that point into the debug info. rdar://78487175 Differential Revision: https://reviews.llvm.org/D103220	2021-05-27 13:23:33 -07:00
Simon Pilgrim	90d25808c4	[CostModel][X86] Improve accuracy of sext/zext to 256-bit vector costs on AVX1 targets Determined from llvm-mca analysis (btver2 vs bdver2 vs sandybridge), the split+extends+concat sequence on AVX1 capable targets are cheaper than the #ops that the cost was previously based on.	2021-05-27 18:17:50 +01:00
Craig Topper	527cd01314	[RISCV] Teach vsetvli insertion to use vsetvl x0, x0 form when we can tell that VLMAX and AVL haven't changed. This can help avoid needing a virtual register for the vsetvl output when the AVL is X0. For other register AVLs it can shorter the live range of the AVL register if it isn't needed later. There's probably no advantage when AVL is a 5 bit immediate that can use vsetivli. But do it anyway for consistency. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D103215	2021-05-27 10:11:38 -07:00
Craig Topper	a105d3024e	[X86] Fold (shift undef, X)->0 for vector shifts by immediate. We could previously do this by accident through the later call to getTargetConstantBitsFromNode I think, but that only worked if N0 had a single use. This patch makes it explicit for undef and doesn't have a use count check. I think this is needed to move the (shl X, 1)->(add X, X) fold to isel for PR50468. We need to be sure X won't be IMPLICIT_DEF which might prevent the same vreg from being used for both operands. Differential Revision: https://reviews.llvm.org/D103192	2021-05-27 09:31:47 -07:00
maekawatoshiki	2165360003	[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass This patch changes LoopUnrollAndJamPass from FunctionPass to LoopNest pass. The next patch will utilize LoopNest to effectively handle loop nests. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D99149	2021-05-28 01:17:23 +09:00
Matt Arsenault	e892705d74	GlobalISel: Do not change register types in lowerLoad Adjusting the load register type is a widenScalar type action, not a lowering. lowerLoad should be reserved for operations that change the memory access size, such as unaligned load decomposition. With this trying to adjust the register type, it was hard to avoid infinite loops in the legalizer. Adds a bandaid to avoid regressing a few AArch64 tests, but I'm not sure what the exact condition is and there's probably a cleaner way to do this. For AMDGPU this regresses handling of some cases for unaligned loads, but the way this is currently working is a pretty ugly hack.	2021-05-27 11:49:37 -04:00
Nico Weber	192b4141f0	Revert "Emit correct location lists with basic block sections." Breaks check-llvm on non-linux, see comments on https://reviews.llvm.org/D85085 This reverts commit `caae570978` and follow-up commit `1546c52d97`.	2021-05-27 11:42:04 -04:00
Matt Arsenault	5efc3bfd32	AMDGPU/GlobalISel: Use IncomingValueAssigner for implicit return This makes no real difference since we assign the same register either way.	2021-05-27 11:28:52 -04:00
Simon Pilgrim	fe8d97cbe5	[CostModel][X86] AVX512 truncation ops are slower than cost models indicate. The SkylakeServer model (and later IceLake/TigerLake targets according to Agner) have the PMOV truncations as uops=2, rthroughput=2 instructions. Noticed while trying to reduce the diffs between cost tables and llvm-mca analysis.	2021-05-27 16:07:42 +01:00
Matt Arsenault	808dc6f866	VirtRegMap: Preserve LiveDebugVariables This avoids recomputing it between regalloc runs when allocation is split, and also avoids a debug info test regression.	2021-05-27 10:40:14 -04:00
Fraser Cormack	5a80dc4988	[VP][SelectionDAG] Add a target-configurable EVL operand type This patch adds a way for the target to configure the type it uses for the explicit vector length operands of VP SDNodes. The type must be a legal integer type (there is still no target-independent legalization of this operand) and must currently be at least as big as i32, the type used by the IR intrinsics. An implicit zero-extension takes place on targets which choose a larger type. All VP nodes should be created with this type used for the EVL operand. This allows 64-bit RISC-V to avoid custom legalization of all VP nodes, keeping them in their target-independent form for that bit longer. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D103027	2021-05-27 15:27:36 +01:00
Mats Petersson	9091ecdae0	[OpenMP]Add support for workshare loop modifier in lowering When lowering the dynamic, guided, auto and runtime types of scheduling, there is an optional monotonic or non-monotonic modifier. This patch adds support in the OMP IR Builder to pass this down to the runtime functions. Also implements tests for the variants. Differential Revision: https://reviews.llvm.org/D102008	2021-05-27 15:33:05 +01:00
Jamie Schmeiser	3879fcdb87	Reuse temporary files for print-changed=diff Summary: Make the file name and descriptors static so that they are reused by print-changed=diff. This avoids errors about being unable to create temporary files when doing the later comparisons in a large compile. Author: Jamie Schmeiser <schmeise@ca.ibm.com> Reviewed By: aeubanks (Arthur Eubanks) Differential Revision: https://reviews.llvm.org/D100116	2021-05-27 10:19:13 -04:00
Matt Arsenault	ee35900089	AMDGPU/GlobalISel: Lower constant-32-bit zextload/sextload consistently We were accidentally leaning on code in lowerLoad which expands extending loads which should be removed.	2021-05-27 09:49:13 -04:00
Matt Arsenault	8a203ac6d2	AMDGPU/GlobalISel: Remove redundant parameter from function	2021-05-27 09:49:13 -04:00
Fraser Cormack	b7101e218c	[DAGCombine][RISCV] Don't try to trunc-store combined vector stores DAGCombine's `mergeStoresOfConstantsOrVecElts` optimization is told whether it's to use vector types and also whether it's to issue a truncating store. However, the truncating store code path assumes a scalar integer `ConstantSDNode`, and when using vector types it creates either a `BUILD_VECTOR` or `CONCAT_VECTORS` to store: neither of which is a constant. The `riscv64` target is able to expose a crash here because it switches on both code paths at the same time. The `f32` is stored as `i32` which must be promoted to `i64`, necessitating a truncating store. It also decides later that it prefers a vector store of `v2f32`. While vector truncating stores are legal, this combine is not able to emit them. We also don't have a test case. This patch adds an assert to catch this case more gracefully, and updates one of the caller functions to the function to turn off the use of truncating stores when preferring vectors. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103173	2021-05-27 14:16:32 +01:00
Fraser Cormack	8c73a31c11	[RISCV] Allow passing fixed-length vectors via the stack The vector calling convention dictates that when the vector argument registers are exhaused, GPRs are used to pass the address via the stack. When the GPRs themselves are exhausted, at best we would previously crash with an assertion, and at worst we'd generate incorrect code. This patch addresses this issue by passing fixed-length vectors via the stack with their full fixed-length size and aligned to their element type size. Since the calling convention lowering can't yet handle scalable vector types, this patch adds a fatal error to make it clear that we are lacking in this regard. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D102422	2021-05-27 14:14:07 +01:00
Florian Hahn	38641ddf3e	[VPlan] Do not sink uniform recipes in sinkScalarOperands. For uniform ReplicateRecipes, only the first lane should be used, so sinking them would mean we have to compute the value of the first lane multiple times. Also, at the moment, sinking them causes a crash because the value of the first lane is re-used by all users. Reported post-commit for D100258.	2021-05-27 14:07:48 +01:00
Simon Giesecke	5f2d4b23b4	Add --quiet option to llvm-gsymutil to suppress output of warnings. Differential Revision: https://reviews.llvm.org/D102829	2021-05-27 12:36:34 +00:00
Mats Petersson	86627be233	Revert "[OpenMP]Add support for workshare loop modifier in lowering" This reverts commit `ea4c5fb04c`.	2021-05-27 13:09:47 +01:00
Mats Petersson	ea4c5fb04c	[OpenMP]Add support for workshare loop modifier in lowering When lowering the dynamic, guided, auto and runtime types of scheduling, there is an optional monotonic or non-monotonic modifier. This patch adds support in the OMP IR Builder to pass this down to the runtime functions. Also implements tests for the variants. Differential Revision: https://reviews.llvm.org/D102008	2021-05-27 12:28:27 +01:00
Fraser Cormack	772b58a641	[SelectionDAG][RISCV] Don't unroll 0/1-type bool VSELECTs This patch extends the cases in which the legalizer is able to express VSELECT in terms of XOR/AND/OR. When dealing with a VSELECT between boolean vector types, the mask itself is an all-ones or all-ones value of the operand type, so a 0/1 boolean type behaves identically to a 0/-1 type. This greatly helps RISC-V which relies on expansion for these nodes. It also allows scalable-vector bool VSELECTs to use the default expansion, where before it would crash in SelectionDAG::UnrollVectorOp. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103147	2021-05-27 10:08:57 +01:00
Sebastian Neubauer	0bb60dbe34	[AMDGPU][GlobalISel] Allow amdgpu_gfx calling conv Calling functions from shaders already works with the SelectionDAG. Differential Revision: https://reviews.llvm.org/D103183	2021-05-27 10:41:40 +02:00
Max Kazantsev	7d418dadf6	[NFCI][LoopDeletion] Do not call complex analysis for known non-zero BTC	2021-05-27 15:29:37 +07:00
Max Kazantsev	c467585682	[NFC] Reuse existing variables instead of re-requesting successors	2021-05-27 15:29:37 +07:00
Amara Emerson	9f39ba13b5	[GlobalISel] Implement splitting of G_SHUFFLE_VECTOR. Thhis is a port from the DAG legalization. We're still missing some of the canonicalizations of shuffles but it's a start. Differential Revision: https://reviews.llvm.org/D102828	2021-05-27 00:28:38 -07:00
Max Kazantsev	51d334a845	[NFCI] Lazily evaluate SCEVs of PHIs Eager evaluation has cost of compile time. Only query them if they are required for proving predicates.	2021-05-27 13:35:31 +07:00
Max Kazantsev	59d938e649	[NFC] Formatting fix	2021-05-27 12:50:54 +07:00
Max Kazantsev	b0b2bf3b5d	[NFCI][LoopDeletion] Only query SCEV about loop successor if another successor is also in loop	2021-05-27 12:44:22 +07:00
Hasyimi Bahrudin	8d25762720	Fix non-global-value-max-name-size not considered by LLParser `non-global-value-max-name-size` is used by `Value` to cap the length of local value name. However, this flag is not considered by `LLParser`, which leads to unexpected `use of undefined value error`. The fix is to move the responsibility of capping the length to `ValueSymbolTable`. The test is the one provided by [[ https://bugs.llvm.org/show_bug.cgi?id=45899 \| Mikael in the bug report ]]. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D102707	2021-05-27 04:20:03 +00:00
Yevgeny Rouban	4d26f41f76	[RS4GC] Introduce intrinsics to get base ptr and offset There can be a need for some optimizations to get (base, offset) for any GC pointer. The base can be calculated by generating needed instructions as it is done by the RewriteStatepointsForGC::findBasePointer() function. The offset can be calculated in the same way. Though to not expose the base calculation and to make the offset calculation as simple as ptrtoint(derived_ptr) - ptrtoint(base_ptr), which is illegal outside RS4GC, this patch introduces 2 intrinsics: @llvm.experimental.gc.get.pointer.base(%derived_ptr) @llvm.experimental.gc.get.pointer.offset(%derived_ptr) These intrinsics are inlined by RS4GC along with generation of statepoint sequences. With these new intrinsics the GC parseable lowering for atomic memcpy intrinsics (`6ec2c5e402`) could be implemented as a separate pass. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D100445	2021-05-27 09:14:14 +07:00
Jessica Paquette	324af79dbc	[GlobalISel] Don't emit lost debug location remarks when legalizing tail calls There were a bunch of lost debug location remarks that show up when legalizing tail calls on AArch64. This would happen because we drop the return in the block where we emit the tail call. So, we end up dropping the debug location, which makes the LostDebugLocObserver report a missing debug location. Although it's true that we lose these debug locations, this isn't a particularly useful remark. We expect to drop these debug locations when emitting tail calls. Suppressing remarks in this case is preferable, since the amount of noise could hide actual debug location related bugs. To do this, I just plumbed the LostDebugLocObserver through the relevant LegalizerHelper functions. This is the only case I can think of where we need the LostDebugLocObserver in the LegalizerHelper. So, rather than storing it in the LegalizerHelper proper and mucking around with the constructors, I figured it'd be cleanest to take the simplest path for now. This clears up ~20 noisy lost debug location remarks on CTMark in AArch64 at -Os. Differential Revision: https://reviews.llvm.org/D103128	2021-05-26 17:16:11 -07:00
Sriraman Tallam	caae570978	Emit correct location lists with basic block sections. This patch addresses multiple things: 1) It ensures that const_value is emitted when possible with basic block sections. 2) It emits location lists such that the labels are always within the section boundary. 3) It fixes a bug when the parameter is first used in a non-entry block which is in a different section from the entry block. Differential Revision: https://reviews.llvm.org/D85085	2021-05-26 17:12:31 -07:00
Amara Emerson	74edfb2805	[AArch64][GlobalISel] Legalize non-power-of-2 vector elements for G_STORE. The rules were already there, it just needed re-ordering so the odd case didn't bail out too early.	2021-05-26 17:01:02 -07:00
Krzysztof Parzyszek	002f5e158d	[Hexagon] Restore handling of expanding shuffles Fixed bugs, added testcases. The byte-unpack is actually recognized by the DAG combiner, but the halfword-unpack it not.	2021-05-26 18:04:15 -05:00
Fangrui Song	5852582532	[AArch64] Support llvm-mc/llvm-objdump -M no-aliases This enables the no-aliases forms of many instructions. Depends on D103004 Reviewed By: tmatheson Differential Revision: https://reviews.llvm.org/D103005	2021-05-26 13:35:31 -07:00
Craig Topper	fdf10e6197	[RISCV] Use X0 as destination of inserted vsetvli when possible. We aren't going to connect the result to anything so we might as well avoid allocating a register. Reviewed By: frasercrmck, HsiangKai Differential Revision: https://reviews.llvm.org/D102031	2021-05-26 13:08:51 -07:00
Jeremy Morse	8496fc2ec8	[DebugInstrRef][1/3] Track PHI values through register allocation This patch introduces "DBG_PHI" instructions, a marker of where a PHI instruction used to be, before PHI elimination. Under the instruction referencing model, we want to know where every value in the function is defined -- and a PHI, even if implicit, is such a place. Just like instruction numbers, we can use this to identify a value to be used as a variable value, but we don't need to know what instruction defines that value, for example: bb1: DBG_PHI $rax, 1 [... more insts ... ] bb2: DBG_INSTR_REF 1, 0, !1234, !DIExpression() This specifies that on entry to bb1, whatever value is in $rax is known as value number one -- and the later DBG_INSTR_REF marks the position where variable !1234 should take on value number one. PHI locations are stored in MachineFunction for the duration of the regalloc phase in the DebugPHIPositions map. The map is populated by PHIElimination, and then flushed back into the instruction stream by virtregrewriter. A small amount of maintenence is needed in LiveDebugVariables to account for registers being split, but only for individual positions, not for entire ranges of blocks. Differential Revision: https://reviews.llvm.org/D86812	2021-05-26 20:24:00 +01:00
Philip Reames	ff08c3468f	[SCEV] Compute trip multiple for multiple exit loops This patch implements getSmallConstantTripMultiple(L) correctly for multiple exit loops. The previous implementation was both imprecise, and violated the specified behavior of the method. This was fine in practice, because it turns out the function was both dead in real code, and not tested for the multiple exit case. Differential Revision: https://reviews.llvm.org/D103189	2021-05-26 11:52:25 -07:00
Heejin Ahn	5dd86aadf0	[WebAssembly] Add TargetInstrInfo::getCalleeOperand DwarfDebug unconditionally assumes for all call instructions the 0th operand is the callee operand, which seems to be true for other targets, but not for WebAssembly. This adds `TargetInstrInfo::getCallOperand` method whose default implementation returns `getOperand(0)` and makes WebAssembly overrides it to use its own utility method to get the callee operand. This also fixes an existing bug in `WebAssembly::getCalleeOp`, which was uncovered by this CL. Reviewed By: dschuff, djtodoro Differential Revision: https://reviews.llvm.org/D102978	2021-05-26 11:43:59 -07:00
Heejin Ahn	5bfe06ad35	[SimplifyCFG] Use make_early_inc_range() while deleting instructions We are deleting `phi` nodes within the for loop, so this makes sure we increment the iterator before we delete the instruction pointed by the iterator. This started to break in `a0be081646`. Reviewed By: dschuff, lebedev.ri Differential Revision: https://reviews.llvm.org/D103181	2021-05-26 11:43:11 -07:00
Stanislav Mekhanoshin	5e2facb922	[AMDGPU] Fix kernel LDS lowering for constants There is a trivial but severe bug in the recent code collecting LDS globals used by kernel. It aborts scan on the first constant without scanning further uses. That leads to LDS overallocation with multiple kernels in certain cases. Differential Revision: https://reviews.llvm.org/D103190	2021-05-26 11:34:50 -07:00
Dmitry Preobrazhensky	13c6568c6e	[AMDGPU][MC][GFX90A] Corrected DS_GWS opcodes Corrected DS_GWS opcodes to use even aligned registers. Differential Revision: https://reviews.llvm.org/D103185	2021-05-26 21:31:50 +03:00
Philip Reames	9306bb638f	[SCEV] Generalize getSmallConstantTripCount(L) for multiple exit loops This came up in review for another patch, see https://reviews.llvm.org/D102982#2782407 for full context. I've reviewed the callers to make sure they can handle multiple exit loops w/non-zero returns. There's two cases in target cost models where results might change (Hexagon and PowerPC), but the results looked legal and reasonable. If a target maintainer wishes to back out the effect of the costing change, they should explicitly check for multiple exit loops and handle them as desired. Differential Revision: https://reviews.llvm.org/D103182	2021-05-26 11:18:25 -07:00
Fangrui Song	73a1179535	[llvm-mc] Add -M to replace -riscv-no-aliases and -riscv-arch-reg-names In objdump, many targets support `-M no-aliases`. Instead of having a `-*-no-aliases` for each target when LLVM adds the support, it makes more sense to introduce objdump style `-M`. -riscv-arch-reg-names is removed. -riscv-no-aliases has too many uses and thus is retained for now. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D103004	2021-05-26 10:43:32 -07:00
Philip Reames	921d3f7af0	[SCEV] Add a utility for converting from "exit count" to "trip count" (Mostly as a logical place to put a comment since this is a reoccuring confusion.)	2021-05-26 10:41:49 -07:00
Craig Topper	9065118b64	[RISCV] Optimize SEW=64 shifts by splat on RV32. SEW=64 shifts only uses the log2(64) bits of shift amount. If we're splatting a 64 bit value in 2 parts, we can avoid splatting the upper bits and just let the low bits be sign extended. They won't be read anyway. For the purposes of SelectionDAG semantics of the generic ISD opcodes, if hi was non-zero or bit 31 of the low is 1, the shift was already undefined so it should be ok to replace high with sign extend of low. In order do be able to find the split i64 value before it becomes a stack operation, I added a new ISD opcode that will be expanded to the stack spill in PreprocessISelDAG. This new node is conceptually similar to BuildPairF64, but I expanded earlier so that we could go through regular isel to get the right VLSE opcode for the LMUL. BuildPairF64 is expanded in a CustomInserter. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D102521	2021-05-26 10:23:32 -07:00
Philip Reames	fb14577d0c	[SCEV] Extract out a helper for computing trip multiples	2021-05-26 10:15:03 -07:00
Craig Topper	b2c7ac874f	[RISCV] Don't propagate VL/VTYPE across inline assembly in the Insert VSETVLI pass. It's conceivable someone could put a vsetvli in inline assembly so its safer to consider them as barriers. The alternative would be to trust that the user marks VL and VTYPE registers as clobbers of the inline assembly if they do that, but hat seems error prone. I'm assuming inline assembly in vector code is going to be rare. Reviewed By: frasercrmck, HsiangKai Differential Revision: https://reviews.llvm.org/D103126	2021-05-26 09:56:20 -07:00
Alexey Bataev	27d3528acf	[SLP]Fix vectorization of insertelements with multiple uses. SLP vectorizer should not consider in sertelements with multiple uses as a part of high level build vector, it must be considered as a terminating insertelement in the vector build, otherwise it may produce incorrect code. Differential Revision: https://reviews.llvm.org/D103164	2021-05-26 09:42:18 -07:00
Stephen Tozer	a0bd6105d8	[DebugInfo] Limit the number of values that may be referenced by a dbg.value Following the addition of salvaging dbg.values using DIArgLists to reference multiple values, a case has been found where excessively large DIArgLists are produced as a result of this salvaging, resulting in large enough performance costs to effectively freeze the compiler. This patch introduces an upper bound of 16 to the number of values that may be salvaged into a dbg.value, to limit the impact of these extreme cases to performance. Differential Revision: https://reviews.llvm.org/D103162	2021-05-26 17:34:05 +01:00
Craig Topper	1b47a3de48	[RISCV] Enable cross basic block aware vsetvli insertion This patch extends D102737 to allow VL/VTYPE changes to be taken into account before adding an explicit vsetvli. We do this by using a data flow analysis to propagate VL/VTYPE information from predecessors until we've determined a value for every value in the function. We use this information to determine if a vsetvli needs to be inserted before the first vector instruction the block. Differential Revision: https://reviews.llvm.org/D102739	2021-05-26 09:25:42 -07:00
Sebastian Neubauer	ea91a8cbab	[AMDGPU][NFC] Remove non-existing function header	2021-05-26 18:20:33 +02:00
Philip Reames	9cc2181ec3	[unroll] Use value domain for symbolic execution based cost model The current full unroll cost model does a symbolic evaluation of the loop up to a fixed limit. That symbolic evaluation currently simplifies to constants, but we can generalize to arbitrary Values using the InstructionSimplify infrastructure at very low cost. By itself, this enables some simplifications, but it's mainly useful when combined with the branch simplification over in D102928. Differential Revision: https://reviews.llvm.org/D102934	2021-05-26 08:41:25 -07:00
Jonas Paulsson	d058262b14	[SystemZ] Support i128 inline asm operands. Support virtual, physical and tied i128 register operands in inline assembly. i128 is on SystemZ not really supported and is not a legal type and generally such a value will be split into two i64 parts. There are however some instructions that require a pair of two GPR64 registers contained in the GR128 bit reg class, which is untyped. For inline assmebly operands, it proved to be very cumbersome to first follow the general behavior of splitting an i128 operand into two parts and then later rebuild the INLINEASM MI to have one GR128 register. Instead, some minor common code changes were made to SelectionDAGBUilder to only create one GR128 register part to begin with. In particular: - getNumRegisters() now has an optional parameter "RegisterVT" which is passed by AddInlineAsmOperands() and GetRegistersForValue(). - The bitcasting in GetRegistersForValue is not performed if RegVT is Untyped. - The RC for a tied use in AddInlineAsmOperands() is now computed either from the tied def (virtual register), or by getMinimalPhysRegClass() (physical register). - InstrEmitter.cpp:EmitCopyFromReg() has been fixed so that the register class (DstRC) can also be computed for an illegal type. In the SystemZ backend getNumRegisters(), splitValueIntoRegisterParts() and joinRegisterPartsIntoValue() have been implemented to handle i128 operands. Differential Revision: https://reviews.llvm.org/D100788 Review: Ulrich Weigand	2021-05-26 10:08:32 -05:00
Anirudh Prasad	1bc0e857bf	[SystemZ][z/OS] Enable the AllowAtInName attribute for the HLASM dialect - Currently, LLVM supports symbols of the name "token1@token2". - "token2" is used to identify whether an appropriate symbol reference can be used for the symbol. - Now, if the symbol reference couldn't be found, the AsmParser usually emits an error, unless the backend is configured to accept the "@" in a symbol name - Thus, this patch aims to do that. It sets the `AllowAtInName` attribute in the SystemZ backend for the HLASM dialect. - Setting this attribute ensures that, if a particular symbol reference is found, it uses that. If it doesn't, and there exists an "@" in the symbol name, it will use that instead of explicitly erroring out. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D103111	2021-05-26 10:49:57 -04:00
jweightma	fcd32d62c0	[AMDGPU] Fix function pointer argument bug in AMDGPU Propagate Attributes pass. This patch fixes a bug in the AMDGPU Propagate Attributes pass where a call instruction with a function pointer argument is identified as a user of the passed function, and illegally replaces the called function of the instruction with the function argument. For example, given functions f and g with appropriate types, the following illegal transformation could occur without this fix: call void @f(void ()* @g) --> call void @g(void ()* @g.1) The solution introduced in this patch is to prevent the cloning and substitution if the instruction's called function and the function which might be cloned do not match. Reviewed By: arsenm, madhur13490 Differential Revision: https://reviews.llvm.org/D101847	2021-05-26 16:40:15 +02:00
Anirudh Prasad	b37a2fcd8d	[SystemZ][z/OS] Validate symbol names for z/OS for printing without quotes - Currently, before printing a label in MCSymbol.cpp (MCSymbol::print), the current code "validates" the label that is to be printed. - If it fails the validation step, then it prints the label within double quotes. - However, the validation is provided as a virtual function in MCAsmInfo.h (i.e. isAcceptableChar() function). So we can override this for the AD_HLASM dialect in SystemZMCAsmInfo.cpp. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D103091	2021-05-26 10:37:09 -04:00
Luo, Yuanke	4ed2b6cccd	[X86][AMX] Fix a bug on tile config. The previous code detect if a MBB is bottom block to determine if it is a backedge of a loop. We should check latch block instead of bottom block and we should check the header and the bottom block are in the same loop. Differential Revision: https://reviews.llvm.org/D103145	2021-05-26 21:57:49 +08:00
Andrea Di Biagio	63cc9fd579	[MCA][InOrderIssueStage] Fix LastWriteBackCycle computation. Conservatively use the instruction latency to compute the last write-back cycle. Before this patch, the last write cycle computation was incorrect for store instructions that didn't declare any register writes.	2021-05-26 14:17:43 +01:00
Kerry McLaughlin	9f76a85260	[LoopVectorize] Enable strict reductions when allowReordering() returns false When loop hints are passed via metadata, the allowReordering function in LoopVectorizationLegality will allow the order of floating point operations to be changed: bool allowReordering() const { // When enabling loop hints are provided we allow the vectorizer to change // the order of operations that is given by the scalar loop. This is not // enabled by default because can be unsafe or inefficient. The -enable-strict-reductions flag introduced in D98435 will currently only vectorize reductions in-loop if hints are used, since canVectorizeFPMath() will return false if reordering is not allowed. This patch changes canVectorizeFPMath() to query whether it is safe to vectorize the loop with ordered reductions if no hints are used. For testing purposes, an additional flag (-hints-allow-reordering) has been added to disable the reordering behaviour described above. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D101836	2021-05-26 13:59:12 +01:00
Max Kazantsev	be1a23203b	Return "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration" (try 2) The patch was reverted due to compile time impact of contextual SCEV queries. It also appeared that it introduced a miscompile on irreducible CFG. Changes made: 1. isKnownPredicateAt is replaced with more lightweight isKnownPredicate; 2. Irreducible CFG in live code is now detected and excluded from processing. Differential Revision: https://reviews.llvm.org/D102615	2021-05-26 19:47:14 +07:00
Max Kazantsev	0de553dce0	Revert "Return "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration"" This reverts commit `43d2e51c2e`. Commited wrong version.	2021-05-26 19:29:07 +07:00
Max Kazantsev	43d2e51c2e	Return "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration" The patch was reverted due to compile time impact of contextual SCEV queries. It also appeared that it introduced a miscompile on irreducible CFG. Changes made: 1. isKnownPredicateAt is replaced with more lightweight isKnownPredicate; 2. Irreducible CFG in live code is now detected and excluded from processing. Differential Revision: https://reviews.llvm.org/D102615	2021-05-26 19:23:21 +07:00
Andrew Savonichev	8ac66d61ea	[AArch64] Generate LD1 for anyext i8 or i16 vector load The existing LD1 patterns do not cover cases where result type does not match the memory type. This happens when illegal vector types are extended and scalarized, for example: load <2 x i16>* %v2i16 is lowered into: // first element (v4i32 (insert_subvector (v2i32 (scalar_to_vector (load anyext from i16))))) // other elements (v4i32 (insert_vector_elt (i32 (load anyext from i16)) idx)) Before this patch these patterns were compiled into LDR + INS. Now they are compiled into LD1. The problem was reported in PR24820: LLVM Generates abysmal code in simple situation. Differential Revision: https://reviews.llvm.org/D102938	2021-05-26 14:44:21 +03:00
Tomas Matheson	165321b3d2	[MC][ELF] Emit unique sections for different flags Global values imply flags such as readable, writable, executable for the sections that they will be placed in. Currently MC places all such entries into the same section, using the first set of flags seen. This can lead to situations in LTO where a writable global is placed in the same named section as a readable global from another file, and the section may not be marked writable. D72194 ensures that mergeable globals with explicit sections are placed in separate sections with compatible entry size, by emitting the `unique` assembly syntax where appropriate. This change extends that approach to include section flags, so that globals with different section flags are emitted in separate unique sections. Differential revision: https://reviews.llvm.org/D100944	2021-05-26 11:51:29 +01:00
Tomas Matheson	e79e8041c5	[MC][NFCI] Factor out ELF section unique ID calculation Precursor to D100944. The logic for determining the unique ID had become quite difficult to reason about, so I have factored this out into a separate function. Differential Revision: https://reviews.llvm.org/D102336	2021-05-26 11:51:29 +01:00
Simon Pilgrim	21aec4fdc5	[X86][SLM] Fix vector PSHUFB + variable shift resource/throughputs Match whats documented in the Intel AOM (+Agner) - PSHUFB xmm is really slow, and mmx/xmm vector shifts are half rate. Noticed while working to get the cost tables to more closely match llvm-mca analysis, in this case for shifts and truncations.	2021-05-26 11:14:21 +01:00
Mirko Brkusanin	9601849984	[AMDGPU][GlobalISel] Stop foldInsertEltToCmpSelect from changing reg banks This function can change regbank for registers which already have a selected bank. Depending on the instruction where these registers were used it can cause instruction selection to fail. Differential Revision: https://reviews.llvm.org/D98515	2021-05-26 11:57:41 +02:00
Mirko Brkusanin	7386ad4e9e	Revert "[AMDGPU][GlobalISel] Stop foldInsertEltToCmpSelect from changing reg banks" This reverts commit `18c5444702`.	2021-05-26 11:57:41 +02:00
Tim Northover	8c5ac18d71	AArch64: support post-indexed stores to bfloat types.	2021-05-26 10:35:52 +01:00
Simon Pilgrim	66978466ba	[X86][Atom] Fix vector variable shift resource/throughputs Match whats documented in the Intel AOM - the non-immediate variants of the PSLL/PSRA/PSRL* shift instructions requires BOTH ports - this was being incorrectly modelled as EITHER port. Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.	2021-05-26 10:30:59 +01:00
Roman Lebedev	8c86161a0b	[NFC][X86] clang-format X86TTIImpl::getInterleavedMemoryOpCostAVX2() I plan to make changes to it, and undoing formatting each time is not going to be fun.	2021-05-26 12:27:47 +03:00
David Sherwood	70d8365e33	Fix warning introduced by `9c766f4090`	2021-05-26 10:20:39 +01:00
David Sherwood	9c766f4090	[InstCombine] Fold extractelement + vector GEP with one use We sometimes see code like this: Case 1: %gep = getelementptr i32, i32* %a, <2 x i64> %splat %ext = extractelement <2 x i32> %gep, i32 0 or this: Case 2: %gep = getelementptr i32, <4 x i32> %a, i64 1 %ext = extractelement <4 x i32> %gep, i32 0 where there is only one use of the GEP. In such cases it makes sense to fold the two together such that we create a scalar GEP: Case 1: %ext = extractelement <2 x i64> %splat, i32 0 %gep = getelementptr i32, i32 %a, i64 %ext Case 2: %ext = extractelement <2 x i32> %a, i32 0 %gep = getelementptr i32, i32 %ext, i64 1 This may create further folding opportunities as a result, i.e. the extract of a splat vector can be completely eliminated. Also, even for the general case where the vector operand is not a splat it seems beneficial to create a scalar GEP and extract the scalar element from the operand. Therefore, in this patch I've assumed that a scalar GEP is always preferrable to a vector GEP and have added code to unconditionally fold the extract + GEP. I haven't added folds for the case when we have both a vector of pointers and a vector of indices, since this would require generating an additional extractelement operation. Tests have been added here: Transforms/InstCombine/gep-vector-indices.ll Differential Revision: https://reviews.llvm.org/D101900	2021-05-26 09:54:26 +01:00
Esme-Yi	bf809cd165	[NFC][object] Change the input parameter of the method isDebugSection. Summary: This is a NFC patch to change the input parameter of the method SectionRef::isDebugSection(), by replacing the StringRef SectionName with DataRefImpl Sec. This allows us to determine if a section is debug type in more ways than just by section name. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D102601	2021-05-26 08:47:53 +00:00
David Green	2cf0e52b85	[ARM] Add patterns for vmulh Now that vmulh can be selected, this adds the MVE patterns to make it legal and generate instructions. Differential Revision: https://reviews.llvm.org/D88011	2021-05-26 09:22:12 +01:00
Arthur Eubanks	1202f559bd	[OpaquePtr] Make atomicrmw work with opaque pointers FullTy is only necessary when we need to figure out what type an instruction works with given a pointer's pointee type. However, we just end up using the value operand's type, so FullTy isn't necessary. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D102788	2021-05-25 20:16:21 -07:00
Teresa Johnson	d35fe04fa3	[LTT] Handle merged llvm.assume when dropping type tests When the lower type test pass is invoked a second time with DropTypeTests set to true, it expects that all remaining type tests feed assume instructions, which are removed along with the type tests. In some cases the llvm.assume might have been merged with another one, i.e. from a builtin_assume instruction, in which case the type test would actually feed a phi that in turn feeds the merged assume instruction. In this case we can simply replace that operand of the phi with "true" before removing the type test. Differential Revision: https://reviews.llvm.org/D103073	2021-05-25 17:02:13 -07:00
Arthur Eubanks	ad90a6be21	[OpaquePtr] Create new bitcode encoding for atomicrmw Since the opaque pointer type won't contain the pointee type, we need to separately encode the value type for an atomicrmw. Emit this new code for atomicrmw. Handle this new code and the old one in the bitcode reader. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D103123	2021-05-25 16:30:34 -07:00
Kevin Athey	52ac114771	LLVM Detailed IR tests for introduction of flag -fsanitize-address-detect-stack-use-after-return-mode. Rework all tests that interact with use after return to correctly handle the case where the mode has been explicitly set to Never or Always. for issue: https://github.com/google/sanitizers/issues/1394 Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D102462	2021-05-25 16:17:39 -07:00
Fangrui Song	b426b45d10	[Internalize] Rename instead of removal if a to-be-internalized comdat has more than one member Beside the `comdat any` deduplication feature, instrumentations use comdat to establish dependencies among a group of sections, to prevent section based linker garbage collection from discarding some members without discarding all. LangRef acknowledges this usage with the following wording: > All global objects that specify this key will only end up in the final object file if the linker chooses that key over some other key. On ELF, for PGO instrumentation, a `__llvm_prf_cnts` section and its associated `__llvm_prf_data` section are placed in the same GRP_COMDAT group. A `__llvm_prf_data` is usually not referenced and expects the liveness of its associated `__llvm_prf_cnts` to retain it. The `setComdat(nullptr)` code (added by D10679) in InternalizePass can break the use case (a `__llvm_prf_data` may be dropped with its associated `__llvm_prf_cnts` retained). The main goal of this patch is to fix the dependency relationship. I think it makes sense for InternalizePass to internalize a comdat and thus suppress the deduplication feature, e.g. a relocatable link of a regular LTO can create an object file affected by InternalizePass. If a non-internal comdat in a.o is prevailed by an internal comdat in b.o, the a.o references to the comdat definitions will be non-resolvable (references cannot bind to STB_LOCAL definitions in b.o). On PE-COFF, for a non-external selection symbol, deduplication is naturally suppressed with link.exe and lld-link. However, this is fuzzy on ELF and I tend to believe the spec creator has not thought about this use case (see D102973). GNU ld and gold are still using the "signature is name based" interpretation. So even if D102973 for ld.lld is accepted, for portability, a better approach is to rename the comdat. A comdat with one single member is the common case, leaving the comdat can waste (sizeof(Elf64_Shdr)+4*2) bytes, so we optimize by deleting the comdat; otherwise we rename the comdat. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D103043	2021-05-25 14:15:27 -07:00
Matt Morehouse	832c99f727	Revert "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration" This reverts commit `2531fd70d1` due to performance regression on the PPC buildbot.	2021-05-25 13:58:42 -07:00
Krzysztof Parzyszek	6a2869cf1e	[Hexagon] Remove unused function from HexagonISelDAGToDAGHVX.cpp It will be reintroduced shortly with an actual use. This change is simply to eliminate a compilation warning.	2021-05-25 14:47:15 -05:00
Stanislav Mekhanoshin	3975e3277f	[AMDGPU] Fix unused variable warning. NFC.	2021-05-25 12:32:28 -07:00
Vitaly Buka	f44f2e0afc	[NFC] Fix 'unused' warning	2021-05-25 12:23:57 -07:00
Lang Hames	249cd9dd60	[JITLink][MachO][arm64] Build GOT entries for defined symbols too. During the generic x86-64 support refactor in `ecf6466f01` the implementation of MachO_arm64_GOTAndStubsBuilder::isGOTEdgeToFix was altered to only return true for external symbols. This behavior is incorrect: GOT entries may be required for defined symbols (e.g. in the large code model). This patch fixes the bug and adds a test case for it (renaming an old test case to avoid any ambiguity).	2021-05-25 12:19:09 -07:00
Benjamin Kramer	d2d4f16806	[Matrix] Use LLVM_DEBUG for a debug flag dump() doesn't exist in release builds. ld.lld: error: undefined symbol: llvm::Value::dump() const >>> referenced by LowerMatrixIntrinsics.cpp >>> LowerMatrixIntrinsics.o:((anonymous namespace)::LowerMatrixIntrinsics::Visit())	2021-05-25 21:10:19 +02:00
Nikita Popov	6300c37a46	[SCEV] Cache operands used in BEInfo (NFC) When memoized values for a SCEV expressions are dropped, we also drop all BECounts that make use of the SCEV expression. This is done by iterating over all the ExitNotTaken counts and (recursively) checking whether they use the SCEV expression. If there are many exits, this will take a lot of time. This patch improves the situation by pre-computing a set of all used operands, so that we can determine whether a certain BEInfo needs to be invalidated using a simple set lookup. Will still need to loop over all BEInfos though. This makes for a mild improvement on non-degenerate cases: https://llvm-compile-time-tracker.com/compare.php?from=b661a55a253f4a1cf5a0fbcb86e5ba7b9fb1387b&to=be1393f450e594c53f0ad7e62339a6bc831b16f6&stat=instructions For the degenerate case from https://bugs.llvm.org/show_bug.cgi?id=50384, for n=128 I'm seeing run time drop from 1.6s to 1.1s. Differential Revision: https://reviews.llvm.org/D102796	2021-05-25 21:03:33 +02:00
Nikita Popov	9c91614959	[CVP] Guard against poison in common phi value transform (PR50399) The common phi value transform replaces constants with values that have the same value as the constant on a given edge. However, LVI generally only provides information that is correct up to poison, so this can end up replacing a well-defined value with poison. D69442 addressed an instance of this problem by clearing poison flags on the generating instruction, which was sufficient at the time. rGa917fb89dc28 made LVI's edge value analysis slightly more powerful, and clearing poison flags is no longer sufficient. This patch changes the transform to instead explicitly guard against a poison value instead. This should be satisfied for most cases due to a prior branch on poison. Fixes https://bugs.llvm.org/show_bug.cgi?id=50399. Differential Revision: https://reviews.llvm.org/D102966	2021-05-25 20:47:17 +02:00
Michael Liao	c9dd29925f	[SelectionDAG] Propagate scoped AA metadata when lowering mem intrinsics. - When memory intrinsics, such as memcpy, the attached scoped AA metadata is not passed down to the backend. As a result, the backend cannot schedule relevant memory operations around them following that hint. In this patch, SelectionDAG is enhanced to propagate that metadata (scoped AA only) when they are lowered into loads and stores. Differential Revision: https://reviews.llvm.org/D102215	2021-05-25 14:42:26 -04:00
Stanislav Mekhanoshin	8de4db697f	[AMDGPU] Lower kernel LDS into a sorted structure Differential Revision: https://reviews.llvm.org/D102954	2021-05-25 11:29:29 -07:00
Sanjay Patel	ca7eaa0a54	[InstSimplify] allow undef element match in vector select condition value The semantics of select with undefined/poison condition are not explicitly stated in the LangRef, but this matches comments in the code and Alive2 appears to concur: https://alive2.llvm.org/ce/z/KXytmd We can find this pattern after demanded elements transforms. As noted in D101191, fuzzers are finding infinite loops because we may not account for this pattern in other passes.	2021-05-25 14:25:34 -04:00
Adam Nemet	dfd1bbd00a	[Matrix] Factor and distribute transposes across multiplies Now that we can fold some transposes into multiplies (CM: A * B^t and RM: A^t * B), we want to move them around to create the optimal expressions: * fold away double transposes while still using them to assert the shape * sink transposes hoping they cancel out * lift transposes when both operands are transposed This also modifies the matrix remarks to include the number of exposed transposes (i.e. transposes that we couldn't fold into a multiply). The adjustment to the test remarks-inlining is a bit subtle: I am changing the double transpose to a single transpose so that we don't remove it completely. More importantly this changes some of the total instruction count, most notable stores because we can no longer use a vector store. Differential Revision: https://reviews.llvm.org/D102733	2021-05-25 11:12:20 -07:00
Roman Lebedev	149e018d12	[LoopIdiom] 'arithmetic right-shift until zero': don't turn potentially infinite loops into finite ones Nowadays LLVM does not assume that all loops are finite, so if we want to produce a finite loop from a potentially-infinite one, we must ensure that the original loop is known to be a finite one. For this transform, it only matters for arithmetic right-shifts. For them, either the function or the loop must be known to be `mustprogress`, or the original value being shifted must be known to be non-negative (because iff the sign bit was set, it will never become zero, but will become `-1` in the "end"). It would be really good for alive2 to actually complain about this, but it currently does not: https://github.com/AliveToolkit/alive2/issues/726	2021-05-25 21:02:28 +03:00
Krzysztof Parzyszek	e7c839b192	[Hexagon] Improve argument packing in vector shuffle selection	2021-05-25 12:48:14 -05:00
Mirko Brkusanin	18c5444702	[AMDGPU][GlobalISel] Stop foldInsertEltToCmpSelect from changing reg banks This function can change regbank for registers which already have a selected bank. Depending on the instruction where these registers were used it can cause instruction selection to fail.	2021-05-25 19:34:09 +02:00
Sanjay Patel	ae1bc9ebf3	[InstCombine] avoid infinite loop from vector select transforms The 2nd test is based on the fuzzer example in post-commit comments of D101191 - https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=34661 The 1st test shows that we don't deal with this symmetrically. We should be able to reduce both examples (possibly in instsimplify instead of instcombine).	2021-05-25 13:28:38 -04:00
Arthur Eubanks	0bbb502daa	Revert "[OpaquePtr] Make atomicrmw work with opaque pointers" This reverts commit `0bebda17be`. Causing "Invalid record" errors.	2021-05-25 10:14:58 -07:00
Philip Reames	aabca2d1da	[SCEV] Cleanup doesIVOverflowOnX checks [NFC] Stylistic changes only. 1) Don't pass a parameter just to do an early exit. 2) Use a name which matches actual behavior.	2021-05-25 10:12:24 -07:00
Philip Reames	a47b2d4567	[SCEV] Remove unused parameter from computeBECount [NFC] All callers pass "false" for the Equality parameter. Kill the dead code, and update the function block comment.	2021-05-25 09:58:56 -07:00
Jinsong Ji	882e4cbd74	[AIX][AsmPrinter] Print Symbol in comments for TOC load We are using TOCEntry symbols like `LC..0` in TOC loads, this is hard to read , at least requiring an additional step to figure out the loaded symbols. We should print out the name in comments. Reviewed By: #powerpc, shchenz Differential Revision: https://reviews.llvm.org/D102949	2021-05-25 16:37:40 +00:00
Simon Pilgrim	57250f2f3c	[X86][Atom] Fix vector PSHUFB resource/throughputs Match whats documented in the Intel AOM - the XMM variant of PSHUFB requires BOTH ports - this was being incorrectly modelled as EITHER port. Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.	2021-05-25 17:31:45 +01:00
Simon Pilgrim	def6269779	[CostModel][X86] Improve accuracy of 256-bit non-uniform vector shifts on AVX1 Determined from llvm-mca analysis, AVX1 capable targets have a higher throughput for VPBLENDVB and shuffle ops, making it cheaper to perform shift+shuffle/select shift patterns.	2021-05-25 17:31:45 +01:00
Florian Hahn	8e83ff58c9	[VectorCombine] Remove unneeded InsertPointGuard (NFCI). All users of the builder should set an insert point before using the builder. There should be no need for using InsertPointGuard here.	2021-05-25 17:01:05 +01:00
Jonas Paulsson	e77cb4ae63	[SystemZ] Return true from preferZeroCompareBranch(). Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D103057	2021-05-25 10:24:14 -05:00
Yonghong Song	6a2ea84600	BPF: Add more relocation kinds Currently, BPF only contains three relocations: R_BPF_NONE for no relocation R_BPF_64_64 for LD_imm64 and normal 64-bit data relocation R_BPF_64_32 for call insn and normal 32-bit data relocation Also .BTF and .BTF.ext sections contain symbols in allocated program and data sections. These two sections reserved 32bit space to hold the offset relative to the symbol's section. When LLVM JIT is used, the LLVM ExecutionEngine RuntimeDyld may attempt to resolve relocations for .BTF and .BTF.ext, which we want to prevent. So we used R_BPF_NONE for such relocations. This all works fine until when we try to do linking of multiple objects. . R_BPF_64_64 handling of LD_imm64 vs. normal 64-bit data is different, so lld target->relocate() needs more context to do a correct job. . The same for R_BPF_64_32. More context is needed for lld target->relocate() to differentiate call insn vs. normal 32-bit data relocation. . Since relocations in .BTF and .BTF.ext are set to R_BPF_NONE, they will not be relocated properly when multiple .BTF/.BTF.ext sections are merged by lld. This patch intends to address this issue by adding additional relocation kinds: R_BPF_64_ABS64 for normal 64-bit data relocation R_BPF_64_ABS32 for normal 32-bit data relocation R_BPF_64_NODYLD32 for .BTF and .BTF.ext style relocations. The old R_BPF_64_{64,32} semantics: R_BPF_64_64 for LD_imm64 relocation R_BPF_64_32 for call insn relocation The existing R_BPF_64_64/R_BPF_64_32 mapping to numeric values is maintained. They are the most common use cases for bpf programs and we want to maintain backward compatibility as much as possible. ExecutionEngine RuntimeDyld BPF relocations are adjusted as well. R_BPF_64_{ABS64,ABS32} relocations will be resolved properly and other relocations will be ignored. Two tests are added for RuntimeDyld. Not handling R_BPF_64_NODYLD32 in RuntimeDyldELF.cpp will result in "Relocation type not implemented yet!" fatal error. FK_SecRel_4 usages in BPFAsmBackend.cpp and BPFELFObjectWriter.cpp are removed as they are not triggered in BPF backend. BPF backend used FK_SecRel_8 for LD_imm64 instruction operands. Differential Revision: https://reviews.llvm.org/D102712	2021-05-25 08:19:13 -07:00
Anirudh Prasad	993f38d0a7	[SystemZ][z/OS] Implement getHostCPUName for z/OS - Currently, the host cpu information is not easily available on z/OS as in other platforms. - This information is stored in the Communications Vector Table (https://www.ibm.com/docs/en/zos/2.2.0?topic=information-cvt-mapping) Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D102793	2021-05-25 11:18:12 -04:00
Joe Nash	b67ea3d0c9	[AMDGPU] Allow no-modifier operands in cvtDPP NFC, since no instructions have their AsmMatchConverter changed, but prepares for that to happen. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D103046 Change-Id: I6afefad899076de7b9a412374d09b95b29e012fa	2021-05-25 10:58:06 -04:00
Simon Pilgrim	c909ddddda	[CostModel][X86] Improve accuracy of vXi64 vector non-uniform shift costs on AVX2+ targets rG1ad4f887bd7692a9e63fb42586f0ece366f2fe01 incorrectly assumed that vXi64 non-uniform shifts were slow like vXi32 were - but llvm-mca (+Agner) both confirm that Haswell/Broadwell are full rate.	2021-05-25 15:58:23 +01:00
Joe Nash	67c3707b31	[AMDGPU] More accurate names for dpp operand types NFC. Renames the variable in the dpp input operand generators from DstRC to OldRC, because that is what it actually sets. Also documents the importance of setting HasModifiers = 0 in the dpp8 asm string. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D103047 Change-Id: Ice69ae38f644de7f228a75ca47c43e88b1f7d9e1	2021-05-25 10:35:25 -04:00
David Goldblatt	8607a02357	[InstSimplify] Transform X * Y % Y --> 0 simplifyDiv already handles the case X * Y / Y --> X (barring overflow). This adds the equivalent handling to simplifyRem. Correctness: https://alive2.llvm.org/ce/z/J2cUbS https://alive2.llvm.org/ce/z/us9NUM https://alive2.llvm.org/ce/z/AvaDGJ https://alive2.llvm.org/ce/z/kq9ige Extending the situations in which we apply this transform would not be correct: https://alive2.llvm.org/ce/z/Lf9V63 https://alive2.llvm.org/ce/z/6RPQK3 https://alive2.llvm.org/ce/z/p9UdxC https://alive2.llvm.org/ce/z/A2zlhE https://alive2.llvm.org/ce/z/vHTtLw https://alive2.llvm.org/ce/z/lvpH42 Differential Revision: https://reviews.llvm.org/D102864	2021-05-25 10:16:04 -04:00
Florian Hahn	575e2aff55	[VectorCombine] Use constant range info for index scalarization legality. We can only scalarize memory accesses if we know the index is valid. This patch adjusts canScalarizeAcceess to fall back to computeConstantRange to check if the index is known to be valid. Reviewed By: nlopes Differential Revision: https://reviews.llvm.org/D102476	2021-05-25 13:58:42 +01:00
Sanjay Patel	0bab0f6161	[InstCombine] canonicalize cast before unary shuffle We could go either direction on this transform. VectorCombine already goes this way for bitcasts (and handles more complicated cases using the cost model), so let's try cast-first. Deferring completely to VectorCombine is another possibility. But the backend should be able to invert this easily when the vectors have the same shape, so it doesn't seem like a transform that we need to avoid. The motivating example from https://llvm.org/PR49081 has an int-to-float sandwiched between 2 shuffles, and the backend currently does not reduce that, so on x86, we get something like: pshufd $249, %xmm0, %xmm0] cvtdq2ps %xmm0, %xmm0 shufps $144, %xmm0, %xmm0 ...instead of just a single conversion instruction. Differential Revision: https://reviews.llvm.org/D103038	2021-05-25 08:43:09 -04:00
Chuanqi Xu	400a9d3501	[NFC] [Coroutines] Remove unused variable: UnreachableCache	2021-05-25 20:33:46 +08:00

... 2 3 4 5 6 ...

147671 Commits