llvm-project

Commit Graph

Author	SHA1	Message	Date
Fraser Cormack	4d268dc94a	[RISCV] Enable CGP to sink splat operands of VP intrinsics This patch brings better splat-matching to our VP support, by sinking splat operands of VP intrinsics back into the same block as the VP operation. The list of VP intrinsics we are interested in matches that of the regular instructions. Some optimization is still lacking. For instance, our VL nodes aren't recognized as commutative, so splats must be on the RHS. Because of this, we limit our sinking of splats to just the RHS operand for now. Improvement in this regard can come in another patch. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117703	2022-01-21 11:30:37 +00:00
OCHyams	b6a41fddcf	[DWARF][DebugInfo] Fix off-by-one error in size of DW_TAG_base_type types Fix PR53163 by rounding the byte size of DW_TAG_base_type types up. Without this fix we risk emitting types with a truncated size (including rounding less-than-byte-sized types' sizes down to zero). Reviewed By: probinson Differential Revision: https://reviews.llvm.org/D117124	2022-01-21 11:37:49 +00:00
Nikita Popov	bfbdb5e43e	[Coroutines] Avoid some pointer element type accesses These are just verifying that pointer types are correct, which is no longer relevant under opaque pointers.	2022-01-21 12:36:19 +01:00
Nikita Popov	9c5b856dac	[CoroSplit] Avoid pointer element type accesses Use isOpaqueOrPointeeTypeMatches() for the assertions instead.	2022-01-21 12:22:09 +01:00
Sebastian Neubauer	ae2f9c8be8	[AMDGPU] Remove lz and nomip combine from codegen These combines have been moved into the IR combiner in D116042. Differential Revision: https://reviews.llvm.org/D116116	2022-01-21 12:09:08 +01:00
Sebastian Neubauer	603d18033c	[AMDGPU][InstCombine] Remove zero LOD bias If the bias is zero, we can remove it from the image instruction. Also copy other image optimizations (l->lz, mip->nomip) to IR combines. Differential Revision: https://reviews.llvm.org/D116042	2022-01-21 12:09:07 +01:00
Sebastian Neubauer	0530fdbbbb	[AMDGPU] Fix LOD bias in A16 combine As the codegen fix in D111754, the LOD bias needs to be converted to 16 bits. Fix this in the combine. Differential Revision: https://reviews.llvm.org/D116038	2022-01-21 12:09:06 +01:00
Nikita Popov	e7762653d3	[Attributor] Avoid some pointer element type accesses	2022-01-21 11:20:10 +01:00
Florian Hahn	55689904d2	[VPlan] Move ::isCanonical outside ifdef. This fixes a build failure with assertions disabled.	2022-01-21 09:44:31 +00:00
Florian Hahn	c0cf209076	[VPlan] Add VPWidenIntOrFpInductionRecipe::isCanonical, use it (NFCI). This patch adds VPWidenIntOrFpInductionRecipe::isCanonical to check if an induction recipe is canonical. The code is also updated to use it instead of isCanonicalID. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D117551	2022-01-21 09:35:06 +00:00
serge-sans-paille	a2f6921ef2	[llvm] Remove unused headers in LLVMDemangle As an hint to the impact of the cleanup, running clang++ -E -Iinclude -I../llvm/include ../llvm/lib/Demangle/*.cpp -std=c++14 -fno-rtti -fno-exceptions \| wc -l before: 208053 lines after: 203965 lines	2022-01-21 10:18:32 +01:00
Nikita Popov	b4900296e4	[ConstantFold] Allow all float types in reinterpret load folding Rather than hardcoding just half, float and double, allow all floating point types.	2022-01-21 09:26:51 +01:00
Simon Moll	7950010e49	[VE][NFC] Factor out helper functions Factor out some helper functions to cleanup VEISelLowering. Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D117683	2022-01-21 09:15:59 +01:00
Nikita Popov	6a19cb837c	[ConstantFold] Support pointers in reinterpret load folding Peculiarly, the necessary code to handle pointers (including the check for non-integral address spaces) is already in place, because we were already allowing vectors of pointers here, just not plain pointers.	2022-01-21 09:13:37 +01:00
Nikita Popov	05cd9a0596	[ConstantFold] Simplify type check in reinterpret load folding (NFC) Keep a list of allowed types, but then always construct the map type the same way. We need an integer with the same width as the original type.	2022-01-21 09:06:35 +01:00
eopXD	e6de53b4de	[RISCV] Bump rvv-related extensions from 0.10 to 1.0 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D112987	2022-01-20 23:22:20 -08:00
Igor Kudrin	86b08ed6bb	[DebugInfo][NFC] Do not call 'isRootFile' for DWARF Version < 5 A quicker comparison should be done first. Differential Revision: https://reviews.llvm.org/D117786	2022-01-21 13:52:10 +07:00
Igor Kudrin	75184f14ae	[DebugInfo] Fix handling '# line "file"' for DWARFv5 `CppHashInfo.Filename` is a `StringRef` that references a part of the source file and it is not null-terminated at the end of the file name. `AsmParser::parseAndMatchAndEmitTargetInstruction()` passes it to `getStreamer().emitDwarfFileDirective()`, and it eventually comes to `isRootFile()`. The comparison fails because `FileName.data()` is not properly terminated. In addition, the old code might cause a significant speed degradation for long source files. The `operator!=()` for `std::string` and `const char *` can be implemented in a way that it finds the length of the second argument first, which slows the comparison for long data. `parseAndMatchAndEmitTargetInstruction()` calls `emitDwarfFileDirective()` every time if `CppHashInfo.Filename` is not empty. As a result, the longer the source file is, the slower the compilation wend, and for a very long file, it might take hours instead of a couple of seconds normally. Differential Revision: https://reviews.llvm.org/D117785	2022-01-21 13:52:10 +07:00
wangpc	8def89b5dc	[RISCV] Set CostPerUse to 1 iff RVC is enabled After D86836, we can define multiple cost values for different cost models. So here we set CostPerUse to 1 iff RVC is enabled to avoid potential impact on RA. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117741	2022-01-21 14:44:26 +08:00
Zi Xuan Wu	82bb8a588d	[CSKY] Add codegen support of GlobalTLSAddress lowering There are static and dynamic TLS address lowering in DAG stage according to different TLS model. It needs PseudoTLSLA32 pseudo to get address of TLS-related entry which resides in constant pool.	2022-01-21 14:39:55 +08:00
Craig Topper	7b3d307288	[RISCV] Add isel patterns for grevi, shfli, and unshfli to brev8/zip/unzip instructions. Zbkb supports some encodings of the general grevi, shfli, and unshfli instructions legal, so we added separate instructions for those encodings to improve the diagnostics for assembler and disassembler. To be consistent we should always use these separate instructions whenever those specific encodings of grevi/shfli/unshfli occur. So this patch adds specific isel patterns to override the generic isel patterns for these cases. Similar was done for rev8 and zext.h for Zbb previously.	2022-01-20 20:43:52 -08:00
Wu Xinlong	7ee1c162cc	[RISCV][RFC] add inst support of zbkb This commit add instructions supports of `zbkb` which defined in scalar cryptography extension version v1.0.0 (has been ratified already). Most of the zbkb directives reuse parts of the zbp and zbb directives, so this patch just modified some of the inst aliases and predicates. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117640	2022-01-21 11:49:36 +08:00
Joao Moreira	82af95029e	[X86] Enable ibt-seal optimization when LTO is used in Kernel Intel's CET/IBT requires every indirect branch target to be an ENDBR instruction. Because of that, the compiler needs to correctly emit these instruction on function's prologues. Because this is a security feature, it is desirable that only actual indirect-branch-targeted functions are emitted with ENDBRs. While it is possible to identify address-taken functions through LTO, minimizing these ENDBR instructions remains a hard task for user-space binaries because exported functions may end being reachable through PLT entries, that will use an indirect branch for such. Because this cannot be determined during compilation-time, the compiler currently emits ENDBRs to every non-local-linkage function. Despite the challenge presented for user-space, the kernel landscape is different as no PLTs are used. With the intent of providing the most fit ENDBR emission for the kernel, kernel developers proposed an optimization named "ibt-seal" which replaces the ENDBRs for NOPs directly in the binary. The discussion of this feature can be seen in [1]. This diff brings the enablement of the flag -mibt-seal, which in combination with LTO enforces a different policy for ENDBR placement in when the code-model is set to "kernel". In this scenario, the compiler will only emit ENDBRs to address taken functions, ignoring non-address taken functions that are don't have local linkage. A comparison between an LTO-compiled kernel binaries without and with the -mibt-seal feature enabled shows that when -mibt-seal was used, the number of ENDBRs in the vmlinux.o binary patched by objtool decreased from 44383 to 33192, and that the number of superfluous ENDBR instructions nopped-out decreased from 11730 to 540. The 540 missed superfluous ENDBRs need to be investigated further, but hypotheses are: assembly code not being taken care of by the compiler, kernel exported symbols mechanisms creating bogus address taken situations or even these being removed due to other binary optimizations like kernel's static_calls. For now, I assume that the large drop in the number of ENDBR instructions already justifies the feature being merged. [1] - https://lkml.org/lkml/2021/11/22/591 Reviewed By: xiangzhangllvm Differential Revision: https://reviews.llvm.org/D116070	2022-01-21 10:55:34 +08:00
Hsiangkai Wang	ad06e65dc4	[RISCV] Fix the bug in the register allocator caused by reserved BP. Originally, hasRVVFrameObject() will scan all the stack objects to check whether if there is any scalable vector object on the stack or not. However, it causes errors in the register allocator. In issue 53016, it returns false before RA because there is no RVV stack objects. After RA, it returns true because there are spilling slots for RVV values during RA. The compiler will not reserve BP during register allocation and generate BP access in the PEI pass due to the inconsistent behavior of the function. The function is changed to use hasStdExtV() as the return value. It is not precise, but it can make the register allocation correct. Refer to https://github.com/llvm/llvm-project/issues/53016. Differential Revision: https://reviews.llvm.org/D117663	2022-01-21 01:23:01 +00:00
Craig Topper	cfae2c65db	[RISCV] Factor Zve32 support into RISCVSubtarget::getMaxELENForFixedLengthVectors. This is needed to properly limit fractional LMULs for Zve32. Add new RUN Zve32 RUN lines to the existing tests for the -riscv-v-fixed-length-vector-elen-max command line option.	2022-01-20 16:31:12 -08:00
Pawe Bylica	1d7604fdce	[InstCombine] Simplify bswap -> shift Simplify bswap(x) to shl(x) or lshr(x) if x has exactly one "active byte", i.e. all active bits are contained in boundaries of a single byte of x. https://alive2.llvm.org/ce/z/nvbbU5 https://alive2.llvm.org/ce/z/KiiL3J Reviewed By: spatel, craig.topper, lebedev.ri Differential Revision: https://reviews.llvm.org/D117680	2022-01-21 01:25:30 +01:00
Johannes Doerfert	37e0c58559	[Attributor][FIX] AAValueConstantRange should not loop unconstrained The old method to avoid unconstrained expansion of the constant range in a loop did not work as soon as there were multiple instructions in between the phi and its input. We now take a generic approach and limit the number of updates as a fallback. The old method is kept as it catches "the common case" early.	2022-01-20 18:07:04 -06:00
Johannes Doerfert	7bf9065ad7	[Attributor][NFC] Clang format	2022-01-20 18:06:53 -06:00
Craig Topper	5e88f527da	[RISCV] Remove RISCVSubtarget::hasStdExtV() and hasStdExtZve(). NFC All code should use one of the cleaner named hasVInstructions functions. Fix the two uses that weren't and delete the methods so no new uses can be created.	2022-01-20 15:05:09 -08:00
Craig Topper	fa8bb22466	[RISCV] Optimize vector_shuffles that are interleaving the lowest elements of two vectors. RISCV only has a unary shuffle that requires places indices in a register. For interleaving two vectors this means we need at least two vrgathers and a vmerge to do a shuffle of two vectors. This patch teaches shuffle lowering to use a widening addu followed by a widening vmaccu to implement the interleave. First we extract the low half of both V1 and V2. Then we implement (zext(V1) + zext(V2)) + (zext(V2) * zext(2^eltbits - 1)) which simplifies to (zext(V1) + zext(V2) * 2^eltbits). This further simplifies to (zext(V1) + zext(V2) << eltbits). Then we bitcast the result back to the original type splitting the wide elements in half. We can only do this if we have a type with wider elements available. Because we're using extends we also have to be careful with fractional lmuls. Floating point types are supported by bitcasting to/from integer. The tests test a varied combination of LMULs split across VLEN>=128 and VLEN>=512 tests. There a few tests with shuffle indices commuted as well as tests for undef indices. There's one test for a vXi64/vXf64 vector which we can't optimize, but verifies we don't crash. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D117743	2022-01-20 14:44:47 -08:00
Michael Kruse	1d4ca42b43	[OpenMPIRBuilder] Detect ambiguous InsertPoints for apply*WorkshareLoop. NFC. Follow-up on D117226 for applyStaticWorkshareLoop and applyDynamicWorkshareLoop checking for conflicting InertPoints via an assert. There is no in-tree code that violates this assertion, hence nothing changes.	2022-01-20 16:10:17 -06:00
Philip Reames	c0906f6b21	[SLP] Remove stray semicolon to make bots happy Certain bots (e.g. sanitizer-x86_64-linux-android) appear to be running with strict c++98 flags which disallow ; at global scope.	2022-01-20 14:09:28 -08:00
Stanislav Mekhanoshin	41ebd19681	[AMDGPU] Do not ignore exec use where exec is read as data Compares, v_cndmask_b32, and v_readfirstlane_b32 use EXEC in a way which modifies the result. This implicit EXEC use shall not be ignored for the purposes of instruction moves. Differential Revision: https://reviews.llvm.org/D117814	2022-01-20 14:05:22 -08:00
Philip Reames	5a670f1378	[SLP] Kill an unused param and use a for-loop in calculateDependencies [NFC]	2022-01-20 13:58:20 -08:00
Philip Reames	60f6191879	[SLP] Extract formBundle helper for readability [NFC]	2022-01-20 13:08:37 -08:00
Sanjay Patel	a7a2860d0e	[InstCombine] convert mul with sexted bool and constant to select We already have the related folds for zext-of-bool, so it should make things more consistent to have this transform to select for sext-of-bool too: https://alive2.llvm.org/ce/z/YikdfA Fixes #53319	2022-01-20 15:57:01 -05:00
Craig Topper	dd7b69a61f	[RISCV] Remove HadStdExtV and HasStdZve* Predicates from tablegen. No instructions should be using these. Everything should use HasVInstructions* Predicates. Remove them so that they can't be used by accident.	2022-01-20 12:54:20 -08:00
Philip Reames	118babe67a	[SLP] Use for loops for walking bundle elements	2022-01-20 12:44:33 -08:00
Craig Topper	7a275dc354	[RISCV] Remove Zvlsseg extension. This string no longer appears in the Vector Extension specification. The segment load/store instructions are just part of the vector instruction set. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D117724	2022-01-20 12:40:07 -08:00
Philip Reames	860038e0d7	[SLP] Rename a couple lambdas to be more clearly separate from method names	2022-01-20 12:13:30 -08:00
Roman Lebedev	ba8eb31bd9	[InstCombine] Instruction sinking: fix check for function terminating block Checking for specific function terminating opcodes means we don't handle other non-hardcoded ones :) This should probably be generalized to something similar to the `IsBlockFollowedByDeoptOrUnreachable()`. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D117810	2022-01-20 22:41:31 +03:00
Craig Topper	94e69fbb4f	[RISCV] Add DAG combine to fold (fp_to_int_sat (ffloor X)) -> (select X == nan, 0, (fcvt X, rdn)) Similar for ceil, trunc, round, and roundeven. This allows us to use static rounding modes to avoid a libcall. This is similar to D116771, but for the saturating conversions. This optimization is done for AArch64 as isel patterns. RISCV doesn't have instructions for ceil/floor/trunc/round/roundeven so the operations don't stick around until isel to enable a pattern match. Thus I've implemented a DAG combine. I'm only handling saturating to i64 or i32. This could be extended to other sizes in the future. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D116864	2022-01-20 11:35:37 -08:00
Daniel Thornburgh	6b92bb4790	[Support] [DebugInfo] Lazily create cache dir. This change defers creating Support/Caching.cpp's cache directory until it actually writes to the cache. This allows using Caching library in a read-only fashion. If read-only, the cache is guaranteed not to write to disk. This keeps tools using DebugInfod (currently llvm-symbolizer) hermetic when not configured to perform remote lookups. Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D117589	2022-01-20 19:27:15 +00:00
Jonas Paulsson	792853cb78	[SystemZ] Remove the ManipulatesSP flag from backend (NFC). This flag was set in the presence of stacksave/stackrestore in order to force a frame pointer. This should however not be needed per the comment in MachineFrameInfo.h stating that a a variable sized object "...is the sole condition which prevents frame pointer elimination", and experiments have also shown that there seems to be no effect whatsoever on code generation with ManipulatesSP. Review: Ulrich Weigand	2022-01-20 13:00:51 -06:00
Sanjay Patel	2d031ec5e5	[InstCombine] add one-use check to opposite shift folds Test comments say this might be intentional, but I don't see any hard evidence to support it. The extra instruction shows up as a potential regression in D117680. One test does show a missed fold that might be recovered with better demanded bits analysis.	2022-01-20 13:49:23 -05:00
Craig Topper	9abc593e98	[TargetLowering][InstCombine] Simplify BSwap demanded bits code a little. NFC Use alignDown instead of &= ~7. Replace ResultBit with NLZ. (BitWidth - NLZ - NTZ == 8) so (BitWidth - NTZ - 8 == NLZ). Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D117804	2022-01-20 10:45:17 -08:00
Alexandre Ganea	5af2433e17	[clang-cl] Support the /HOTPATCH flag This patch adds support for the MSVC /HOTPATCH flag: https://docs.microsoft.com/sv-se/cpp/build/reference/hotpatch-create-hotpatchable-image?view=msvc-170&viewFallbackFrom=vs-2019 The flag is translated to a new -fms-hotpatch flag, which in turn adds a 'patchable-function' attribute for each function in the TU. This is then picked up by the PatchableFunction pass which would generate a TargetOpcode::PATCHABLE_OP of minsize = 2 (which means the target instruction must resolve to at least two bytes). TargetOpcode::PATCHABLE_OP is only implemented for x86/x64. When targetting ARM/ARM64, /HOTPATCH isn't required (instructions are always 2/4 bytes and suitable for hotpatching). Additionally, when using /Z7, we generate a 'hot patchable' flag in the CodeView debug stream, in the S_COMPILE3 record. This flag is then picked up by LLD (or link.exe) and is used in conjunction with the linker /FUNCTIONPADMIN flag to generate extra space before each function, to accommodate for live patching long jumps. Please see: `d703b92296/lld/COFF/Writer.cpp (L1298)` The outcome is that we can finally use Live++ or Recode along with clang-cl. NOTE: It seems that MSVC cl.exe always enables /HOTPATCH on x64 by default, although if we did the same I thought we might generate sub-optimal code (if this flag was active by default). Additionally, MSVC always generates a .debug$S section and a S_COMPILE3 record, which Clang doesn't do without /Z7. Therefore, the following MSVC command-line "cl /c file.cpp" would have to be written with Clang such as "clang-cl /c file.cpp /HOTPATCH /Z7" in order to obtain the same result. Depends on D43002, D80833 and D81301 for the full feature. Differential Revision: https://reviews.llvm.org/D116511	2022-01-20 12:57:19 -05:00
Matt Arsenault	064cea9c9a	AMDGPU/GlobalISel: Try to use s_and_b64 in ptrmask selection Avoids a test diff with SDAG.	2022-01-20 12:56:53 -05:00
Matt Arsenault	2e49e0cfde	AMDGPU/GlobalISel: Directly diagnose return value use for FP atomics Emit an error if the return value is used on subtargets that do not support them. Previously we were falling back to the DAG on selection failure, where it would emit this error and then fail again.	2022-01-20 12:46:45 -05:00
Nadav Rotem	191a6e9dfa	optimize icmp-ugt-ashr This diff optimizes the sequence icmp-ugt(ashr,C_1) C_2. InstCombine already implements this optimization for sgt, and this patch adds support ugt. This patch adds the check for UGT. @craig.topper came up with the idea and proof: define i1 @src(i8 %x, i8 %y, i8 %c) { %cp1 = add i8 %c, 1 %i = shl i8 %cp1, %y %i.2 = ashr i8 %i, %y %cmp = icmp eq i8 %cp1, %i.2 ;Assume: C + 1 == (((C + 1) << y) >> y) call void @llvm.assume(i1 %cmp) ; uncomment for the sgt case %j = shl i8 %cp1, %y %j.2 = sub i8 %j, 1 %cmp2 = icmp ne i8 %j.2, 127 ;Assume (((c + 1 ) << y) - 1) != 127 call void @llvm.assume(i1 %cmp2) %s = ashr i8 %x, %y %r = icmp sgt i8 %s, %c ret i1 %r } define i1 @tgt(i8 %x, i8 %y, i8 %c) { %cp1 = add i8 %c, 1 %j = shl i8 %cp1, %y %j.2 = sub i8 %j, 1 %r = icmp sgt i8 %x, %j.2 ret i1 %r } declare void @llvm.assume(i1) This change is related to the optimizations in D117252. Differential Revision: https://reviews.llvm.org/D117365	2022-01-20 09:31:46 -08:00

1 2 3 4 5 ...

154397 Commits