llvm-project

Commit Graph

Author	SHA1	Message	Date
Quentin Colombet	e547a333a4	[DeadArgElim] Set unused arguments for internal functions Prior to this patch we would only set to undef the unused arguments of the external functions. The rationale was that unused arguments of internal functions wouldn't need to be turned into undef arguments because they should have been simply eliminated by the time we reach that code. This is actually not true because there are plenty of cases where we can't remove unused arguments. For instance, if the internal function is used in an indirect call, it may not be possible to change the function signature. Yet, for statically known call-sites we would still like to mark the unused arguments as undef. This patch enables the "set undef arguments" optimization on internal functions when we encounter cases where internal functions cannot be optimized. I.e., whenever an internal function is marked "live". Differential Revision: https://reviews.llvm.org/D124699	2022-05-02 11:16:32 -07:00
Jonas Paulsson	304378fd09	Reapply "[BuildLibCalls] Introduce getOrInsertLibFunc() for use when building libcalls." (was `0f8c626`). This reverts commit `14d9390`. The patch previously failed to recognize cases where user had defined a function alias with an identical name as that of the library function. Module::getFunction() would then return nullptr which is what the sanitizer discovered. In this updated version a new function isLibFuncEmittable() has as well been introduced which is now used instead of TLI->has() anytime a library function is to be emitted . It additionally also makes sure there is e.g. no function alias with the same name in the module. Reviewed By: Eli Friedman Differential Revision: https://reviews.llvm.org/D123198	2022-05-02 19:37:00 +02:00
Amy Kwan	2534dc120a	[PowerPC] Enable CR bits support for Power8 and above. This patch turns on support for CR bit accesses for Power8 and above. The reason why CR bits are turned on as the default for Power8 and above is that because later architectures make use of builtins and instructions that require CR bit accesses (such as the use of setbc in the vector string isolate predicate and bcd builtins on Power10). This patch also adds the clang portion to allow for turning on CR bits in the front end if the user so desires to. Differential Revision: https://reviews.llvm.org/D124060	2022-05-02 12:06:15 -05:00
Arthur Eubanks	b07aab8fc1	[GlobalOpt] Iterate over replaced values deterministically to constprop If there are pre-existing dead instructions, the order we visit replaced values can cause us sometimes to not delete dead instructions. The added test non-deterministically failed without the change.	2022-05-02 09:43:20 -07:00
Nikita Popov	95fedfab6c	[InstCombine] Handle non-canonical GEP index in indexed compare fold (PR55228) Normally the index type will already be canonicalized here, but this is not guaranteed depending on visitation order. The code was already accounting for a potentially needed sext, but a trunc may also be needed. Add a ConstantExpr::getSExtOrTrunc() helper method to make this simpler. This matches the corresponding IRBuilder method in behavior. Fixes https://github.com/llvm/llvm-project/issues/55228.	2022-05-02 17:56:01 +02:00
Simon Pilgrim	59dc8ce95a	[X86] Reduce some superfluous diffs between znver1/znver2 models. NFC znver2 is a mainly a search+replace of the znver1 model, but for no reason the HADD and DPPS have been moved around - try to keep these in sync (no actual changes in the models).	2022-05-02 16:45:43 +01:00
Simon Pilgrim	ce9c0faca1	[X86][AMX] combineLdSt - don't dereference dyn_cast. NFC This leads to null pointer dereference warnings - use cast<> which will assert that the cast correct.	2022-05-02 16:45:43 +01:00
Augie Fackler	c7ae423e39	BuildLibCalls: add alloc-family attribute to many allocator functions Differential Revision: https://reviews.llvm.org/D123086	2022-05-02 11:12:55 -04:00
Augie Fackler	e940456531	BuildLibCalls: infer allocptr attribute for free and realloc() family functions Differential Revision: https://reviews.llvm.org/D123084	2022-05-02 09:43:21 -04:00
Simon Pilgrim	c7662dc3e5	[X86] MOVDDUP has the same sched behaviour as MOVSHDUP/MOVSLDUP on Skylake Fixes an old TODO - confirmed on Agner + uops.info	2022-05-02 12:50:37 +01:00
David Green	2dcb2d8562	[AArch64] Cost modelling for fptoi_sat This builds on top of the target-independent cost model added in D124269 to add aarch64 specific costs for fptoui_sat and fptosi_sat intrinsics. For many common types they will be legal instructions as the AArch64 instructions will saturate naturally. For unsupported pairs of integer and floating point types, an additional min/max clamp is needed. Differential Revision: https://reviews.llvm.org/D124357	2022-05-02 11:36:05 +01:00
Simon Pilgrim	86bb7df6e6	[CostModel][X86] getScalarizationOverhead - handle vXi1 extracts with MOVMSK (pre-AVX512) We can quickly extract multiple elements of a bool vector using MOVMSK ops - since we don't know what generated the vXi1, I've been optimistic and assumed we can use PMOVMSKB to extract the maximum number of bools with a single op. The MOVMSK pattern isn't great for extract+insert round trips as vXi1 type legalization can interfere with this a lot - so this relies on us remaining good at using getScalarizationOverhead properly (and tagging both Insert and Extract modes) for those round trip cases. The AVX512 KMOV codegen for bool extraction is a bit of a mess so for now I've not included that - the per-element cost is a lot more accurate for current codegen.	2022-05-02 09:58:39 +01:00
Nikita Popov	aae5f8115a	[Local] Consider atomic loads from constant global as dead Per the guidance in https://llvm.org/docs/Atomics.html#atomics-and-ir-optimization, an atomic load from a constant global can be dropped, as there can be no stores to synchronize with. Any write to the constant global would be UB. IPSCCP will already drop such loads, but the main helper in Local doesn't recognize this currently. This is motivated by D118387. Differential Revision: https://reviews.llvm.org/D124241	2022-05-02 10:52:58 +02:00
Nikita Popov	597946a4dd	[ConstantFold] Don't convert getelementptr to ptrtoint+inttoptr ConstantFolding currently converts "getelementptr i8, Ptr, (sub 0, V)" to "inttoptr (sub (ptrtoint Ptr), V)". This transform is, taken by itself, correct, but does came with two issues: 1. It unnecessarily broadens provenance by introducing an inttoptr. We generally prefer not to introduce inttoptr during optimization. 2. For the case where V == ptrtoint Ptr, this folds to inttoptr 0, which further folds to null. In that case provenance becomes incorrect. This has been observed as a real-world miscompile with rustc. We should probably address that incorrect inttoptr 0 fold at some point, but in either case we should also drop this inttoptr-introducing fold. Instead, replace it with a fold rooted at ptrtoint(getelementptr), which seems to cover the original motivation for this fold (test2 in the changed file). Differential Revision: https://reviews.llvm.org/D124677	2022-05-02 10:24:46 +02:00
Phoebe Wang	7c04454227	[ArgPromotion][Attributor] Update min-legal-vector-width when do promotion X86 codegen uses function attribute `min-legal-vector-width` to select the proper ABI. The intention of the attribute is to reflect user's requirement when they passing or returning vector arguments. So Clang front-end will iterate the vector arguments and set `min-legal-vector-width` to the width of the maximum for both caller and callee. It is assumed any middle end optimizations won't care of the attribute expect inlining and argument promotion. - For inlining, we will propagate the attribute of inlined functions because the inlining functions become the newer caller. - For argument promotion, we check the `min-legal-vector-width` of the caller and callee and refuse to promote when they don't match. The problem comes from the optimizations' combination, as shown by https://godbolt.org/z/zo3hba8xW. The caller `foo` has two callees `bar` and `baz`. When doing argument promotion, both `foo` and `bar` has the same `min-legal-vector-width`. So the argument was promoted to vector. Then the inlining inlines `baz` to `foo` and updates `min-legal-vector-width`, which results in ABI mismatch between `foo` and `bar`. This patch fixes the problem by expanding the concept of `min-legal-vector-width` to indicator of functions arguments. That says, any passes touch functions arguments have to set `min-legal-vector-width` to the value reflects the width of vector arguments. It makes sense to me because any arguments modifications are ABI related and should response for the ABI compatibility. Differential Revision: https://reviews.llvm.org/D123284	2022-05-02 14:13:05 +08:00
Fangrui Song	2019c9b1c8	[RISCV] Lower case the first letter of LowerRISCVMachineOperandToMCOperand. NFC	2022-05-01 14:13:55 -07:00
Florian Hahn	5387a38c38	[SimpleLoopUnswitch] Freeze individual OR/AND operands. In some cases, it is not enough to freeze the final AND/OR operation when chaining a number of invariant conditions together. After creating a chain of ANDs/ORs, we assume all unswitched operands to be either true or false. But if any of the operands is poison, the rest of the operands could have any value after branching on the frozen condition. To avoid that, freeze individual operands, if needed. In some cases this may lead to unnecessary freezes, but it seems required at least for some cases (see trivial-unswitch-freeze-individual-conditions.ll) Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D124554	2022-05-01 20:11:05 +01:00
Simon Pilgrim	34f97a3709	[VectorCombine] Merge isa<>/cast<> into dyn_cast<>. NFC. We want to handle the the assert in VectorCombine so avoid the repeated isa/cast code.	2022-05-01 20:09:10 +01:00
Simon Pilgrim	ae8b10e543	[DAG] (style) Break apart if-else chain as they all return	2022-05-01 17:56:59 +01:00
Simon Pilgrim	980f41d7c4	[X86] (style) Use auto for dyn_cast<> results	2022-05-01 17:15:18 +01:00
Simon Pilgrim	d4f06ec874	[X86] (style) Don't use auto for non obvious types	2022-05-01 17:10:21 +01:00
Simon Pilgrim	09761ce295	[SLPVectorizer] Remove weird unicode character from comment. NFCI. Whatever it was, Visual Assist really didn't like it....	2022-05-01 16:37:21 +01:00
Simon Pilgrim	d5198cf92f	[CostModel][X86] Check for 'null op' truncations If the legalized src/dst types are the same, assume the "truncation" is free. This fixes some edge cases such as mul lo/hi ops and bool vectors which will get legalized back to legal vector widths	2022-05-01 12:03:40 +01:00
Simon Pilgrim	c2964746e3	[CostModel][X86] Reduce cost of vector selects on SSE2/AVX1 targets Based off the script from D103695, we were exaggerating the cost of the OR(AND(X,M),AND(Y,~M)) expansion using instruction count instead of effective throughput	2022-05-01 09:32:14 +01:00
Jack Andersen	09325d3606	[CAPI] Expose CastInst::getCastOpcode in C API Reviewed By: deadalnix Differential Revision: https://reviews.llvm.org/D91514	2022-04-30 18:40:04 -04:00
Dmitry Vassiliev	2e7e0975c0	[NVPTX] Prefix "$L__" for branch label names A global variable may have the same name as a label, and ptxas does not accept it. Prefix labels with $L__ to fix this. Reviewed By: MaskRay, tra Differential Revision: https://reviews.llvm.org/D119669	2022-04-30 21:55:20 +02:00
Florian Hahn	8b022f87b0	[SimpleLoopUnswitch] Freeze trivial conditions if needed. Trivial unswitching can also introduce new branches on undef/poison. Freeze the conditions if needed. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D124549	2022-04-30 19:53:36 +01:00
Paul Walker	f10a8f6752	[LegalizeDAG] Fix TypeSize conversion error when expanding SIGN_EXTEND_INREG SIGN_EXTEND_INREG expansion can trigger a TypeSize error because "VT.getSizeInBits() == 1" is used to detect for a boolean without first verifying VT is a scalar.	2022-04-30 19:21:48 +01:00
Craig Topper	6affe87bda	[DAGCombiner] When matching a disguised rotate by constant don't forget to apply LHSMask/RHSMask. We try to match as a disguised rotate by constant of these forms (shl (X \| Y), C1) \| (srl X, C2) --> (rotl X, C1) \| (shl Y, C1) (shl X, C1) \| (srl (X \| Y), C2) --> (rotl X, C1) \| (srl Y, C2) We may have also looked through an AND to find the shift. If we did, we need to apply a mask to the result. I'll add an AArch64 test and pre-commit it and the RISC-V test tomorrow. Fixes PR55201. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D124711	2022-04-30 11:02:30 -07:00
luxufan	e098281c27	[RISCV] Don't getDebugLoc for the end node of MBB iterator Because of shrink wrapping, the block to insert epilog may don't have instructions (Only debug instructions). And the position to insert may point to MBB.end() that don't have a DebugLoc. This patch fix this problem. The test program was copied from the issue:https://github.com/llvm/llvm-project/issues/53662 Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D123679	2022-04-30 16:00:20 +08:00
Saleem Abdulrasool	24ba1302b3	AArch64: modify Swift async frame record storage on Windows The frame layout on Windows differs from that on other platforms. It will spill the registers in descending numeric value (i.e. x30, x29, ...). Furthermore, the x29, x30 pair is particularly important as it is used for the fast stack walking. As a result, we cannot simply insert the Swift async frame record in between the store. To provide the simplistic search mechanism, always spill the async frame record prior to the spilled registers. This was caught by the assertion failure in the frame lowering code when building the runtime for Windows AArch64. Fixes: #55058 Differential Revision: https://reviews.llvm.org/D124498 Reviewed By: mstorsjo	2022-04-30 09:01:33 -07:00
Juneyoung Lee	40a2e35599	[InstCombine] Remove the undef-related workaround code in visitSelectInst This patch removes an old hack in visitSelectInst that was written to avoid miscompilation bugs in loop unswitch. (Added via https://reviews.llvm.org/D35811) The legacy loop unswitch pass will be removed after D124376, and the new simple loop unswitch pass correctly uses freeze to avoid introducing UB after D124252. Since the hack is not necessary anymore, this patch removes it. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D124426	2022-04-30 20:48:42 +09:00
Simon Pilgrim	92235e3bf4	[X86] lowerShuffleAsRepeatedMaskAndLanePermute - permit 32-bit sublane permute for unary v32i8 cases Increase the likelihood that we can lower to a permd(pshufb()) pattern, but only after we've attempted with 64-bit sublane permutes first Fixes #55066	2022-04-30 11:00:28 +01:00
Yeting Kuo	c069e37019	[RISCV] Add DAGCombine to fold base operation and reduction. Transform (<bop> x, (reduce.<bop> vec, splat(neutral_element))) to (reduce.<bop> vec, splat (x)). Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D122563	2022-04-30 14:07:05 +08:00
Craig Topper	f91690f7db	[RISCV] Don't merge addi into load/store address if addi has a FrameIndex operand. This fixes a crash from D124231. We can't fold (load (add base, (addi src, off1)), off2) -> (load (add base, src), off1+off2) if the src is a FrameIndex. FrameIndex cannot be the operand of an add. There was an immediate==0 check that I think was trying to catch the common case of FrameIndex addis where the immediate is 0, but they can also appear in non-zero form. Instead explicitly check for a FrameIndex operand.	2022-04-29 18:22:20 -07:00
Craig Topper	5aa1a7b307	[RISCV] Remove 'frameindex' from list for ComplexPattern. NFC Putting a node in this list allows the node to be used as the root of an isel pattern that would then call the ComplexPattern. The usual case is to use the ComplexPattern as the operand of another operator. AddrFI is never used as a root operation. frameindex is handled directly with custom code in RISCVISelDAGToDAG::Select. So adding frameindex to the list here serves no purpose.	2022-04-29 17:41:07 -07:00
Hongtao Yu	bdb8c50a1c	[CSSPGO] Turn on priority inlining for probe-only profile We have seen that the prioirty inliner delivered on-par performance with the old inliner for probe-only CSSPGO profile, as long as without a size budget. I'm turning on the priority inliner for probe-only profile by default. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D124632	2022-04-29 17:31:56 -07:00
Hongtao Yu	e36786d15f	[CSSPGO] Rename ProfileIsCSNested and ProfileIsCSFlat To be more clear and definitive, I'm renaming `ProfileIsCSFlat` back to `ProfileIsCS` which stands for full context-sensitive flat profiles. `ProfileIsCSNested` is now renamed to `ProfileIsPreInlined` and is extended to be applicable for CS flat profiles too. More specifically, `ProfileIsPreInlined` is for any kind of profiles (flat or nested) that contain 'ShouldBeInlined' contexts. The flag is encoded in the profile summary section for extbinary profiles and is computed on-the-fly for text profiles. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D122602	2022-04-29 17:03:52 -07:00
David Kreitzer	6918a15f43	Test commit. Fixed a typo in a comment.	2022-04-29 16:18:09 -07:00
Dmitry Vassiliev	8c49ab040c	[NVPTX] Add add.cc/addc.cc/sub.cc/subc.cc for i64 PTX supports those instructions for i64 starting from 4.3. The patch also marks corresponding DAG nodes legal for both i32 and i64. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D124698	2022-04-29 15:32:22 -07:00
Simon Dardis	938ed8ae99	[MIPS] Address instruction selection failure for abs.[sd] Previously, the choice between the instruction selection of ISD::FABS was decided at the point of setting the MIPS target lowering operation choice either `Custom` lowering or `Legal`. This lead to instruction selection failures as functions could be marked as having no NaNs. Changing the lowering to always be `Custom` and directly handling the the cases where MIPS selects the instructions for ISD::FABS resolves this crash. Thanks to kray for reporting the issue and to Simon Atanasyan for producing the reduced test case. This resolves PR/53722. Differential Revision: https://reviews.llvm.org/D124651	2022-04-29 23:10:58 +01:00
James Y Knight	02aa795785	Revert "[JumpThreading][NFC][CompileTime] Do not recompute BPI/BFI analyzes" This change has caused non-reproducibility of a self-build of Clang when using NewPM and providing profile data. This reverts commit `35f38583d2`.	2022-04-29 21:15:47 +00:00
Congzhe Cao	c428a3d2a0	[LoopCacheAnalysis] Enable delinearization of fixed sized arrays Currently loop cache cost (LCC) cannot analyze fix-sized arrays since it cannot delinearize them. This patch adds the capability to delinearize fix-sized arrays to LCC. Most of the code is ported from DependenceAnalysis.cpp and some refactoring will be done in a next patch. Reviewed By: #loopoptwg, Meinersbur Differential Revision: https://reviews.llvm.org/D122857	2022-04-29 16:01:27 -04:00
Stanislav Mekhanoshin	51e02409f0	[AMDGPU] Produce waitcounts for LDS DMA MUBUF and FLAT LDS DMA operations need a wait on vmcnt before LDS written can be accessed. A load from LDS to VMEM does not need a wait. Differential Revision: https://reviews.llvm.org/D124626	2022-04-29 11:14:11 -07:00
David Penry	dcb77643e3	Reapply [CodeGen][ARM] Enable Swing Module Scheduling for ARM Fixed "private field is not used" warning when compiled with clang. original commit: `28d09bbbc3` reverted in: `fa49021c68` ------ This patch permits Swing Modulo Scheduling for ARM targets turns it on by default for the Cortex-M7. The t2Bcc instruction is recognized as a loop-ending branch. MachinePipeliner is extended by adding support for "unpipelineable" instructions. These instructions are those which contribute to the loop exit test; in the SMS papers they are removed before creating the dependence graph and then inserted into the final schedule of the kernel and prologues. Support for these instructions was not previously necessary because current targets supporting SMS have only supported it for hardware loop branches, which have no loop-exit-contributing instructions in the loop body. The current structure of the MachinePipeliner makes it difficult to remove/exclude these instructions from the dependence graph. Therefore, this patch leaves them in the graph, but adds a "normalization" method which moves them in the schedule to stage 0, which causes them to appear properly in kernel and prologues. It was also necessary to be more careful about boundary nodes when iterating across successors in the dependence graph because the loop exit branch is now a non-artificial successor to instructions in the graph. In additional, schedules with physical use/def pairs in the same cycle should be treated as creating an invalid schedule because the scheduling logic doesn't respect physical register dependence once scheduled to the same cycle. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D122672	2022-04-29 10:54:39 -07:00
Philip Reames	3ea191ed03	[RISCV] Factor repeating code into getMaskTypeFor(VT) [nfc]	2022-04-29 10:00:57 -07:00
Joe Nash	813e521e55	[AMDGPU] Add gfx11 subtarget ELF definition This is the first patch of a series to upstream support for the new subtarget. Contributors: Jay Foad <jay.foad@amd.com> Konstantin Zhuravlyov <kzhuravl_dev@outlook.com> Patch 1/N for upstreaming AMDGPU gfx11 architectures. Reviewed By: foad, kzhuravl, #amdgpu Differential Revision: https://reviews.llvm.org/D124536	2022-04-29 12:27:17 -04:00
Paul Walker	b481512485	[SVE] Move reg+reg gather/scatter addressing optimisations from lowering into DAG combine. This is essentially a refactoring patch but allows more cases to be caught, hence the output changes to some tests. Differential Revision: https://reviews.llvm.org/D122994	2022-04-29 17:42:33 +01:00
Philip Reames	f927be0df8	[RISCV] Extract getAllOnesMask helper [nfc]	2022-04-29 09:30:18 -07:00
Alexey Bataev	484fcb9888	[SLP][NFC]Fix a comment.	2022-04-29 09:27:13 -07:00
Craig Topper	5c38373125	[RISCV] Improve constant materialization for cases that can use LUI+ADDI instead of LUI+ADDIW. It's possible that we have a constant that isn't simm32 so we can't use LUI+ADDIW, but we can use LUI+ADDI. Because ADDI uses a sign extended constant, it's possible that after subtracting it out, we end up with a simm32 that maps to LUI. This patch detects this case after removing Lo12 and before shifting the value for SLLI. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D124222	2022-04-29 08:58:32 -07:00
Simon Pilgrim	b424055b52	[X86] lowerShuffleAsRepeatedMaskAndLanePermute - move the sublane split code into a lambda helper. NFC. This is a NFC cleanup as part of the work on #55066 - the idea being that we will be able to check for multiple sub lane scales.	2022-04-29 16:03:50 +01:00
Alexey Bataev	371412e065	[COST]Fix crash for non-power-2 vector shuffle mask. Need to normalizize the mask to avoid possible crashes during attempts to estimate cost of the very long shuffles with non-power-2 number of elements in masks.	2022-04-29 07:28:07 -07:00
Florian Hahn	a80081763c	[SimplifyCFG] Avoid shifting by a too large exponent. TI->getBitWidth can be > 64 and in those cases the shift will be UB due to the exponent being too large. To fix this, cap the shift at 63. I think this should work out fine, because TableSize is itself a 64 bit type and the maximum table size must fit in the type. Also, if we would underestimate the size here, at most we get an extra ZExt. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D124608	2022-04-29 15:19:06 +01:00
Anna Thomas	205246cb64	[CompileTime] [Passes] Avoid computing unnecessary analyses. NFC Similar to `c515b2f39e`, If there are no loops in the function as seen through LI, we should avoid computing the remaining expensive analyses (such as SCEV, BPI). Reordered the analyses requests and early return if there are no loops. The logic of avoiding expensive analyses is applied to LoopVectorizer, LoopLoadElimination and LoopUnrollPass, i.e. all function passes which operate on loops. This is an NFC with compile time improvement. Differential Revision: https://reviews.llvm.org/D124529	2022-04-29 10:00:06 -04:00
Stefan Pintilie	f685bce808	[PowerPC][NFC] Add a function to determine if a call needs to be NOTOC. Add the isNoTOCCallInstr function to PPCInstrInfo to determine if a call opcode does not need a TOC restore after the call. All call opcodes should be listed in this function. A default unreachable in this function should force future call opcodes to also be added. This is a follow up patch to D122012 Reviewed By: jsji, shchenz Differential Revision: https://reviews.llvm.org/D124415	2022-04-29 08:36:07 -05:00
Paul Walker	23c509754d	[DAGCombiner] Stop invalid sign conversion in refineIndexType. When looking through extends of gather/scatter indices it's safe to convert a known positive signed index to unsigned, but unsigned indices must remain unsigned. Depends On D123318 Differential Revision: https://reviews.llvm.org/D123326	2022-04-29 14:20:13 +01:00
Paul Walker	59588f0a3d	[SVE][ISel] Ensure explicit gather/scatter offset extension isn't lost. getGatherScatterIndexIsExtended currently looks through all SIGN_EXTEND_INREG operations regardless of their input type. This patch restricts the code to only look through i32->i64 extensions, which are the ones supported implicitly by SVE addressing modes. Differential Revision: https://reviews.llvm.org/D123318	2022-04-29 14:20:13 +01:00
Joseph Huber	643c9b22ef	[OpenMP] Make generating offloading entries more generic This patch moves the logic for generating the offloading entries to the OpenMPIRBuilder. This makes it easier to re-use in other places, such as for OpenMP support in Flang or using the same method for generating offloading entires for other languages like Cuda. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D123460	2022-04-29 09:14:31 -04:00
Nikita Popov	027c728f29	[SelectionDAGBuilder] Don't create MGATHER/MSCATTER with Scale != ElemSize This is an alternative to D124530. In getUniformBase() only create scales that match the gather/scatter element size. If targets also support other scales, then they can produce those scales in target DAG combines. This is what X86 already does (as long as the resulting scale would be 1, 2, 4 or 8). This essentially restores the pre-opaque-pointer state of things. Fixes https://github.com/llvm/llvm-project/issues/55021. Differential Revision: https://reviews.llvm.org/D124605	2022-04-29 14:57:53 +02:00
Nikita Popov	1881711fbb	[InstCombine] Remove memset of undef value This removes memset with undef char. We already do this for stores of undef value. This comes with the caveat that this optimization is not, strictly speaking, legal for undef values, because we might be overwriting a poison value. However, our entire load/store model currently still operates on undef values, so we need to support undef here as well for internal consistency. Once https://github.com/llvm/llvm-project/issues/52930 is resolved, these and related folds can be limited to poison -- I've added FIXMEs to that effect. Differential Revision: https://reviews.llvm.org/D124173	2022-04-29 14:51:18 +02:00
Ricky Zhou	24a133e16f	[LV] Rename CountRoundDown to VectorTripCount (NFC) The name CountRoundDown is potentially misleading, as the number of iterations can be rounded up when folding the tail. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D119681	2022-04-29 13:50:00 +01:00
Nikita Popov	982cbed819	[InstCombine] Fold logical and/or of range icmps with nowrap flags This is an edge-case where we don't convert to bitwise and/or based on implies poison reasoning, so explicitly try to perform the fold in logical form. The transform itself is poison-safe, as both icmps are based on the same value and any nowrap flags are discarded as part of the fold (https://alive2.llvm.org/ce/z/aCwC8b for the used example).	2022-04-29 14:42:42 +02:00
Florian Hahn	e66127e69b	[VPlan] Simplify & adjust code as suggested in D123005. Improve code as suggested in D123005. Applied separately, because the comments where made a diff that has not been rebased to current main.	2022-04-29 13:34:54 +01:00
NAKAMURA Takumi	61d3a3afe2	AVRExpandPseudoInsts.cpp: Fix a warning. [-Wunused-but-set-variable] It has been enabled since llvmorg-15-init-5683-g2af845a6519c, aka D122271.	2022-04-29 21:01:47 +09:00
Paul Walker	7a0b897e86	[DAGCombiner][SVE] Ensure MGATHER/MSCATTER addressing mode combines preserve index scaling refineUniformBase and selectGatherScatterAddrMode both attempt the transformation: base(0) + index(A+splat(B)) => base(B) + index(A) However, this is only safe when index is not implicitly scaled. Differential Revision: https://reviews.llvm.org/D123222	2022-04-29 12:35:16 +01:00
Simon Pilgrim	3562f855b7	[X86] SimplifyDemandedVectorEltsForTargetNode - fold (uniform) shift(0,x) -> 0	2022-04-29 12:08:47 +01:00
Nikita Popov	57aaeefc18	[InstCombine] Pass ICmpInsts to foldAndOrOfICmpsUsingRanges() (NFC) Pass the whole instruction rather than unpacking it. This makes it easier to reuse the function in another place, as the entire logic is encapsulated.	2022-04-29 12:46:31 +02:00
Simon Pilgrim	336a1233b2	[X86] SimplifyDemandedVectorEltsForTargetNode - fold shift(0,x) -> 0	2022-04-29 11:32:54 +01:00
Nikita Popov	1f53932a95	[InstCombine] Remove foldAndOrOfEqualityCmpsWithConstants() fold This fold handles a special subset of foldAndOrOfICmpsUsingRanges(), use the more generic implementation instead. The result can differ if a representation using a range comparison is possible, in which case that is preferred over masking. There is a canonicalization opportunity here.	2022-04-29 12:23:00 +02:00
Nikita Popov	5515263e44	[InstCombine] Fold and of two ranges differing by mask This is the de Morgan conjugated variant of the existing fold for ors. Implement this by switching the range code to always work on ors and perform invert operands at the start and end. This makes reasoning easier and makes the extension more obviosuly correct.	2022-04-29 12:01:38 +02:00
Florian Hahn	fb4113ef0c	[Passes] Remove legacy LoopUnswitch pass. The legacy LoopUnswitch pass is only used in the legacy pass manager pipeline, which is deprecated. The NewPM replacement is SimpleLoopUnswitch and I think it is time to remove the legacy LoopUnswitch code. Fixes #31000. Reviewed By: aeubanks, Meinersbur, asbirlea Differential Revision: https://reviews.llvm.org/D124376	2022-04-29 10:30:49 +01:00
Simon Pilgrim	6c44e398ec	[X86] combineShuffle - reuse SDLoc. NFCI.	2022-04-29 10:30:11 +01:00
Simon Pilgrim	2d7f0b1c22	[X86] Fold ANDNP(undef,x)/ANDNP(x,undef) -> 0 Matches the fold in DAGCombiner::visitANDLike.	2022-04-29 10:20:48 +01:00
Nikita Popov	d5ee20fcc9	[InstCombine] Switch an or of icmps fold to use constant ranges We can express this fold more naturally when working on the constant range implementation. This change is not entirely NFC, because the code now also handles cases that don't match the precise pattern this previously looked for, e.g. we can omit an add on one of the ranges.	2022-04-29 11:15:54 +02:00
David Green	7047c47918	[VecCombine] Fix sort comparator logic in foldShuffleFromReductions I think this sort comparator was overly complex, and the windows expensive check bot agreed, failing as it was not giving a strict weak ordering. Change it to use the comparison of the mask values as unsigned integers. This should sort the undef elements to the end whilst keeping X<Y otherwise.	2022-04-29 09:30:02 +01:00
Nikita Popov	884e9a877b	[SimplifyCFG] Replace condition value when threading Replace the condition value with the known constant value on the threaded edge. This happens implicitly with phi threading because we replace with the incoming value, but not for non-phi threading.	2022-04-29 09:50:27 +02:00
Nikita Popov	4e545bdb35	[SimplifyCFG] Thread branches on same condition in more cases (PR54980) SimplifyCFG implements basic jump threading, if a branch is performed on a phi node with constant operands. However, InstCombine canonicalizes such phis to the condition value of a previous branch, if possible. SimplifyCFG does support this as well, but only in the very limited case where the same condition is used in a direct predecessor -- notably, this does not include the common diamond pattern (i.e. two consecutive if/elses on the same condition). This patch extends the code to look back a limited number of blocks to find a branch on the same value, rather than only looking at the direct predecessor. Fixes https://github.com/llvm/llvm-project/issues/54980. Differential Revision: https://reviews.llvm.org/D124159	2022-04-29 09:44:05 +02:00
Serge Pavlov	c96cc500f0	[SystemZ] Custom lowering of llvm.is_fpclass Differential Revision: https://reviews.llvm.org/D114695	2022-04-29 13:27:36 +07:00
LiaoChunyu	03a3654203	[RISCV] Add cost model for SK_Broadcast Add cost model for broadcast shuffle in RISCVTTIImpl::getShuffleCost with scalable vector. The cost model might not the best. For scalable vector, BasicTTIImpl::getShuffleCost return invalid cost, so this patch relies on the existing cost model in BasicTTIImpl. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D124101	2022-04-29 13:28:02 +08:00
Hsiangkai Wang	c62b014db9	[RISCV] Merge addi into load/store as there is a ADD between them This patch adds peephole optimizations for the following patterns: (load (add base, (addi src, off1)), off2) -> (load (add base, src), off1+off2) (store val, (add base, (addi src, off1)), off2) -> (store val, (add base, src), off1+off2) Differential Revision: https://reviews.llvm.org/D124231	2022-04-29 04:33:05 +00:00
Serge Pavlov	9fc58f1820	[PowerPC] Support of ppc_fp128 in lowering of llvm.is_fpclass PowerPC supports `ppc_fp128`, which is not an IEEE floating point type. The generic lowering of llvm.is_fpclass could not handle it properly. This change extends the generic lowering code to support `ppc_fp128`. The change was tested on emulator using runtime tests from https://reviews.llvm.org/D112933 and the patch for clang https://reviews.llvm.org/D112932. Differential Revision: https://reviews.llvm.org/D113908	2022-04-29 11:10:47 +07:00
Roman Lebedev	981ed72a17	[NFC][SCEV] Refactor `createNodeForSelectViaUMinSeq()` out of `createNodeForSelectOrPHIViaUMinSeq()`	2022-04-29 02:37:06 +03:00
Zequan Wu	4fe2ab5279	Revert "[DebugInfo][InstrRef] Describe value sizes when spilt to stack" This reverts commit `a15b66e76d`. This causes linker to crash at assertion: `Assertion failed: !Expr->isComplex(), file C:\b\s\w\ir\cache\builder\src\third_party\llvm\llvm\lib\CodeGen\LiveDebugValues\InstrRefBasedImpl.cpp, line 907`.	2022-04-28 16:18:16 -07:00
Mircea Trofin	49942d595f	[NFC] remove const from FunctionPropertiesAnalysis::run, keep on Result The goal in `75881d8b02` was just modifying what `Result` is, didn't need to also modify ::run.	2022-04-28 15:10:21 -07:00
CHIANG, YU-HSUN (Tommy Chiang, oToToT)	4a31af88a2	[MC][AArch64] Enable '+v8a' when nothing specified for MCSubtargetInfo Since D110065, the 'R' profile support is added to LLVM. It turns the `generic` cpu into the intersection of v8-a and v8-r. However, this makes some backward compatibility problems. The original patch makes the clang driver implicitly pass -march=armv8-a when only the triple is specified. Since it only applies to clang, other tools like llvm-objdump still faces the backward compatibility problem. This patch applies the same idea to MC related tools by enabling '+v8a' feature when nothing is specified (both CPU and FS are empty) for MCSubtargetInfo creation. This patch should fix PR53956. Reviewed by: labrinea Differential Revision: https://reviews.llvm.org/D124319	2022-04-29 04:53:22 +08:00
David Penry	fa49021c68	Revert "[CodeGen][ARM] Enable Swing Module Scheduling for ARM" This reverts commit `28d09bbbc3` while I investigate a buildbot failure.	2022-04-28 13:29:27 -07:00
Simon Pilgrim	ab17ed0723	[X86] Don't fold AND(SRL(X,Y),1) -> SETCC(BT(X,Y)) on BMI2 targets With BMI2 we have SHRX which is a lot quicker than regular x86 shifts. Fixes #55138	2022-04-28 21:28:16 +01:00
David Penry	28d09bbbc3	[CodeGen][ARM] Enable Swing Module Scheduling for ARM This patch permits Swing Modulo Scheduling for ARM targets turns it on by default for the Cortex-M7. The t2Bcc instruction is recognized as a loop-ending branch. MachinePipeliner is extended by adding support for "unpipelineable" instructions. These instructions are those which contribute to the loop exit test; in the SMS papers they are removed before creating the dependence graph and then inserted into the final schedule of the kernel and prologues. Support for these instructions was not previously necessary because current targets supporting SMS have only supported it for hardware loop branches, which have no loop-exit-contributing instructions in the loop body. The current structure of the MachinePipeliner makes it difficult to remove/exclude these instructions from the dependence graph. Therefore, this patch leaves them in the graph, but adds a "normalization" method which moves them in the schedule to stage 0, which causes them to appear properly in kernel and prologues. It was also necessary to be more careful about boundary nodes when iterating across successors in the dependence graph because the loop exit branch is now a non-artificial successor to instructions in the graph. In additional, schedules with physical use/def pairs in the same cycle should be treated as creating an invalid schedule because the scheduling logic doesn't respect physical register dependence once scheduled to the same cycle. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D122672	2022-04-28 13:01:18 -07:00
Mircea Trofin	75881d8b02	[NFC] const-ed the return type of FunctionPropertiesAnalysis The result is a data bag, this makes sure it's signaled to a user that the data can't be mutated when, for example, doing something like: auto &R = FAM.getResult<FunctionPropertiesAnalysis>(F) ... R.Uses++	2022-04-28 12:42:16 -07:00
Florian Hahn	f4e1eaa375	Revert "[VPlan] Remove uneeded needsVectorIV check." This reverts commit `43842b887e` while I investigate a buildbot failure. It also reverts the follow-up commit `2883de0514`.	2022-04-28 20:16:21 +01:00
David Tenty	8042699a30	[LLVM] Add exported visibility style for XCOFF For the AIX linker, under default options, global or weak symbols which have no visibility bits set to zero (i.e. no visibility, similar to ELF default) are only exported if specified on an export list provided to the linker. So AIX has an additional visibility style called "exported" which indicates to the linker that the symbol should be explicitly globally exported. This change maps "dllexport" in the LLVM IR to correspond to XCOFF exported as we feel this best models the intended semantic (discussion on the discourse RFC thread: https://discourse.llvm.org/t/rfc-adding-exported-visibility-style-to-the-ir-to-model-xcoff-exported-visibility/61853) and allows us to enable writing this visibility for the AIX target in the assembly path. Reviewed By: DiggerLin Differential Revision: https://reviews.llvm.org/D123951	2022-04-28 14:56:00 -04:00
David Green	ded8187e35	[VectorCombine] Try to reduce shuffle cost for commutative reduction operands Given a shuffle feeding a commutative reduction, the lane ordering of the shuffle will not alter the result. This is also true if there are a number of operations between the reduction and the shuffle, providing they only operate lane-wise. This patch searches for cases like that in Vector Combine, allowing us to check the cost of the shuffle vs an in-order identity shuffle and replace the order if possible. This only handles a single shuffle at the moment to keep things simple, and is able to ignore splats that produce results where every result is the same. This is a more powerful version of a combine that already happens in instrcombine, capable of optimizing more cases by looking through more instructions and being able to cost the shuffle. Differential Revision: https://reviews.llvm.org/D123494	2022-04-28 19:46:12 +01:00
Alan Zhao	3333c28fc0	[llvm-ml] Improve indirect call parsing In MASM, if a QWORD symbol is passed to a jmp or call instruction in 64-bit mode or a DWORD or WORD symbol is passed in 32-bit mode, then MSVC's assembler recognizes that as an indirect call. Additionally, if the operand is qualified as a ptr, then that should also be an indirect call. Furthermore, in 64-bit mode, such operands are implicitly rip-relative (in fact, MSVC's assembler ml64.exe does not allow explicitly specifying rip as a base register.) To keep this patch managable, this patch does not include: * error messages for wrong operand types (e.g. passing a QWORD in 32-bit mode) * resolving indirect calls if the symbol is declared after it's first use (llvm-ml currently only runs a single pass). * imlementing the extern keyword (required to resolve https://crbug.com/762167.) This patch is likely missing a bunch of edge cases, so please do point them out in the review. Reviewed By: epastor, hans, MaskRay Committed By: epastor (on behalf of ayzhao) Differential Revision: https://reviews.llvm.org/D124413	2022-04-28 13:17:19 -04:00
Simon Pilgrim	a9215ed9cc	[InstCombine][X86] simplifyDemandedVectorEltsIntrinsic - handle avx2 per-element vector shifts	2022-04-28 18:14:54 +01:00
Alexey Bataev	75e1cf4a6a	[COST]Improve cost model for shuffles in SLP. Introduced masks where they are not added and improved target dependent cost models to avoid returning of the incorrect cost results after adding masks. Differential Revision: https://reviews.llvm.org/D100486	2022-04-28 10:04:41 -07:00
Craig Topper	ec11fbb1d6	[RISCV] Use default promotion for (i32 (shl 1, X)) on RV64 when Zbs is enabled. This improves opportunities to use bset/bclr/binv. Unfortunately, there are no W versions of these instrcutions so this isn't always a clear win. If we use SLLW we get free sign extend and shift masking, but need to put a 1 in a register and can't remove an or/xor. If we use bset/bclr/binv we remove the immediate materializationg and logic op, but might need a mask on the shift amount and sext.w. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D124096	2022-04-28 09:58:30 -07:00
Pavel Samolysov	9197959e13	[ArgPromotion] Move ArgPart and OffsetAndArgPart to anonymous namespace The structure ArgPart and alias OffsetAndArgPart have been moved into the anonymous namespace. NFC. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D124617	2022-04-28 09:51:46 -07:00
Pavel Samolysov	6b825e50f7	[ArgPromotion] Change the condition to check the promotion limit The condition should be 'ArgParts.size() > MaxElements', so that if we have exactly 3 elements in the 'ArgParts' vector, the promotion should be allowed because the 'MaxElement' threshold is not exceeded yet. The default value for 'MaxElement' has been decreased to 2 in order to avoid an actual change in argument promoting behavior. However, this changes byval argument transformation behavior by allowing adding not more than 2 arguments to the function instead of 3 allowed before. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D124178	2022-04-28 09:42:58 -07:00
Bjorn Pettersson	3a39bb96ca	[SelectionDAG] Use correct boolean representation in FoldConstantArithmetic The description of SETCC says /// SetCC operator - This evaluates to a true value iff the condition is /// true. If the result value type is not i1 then the high bits conform /// to getBooleanContents. Without this patch, we sign extended the i1 to the used larger type regardless of getBooleanContents. This resulted in miscompiles, as shown in the attached testcase that ended up returning -1 instead of 1 when using -mattr=+v. Fixes https://github.com/llvm/llvm-project/issues/55168 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D124618	2022-04-28 18:42:16 +02:00
Simon Pilgrim	9e3b7e8e65	[X86] getTargetVShiftByConstNode - use SelectionDAG::FoldConstantArithmetic to perform constant folding. NFCI. Remove some unnecessary code duplication.	2022-04-28 17:10:20 +01:00
Craig Topper	8631a5e712	[RISCV] Fix alias printing for vmnot.m By clearing the HasDummyMask flag from mask register binary operations and mask load/store. HasDummyMask was causing an extra operand to get appended when converting from MachineInstr to MCInst. This extra operand doesn't appear in the assembly string so was mostly ignored, but it prevented the alias instruction printing from working correctly. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D124424	2022-04-28 08:33:52 -07:00
Florian Hahn	2883de0514	[VPlan] Fix comment formatting from `43842b887e`.	2022-04-28 16:31:48 +01:00
Florian Hahn	43842b887e	[VPlan] Remove uneeded needsVectorIV check. Remove one of the last remaining uses of ::needsVectorIV, preparing for its removal. Now that usesScalars is available and based on the information explicit in VPlan, there is no need to use the pre-computed needsVectorIV. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D123720	2022-04-28 16:27:34 +01:00
Alexey Bataev	9861ca0c23	Revert "[COST]Improve cost model for shuffles in SLP." This reverts commit `29a470e380` to fix a crash reported in https://reviews.llvm.org/D100486#3479989.	2022-04-28 08:11:56 -07:00
Simon Pilgrim	de7cee24b6	[X86] getBT - attempt to peek through aext(and(trunc(x),c)) mask/modulo Ideally we'd fold this with generic DAGCombiner, but that only works for !isTruncateFree cases - we might be able to adapt IsDesirableToPromoteOp to find truncated src ops in the future, but for now just use this peephole. Noticed in Issue #55138	2022-04-28 16:10:26 +01:00
Pavel Samolysov	744a837838	[ArgPromotion] Rename variables according to the code style. NFC Some loop counters ('i', 'e') and variables ('type') were named not in accordance with the code style and clang-tidy issues warnings about the using of such variables. This patch renames the variables and fixes some typos in the comments within the source file. Differential Revision: https://reviews.llvm.org/D123662	2022-04-28 15:32:05 +02:00
Chris Jackson	c792884589	[Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2] Reland `3f2b76ec90` with the test corrected to require x86-registered-target. Differential Revision: https://reviews.llvm.org/D120169	2022-04-28 14:21:56 +01:00
Chris Jackson	cd5f9efc4d	Revert "[Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2]" This reverts commit `3f2b76ec90`.	2022-04-28 14:07:31 +01:00
Nikita Popov	90dba831ae	[InstCombine] Fold or of icmp ne trunc/and This adds the de Morgan conjugated variant for the existing "and eq" style fold. Proof: https://alive2.llvm.org/ce/z/tkNAcG	2022-04-28 15:07:16 +02:00
Chris Jackson	3f2b76ec90	[Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2] Reland commit `74273d575f` following a fix for a memory leak. The DVIRecoveryRecord vectors now use unique_ptr. Differential Revision: https://reviews.llvm.org/D120169	2022-04-28 13:55:49 +01:00
Simon Pilgrim	ed8dffef4c	[X86] getFauxShuffle - don't assume an UNDEF src element for AND/ANDNP results in an UNDEF shuffle mask index The other src element might be zero, guaranteeing zero. Fixes #55157	2022-04-28 12:32:58 +01:00
Michael Forster	cfb4e78252	Revert "[llvm-pdbutil] Add options to only dump symbol record at specified offset and its parents or children with spcified depth." This reverts commit `a3b7cb015f`. symbol-offset.test fails under MSAN: [ 1] ; RUN: llvm-pdbutil yaml2pdb %p/Inputs/symbol-offset.yaml --pdb=%t.pdb [FAIL] llvm-pdbutil yaml2pdb <REDACTED>/llvm/test/tools/llvm-pdbutil/Inputs/symbol-offset.yaml --pdb=<REDACTED>/tmp/symbol-offset.test/symbol-offset.test.tmp.pdb ==9283==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x55f975e5eb91 in __libcpp_tls_set <REDACTED>/include/c++/v1/__threading_support:428:12 #1 0x55f975e5eb91 in set_pointer <REDACTED>/include/c++/v1/thread:196:5 #2 0x55f975e5eb91 in void* std::__msan::__thread_proxy<std::__msan::tuple<std::__msan::unique_ptr<std::__msan::__thread_struct, std::__msan::default_delete<std::__msan::__thread_struct> >, llvm::parallel::detail::(anonymous namespace)::ThreadPoolExecutor::ThreadPoolExecutor(llvm::ThreadPoolStrategy)::'lambda'()::operator()() const::'lambda'()> >(void*) <REDACTED>/include/c++/v1/thread:285:27 #3 0x7f74a1e55b54 in start_thread (<REDACTED>/libpthread.so.0+0xbb54) (BuildId: 64752de50ebd1a108f4b3f8d0d7e1a13) #4 0x7f74a1dc9f7e in clone (<REDACTED>/libc.so.6+0x13cf7e) (BuildId: 7cfed7708e5ab7fcb286b373de21ee76)	2022-04-28 12:42:31 +02:00
Ties Stuij	051deb2d9d	[ARM] add Armv9 build attribute The build attribute number can be found in the Arm ABI addenda32 document: https://github.com/ARM-software/abi-aa/blob/main/addenda32/addenda32.rst#335target-related-attributes Reviewed By: tmatheson Differential Revision: https://reviews.llvm.org/D124090	2022-04-28 10:48:26 +01:00
Lian Wang	dc0ae8ce18	[RISCV] Support VP_SETCC mask operations Support VP_SETCC mask operations, turn it to logical operation. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D124438	2022-04-28 08:52:29 +00:00
Nikita Popov	b9dc565147	[GVN] Encode GEPs in offset representation When using opaque pointers, convert GEPs into offset representation of the form P + V1 * Scale1 + V2 * Scale2 + ... + ConstantOffset. This allows us to recognize equivalent address calculations even if the GEPs don't use the same source element type. This fixes an opaque pointer codegen regression seen in rustc. Differential Revision: https://reviews.llvm.org/D124527	2022-04-28 09:32:05 +02:00
Luo, Yuanke	942ec5c36d	[X86][AMX] combine tile cast and load/store instruction. The `llvm.x86.cast.tile.to.vector` intrinsic is lowered to `llvm.x86.tilestored64.internal` and `load <256 x i32>`. The `llvm.x86.cast.vector.to.tile` is lowered to `store <256 x i32>` and `llvm.x86.tileloadd64.internal`. When `llvm.x86.cast.tile.to.vector` is used by `store <256 x i32>` or `load <256 x i32>` is used by `llvm.x86.cast.vector.to.tile`, they can be combined by `llvm.x86.tilestored64.internal` and `llvm.x86.tileloadd64.internal`. Differential Revision: https://reviews.llvm.org/D124378	2022-04-28 14:55:21 +08:00
Max Kazantsev	35f38583d2	[JumpThreading][NFC][CompileTime] Do not recompute BPI/BFI analyzes They can already be available, and even if not, DT/LI can be available. We should not recompute them. Old PM is unchanged because it would require changing dependencies, and we don't care enough about it. Differential Revision: https://reviews.llvm.org/D124439 Reviewed By: nikic, aeubanks	2022-04-28 10:46:08 +07:00
Wenju He	96d3be8443	[InferAddressSpaces] Check if AS are the same in isNoopPtrIntCastPair isNoopAddrSpaceCast is expecting SrcAS is different from DestAS. If the two AS are the same, consider ptrtoint/inttoptr as noop cast. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D123573	2022-04-28 11:10:55 +08:00
Liqin.Weng	6365bde658	[XCORE][CodeGen][NFC] Use ArrayRef in TargetLowering functions Reviewed By: nigelp-xmos Differential Revision: https://reviews.llvm.org/D123661	2022-04-28 02:06:46 +00:00
Shengchen Kan	6a6b0e4a63	[X86] Check the address in machine verifier 1. The scale factor must be 1, 2, 4, 8 2. The displacement must fit in 32-bit signed integer Noticed by: https://github.com/llvm/llvm-project/issues/55091 Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D124455	2022-04-28 10:05:39 +08:00
Arthur Eubanks	4e65291837	[OpaquePtr][GlobalOpt] Don't attempt to evaluate global constructors with arguments Previously all entries in global_ctors had to have the void()* type and we'd skip evaluating bitcasted functions. With opaque pointers we may see the function directly. Fixes #55147. Reviewed By: #opaque-pointers, nikic Differential Revision: https://reviews.llvm.org/D124553	2022-04-27 19:00:44 -07:00
Fangrui Song	c74a706893	[LegacyPM] Remove ThreadSanitizerLegacyPass Using the legacy PM for the optimization pipeline was deprecated in 13.0.0. Following recent changes to remove non-core features of the legacy PM/optimization pipeline, remove ThreadSanitizerLegacyPass. Reviewed By: #sanitizers, vitalybuka Differential Revision: https://reviews.llvm.org/D124209	2022-04-27 16:25:41 -07:00
Kirill Stoimenov	761366e6ae	Revert "[Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2]" This reverts commit `74273d575f`. Buildbot: https://lab.llvm.org/buildbot/#/builders/5/builds/22795 Failing with memory leak.	2022-04-27 23:11:48 +00:00
Zequan Wu	1043eeaf86	[llvm-symbolizer][native-pdb] Don't reset CurLineOffset if NextLineOffset is none	2022-04-27 16:05:19 -07:00
Matt Arsenault	7762a3ce18	Revert "BranchFolder: Assert on SSA functions" This reverts commit `6ff91d17d6`.	2022-04-27 19:02:15 -04:00
Matt Arsenault	6ff91d17d6	BranchFolder: Assert on SSA functions We probably should have the opposite of getRequiredProperties for this	2022-04-27 18:51:37 -04:00
Bill Wendling	8f2ec974d1	[X86] Move target-generic code into CodeGen [NFC] This code is the same for all platforms. Differential Revision: https://reviews.llvm.org/D124566	2022-04-27 15:37:28 -07:00
Matt Arsenault	7c2db66632	llvm-reduce: Support multiple MachineFunctions The current testcase I'm trying to reduce only reproduces with IPRA enabled and requires handling multiple functions. The only real difference vs. the IR is the extra indirect to look for the underlying MachineFunction, so treat the ReduceWorkItem as the module instead of the function. The ugliest piece of this is really the ugliness of MachineModuleInfo. It not only tracks actual module state, but has a number of transient fields used for isel and/or the asm printer. These shouldn't do any harm for the use here, though they should be separated out.	2022-04-27 18:11:59 -04:00
Zequan Wu	a3b7cb015f	[llvm-pdbutil] Add options to only dump symbol record at specified offset and its parents or children with spcified depth. Right now, if we want to dump symbol at specified offset, we need to use `grep`. And it can only show surrounding symbols in layout (not in lexical scope sense). This adds similar options to `dump` command as `llvm-dwarfdump` to allow users to dump symbol record at specified offset and its parents or children with spcified depth. `--symbol-offset=` must be used with `--modi` to dump only one symbol at given offset. `--show-parents`/`--show-children` must be used with `--symbol-offset` to dump all symbols that are parents/children of the symbol at given offset. `--parent-recurse-depth`/`--children-recurse-depth` must be used with `--show-parents`/`--show-children` to specify the max up/down depth. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D124317	2022-04-27 14:37:35 -07:00
David Blaikie	727c590fe9	DebugInfo: Use hash-based unit lookup when available in dwp files Fix a test case that had a bogus (probably I hand crafted it at some point) index that didn't point to the right data in the process.	2022-04-27 21:18:14 +00:00
Simon Pilgrim	e378577524	[X86] Use is128BitLaneRepeatedShuffleMask wrapper. NFC. We don't need to know the actual repeated mask.	2022-04-27 21:09:57 +01:00
Nicolas Abram Lujan	f8a574bf4d	[InstCombine] C0 >> (X - C1) --> (C0 << C1) >> X With the right pre-conditions, we can fold the offset into the shifted constant: https://alive2.llvm.org/ce/z/drMRBU https://alive2.llvm.org/ce/z/cUQv-_ Fixes #55016 Differential Revision: https://reviews.llvm.org/D124369	2022-04-27 14:18:30 -04:00
Craig Topper	c2614b31d9	[RISCV] Add isCommutable to scalar FMA instructions. The default implementation of findCommutedOpIndices picks the first two source operands. That's exactly what we want for the scalar FMA instructions. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D124463	2022-04-27 11:07:18 -07:00
Martin Sebor	efa0f12c0b	[InstCombine] Fold strnlen calls in equality to zero. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D123818	2022-04-27 12:03:24 -06:00
Alexey Bataev	29a470e380	[COST]Improve cost model for shuffles in SLP. Introduced masks where they are not added and improved target dependent cost models to avoid returning of the incorrect cost results after adding masks. Differential Revision: https://reviews.llvm.org/D100486	2022-04-27 10:56:26 -07:00
Wei Wang	26a0d53b15	[CHR] Skip region containing llvm.coro.id When a block containing llvm.coro.id is cloned during CHR, it inserts an invalid PHI node with token type to the beginning of the block containing llvm.coro.begin. To avoid such case, we exclude regions with llvm.coro.id. Reviewed By: ChuanqiXu Differential Revision: https://reviews.llvm.org/D124418	2022-04-27 10:27:25 -07:00
Chris Bieneman	05b765ff69	[DXIL] [NFC] Remove dead attribute code paths DXIL doesn't support attributes added after LLVM 3.7. The DXILPrepare pass removes those attributes so they should never be present by the time we reach the DXIL bitcode writer. In the event that we somehow try to write a newer attribute in the DXIL writer, we should fail hard (crash), because the output would be invalid. This case should only be possible if the DXIL writer were called without DXILPrepare being run first, which shouldn't be possible. This patch also adds a default case to the switch statement over the attribute list which covers all the removed cases and any new attribute kinds that may be added in the future. The default case is handled like other unsupported cases by a call to llvm_unreachable.	2022-04-27 10:46:59 -05:00
Simon Pilgrim	03482bccad	[X86] collectConcatOps - add ability to collect from vector 'widening' patterns Recognise insert_subvector(undef, x, lo/hi) patterns where we double the width of a vector - creating an UNDEF subvector on the fly.	2022-04-27 15:38:58 +01:00
David Green	46cef9a82d	[AArch64] Attempt to fix bots by ensuring legalized type is a vector	2022-04-27 15:36:15 +01:00
Roman Lebedev	ffafa71f64	[InstCombine] 'round up integer': if bias is just right, just reuse instructions This is only useful if we can't create new instruction because %x.aligned has other uses and already sticks around.	2022-04-27 17:27:02 +03:00
Roman Lebedev	aac0afd1dd	[InstCombine] Fold 'round up integer' pattern (when alignment is a power of two) But don't deal with non-splats. The test coverage is sufficiently exhaustive, and alive is happy about the changes there. Example with constants: https://alive2.llvm.org/ce/z/EUaJ5- / https://alive2.llvm.org/ce/z/Bkng2X General proof: https://alive2.llvm.org/ce/z/3RjJ5A	2022-04-27 17:26:55 +03:00
Shilei Tian	a6b355dd31	[SLP] Fix a typo that causes redundant assertion and potential segment fault Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D124497	2022-04-27 10:07:59 -04:00
Ivan Kosarev	6ddf2a824d	[AMDGPU] Adjust wave priority based on VMEM instructions to avoid duty-cycling. As older waves execute long sequences of VALU instructions, this may prevent younger waves from address calculation and then issuing their VMEM loads, which in turn leads the VALU unit to idle. This patch tries to prevent this by temporarily raising the wave's priority. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D124246	2022-04-27 14:37:18 +01:00
Anna Thomas	c515b2f39e	[IRCE] Avoid computing potentially unnecessary analyses. NFC IRCE is a function pass that operates on loops. If there are no loops in the function (as seen through LI), we should avoid computing the remaining expensive analyses (such as BPI). Reordered the analyses requests and early return if there are no loops. This is an NFC with compile time improvement. The same will be done in a follow-up patch for the loop vectorizer. Reviewed-By: nikic Differential Revision: https://reviews.llvm.org/D124478	2022-04-27 09:22:10 -04:00
Denis Antrushin	4059770af5	[StatepointLowering] Only export STATEPOINT results if used in nonlocal blocks. Cuurently we always export STATEPOINT results (GC pointers lowered via VRegs) to virtual registers. When processing gc.relocate instructions we have to generate CopyFromRegs node and then export it to VReg again if gc.relocate is used in other basic blocks. This results in generation of extra COPY MIR instruction if statepoint and its gc.relocate are in the same BB, but gc.relocate result is used in other blocks. This patch changes this behavior to export statepoint results only if used in other basic blocks. For local uses StatepointLoweringState.(get\|set)Location() API is used to communicate appropriate statepoint result from `LowerStatepoint()` to `visitGCRelocate()` This is NFC and is purely compile time optimization. On big methids it can improve codegen compile time up to 10%. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D124444	2022-04-27 15:53:24 +03:00
David Green	8e2a0e61f5	[AArch64] Break up larger shuffle-masks into legal sizes in getShuffleCost Given a larger-than-legal shuffle mask, the final codegen will split into multiple sub-vectors. This attempts to model that in AArch64TTIImpl::getShuffleCost, splitting masks up according to the size of the legalized vectors. If the sub-masks have at most 2 input sources we can call getShuffleCost on them and sum the costs, to get a more accurate final cost for the entire shuffle. The call to improveShuffleKindFromMask helps to improve the shuffle kind for the sub-mask cost call. Differential Revision: https://reviews.llvm.org/D123414	2022-04-27 13:51:50 +01:00
Chris Jackson	74273d575f	[Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2] This relands commit `8f550368b1`. The test is amended with REQUIRES: x86-registered-target, in line with the other debuginfo-scev-salvage tests. Differential Revision: https://reviews.llvm.org/D120169	2022-04-27 13:10:30 +01:00
Chris Jackson	855752e563	Revert [Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics[2/2] This reverts commit `8f550368b1`.	2022-04-27 13:06:03 +01:00
Chris Jackson	8f550368b1	[Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2] Second of two patches to extend SCEV-based salvaging to dbg.value intrinsics that have multiple location ops pre-LSR. This second patch adds the core implementation. Reviewers: @StephenTozer, @djtodoro Differential Revision: https://reviews.llvm.org/D120169	2022-04-27 12:47:35 +01:00

1 2 3 4 5 ...

157805 Commits