llvm-project

Commit Graph

Author	SHA1	Message	Date
Nikita Popov	bfa91f38a9	[DAG] Restore dropped condition This was dropped in `fcee33bd5a`, presumably accidentally.	2021-11-26 21:18:54 +01:00
Nikita Popov	bee8dcda1f	[SCEV] Fix and validate ValueExprMap/ExprValueMap consistency Relative to the previous landing attempt, this makes insertValueToMap() resilient against the value already being present in the map -- previously I only checked this for the createSimpleAffineAddRec() case, but the same issue can also occur for the general createNodeForPHI(). In both cases, the addrec may be constructed and added to the map in a recursive query trying to create said addrec. In this case, this happens due to the invalidation when the BE count is computed, which ends up clearing out the symbolic name as well. ----- This adds validation for consistency of ValueExprMap and ExprValueMap, and fixes identified issues: * Addrec construction directly wrote to ValueExprMap in a few places, without updating ExprValueMap. Add a helper to ensures they stay consistent. The adjustment in forgetSymbolicName() explicitly drops the old value from the map, so that we don't rely on it being overwritten. * forgetMemoizedResultsImpl() was dropping the SCEV from ExprValueMap, but not dropping the corresponding entries from ValueExprMap. Differential Revision: https://reviews.llvm.org/D113349	2021-11-26 20:57:47 +01:00
Simon Pilgrim	fcee33bd5a	[DAG] Pull out repeated isLittleEndian() calls. NFC.	2021-11-26 18:41:56 +00:00
Kazu Hirata	562356d6e3	[Target] Use range-based for loops (NFC)	2021-11-26 08:23:01 -08:00
Alexey Bataev	fc0aacf324	[SLP]Improve analysis/emission of vector operands for alternate nodes. Compiler has an analysis for perfect diamond matching but it does not support nodes with main/alternate opcodes. The problem is that the scalars themselves are different and might not match directly with other nodes, but operands and main/alternate opcodes might match and compiler might reuse some previously emitted vector instructions. Need to include this analysis in the cost model and actual vector instructions emission process. Differential Revision: https://reviews.llvm.org/D114101	2021-11-26 06:38:02 -08:00
Florian Hahn	b927aa69bf	[SCEV] Turn check in createSimpleAffineAddRec to assertion. (NFC) Accum is guaranteed to be defined outside L (via Loop::isLoopInvariant checks above). I think that should guarantee that the more powerful ScalarEvolution::isLoopInvariant also determines that the value is loop invariant. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D114634	2021-11-26 13:23:48 +00:00
Igor Kirillov	08d45e6f4d	[AArch64][SVEIntrinsicOpts] Fix: predicated SVE mul/fmul are not commutative We can not swap multiplicand and multiplier because the sve intrinsics are predicated. Imagine lanes in vectors having the following values: pg = 0 multiplicand = 1 (from dup) multiplier = 2 The resulting value should be 1, but if we swap multiplicand and multiplier it will become 2, which is incorrect. Differential Revision: https://reviews.llvm.org/D114577	2021-11-26 12:41:27 +00:00
Carl Ritson	8967d044fc	[AMDGPU] Add SIMemoryLegalizer comments to clarify bit usage Attempt to further document the intended cache policies requested by different combinations of GLC, SLC and DLC bits. GFX10 non-temporal stores are updated to set GLC. Reviewed By: t-tye Differential Revision: https://reviews.llvm.org/D114351	2021-11-26 21:05:58 +09:00
Abinav Puthan Purayil	4af45f10cc	[GlobalISel] Fold or of shifts to funnel shift. This change folds a basic funnel shift idiom: - (or (shl x, amt), (lshr y, sub(bw, amt))) -> fshl(x, y, amt) - (or (shl x, sub(bw, amt)), (lshr y, amt)) -> fshr(x, y, amt) This also helps in folding to rotate shift if x and y are equal since we already have a funnel shift to rotate combine. Differential Revision: https://reviews.llvm.org/D114499	2021-11-26 17:05:29 +05:30
David Sherwood	e20391fc5d	[LoopVectorize] When tail-folding, don't always predicate uniform loads In VPRecipeBuilder::handleReplication if we believe the instruction is predicated we then proceed to create new VP region blocks even when the load is uniform and only predicated due to tail-folding. I have updated isPredicatedInst to avoid treating a uniform load as predicated when tail-folding, which means we can do a single scalar load and a vector splat of the value. Tests added here: Transforms/LoopVectorize/AArch64/tail-fold-uniform-memops.ll Differential Revision: https://reviews.llvm.org/D112552	2021-11-26 11:30:54 +00:00
Bradley Smith	eafbaca977	[AArch64][SVE] Generate ASRD instructions for power of 2 signed divides Differential Revision: https://reviews.llvm.org/D113281	2021-11-26 11:08:27 +00:00
David Green	c76d6dd192	[ARM] Generate VCTP from SETCC This converts a vector SETCC([0,1,2,..], splat(n), ult) to vctp n, which can be fewer instructions and prevent the need for constant pool loads. Differential Revision: https://reviews.llvm.org/D114177	2021-11-26 10:57:14 +00:00
Simon Pilgrim	2778f9a9f6	[DAG] SimplifyDemandedVectorElts - attempt to handle ADD(x,x) as single use If the ADD node is the only user of the repeated operand, then treat this as single use - allows us to peek through shl(x,1) patterns.	2021-11-26 10:32:10 +00:00
David Sherwood	86137fb722	[CodeGen] Add scalable vector support for lowering of llvm.get.active.lane.mask Currently the generic lowering of llvm.get.active.lane.mask is done in SelectionDAGBuilder::visitIntrinsicCall and currently assumes only fixed-width vectors are used. This patch changes the code to be more generic and support scalable vectors too. I have added tests for SVE here: CodeGen/AArch64/active_lane_mask.ll although the code quality leaves a lot to be desired. The code will be improved significantly in a later patch that makes use of the SVE whilelo instruction. Differential Revision: https://reviews.llvm.org/D114541	2021-11-26 08:17:55 +00:00
Kazu Hirata	259cd6f893	[llvm] Use range-based for loops (NFC)	2021-11-25 22:17:10 -08:00
Christudasan Devadasan	654c89d85a	[AMDGPU] Make vector superclasses allocatable The combined vector register classes with both VGPRs and AGPRs are currently unallocatable. This patch turns them into allocatable as a prerequisite to enable copy between VGPR and AGPR registers during regalloc. Also, added the missing AV register classes from 192b to 1024b. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D109300	2021-11-26 00:42:12 -05:00
Jeremy Morse	536b9eb31e	[DebugInfo][InstrRef] Add extra indirection for NRVO tests In some scenarios, usually involving NRVO, we can issue indirect DBG_VALUEs after SelectionDAG, even in instruction referencing mode (if the variable is an argument). If the corresponding argument value is spilt to the stack, then we have: * Indirection from it being on the stack, * Indirection from it being a dbg.declare or a dbg.addr. However InstrRefBasedLDV only emits one level of indirection. This patch adds the second, by adding an extra DW_OP_deref if necessary. The two tests modified fail otherwise -- they feature some NRVO, and require two levels of indirection to be correct. Differential Revision: https://reviews.llvm.org/D114364	2021-11-25 21:43:38 +00:00
Quinn Pham	b11c66accf	[NFC] Inclusive language: rename master flag to main flag [NFC] As part of using inclusive language within the llvm project, this patch renames master flag to main flag in these comments. Reviewed By: ZarkoCA Differential Revision: https://reviews.llvm.org/D114090	2021-11-25 15:16:11 -06:00
Jeremy Morse	3107081e94	[DebugInfo][InstrRef] Avoid some quadratic behaviour in LiveDebugVariables This is a performance patch -- LiveDebugVariables can behave quadratically if a lot of debug instructions are inserted back into the same place, and we have to repeatedly step-over hte ones we've already inserted. To get around it, whenever we insert a debug instruction at a slot index, check whether there are more debug instructions to insert at this point, and insert them too. That avoids the repeated lookup and stepping through. It relies on the container for unlinked debug instructions being recorded in-order, which is how LiveDebugVariables currently does it. Differential Revision: https://reviews.llvm.org/D114587	2021-11-25 20:31:00 +00:00
Florian Hahn	8cb1af73c6	Recommit [ThreadPool] Support returning futures with results. This reverts commit `71a7c55f0f`. The revert broken building llvm-reduce and it is not clear it fixes an issue with LLVM_ENABLE_THREADS=OFF. See discussion in https://reviews.llvm.org/D114183 for more details.	2021-11-25 20:07:53 +00:00
Amy Kwan	150681f2f3	[PowerPC] Prevent the optimizer from producing wide vector types in IR. This patch prevents the optimizer from producing wide vectors in the IR, specifically the MMA types (v256i1, v512i1). The idea is that on Power, we only want to be producing these types only if the vector_pair and vector_quad types are used specifically. To prevent the optimizer from producing these types in the IR, vectorCostAdjustmentFactor() is updated to return an instruction cost factor or an invalid instruction cost if the current type is that of an MMA type. An invalid instruction cost returned by this function signifies to other cost computing functions to return the maximum instruction cost to inform the optimizer that producing these types within the IR is expensive, and should not be produced in the first place. This issue was first seen in the test case included within this patch. Differential Revision: https://reviews.llvm.org/D113900	2021-11-25 12:35:26 -06:00
Daniel McIntosh	71a7c55f0f	Revert "[ThreadPool] Support returning futures with results." This reverts commit `6149e57dc1`. The offending commit broke building with LLVM_ENABLE_THREADS=OFF.	2021-11-25 12:19:35 -05:00
Kazu Hirata	bfd5dd1568	[llvm] Use range-based for loops (NFC)	2021-11-25 08:55:16 -08:00
David Green	fbb61adb70	[ARM] Convert fptoi.sat to fixed point multiply This is a very small addition to the existing MVE fixed point vcvt code to also create them from FP_TO_SINT_SAT and FP_TO_UINT_SAT nodes, which should be equally valid for native saturating converts under MVE. Differential Revision: https://reviews.llvm.org/D114360	2021-11-25 15:43:45 +00:00
Jeremy Morse	102d2a8a99	[DebugInfo][InstrRef] Track variable assignments in out-of-scope blocks DBG_INSTR_REF's and DBG_VALUE's can end up in blocks that aren't in the lexical scope of their variable. It's arguable as to what we should do about this, however VarLocBasedLDV permits such variable locations to be propagated, so let's allow it in InstrRefBasedLDV. It's necessary for the modified test to work. Differential Revision: https://reviews.llvm.org/D114578	2021-11-25 14:52:11 +00:00
Alexey Bataev	4675a1654c	Revert "[SLP]Improve analysis/emission of vector operands for alternate nodes." This reverts commit `496254cf80` to fix compiler crashes reported in D114101#3152982.	2021-11-25 05:19:49 -08:00
Simon Pilgrim	63b1e58f07	[DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl (REAPPLIED) If we only demand bits from one half of a rotation pattern, see if we can simplify to a logical shift. For the ARM/AArch64 rev16/32 patterns, I had to drop a fold to prevent srl(bswap()) -> rotr(bswap) -> srl(bswap) infinite loops. I've replaced this with an isel PatFrag which should do the same task. Reapplied with fix for AArch64 rev patterns to matching the ARM fix. https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount) https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount) https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount) https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount) Differential Revision: https://reviews.llvm.org/D114354	2021-11-25 11:14:15 +00:00
David Green	3a700cabdc	[SDAG] Allow Unknown sizes when refining MMO alignments. NFC The changes in D113888 / `32b6c17b29` altered the memory size of a masked store, as it will store an unknown number of bytes not the full vector size. We can have situations where the masked stores is legalized and then turned to a normal store, as the mask is known to be all ones. This creates a store with an unknown size MMO that was hitting this assert. The store created can be given a better size in a followup patch. This currently adjusts the assert to handle unknown sizes.	2021-11-25 10:19:29 +00:00
Jameson Nash	0332d105b9	GlobalISel: remove assert that memcpy Src and Dst addrspace must be identical The LangRef does not require these arguments to have the same type. Differential Revision: https://reviews.llvm.org/D93154	2021-11-24 20:23:05 -05:00
Zarko Todorovski	95875d246a	[LLVM][NFC]Inclusive language: remove occurances of sanity check/test from llvm Part of work to use more inclusive language in clang/llvm. Rewording some comments and change function and variable names.	2021-11-24 17:29:55 -05:00
David Blaikie	cd93ab8947	DWARFVerifier: Don't parse all units twice Introduced/discussed in https://reviews.llvm.org/D38719 The header validation logic was also explicitly building the DWARFUnits to validate. But then other calls, like "Units.getUnitForOffset" creates the DWARFUnits again in the DWARFContext proper - so, let's avoid creating the DWARFUnits twice by walking the DWARFContext's units rather than building a new list explicitly. This does reduce some verifier power - it means that any unit with a header parsing failure won't get further validation, whereas the verifier-created units were getting some further validation despite invalid headers. I don't think this is a great loss/seems "right" in some ways to me that if the header's invalid we should stop there. Exposing the raw DWARFUnitVectors from DWARFContext feels a bit sub-optimal, but gave simple access to the getUnitForOffset to keep the rest of the code fairly similar.	2021-11-24 14:03:56 -08:00
Artur Pilipenko	aa60d169ea	[CVP] Add a cl::opt for canonicalization of signed relational comparisons This canonicalization breaks the ability to discard checks in some cases. Add a command line option to disable it. This option is on by default, so the change is NFC. See for details: https://reviews.llvm.org/D112895#3149487	2021-11-24 13:52:38 -08:00
Alexey Bataev	496254cf80	[SLP]Improve analysis/emission of vector operands for alternate nodes. Compiler has an analysis for perfect diamond matching but it does not support nodes with main/alternate opcodes. The problem is that the scalars themselves are different and might not match directly with other nodes, but operands and main/alternate opcodes might match and compiler might reuse some previously emitted vector instructions. Need to include this analysis in the cost model and actual vector instructions emission process. Differential Revision: https://reviews.llvm.org/D114101	2021-11-24 12:55:24 -08:00
Florian Hahn	2897b67665	[LV] Use OrigLoop instead of induction to get function. (NFC) Upcoming changes will result in Induction not being set/used in some cases. Use OrigLoop to get the function instead.	2021-11-24 20:17:44 +00:00
Jeremy Morse	bfadc5dcbf	[DebugInfo][InstrRef] Cope with win32 calls changing SP in LiveDebugValues Almost all of the time, call instructions don't actually lead to SP being different after they return. An exception is win32's _chkstk, which which implements stack probes. We need to recognise that as modifying SP, so that copies of the value are tracked as distinct vla pointers. This patch adds a target frame-lowering hook to see whether stack probe functions will modify the stack pointer, store that in an internal flag, and if it's true then scan CALL instructions to see whether they're a stack probe. If they are, recognise them as defining a new stack-pointer value. The added test exercises this behaviour: two calls to _chkstk should be considered as producing two different values. Differential Revision: https://reviews.llvm.org/D114443	2021-11-24 19:56:21 +00:00
Stanislav Mekhanoshin	9300b133c8	Revert "[InstCombine] (~(a \| b) & c) \| ~(c \| (a ^ b)) -> ~((a \| b) & (c \| (b ^ a)))" This reverts commit `c407769f5e`.	2021-11-24 11:14:52 -08:00
Jeremy Morse	133e25f946	[DebugInfo][InstrRef] Ignore SP clobbers on call instructions even more Avoid un-necessarily recreating DBG_VALUEs on call instructions. In LiveDebugvalues we choose to ignore any clobbers of SP by call instructions, as they're irrelevant to our model of the machine. We currently do so for tracking register values (MTracker); do the same for tracking variable locations (TTracker). Test modified to endure that a duplicate DBG_VALUE is not created after the call in struction in this test. Differential Revision: https://reviews.llvm.org/D114365	2021-11-24 17:25:48 +00:00
Peter Waller	787b66eb5f	[LoopAccessAnalysis][SVE] Bail out for scalable vectors The supplied test case, reduced from real world code, crashes with a 'Invalid size request on a scalable vector.' error. Since it's similar in spirit to an existing LAA test, rename the file to generalize it to both. Differential Revision: https://reviews.llvm.org/D114155	2021-11-24 15:52:20 +00:00
Roman Lebedev	cd8d219536	[X86][Costmodel] `getReplicationShuffleCost()`: promote 1 bit-wide elements to 32 bit when have AVX512DQ I believe, this effectively completes `X86TTIImpl::getReplicationShuffleCost()` for AVX512, other than the question of handling plain AVX512F, where we end up with some really ugly "shuffles", but then is there any CPU's that support AVX512, but not AVX512DQ/AVX512BW? Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D114315	2021-11-24 17:23:15 +03:00
Florian Hahn	8b86752c60	[VPlan] Remove unused VPInstruction constructor. (NFC) VPInstruction inherits from VPValue, so the constructor taking ArrayRef<VPValue*> covers all cases that would be covered by the removed constructor.	2021-11-24 14:06:50 +00:00
Bradley Smith	080ef0b6a6	[AArch64][SVE] Recognize all ones mask during fixed mask generation Differential Revision: https://reviews.llvm.org/D114431	2021-11-24 13:55:06 +00:00
Benjamin Kramer	d32787230d	Revert "[DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl" This reverts commit `3cf4a2c620`. It makes llc hang on the following test case. ``` target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128" target triple = "aarch64-unknown-linux-gnu" define dso_local void @_PyUnicode_EncodeUTF16() local_unnamed_addr #0 { entry: br label %while.body117.i while.body117.i: ; preds = %cleanup149.i, %entry %out.6269.i = phi i16* [ undef, %cleanup149.i ], [ undef, %entry ] %0 = load i16, i16* undef, align 2 %1 = icmp eq i16 undef, -10240 br i1 %1, label %fail.i, label %cleanup149.i cleanup149.i: ; preds = %while.body117.i %or130.i = call i16 @llvm.bswap.i16(i16 %0) #2 store i16 %or130.i, i16* %out.6269.i, align 2 br label %while.body117.i fail.i: ; preds = %while.body117.i ret void } ; Function Attrs: nofree nosync nounwind readnone speculatable willreturn declare i16 @llvm.bswap.i16(i16) #1 attributes #0 = { "target-features"="+neon,+v8a" } attributes #1 = { nofree nosync nounwind readnone speculatable willreturn } attributes #2 = { mustprogress nofree norecurse nosync nounwind readnone uwtable willreturn "frame-pointer"="non-leaf" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+neon,+v8a" } ```	2021-11-24 14:42:54 +01:00
Sanjay Patel	b326c05814	[InstSimplify] fold xor logic of 2 variables, part 2 (~a & b) ^ (a \| b) --> a This is the swapped and/or (Demorgan?) sibling fold for the fold added with D114462 ( `892648b18a` ). This case is easier to specify because we are returning a root value, not a 'not': https://alive2.llvm.org/ce/z/SRzj4f	2021-11-24 08:15:47 -05:00
Nemanja Ivanovic	b7bf937bbe	[PowerPC] Provide XL-compatible vec_round implementation The XL implementation of vec_round for vector double uses "round-to-nearest, ties to even" just as the vector float `version does. However clang and gcc use "round-to-nearest-away" for vector double and "round-to-nearest, ties to even" for vector float. The XL behaviour is implemented under the __XL_COMPAT_ALTIVEC__ macro similarly to other instances of incompatibility. Differential revision: https://reviews.llvm.org/D113642	2021-11-24 06:43:56 -06:00
Simon Pilgrim	3cf4a2c620	[DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl If we only demand bits from one half of a rotation pattern, see if we can simplify to a logical shift. For the ARM rev16 patterns, I had to drop a fold to prevent srl(bswap()) -> rotr(bswap) -> srl(bswap) infinite loops. I've replaced this with an isel PatFrag which should do the same task. https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount) https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount) https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount) https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount) Differential Revision: https://reviews.llvm.org/D114354	2021-11-24 11:28:35 +00:00
Jay Foad	d7e03df719	[AMDGPU] Implement widening multiplies with v_mad_i64_i32/v_mad_u64_u32 Select SelectionDAG ops smul_lohi/umul_lohi to v_mad_i64_i32/v_mad_u64_u32 respectively, with an addend of 0. v_mul_lo, v_mul_hi and v_mad_i64/u64 are all quarter-rate instructions so it is better to use one instruction than two. Further improvements are possible to make better use of the addend operand, but this is already a strict improvement over what we have now. Differential Revision: https://reviews.llvm.org/D113986	2021-11-24 11:25:02 +00:00
Jay Foad	8a52bd82e3	[AMDGPU] Only select VOP3 forms of VOP2 instructions Change VOP_PAT_GEN to default to not generating an instruction selection pattern for the VOP2 (e32) form of an instruction, only for the VOP3 (e64) form. This allows SIFoldOperands maximum freedom to fold copies into the operands of an instruction, before SIShrinkInstructions tries to shrink it back to the smaller encoding. This affects the following VOP2 instructions: v_min_i32 v_max_i32 v_min_u32 v_max_u32 v_and_b32 v_or_b32 v_xor_b32 v_lshr_b32 v_ashr_i32 v_lshl_b32 A further cleanup could simplify or remove VOP_PAT_GEN, since its optional second argument is never used. Differential Revision: https://reviews.llvm.org/D114252	2021-11-24 11:15:30 +00:00
Carl Ritson	976f3b3c9e	[AMDGPU] Only allow implicit WQM in pixel shaders Implicit derivatives are only valid in pixel shaders, hence only implicitly enable WQM for pixel shaders. This avoids unintended WQM in other shader types (e.g. compute) when image sampling instructions are used. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D114414	2021-11-24 20:04:42 +09:00
David Green	581f837355	[ARM] Fold (fadd x, (vselect c, y, -1.0)) into (vselect c, (fadd x, y), x) This is similar to D113574, but as a DAG combine, not tablegen patterns. Doing the fold as a DAG combine allows the fadd to be folded with a fmul, finally producing a predicated vfma. It performs the same fold of fadd(x, vselect(p, y, -0.0)) to vselect p, (fadd x, y), x) using -0.0 as the identity value of a fadd. Differential Revision: https://reviews.llvm.org/D113584	2021-11-24 10:41:00 +00:00
David Sherwood	cf40ca026f	[NFC] Tidy up SelectionDAGBuilder::visitIntrinsicCall to use existing sdl debug loc In quite a few places we were calling getCurSDLoc() to get the debug location, but this is already a local variable `sdl`. Differential Revision: https://reviews.llvm.org/D114447	2021-11-24 10:35:49 +00:00
Jeremy Morse	b8f68ad9cd	[DebugInfo][InstrRef] Avoid crash when values optimised out late in sdag It appears that we can emit all the instructions for a function, including debug instructions, and then optimise some of the values out late. Specifically, in the attached test case, an argument gets optimised out after DBG_VALUE / DBG_INSTR_REFs are created. This confuses MachineFunction::finalizeDebugInstrRefs, which expects to be able to find a defining instruction, and crashes instead. Fix this by identifying when there's no defining instruction, and translating that instead into a DBG_VALUE $noreg. Differential Revision: https://reviews.llvm.org/D114476	2021-11-24 10:34:48 +00:00
David Green	d9af9c2c5a	[ARM] Fold floating point select(binop) patterns Similar to D84091 which added extra predicated folds for integer operations using the identity element of the operation, this adds them for floating point operations for the form `BinOp(x, select(p, y, Identity))`. They are folded back to predicated versions of the operator, with fadd having the identity -0.0, fsub using the identity 0.0 and fmul using 1.0. Differential Revision: https://reviews.llvm.org/D113574	2021-11-24 10:22:20 +00:00
Rosie Sumpter	df32a39dd0	[LoopVectorize][CostModel] Update cost model for fmuladd intrinsic This patch updates the cost model for ordered reductions so that a call to the llvm.fmuladd intrinsic is modelled as a normal fmul instruction plus the cost of an ordered fadd reduction. Differential Revision: https://reviews.llvm.org/D111630	2021-11-24 08:50:05 +00:00
Rosie Sumpter	2d33327f9d	[LoopVectorize] Print fast-math flags for VPReductionRecipe	2021-11-24 08:50:05 +00:00
Rosie Sumpter	991074012a	[LoopVectorize] Propagate fast-math flags for VPInstruction In-loop vector reductions which use the llvm.fmuladd intrinsic involve the creation of two recipes; a VPReductionRecipe for the fadd and a VPInstruction for the fmul. If the call to llvm.fmuladd has fast-math flags these should be propagated through to the fmul instruction, so an interface setFastMathFlags has been added to the VPInstruction class to enable this. Differential Revision: https://reviews.llvm.org/D113125	2021-11-24 08:50:04 +00:00
Rosie Sumpter	c2441b6b89	[LoopVectorize] Add vector reduction support for fmuladd intrinsic Enables LoopVectorize to handle reduction patterns involving the llvm.fmuladd intrinsic. Differential Revision: https://reviews.llvm.org/D111555	2021-11-24 08:50:04 +00:00
Abinav Puthan Purayil	078da26b1c	[AMDGPU] Check for unneeded shift mask in shift PatFrags. The existing constrained shift PatFrags only dealt with masked shift from OpenCL front-ends. This change copies the X86DAGToDAGISel::isUnneededShiftMask() function to AMDGPU and uses it in the shift PatFrag predicates. Differential Revision: https://reviews.llvm.org/D113448	2021-11-24 10:53:12 +05:30
Jun Ma	17eb6b61de	Revert "[Taildup] Don't tail-duplicate loop header with multiple successors as its latches" This reverts commit `1f9fa54984`.	2021-11-24 10:26:37 +08:00
Jun Ma	07333810ca	Revert "Revert "Revert "Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover all possible values.""""" This reverts commit `c93f93b2e3`.	2021-11-24 10:26:37 +08:00
Stanislav Mekhanoshin	661a232e34	[AMDGPU] Remove a no-op check in the gfx90a hazard recognizer Also rename helper function accordingly. Differential Revision: https://reviews.llvm.org/D114289	2021-11-23 15:35:19 -08:00
Florian Mayer	6c06d8e310	[stack-safety] Check SCEV constraints at memory instructions. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D113160	2021-11-23 15:29:23 -08:00
Nemanja Ivanovic	c9cb8edc51	[PowerPC] Allow scalars for asm constraint "v" with VSX Similarly to what GCC does, we should allow scalars with the "v" constraint rather than introducing unnecessary new constraints for scalars in Altivec registers. Differential revision: https://reviews.llvm.org/D113635	2021-11-23 17:03:04 -06:00
Matt Arsenault	273a0c8bc9	PrologEpilogInserter: Use explicit control for scavenge slot placement AMDGPU is unusual in that the both stack is indexed in the same direction as stack growth (up). We therefore always need the emergency stack slots placed as low as possible to ensure they are in range of load/store instruction immediate offsets. The existing logic is mostly OK, but failed if we required stack realignment. I don't understand what the existing control isFPCloseToIncomingSP is supposed to mean, but can only be used to stop placing the scavenge slots earlier. Make this explicit so that targets can opt-in rather than opt-out only.	2021-11-23 18:01:12 -05:00
Florian Hahn	73a05cc8df	[LAA] Move visitPointers up in file (NFC). This allows easier re-use in earlier functions.	2021-11-23 22:47:26 +00:00
Sanjay Patel	892648b18a	[InstSimplify] fold xor logic of 2 variables (a & b) ^ (~a \| b) --> ~a I was looking for a shortcut to reduce some of the complex logic folds that are currently up for review (D113216 and others in that stack), and I found this missing from instcombine/instsimplify. There is a trade-off in putting it into instsimplify: because we can't create new values here, we need a strict 'not' op (no undef elements). Otherwise, the fold is not valid: https://alive2.llvm.org/ce/z/k_AGGj If this was in instcombine instead, we could create the proper 'not'. But having the fold here benefits other passes like GVN that use instsimplify as an analysis. There is a related fold where 'and' and 'or' are swapped, and that is planned as a follow-up commit. Differential Revision: https://reviews.llvm.org/D114462	2021-11-23 16:50:23 -05:00
Rong Xu	bf1138491a	[SampleFDO] Recompute BFI if the sample loader changes BPI The MIR sample loader changes the branch probability but not BFI. Here we force a recompute of BFI if the branch probabilities are changed. Also register the MIR FSAFDO passes properly. Differential Revision: https://reviews.llvm.org/D114400	2021-11-23 13:24:31 -08:00
Mehdi Amini	1392b654ff	Revert "profi - a flow-based profile inference algorithm: Part I (out of 3)" This reverts commit `884b6dd311`. The windows build is broken with a linker error.	2021-11-23 20:10:36 +00:00
Quinn Pham	1345bc5e16	[NFC][llvm] Inclusive language: remove instance of master in LiveRangeUtils.h [NFC] As part of using inclusive language within the llvm project, this patch replaces master with primary in `LiveRangeUtils.h`. Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D114191	2021-11-23 13:07:42 -06:00
spupyrev	884b6dd311	profi - a flow-based profile inference algorithm: Part I (out of 3) The benefits of sampling-based PGO crucially depends on the quality of profile data. This diff implements a flow-based algorithm, called profi, that helps to overcome the inaccuracies in a profile after it is collected. Profi is an extended and significantly re-engineered classic MCMF (min-cost max-flow) approach suggested by Levin, Newman, and Haber [2008, Complementing missing and inaccurate profiling using a minimum cost circulation algorithm]. It models profile inference as an optimization problem on a control-flow graph with the objectives and constraints capturing the desired properties of profile data. Three important challenges that are being solved by profi: - "fixing" errors in profiles caused by sampling; - converting basic block counts to edge frequencies (branch probabilities); - dealing with "dangling" blocks having no samples in the profile. The main implementation (and required docs) are in SampleProfileInference.cpp. The worst-time complexity is quadratic in the number of blocks in a function, O(\|V\|^2). However a careful engineering and extensive evaluation shows that the running time is (slightly) super-linear. In particular, instances with 1000 blocks are solved within 0.1 second. The algorithm has been extensively tested internally on prod workloads, significantly improving the quality of generated profile data and providing speedups in the range from 0% to 5%. For "smaller" benchmarks (SPEC06/17), it generally improves the performance (with a few outliers) but extra work in the compiler might be needed to re-tune existing optimization passes relying on profile counts. Reviewed By: wenlei, hoy Differential Revision: https://reviews.llvm.org/D109860	2021-11-23 11:02:40 -08:00
Zarko Todorovski	0d3add216f	[llvm][NFC] Inclusive language: Reword replace uses of sanity in llvm/lib/Transform comments and asserts Reworded some comments and asserts to avoid usage of `sanity check/test` Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D114372	2021-11-23 13:22:55 -05:00
Philip Reames	03d8bc184a	[indvars] Fix lftr crash when preheader is terminated by switch This was found by oss-fuzz. The switch will get canonicalized to a branch, but if it hasn't been when we run LFTR, we crashed on an unneeded assert.	2021-11-23 09:58:46 -08:00
Nemanja Ivanovic	c933c2eb33	[PowerPC] Add BCD add/sub/cmp builtins Support for builtins that use bcdadd./bcdsub. to add/subtract Binary Coded Decimal values as well as to determine validity and compare BCD values. Differential revision: https://reviews.llvm.org/D114088	2021-11-23 11:42:36 -06:00
Florian Hahn	0a00d64e32	[LAA] Turn aggregate type check into assertion (NFCI). getPtrStride should not be called with aggregate access types. There's also an old TODO. Turn the check into an assertion.	2021-11-23 17:37:30 +00:00
Philip Reames	065f777d27	Revert "profi - a flow-based profile inference algorithm: Part I (out of 3)" This reverts commit `b00fc19822`. This change fails to build (link) on ubuntu x86,	2021-11-23 09:18:28 -08:00
Philip Reames	18086186ab	[unroll] Remove two dead variable assignments [nfc] These variables are not out-params, and we immediately return after assigning them. Thus, the assignments are dead and just confusing. I believe these used to be out-params, but they're not any more.	2021-11-23 09:12:20 -08:00
spupyrev	b00fc19822	profi - a flow-based profile inference algorithm: Part I (out of 3) The benefits of sampling-based PGO crucially depends on the quality of profile data. This diff implements a flow-based algorithm, called profi, that helps to overcome the inaccuracies in a profile after it is collected. Profi is an extended and significantly re-engineered classic MCMF (min-cost max-flow) approach suggested by Levin, Newman, and Haber [2008, Complementing missing and inaccurate profiling using a minimum cost circulation algorithm]. It models profile inference as an optimization problem on a control-flow graph with the objectives and constraints capturing the desired properties of profile data. Three important challenges that are being solved by profi: - "fixing" errors in profiles caused by sampling; - converting basic block counts to edge frequencies (branch probabilities); - dealing with "dangling" blocks having no samples in the profile. The main implementation (and required docs) are in SampleProfileInference.cpp. The worst-time complexity is quadratic in the number of blocks in a function, O(\|V\|^2). However a careful engineering and extensive evaluation shows that the running time is (slightly) super-linear. In particular, instances with 1000 blocks are solved within 0.1 second. The algorithm has been extensively tested internally on prod workloads, significantly improving the quality of generated profile data and providing speedups in the range from 0% to 5%. For "smaller" benchmarks (SPEC06/17), it generally improves the performance (with a few outliers) but extra work in the compiler might be needed to re-tune existing optimization passes relying on profile counts. Reviewed By: wenlei, hoy Differential Revision: https://reviews.llvm.org/D109860	2021-11-23 09:08:30 -08:00
Philip Reames	5c77aa2b91	[unroll] Use early return in shouldFullUnroll [nfc]	2021-11-23 09:01:36 -08:00
Kazu Hirata	d45cb1d7ea	[llvm] Use range-based for loops (NFC)	2021-11-23 08:54:48 -08:00
Paul Robinson	c075566c8d	[PS4][TLI] Remove redundant line	2021-11-23 08:42:32 -08:00
alex-t	9e03e8c99e	[AMDGPU] Enable fneg and fabs divergence-driven instruction selection. Detailed description: We currently have a set of patterns to select ISD::FNEG and ISD::FABS to the bitwise operations. We need to make them predicated to select the VALU or SALU bitwise operation variant according to the SDNode divergence bit. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D114257	2021-11-23 19:37:07 +03:00
Simon Moll	1e65b93f3a	[VP] Canonicalize macros of VPIntrinsics.def Usage and naming of macros in VPIntrinsics.def has been inconsistent. Rename all property macros to VP_PROPERTY_<name>. Use BEGIN/END scope macros to attach properties to vp intrinsics and SDNodes (instead of specifying either directly with the property macro). A follow-up patch has documentation on how the macros are (intended) to be used. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D114144	2021-11-23 16:51:11 +01:00
Sanjay Patel	430ad9697d	[InstCombine] enhance bitwise select matching I noticed that adding a seemingly unrelated fold for xor caused regressions on similar patterns, and this is one of the underlying causes. This could also be a variation for code as seen in: https://llvm.org/PR34047 ...although that exact example should be fixed after: D113035 / `c36b7e21bd` The vector test shows that we are actually missing a potential canonicalization for bitcast-of-sext-of-not or the inverse. The scalar test shows that even if we had that canonicalization, it would still be possible to see this pattern due to extra uses. https://alive2.llvm.org/ce/z/y2BAgi	2021-11-23 09:57:44 -05:00
Evgeniy Brevnov	47e2644c89	[DSE][NFC] Introduce "doesn't overwrite" return code for isOverwrite Add OR_None code to indicate that there is no overwrite. This has no any effect for current uses but will be used in one of the next patches building support for PHI translation. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D105098	2021-11-23 17:11:15 +07:00
David Green	32b6c17b29	[SDAG] Use UnknownSize for masked load/store MMO size A masked load or store will load a potentially unknown number of bytes from a memory location - that is not generally known at compile time. They do not necessarily load/store the entire vector width, and treating them as such can lead to incorrect aliasing information (for example, if the underlying object is smaller than the size of the vector). This makes sure that the MMO is given an unknown size to represent this. which is less accurate that "may load/store from up to 16 bytes", but less incorrect that "will load/store from 16 bytes". Differential Revision: https://reviews.llvm.org/D113888	2021-11-23 09:47:56 +00:00
Qiu Chaofan	59f4b3d308	[PowerPC] Implement more fusion types for Power10 This implements the rest of Power10 instruction fusion pairs, according to user manual, including 'wide immediate', 'load compare', 'zero move' and 'SHA3 assist'. Only 'SHA3 assist' is enabled by default. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D112912	2021-11-23 17:21:17 +08:00
Martin Storsjö	4e5488afb2	[AArch64] [COFF] Move jump tables back to the readonly section This essentially reverts `f5884d255e` (D57277). That commit was made as a workaround since LLVM back then didn't support cross-section relative relocations (IMAGE_REL_ARM64_REL32) in COFF for ARM64. Support for this was implemented later, in `d5c5cf5ce8` (D99572) and `382c505d9c` (D102217). The commit that moved jump tables to the function section noted that it woud be ideal to utilize IMAGE_REL_ARM64_REL32. Differential Revision: https://reviews.llvm.org/D113576	2021-11-23 10:13:48 +02:00
Martin Storsjö	06d0d449d8	[COFF] [ARM64] Create symbols with regular intervals for relocations against temporary symbols For relocations against temporary symbols (that don't persist in the object file), we normally adjust them to reference the start of the section. For adrp relocations, the immediate offset from the referenced symbol is stored in the opcode as the 21 bit signed immediate; this means that the symbol referenced must be within +/- 1 MB from the referenced symbol. Create label symbols with regular intervals (1 MB intervals). For relocations against temporary symbols, pick the preceding added offset symbol and make the relocation against that instead of against the start of the section. This should fix the root issue behind https://bugs.llvm.org/show_bug.cgi?id=52378. Differential Revision: https://reviews.llvm.org/D114340	2021-11-23 10:12:41 +02:00
Kazu Hirata	d5b73a70a0	[llvm] Use range-based for loops (NFC)	2021-11-22 20:33:28 -08:00
Huihui Zhang	9cd7c534e2	[InstCombine] Enable fold select into operand for FAdd, FMul, FSub and FDiv. For FAdd, FMul, FSub and FDiv, fold select into one of the operands to enable further optimizations, i.e., floating-point reduction detection. Turn code: %C = fadd %A, %B %D = select %cond, %C, %A into: %C = select %cond, %B, -0.000000e+00 %D = fadd %A, %C Alive2 verification (with --disable-undef-input), timed out otherwise. FAdd - https://alive2.llvm.org/ce/z/eUxN4Y FMul - https://alive2.llvm.org/ce/z/5SWZz4 FSub - https://alive2.llvm.org/ce/z/Dhj8dU FDiv - https://alive2.llvm.org/ce/z/Yj_NA2 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D113442	2021-11-22 15:10:10 -08:00
Florian Hahn	6149e57dc1	[ThreadPool] Support returning futures with results. This patch adjusts ThreadPool::async to return futures that wrap the result type of the passed in callable. To do so, ThreadPool::asyncImpl first creates a shared promise. The result of the promise is set in a new callable that first executes the task. The callable is added to the task queue. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D114183	2021-11-22 21:20:55 +00:00
Jay Foad	44a3916f78	[AMDGPU] Allow VOP3 source modifiers in fpow expansion Differential Revision: https://reviews.llvm.org/D114353	2021-11-22 20:39:46 +00:00
Stanislav Mekhanoshin	c407769f5e	[InstCombine] (~(a \| b) & c) \| ~(c \| (a ^ b)) -> ~((a \| b) & (c \| (b ^ a))) Transform ``` (~(a \| b) & c) \| ~(c \| (a ^ b)) -> ~((a \| b) & (c \| (b ^ a))) ``` And swapped case: ``` (a \| ~(b & c)) & ~(a & (b ^ c)) --> ~(a \| b) \| (a ^ b ^ c) ``` ``` ---------------------------------------- define i3 @src(i3 %a, i3 %b, i3 %c) { %0: %or1 = or i3 %b, %c %not1 = xor i3 %or1, 7 %and1 = and i3 %a, %not1 %xor1 = xor i3 %b, %c %or2 = or i3 %xor1, %a %not2 = xor i3 %or2, 7 %or3 = or i3 %and1, %not2 ret i3 %or3 } => define i3 @tgt(i3 %a, i3 %b, i3 %c) { %0: %obc = or i3 %b, %c %xbc = xor i3 %b, %c %o = or i3 %a, %xbc %and = and i3 %obc, %o %r = xor i3 %and, 7 ret i3 %r } Transformation seems to be correct! ``` ``` ---------------------------------------- define i4 @src(i4 %a, i4 %b, i4 %c) { %0: %and1 = and i4 %b, %c %not1 = xor i4 %and1, 15 %or1 = or i4 %not1, %a %xor1 = xor i4 %b, %c %and2 = and i4 %xor1, %a %not2 = xor i4 %and2, 15 %and3 = and i4 %or1, %not2 ret i4 %and3 } => define i4 @tgt(i4 %a, i4 %b, i4 %c) { %0: %xor1 = xor i4 %b, %c %xor2 = xor i4 %xor1, %a %or1 = or i4 %a, %b %not1 = xor i4 %or1, 15 %or2 = or i4 %xor2, %not1 ret i4 %or2 } Transformation seems to be correct! ``` Differential Revision: https://reviews.llvm.org/D112955	2021-11-22 10:49:21 -08:00
Nico Weber	2fb3c05b34	[asm] Merge EmitMSInlineAsmStr() and EmitGCCInlineAsmStr() This basically reverts `1778831a3d`, which split them. Since they were split 9 years ago, EmitGCCInlineAsmStr() grew a bunch of features that usually weren't added to EmitMSInlineAsmStr(), and that was usually a mistake. D71677, D113932, D114167 are all examples of where things were backported to EmitMSInlineAsmStr(). The names were also not great. EmitMSInlineAsmStr() used to be called for `asm inteldialect`, which clang produces for Microsoft-style __asm { ... } blocks as well for GCC-style __asm__ / asm statements with -masm=intel. On the other hand, EmitGCCInlineAsmStr() used to be called for `asm`, whic clang produces for GCC-style __asm__ / asm statements with -masm=att (the default). It's also less code (23 insertions, 188 deletions). No behavior change. Differential Revision: https://reviews.llvm.org/D114330	2021-11-22 11:49:57 -05:00
Nico Weber	7c2d51474a	[asm] Allow labels as operands in intel asm syntax This makes a line in llvm/test/CodeGen/X86/asm-block-labels.ll pass with `asm inteldialect` too. I don't know if this is something one can hit in practice with inline asm. The test is from 2007 (`4646aa3e33`) but in 2009 blockaddr was introduced and e.g. `__asm__ __volatile__("brl %0" :: "X"(&&foo) : "memory");` compiles to call void asm sideeffect "brl $0", "X,..."(i8* blockaddress(@func, %1)) nowadays (thanks to jrtc27 for that example!). (`6c4d255bf3` switched clang to blockaddress on an opt-in basis, `e4801f7844` added docs for it, `31b132c0b7` added IR support.) I half-heartedly tried to build clang 2.8 locally, but it didn't just build. And 2.8 didn't have a prebuilt clang binary yet. The motivation is to make EmitGCCInlineAsmStr() and EmitMSInlineAsmStr() more alike, and maybe we should delete this code form EmitGCCInlineAsmStr() instead. But since it's just 3 lines and it's reachable from LLVM IR, let's do the safer thing for now. Differential Revision: https://reviews.llvm.org/D114329	2021-11-22 11:49:29 -05:00
Kazu Hirata	59a26448a6	[Target] Use range-based for loops (NFC)	2021-11-22 08:21:07 -08:00
Zarko Todorovski	dc9b5550b2	[NFC][llvm][Hexagon] Inclusive Terms remove uses of sanity in Hexagon taget Most changes are rewording comments but there are some assertions that I rephrased. Reviewed By: kparzysz Differential Revision: https://reviews.llvm.org/D114132	2021-11-22 10:08:01 -05:00
Hsiangkai Wang	137d3474ca	[RISCV] Reverse the order of loading/storing callee-saved registers. Currently, we restore the return address register as the last restoring instruction in the epilog. The next instruction is `ret` usually. It is a use of return address register. In some microarchitectures, there is load-to-use data hazard. To avoid the load-to-use data hazard, we could separate the load instruction from its use as far as possible. In this patch, we reverse the order of restoring callee-saved registers to increase the distance of `load ra` and `ret` in the epilog. Differential Revision: https://reviews.llvm.org/D113967	2021-11-22 23:02:11 +08:00
Nikita Popov	62e9acad0a	Revert "[SCEV] Fix and validate ValueExprMap/ExprValueMap consistency" This reverts commit `d633db8f9d`. Causes bootstrap assertion failures: https://lab.llvm.org/buildbot/#/builders/168/builds/3459/steps/9/logs/stdio	2021-11-22 15:47:33 +01:00
Nikita Popov	d633db8f9d	[SCEV] Fix and validate ValueExprMap/ExprValueMap consistency This adds validation for consistency of ValueExprMap and ExprValueMap, and fixes identified issues: * Addrec construction directly wrote to ValueExprMap in a few places, without updating ExprValueMap. Add a helper to ensures they stay consistent. The adjustment in forgetSymbolicName() explicitly drops the old value from the map, so that we don't rely on it being overwritten. * forgetMemoizedResultsImpl() was dropping the SCEV from ExprValueMap, but not dropping the corresponding entries from ValueExprMap. Differential Revision: https://reviews.llvm.org/D113349	2021-11-22 15:27:25 +01:00
Simon Moll	56db1c072c	[DA][NFC] Update publication - add remarks Update the reference publication for the SyncDependenceAnalysis and Divergence Analysis. Fix phrasing, formatting. Add comments on reducible loop limitation. Reviewed By: sameerds Differential Revision: https://reviews.llvm.org/D114146	2021-11-22 12:58:19 +01:00
Roman Lebedev	704d92607d	[X86][TTI] Finish costmodel for AVX512BW's VPMOVM2[BW] / VPMOV[BW]2M instructions Apparently my methodology was suboptimal, and not only did miss all the +VL tuples, i also missed some plain tuples. I believe, this adds everything missing. Indeed, these manual costmodels are just not okay long-term. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D114334	2021-11-22 14:31:34 +03:00
Roman Lebedev	8d09dd61c3	[X86][TTI] Costmodel for AVX512DQ's VPMOVM2[DQ] / VPMOV[DQ]2M instructions Much like the VPMOVM2[BW] / VPMOV[BW]2M from AVX512BW, these either sign-extent the mask register into a vector, or pack the mask from vector register. Apparently, we didn't even have MCA tests for these, added in rG2f364f6f0d3a2420ca78cbd80abb186657180e05, so i'm just guessing that their perf characteristics are optimal. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D114314	2021-11-22 14:31:34 +03:00
Diego Caballero	4348cd42c3	[LV] Drop integer poison-generating flags from instructions that need predication This patch fixes PR52111. The problem is that LV propagates poison-generating flags (`nuw`/`nsw`, `exact` and `inbounds`) in instructions that contribute to the address computation of widen loads/stores that are guarded by a condition. It may happen that when the code is vectorized and the control flow within the loop is linearized, these flags may lead to generating a poison value that is effectively used as the base address of the widen load/store. The fix drops all the integer poison-generating flags from instructions that contribute to the address computation of a widen load/store whose original instruction was in a basic block that needed predication and is not predicated after vectorization. Reviewed By: fhahn, spatel, nlopes Differential Revision: https://reviews.llvm.org/D111846	2021-11-22 10:57:29 +00:00
Sjoerd Meijer	4d21b64464	[BPI] Look-up tables for non-loop branches. NFC. This adds and uses look-up tables for non-loop branch probabilities, which have have probabilities directly encoded into the tables for the different condition codes. Compared to having this logic inlined in different functions, as it used to be the case, I think this is compacter and thus also easier to check/cross reference. This also adds a test for pointer heuristics that was missing. Differential Revision: https://reviews.llvm.org/D114009	2021-11-22 10:30:42 +00:00
David Green	760d4d03d5	[AArch64] Sink splat shuffles to lane index intrinsics This teaches AArch64TargetLowering::shouldSinkOperands to sink splat shuffles to certain neon intrinsics, so that they can make use of the lane variants of the instructions that are available. Differential Revision: https://reviews.llvm.org/D112994	2021-11-22 08:11:35 +00:00
wangpc	af0ecfccae	[RISCV] Generate pseudo instruction li Add an alias of `addi [x], zero, imm` to generate pseudo instruction li, which makes assembly mush more readable. For existed tests, users can update them by running script `llvm/utils/update_llc_test_checks.py`. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D112692	2021-11-22 14:01:37 +08:00
Kazu Hirata	49e3838145	[llvm] Use make_early_inc_range (NFC)	2021-11-21 19:24:17 -08:00
Kazu Hirata	ea5421bd0d	[llvm] Use range-based for loops (NFC)	2021-11-21 19:24:15 -08:00
Roland McGrath	b72b56016a	NFC: clang-format lib/Transforms/Instrumentation/InstrProfiling.cpp Differential Revision: https://reviews.llvm.org/D114343	2021-11-21 18:16:02 -08:00
Kazu Hirata	c133fb321f	[CodeGen] Use llvm::is_contained (NFC)	2021-11-21 10:36:20 -08:00
Kazu Hirata	fc981cedea	[llvm] Use range-based for loops (NFC)	2021-11-21 10:36:18 -08:00
Kazu Hirata	f6bce30cf9	[llvm] Use range-based for loops (NFC)	2021-11-20 18:42:10 -08:00
Phoebe Wang	6cc820a3e2	[X86][FP16] Relax the pattern condition for VZEXT_MOVL to match more cases Fixes pr52560 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D114313	2021-11-21 09:14:11 +08:00
Nikita Popov	aeba28bc62	[DSE] Drop hasAnalyzableMemoryWrite() (NFCI) The functionality of hasAnalyzableMemoryWrite() is effectively subsumed by getLocForWriteEx(), which will return None if the instruction is not analyzable. The implementations don't match exactly (e.g. getLocForWriteEx() does not limit non-calls to stores), but in conjunction with the isRemovable() check, it ends up being the same.	2021-11-20 23:20:12 +01:00
Nikita Popov	0a2bde94a0	[LVI] Drop requirement that modulus is constant If we're looking only at the lower bound, the actual modulus doesn't matter. This is a leftover from when I wanted to consider the upper bound as well, where the modulus does matter.	2021-11-20 21:06:08 +01:00
Nikita Popov	cd84cab6b3	[LVI] Support urem in implied conditions If (X urem M) >= C we know that X >= C. Make use of this fact when computing the implied condition range. In some cases we could also establish an upper bound, but that's both tricker and not interesting in practice. Alive: https://alive2.llvm.org/ce/z/R5ZGSW	2021-11-20 21:01:26 +01:00
Florian Hahn	cf8efbd30e	[VPlan] Wrap vector loop blocks in region. A first step towards modeling preheader and exit blocks in VPlan as well. Keeping the vector loop in a region allows for changing the VF as we traverse region boundaries. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D113182	2021-11-20 17:59:48 +00:00
Sanjay Patel	337948ac6e	[InstCombine] add folds for binop with sexted bool and constant operands This is a generalization/extension of the existing and/or folds noted with TODO comments. Those have a one-use constraint that is not necessary. Potential follow-ups are noted by the TODO comments in the new function. We can also call this function from other binop visit* functions, but we need to add tests first. This solves: https://llvm.org/PR52543 https://alive2.llvm.org/ce/z/NWuCR5	2021-11-20 12:33:00 -05:00
Craig Topper	a4373f6753	[X86] Don't combine (x86cmp (trunc (movmsk (bitcast X))), 0) if the truncate discards unknown bits. We have transform that tries turn a pmovmskb into movmskps/pd or movmskps to movmskpd. This transform isn't valid if the truncate discarded bits that might be set by the original movmsk. We could fix this by inserting an AND after the new movmsk to discard the equivalent of the truncated bits, but I've left that for later patch. Fixes PR52567. Differential Revision: https://reviews.llvm.org/D114306	2021-11-19 21:50:35 -08:00
Kazu Hirata	d1abf481da	[llvm] Use range-based for loops (NFC)	2021-11-19 21:12:13 -08:00
RamNalamothu	18f9351223	[AMDGPU] Do not generate ELF symbols for the local branch target labels The compiler was generating symbols in the final code object for local branch target labels. This bloats the code object, slows down the loader, and is only used to simplify disassembly. Use '--symbolize-operands' with llvm-objdump to improve readability of the branch target operands in disassembly. Fixes: SWDEV-312223 Reviewed By: scott.linder Differential Revision: https://reviews.llvm.org/D114273	2021-11-20 10:32:41 +05:30
Lang Hames	43f5f6916f	[ORC][JITLink] Move JITDylib name into JITLinkDylib base class. This will enable better error messages and debug logs in JITLink.	2021-11-19 20:54:28 -08:00
ksyx	97b9e8438e	[GVN][NFC] Remove redundant check The if-check above deleted part guarantees that StoreOffset <= LoadOffset and that StoreOffset + StoreSize >= LoadOffset + LoadSize, and given that LoadOffset + LoadSize > LoadOffset when LoadSize > 0. Thus, this shows StoreOffset + StoreSize > LoadOffset is guaranteed given LoadSize > 0, while it could be meaningless to have a type with nonpositive size, so that the check could be removed. The values are converted to signed types to avoid unsigned operation with negative offsets. Part of revision D100179 Reapply commit `c35e8185d8` with fixing problem reported by mstorsjo	2021-11-19 20:24:36 -05:00
James Nagurne	241df03ce5	[NFC] Test commit, add whitespace to end-of-line	2021-11-19 18:22:00 -06:00
Ellis Hoag	de11de308b	[InstrProf] Use i32 for GEP index from lowering llvm.instrprof.increment The `llvm.instrprof.increment` intrinsic uses `i32` for the index. We should use this same type for the index into the GEP instructions. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D114268	2021-11-19 15:45:14 -08:00
Quinn Pham	3f3bee42d2	[NFC][llvm] Inclusive language: remove instance of master from Thumb2SizeReduction.cpp [NFC] As part of using inclusive language within the llvm project, this patch replaces master with main in `Thumb2SizeReduction.cpp`. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D114196	2021-11-19 16:07:58 -06:00
David Blaikie	3f3680dff3	DWARFVerifier: Simplify name lookups No need to use the dynamic fallback query when the name type is known statically at the call site.	2021-11-19 12:31:27 -08:00
Jay Foad	ff7f2cfa95	[AMDGPU] Add an implicit use of M0 to all V_MOV_B32_indirect_read/write NFCI. Previously the implicit use was added to V_MOV_B32_indirect_read when building the instruction. V_MOV_B32_indirect_write didn't have an implicit use of M0 at all, but apparently it did not cause any problems. Differential Revision: https://reviews.llvm.org/D114239	2021-11-19 19:00:17 +00:00
Fabian Wolff	7eec832def	[DSE] Improve handling of `strncpy` in Dead Store Elimination Fixes PR#52062 and one of the remaining cases of PR#47644. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D114035	2021-11-19 17:46:29 +00:00
Florian Hahn	76effb001d	[LV] Remove obsolete comment about creating a dummy block (NFC) No dummy pre-entry block is created since `a6c4969f5f`. The comment is stale now and can be removed. Mentioned by @Ayal in D113182.	2021-11-19 17:17:04 +00:00
Philip Reames	28000587e1	[SCEV] Revert two speculative compile time optimizations which made no difference Revert "[SCEV] Defer all work from `ea12c2cb` as late as possible" Revert "[SCEV] Defer loop property checks from `ea12c2cb` as late as possible" This reverts commit `734abbad79` and `1a5666acb2`. Both of these changes were speculative attempts to address a compile time regression. Neither worked, and both complicated the code in undesirable ways.	2021-11-19 08:45:56 -08:00
Philipp Tomsich	af57a71d18	[RISCV] Don't call setHasMultipleConditionRegisters(), so icmp is sunk On RISC-V, icmp is not sunk (as the following snippet shows) which generates the following suboptimal branch pattern: ``` core_list_find: lh a2, 2(a1) seqz a3, a0 << bltz a2, .LBB0_5 bnez a3, .LBB0_9 << should sink the seqz [...] j .LBB0_9 .LBB0_5: bnez a3, .LBB0_9 << should sink the seqz lh a1, 0(a1) [...] ``` due to an icmp not being sunk. The blocks after `codegenprepare` look as follows: ``` define dso_local %struct.list_head_s* @core_list_find(%struct.list_head_s* readonly %list, %struct.list_data_s* nocapture readonly %info) local_unnamed_addr #0 { entry: %idx = getelementptr inbounds %struct.list_data_s, %struct.list_data_s* %info, i64 0, i32 1 %0 = load i16, i16* %idx, align 2, !tbaa !4 %cmp = icmp sgt i16 %0, -1 %tobool.not37 = icmp eq %struct.list_head_s* %list, null br i1 %cmp, label %while.cond.preheader, label %while.cond9.preheader while.cond9.preheader: ; preds = %entry br i1 %tobool.not37, label %return, label %land.rhs11.lr.ph ``` where the `%tobool.not37` is the result of the icmp that is not sunk. Note that it is computed in the basic-block up until what becomes the `bltz` instruction and the `bnez` is a basic-block of its own. Compare this to what happens on AArch64 (where the icmp is correctly sunk): ``` define dso_local %struct.list_head_s* @core_list_find(%struct.list_head_s* readonly %list, %struct.list_data_s* nocapture readonly %info) local_unnamed_addr #0 { entry: %idx = getelementptr inbounds %struct.list_data_s, %struct.list_data_s* %info, i64 0, i32 1 %0 = load i16, i16* %idx, align 2, !tbaa !6 %cmp = icmp sgt i16 %0, -1 br i1 %cmp, label %while.cond.preheader, label %while.cond9.preheader while.cond9.preheader: ; preds = %entry %1 = icmp eq %struct.list_head_s* %list, null br i1 %1, label %return, label %land.rhs11.lr.ph ``` This is caused by sinkCmpExpression() being skipped, if multiple condition registers are supported. Given that the check for multiple condition registers affect only sinkCmpExpression() and shouldNormalizeToSelectSequence(), this change adjusts the RISC-V target as follows: * we no longer signal multiple condition registers (thus changing the behaviour of sinkCmpExpression() back to sinking the icmp) * we override shouldNormalizeToSelectSequence() to let always select the preferred normalisation strategy for our backend With both changes, the test results remain unchanged. Note that without the target-specific override to shouldNormalizeToSelectSequence(), there is worse code (more branches) generated for select-and.ll and select-or.ll. The original test case changes as expected: ``` core_list_find: lh a2, 2(a1) bltz a2, .LBB0_5 beqz a0, .LBB0_9 << [...] j .LBB0_9 .LBB0_5: beqz a0, .LBB0_9 << lh a1, 0(a1) [...] ``` Differential Revision: https://reviews.llvm.org/D98932	2021-11-19 08:32:59 -08:00
Victor Huang	86e77cdb08	[PowerPC] Add a flag for conditional trap optimization This patch adds a flag to enable/disable conditional trap optimization. Optimization disabled by default. Peer reviewed by: nemanjai	2021-11-19 10:24:54 -06:00
Ben Langmuir	4c94760f36	[ORC] Fix materialization of weak local symbols We were adding all defined weak symbols to the materialization responsibility, but local symbols will not be in the symbol table, so it failed to materialize due to the "missing" symbol. Local weak symbols come up in practice when using `ld -r` with a hidden weak symbol. rdar://85574696	2021-11-19 07:25:56 -08:00
Matt Morehouse	671f0930fe	[X86] Selective relocation relaxation for +tagged-globals For tagged-globals, we only need to disable relaxation for globals that we actually tag. With this patch function pointer relocations, which we do not instrument, can be relaxed. This patch also makes tagged-globals work properly with LTO, as -Wa,-mrelax-relocations=no doesn't work with LTO. Reviewed By: pcc Differential Revision: https://reviews.llvm.org/D113220	2021-11-19 07:18:27 -08:00
Alexey Bataev	d1fdf867b1	[SLP][NFC]Introduce TreeEntry::getVectorFactor member function, NFC. Added TreeEntry::getVectorFactor to get the final vectotization factor to simplify the code. Differential Revision: https://reviews.llvm.org/D114190	2021-11-19 06:32:19 -08:00
Nico Weber	8b76d33c59	[asm] Allow block address operands in `asm inteldialect` This makes the following program build with -masm=intel: int foo(int count) { asm goto ("dec %0; jb %l[stop]" : "+r" (count) : : : stop); return count; stop: return 0; } It's also is another step towards merging EmitGCCInlineAsmStr() and EmitMSInlineAsmStr(). Differential Revision: https://reviews.llvm.org/D114167	2021-11-19 09:27:30 -05:00
Nico Weber	4f9a5c2a14	[asm] Remove explicit branch for modifier 'l' No intended behavior change. EmitGCCInlineAsmStr() used to explicitly check for modifier 'l' after handling block address and machine basic block operands. This prevented passing a MachineOperand with 'l' modifier to PrintAsmMemoryOperand(). Conceptually that seems kind of nice, but in practice the overrides of PrintAsmMemoryOperand() in all () AsmPrinter subclasses already reject modifiers they don't know about, and none of them don't know about 'l'. So removing this doesn't have a behavior difference, is less code, and it makes EmitGCCInlineAsmStr() and EmitMSInlineAsmStr() more similar, to prepare for merging them later. (Why not _add_ the branch to EmitMSInlineAsmStr() instead? Because that always works with X86AsmPrinter I think, and X86AsmPrinter::PrintAsmMemoryOperand() very decisively rejects the 'l' modifier, so it's hard to motivate adding that branch.) : The one exception was AVRAsmPrinter, which had an llvm_unreachable instead of returning true. So this commit changes that, so that the AVR target keeps emitting an error instead of crashing when passing a mem operand with a :l modifier to it. All the other targets already don't crash on this. Differential Revision: https://reviews.llvm.org/D114216	2021-11-19 09:19:53 -05:00
Jay Foad	30b27ecfc2	[AMDGPU] Use new opcode for indexed vgpr reads Introduce V_MOV_B32_indirect_read for indexed vgpr reads (and rename the old V_MOV_B32_indirect to V_MOV_B32_indirect_write) so they can be unambiguously distinguished from regular V_MOV_B32_e32. Previously they were distinguished by looking for extra implicit operands but this is fragile because regular moves sometimes have extra implicit operands too: - either by accident, when instructions end up with duplicate implicit operands (see e.g. D100939) - or by design, when SIInstrInfo::copyPhysReg breaks a multi-dword copy into individual subreg mov instructions and adds implicit operands for the super-register. The effect of this is that SIInstrInfo::isFoldableCopy can be simplified and identifies more foldable copies. The test diffs show that more immediate 0 values have been folded as inline operands. SIInstrInfo::isReallyTriviallyReMaterializable could probably be simplified too but that is not part of this patch. Differential Revision: https://reviews.llvm.org/D114230	2021-11-19 13:08:11 +00:00
Roman Lebedev	049799c311	[X86][Costmodel] `getReplicationShuffleCost()`: promote 1 bit-wide elements to 8 bit when have AVX512BW+AVX512VBMI If in addition to AVX512BW (that provides `{k}<->{i8,i16}` casts and i16 shuffles), we have AVX512VBMI, which provides i8 shuffles, we are in an optimal situation. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D114071	2021-11-19 15:58:10 +03:00
Roman Lebedev	a751084bb4	[X86][Costmodel] `trunc v16i8 to v8i1` can appear after legalization, cost is same as for `trunc v8i8 to v8i1` Note that there are many other missing costs, i'm only adding the ones that are queried from `getReplicationShuffleCost()` for the existing (quite exhaustive) test coverage. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D114070	2021-11-19 15:57:32 +03:00
Roman Lebedev	a50fdd3fc9	[X86][Costmodel] `getReplicationShuffleCost()`: promote 1 bit-wide elements to 16 bit when have AVX512BW Here we get pretty lucky. AVX512F does not provide any instructions to convert between a `k` vector mask and a vector, but AVX512BW adds `{k}<->nX{i8,i16}`conversions, and just as it happens, with AVX512BW we have a i16 shuffle. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D113915	2021-11-19 15:55:41 +03:00
Simon Pilgrim	0f652d8f52	[X86] LowerRotate - recognise hidden ROTR patterns for better vXi8 codegen Check for a hidden ISD::ROTR (rotl(sub(0,x))) - vXi8 lowering can handle both (its always beneficial for splats, but otherwise only if we have VPTERNLOG). We currently hit infinite loops in TargetLowering::expandROT if we set ISD::ROTR to custom, which needs addressing before we extend this much further.	2021-11-19 11:49:15 +00:00
Simon Pilgrim	812e64ef0c	[DAG] MatchRotate - support rotate-by-constant of illegal types Patch to fix some of the regressions in D77804. By folding to rotate/funnel-shift by constant amounts for illegal types, we prevent SimplifyDemandedBits from destroying the patterns prematurely, allowing us to use the rotate/funnel-shift legalization that was added in D112443. Differential Revision: https://reviews.llvm.org/D113192	2021-11-19 11:12:04 +00:00
Yonghong Song	8fb3f84484	BPF: Workaround InstCombine trunc+icmp => mask+icmp Optimization Patch [1] added further InstCombine trunc+icmp => mask+icmp optimization and this caused a couple of bpf selftest failure. Previous llvm BPF backend patch [2] introduced llvm.bpf.compare builtin to handle such situations. This patch further added support ">" and ">=" icmp opcodes. Tested with bpf selftests and all tests are passed including two previously failed ones. Note Patch [1] also added optimization if the to-be-compared constant is negative-power-of-2 (-C) or not-of-power-of-2 (~C). This patch didn't implement these two cases as typical bpf program compares a scalar to a positive length or boundary value, and this scalar later is used as a index into an array buffer or packet buffer. [1] https://reviews.llvm.org/D112634 [2] https://reviews.llvm.org/D112938 Differential Revision: https://reviews.llvm.org/D114215	2021-11-18 20:25:28 -08:00
Serguei Katkov	3557f49353	[AARCH64] Teach AArch64FrameLowering::getFrameIndexReferencePreferSP really prefer SP. Do more efforts to use sp if it is possible to lower a frame index. Reviewers: reames, loicottet, ostannard, t.p.northover Reviewed By: reames Subscribers: arphaman, danilaml, hiraditya, kristof.beyls, llvm-commits, Matt, yrouban Differential Revision: https://reviews.llvm.org/D111133	2021-11-19 11:14:02 +07:00
Senran Zhang	0425ea4621	[NFC][OpaquePtr][Evaluator] Remove call to PointerType::getElementType There are still another 2 uses of PointerType::getElementType in Evaluator when evaluating BitCast's on pointers. BitCast's on pointers should be removed when opaque ptr is ready, so I just keep them as is. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D114131	2021-11-19 10:32:55 +08:00
Philip Reames	734abbad79	[SCEV] Defer all work from `ea12c2cb` as late as possible This is a second speculative compile time optimization to address a reported regression. My actual suspicion is that availability of no-self-wrap is making some other bit of code trigger, but let's rule this out.	2021-11-18 17:19:52 -08:00
Stanislav Mekhanoshin	f8e615462b	[AMDGPU] Fix SIPostRABundler crash on null register used by dbg value Recently we started generate DBG_VALUEs with $noreg operands. This crashes SIPostRABundler, and it should not iterate these registers anyway. Fixes: SWDEV-311733 Differential Revision: https://reviews.llvm.org/D114202	2021-11-18 17:01:19 -08:00
Victor Huang	40c65655af	[PowerPC] Remove the redundant terminator instruction when optimizing conditional trap This patch is a follow up patch for `ae27ca9a67` to the remove redundant terminator when optimizing conditional trap. Peer reviewed by: nemanjai	2021-11-18 17:52:26 -06:00
Ahmed Bougacha	e3a7f0e2f9	[AArch64][PAC] Select llvm.ptrauth.sign/sign.generic to PAC*. The @llvm.ptrauth.sign/sign.generic intrinsics map cleanly to the various AArch64 PAC[IDG][Z][AB] instructions. Select them. Differential Revision: https://reviews.llvm.org/D91087	2021-11-18 15:21:30 -08:00
David Blaikie	3cbc4c487a	llvm-dwarfdump: Rebuild type names in dwo type units	2021-11-18 14:12:48 -08:00
Philip Reames	1a5666acb2	[SCEV] Defer loop property checks from `ea12c2cb` as late as possible This is a speculative compile time optimization to address a reported regression. It's the only thing which vaguely makes sense.	2021-11-18 13:47:45 -08:00
Nikita Popov	46c26991ae	[DSE] Remove getLocForWrite() (NFCI) This implements nearly the same logic as getLocForWriteEx(), and is only used in one place. In that context, we should also know that getLocForWriteEx() returns a non-None result. As such, consolidate everything to use one function.	2021-11-18 21:19:18 +01:00
Nikita Popov	f1295563f1	[DSE] Move removePartiallyOverlappedStores() into DSEState (NFC) So it can use getLocForWriteEx().	2021-11-18 21:19:18 +01:00
Philip Reames	ea12c2cb9c	[SCEV] Move mustprogress based no-self-wrap logic so it applies to all exit conditions This change moves logic which we'd added specifically for less than tests so that it applies to equalities and greater than tests as well. The basic idea is that if we can show an IV cycles infinitely through the same series on self-wrap, and that the exit condition must be taken to prevent UB, we can conclude that it must be taken before self-wrap and thus infer said flag. The motivation here is simple loops with unsigned induction variables w/non-one steps and inequality tests. A toy example would be: for (unsigned i = 0; i != N; i += 2) { body; } If body contains no side effects, and this is a mustprogress function, we can assume that this must be a finite loop and thus that the exit count is N/2. Differential Revision: https://reviews.llvm.org/D103991	2021-11-18 10:07:44 -08:00
Kazu Hirata	7ca14f6044	[llvm] Use range-based for loops (NFC)	2021-11-18 09:09:52 -08:00
Simon Pilgrim	3821d2ab3b	[X86] LowerRotate - pull out repeated is ISD::ROTL check. NFC.	2021-11-18 17:03:19 +00:00
Kerry McLaughlin	ff64b2933a	[LoopVectorize] Check the number of uses of an FAdd before classifying as ordered checkOrderedReductions looks for Phi nodes which can be classified as in-order, meaning they can be vectorised without unsafe math. In order to vectorise the reduction it should also be classified as in-loop by getReductionOpChain, which checks that the reduction has two uses. In this patch, a similar check is added to checkOrderedReductions so that we now return false if there are more than two uses of the FAdd instruction. This fixes PR52515. Reviewed By: fhahn, david-arm Differential Revision: https://reviews.llvm.org/D114002	2021-11-18 16:41:19 +00:00
Arnold Schwaighofer	7d11c5dac2	Coro: Remove coro_end and coro_suspend_retcon in private unprocessed functions We might emit functions that are private and never called. The coro split pass only processes functions that might be called. Remove intrinsics that we can't generate code for. rdar://84619859 Differential Revision: https://reviews.llvm.org/D114021	2021-11-18 07:48:24 -08:00
Mubashar Ahmad	8e47b83ec9	[AArch64][ARM] Enablement of Cortex-A710 Support Phabricator review: https://reviews.llvm.org/D113256	2021-11-18 10:58:05 +00:00
Eric Tang	9fe6b9e802	[TargetLowering][RISCV] Fixed a scalable vector issue when lowering [s\|u]mul.overflow intrinsics Fixed the vector type issue that where we used getVectorNumElements() should be replaced by getVectorElementCount() when lowering these intrinsics. This is similar to D94149 Signed-off-by: Eric Tang <tangxingxin1008@gmail.com> Reviewed By: craig.topper, frasercrmck Differential Revision: https://reviews.llvm.org/D109809	2021-11-18 10:16:08 +00:00
David Sherwood	9cef7c1ca9	[CodeGen][SVE] Add missing isel patterns for vector_reverse We were missing patterns for vector_reverse of unpacked FP vector types, as well as all the supported bfloat vectors. Tests added here: CodeGen/AArch64/named-vector-shuffle-reverse-sve.ll Differential Revision: https://reviews.llvm.org/D114089	2021-11-18 09:59:26 +00:00
Florian Hahn	da9f2ba3b1	[SCEV] Reorder operands checks in collectConditions. The initial two cases require a SCEVConstant as RHS. Pull up the condition to check and swap SCEVConstants from below. Also remove a redundant check & swap if RHS is SCEVUnknown.	2021-11-18 09:36:16 +00:00
Chen Zheng	9bda9a3980	[PowerPC] fix typos in comments, NFC	2021-11-18 08:55:23 +00:00
Phoebe Wang	a9fba2be35	[X86][ABI] Do not return float/double from x87 registers when x87 is disabled This is aligned with GCC's behavior. Also, alias `-mno-fp-ret-in-387` to `-mno-x87`, by which we can fix pr51498. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D112143	2021-11-18 12:31:08 +08:00
Phoebe Wang	de34a940ae	[X86] Add -mskip-rax-setup support to align with GCC AMD64 ABI mandates caller to specify the number of used SSE registers when passing variable arguments. GCC also provides option -mskip-rax-setup to skip the setup of rax when SSE is disabled. This helps to reduce the code size, see pr23258. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D112413	2021-11-18 11:20:32 +08:00
Zarko Todorovski	59c84774d2	[NFC][llvm] Inclusive language: remove uses of sanity in llvm/lib/ExecutionEngine/ Reworded and removed code comments to avoid using `sanity check` and `sanity test`.	2021-11-17 22:17:54 -05:00
Zi Xuan Wu	24d1673c8b	[llvm-tblgen][RISCV] Make llvm-tblgen RISCVCompressInstEmitter to be common infra across different targets Not only RISCV but also other target such as CSKY, there are compressed instructions mixed with normal instructions. To reuse the basic infra to compress/uncompress and predict instruction, we need reconstruct the RISCVCompressInstEmitter and make it more general and suitable for other target. Differential Revision: https://reviews.llvm.org/D113475	2021-11-18 11:14:27 +08:00
Zarko Todorovski	5b8bbbecfa	[NFC][llvm] Inclusive language: reword and remove uses of sanity in llvm/lib/Target Reworded removed code comments that contain `sanity check` and `sanity test`.	2021-11-17 21:59:00 -05:00
Luo, Yuanke	c4dba47196	[X86][AMX] Don't emit tilerelease for old AMX instrisic. We should avoid mixing old AMX instrinsic with new AMX intrinsic. For old AMX intrinsic, user is responsible for invoking tile release. This patch is to check if there is any tile config generated by compiler. If so it emit tilerelease instruction, otherwise it don't emit the instruction. Differential Revision: https://reviews.llvm.org/D114066	2021-11-18 09:28:32 +08:00
Craig Topper	d78fdf111d	[LegalizeTypes] Further limit expansion of CTTZ during type promotion. Don't expand CTTZ if CTPOP or CTLZ is supported on the promoted type. We have special handling for CTTZ expansion to use those ops with a small conversion. The setup for that doesn't generate extra code or large constants so we don't gain anything from expanding early and we make CTTZ_ZERO_UNDEF codegen worse. Follow up from post commit feedback on D112268. We don't seem to have any in tree tests that care about this.	2021-11-17 15:27:29 -08:00
Philip Reames	ad69402f3e	[SCEVAA] Avoid forming malformed pointer diff expressions This solves the same crash as in D104503, but with a different approach. The test case test_non_dom demonstrates a case where scev-aa crashes today. (If exercised either by -eval-aa or -licm.) The basic problem is that SCEV-AA expects to be able to compute a pointer difference between two SCEVs for any two pair of pointers we do an alias query on. For (valid, but out of scope) reasons, we can end up asking whether expressions in different sub-loops can alias each other. This results in a subtraction expression being formed where neither operand dominates the other. The approach this patch takes is to leverage the "defining scope" notion we introduced for flag semantics to detect and disallow the formation of the problematic SCEV. This ends up being relatively straight forward on that new infrastructure. This change does hint that we should probably be verifying a similar property for all SCEVs somewhere, but I'll leave that to a follow on change. Differential Revision: D114112	2021-11-17 12:38:04 -08:00
Simon Pilgrim	5f99f771ec	[X86] splitVector - only extract lower half subvector from splats If we're splitting a source vector that is a splat (with no undefs), just extract (for free) the lower half subvector and use it for both halfs.	2021-11-17 19:38:43 +00:00
Simon Pilgrim	3020608b61	Fix MSVC signed/unsigned mismatch warning. NFC.	2021-11-17 18:59:23 +00:00
Simon Pilgrim	e76032c173	[X86] LowerRotate - improve vXi8 rotate-by-scalar lowering with direct use of (extended) shift-by-scalar helpers. If we're rotating vXi8 by a splatted amount, then unpack to vXi16, perform a SHL by the (extended) scalar, and then pack the results. This is a vector equivalent to the "rotl(x,y) -> (((aext(x) << bw) \| zext(x)) << (y & (bw-1))) >> bw" style expansion we do for scalars in LowerFunnelShift. I think we can usefully use this for other vector types and vector funnel-shifts in the future, depending how we expand beyond D113192 for matching rotations/funnel-shifts for more type/ops.	2021-11-17 18:59:23 +00:00
Stanislav Mekhanoshin	6d3db28088	[InstCombine] Generalize complex OR patterns to AND For every pattern with only NOT, OR, and AND operations there is always a symmetrical attern with AND and OR swapped. This adds 2 transformations: https://reviews.llvm.org/D113526 ``` (~(a & b) \| c) & (~(a & c) \| b) --> ~((b ^ c) & a) (~(a & b) \| c) & ~(a & c) --> ~((b \| c) & a) ``` ``` ---------------------------------------- define i4 @src(i4 %a, i4 %b, i4 %c) { %0: %and1 = and i4 %b, %a %not1 = xor i4 %and1, 15 %and2 = and i4 %a, %c %not2 = xor i4 %and2, 15 %or = or i4 %not2, %b %r = and i4 %or, %not1 ret i4 %r } => define i4 @tgt(i4 %a, i4 %b, i4 %c) { %0: %or = or i4 %b, %c %and = and i4 %or, %a %r = xor i4 %and, 15 ret i4 %r } Transformation seems to be correct! ---------------------------------------- define i4 @src(i4 %a, i4 %b, i4 %c) { %0: %and1 = and i4 %a, %b %not1 = xor i4 %and1, 15 %or1 = or i4 %not1, %c %and2 = and i4 %a, %c %not2 = xor i4 %and2, 15 %or2 = or i4 %not2, %b %and3 = and i4 %or1, %or2 ret i4 %and3 } => define i4 @tgt(i4 %a, i4 %b, i4 %c) { %0: %xor = xor i4 %b, %c %and = and i4 %xor, %a %not = xor i4 %and, 15 ret i4 %not } Transformation seems to be correct! ``` Differential Revision: https://reviews.llvm.org/D113526	2021-11-17 10:47:36 -08:00
Nico Weber	bf834b2629	[x86/asm] Let EmitMSInlineAsmStr() handle variants too This is preparation for D113707, where I want to make `-masm=intel` emit `asm inteldialect` instructions. `{movq %rbx, %rax\|mov rax, rbx}` is supposed to evaluate to the bit between { and \| for att and to the bit between \| and } for intel. Since intel will become `asm inteldialect`, which alls EmitMSInlineAsmStr(), EmitMSInlineAsmStr() has to support variants as well. (clang translates `{...\|...}` to `$(...$\|...$)`. I'm not sure why it doesn't just send along only the first `...` or the second `...` to LLVM, but given the notes in PR23933 let's not do a big reorganization in this codepath.) Differential Revision: https://reviews.llvm.org/D113932	2021-11-17 13:31:59 -05:00
Craig Topper	0274be28d7	[RISCV] Lower vector CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF by converting to FP and extracting the exponent. If we have a large enough floating point type that can exactly represent the integer value, we can convert the value to FP and use the exponent to calculate the leading/trailing zeros. The exponent will contain log2 of the value plus the exponent bias. We can then remove the bias and convert from log2 to leading/trailing zeros. This doesn't work for zero since the exponent of zero is zero so we can only do this for CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF. If we need a value for zero we can use a vmseq and a vmerge to handle it. We need to be careful to make sure the floating point type is legal. If it isn't we'll continue using the integer expansion. We could split the vector and concatenate the results but that needs some additional work and evaluation. Differential Revision: https://reviews.llvm.org/D111904	2021-11-17 10:29:41 -08:00
Nico Weber	103cc914d6	[x86/asm] Make variants work when converting at&t inline asm input to intel asm output `asm` always has AT&T-style input (`asm inteldialect` has Intel-style asm input), so EmitGCCInlineAsmStr() always has to pick the same variant since it cares about the input asm string, not the output asm string. For PowerPC, that default variant is 1. For other targets, it's 0. Without this, the included test case errors out with error: unknown use of instruction mnemonic without a size suffix mov rax, rbx since it picks the intel branch and then tries to interpret it as AT&T when selecting intel-style output with `-x86-asm-syntax=intel`. Differential Revision: https://reviews.llvm.org/D113894	2021-11-17 13:23:18 -05:00
DianQK	1e9fa0b12a	Fix the side effect of outlined function when the register is implicit use and implicit-def in the same instruction. This is the diff associated with {D95267}, and we need to mark $x0 as live whether or not $x0 is dead. The compiler also needs to mark register $x0 as live in for the following case. ``` $x1 = ADDXri $sp, 16, 0 BL @spam, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit killed $x1, implicit-def $sp, implicit-def $x0 ``` This change fixes an issue where the wrong registers were used when -machine-outliner-reruns>0. As an example: ``` lang=c typedef struct { double v1; double v2; } D16; typedef struct { D16 v1; D16 v2; } D32; typedef long long LL8; typedef struct { long long v1; long long v2; } LL16; typedef struct { LL16 v1; LL16 v2; } LL32; typedef struct { LL32 v1; LL32 v2; } LL64; LL8 needx0(LL8 v0, LL8 v1); void bar(LL64 v1, LL32 v2, LL16 v3, LL32 v4, LL8 v5, D16 v6, D16 v7, D16 v8); LL8 foo(LL8 v0, LL64 v1, LL32 v2, LL16 v3, LL32 v4, LL8 v5, D16 v6, D16 v7, D16 v8) { LL8 result = needx0(v0, 0); bar(v1, v2, v3, v4, v5, v6, v7, v8); return result + 1; } ``` As you can see from the `foo` function, we should not modify the value of `x0` until we call `needx0`. This code is compiled to give the following instruction MIR code. ``` $sp = frame-setup SUBXri $sp, 256, 0 frame-setup STPDi killed $d13, killed $d12, $sp, 16 frame-setup STPDi killed $d11, killed $d10, $sp, 18 frame-setup STPDi killed $d9, killed $d8, $sp, 20 frame-setup STPXi killed $x26, killed $x25, $sp, 22 frame-setup STPXi killed $x24, killed $x23, $sp, 24 frame-setup STPXi killed $x22, killed $x21, $sp, 26 frame-setup STPXi killed $x20, killed $x19, $sp, 28 ... $x1 = MOVZXi 0, 0 BL @needx0, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit $x1, implicit-def $sp, implicit-def $x0 ... ``` Since there are some other instruction sequences that duplicate `foo`, after the first execution of Machine Outliner you will get: ``` $sp = frame-setup SUBXri $sp, 256, 0 frame-setup STPDi killed $d13, killed $d12, $sp, 16 frame-setup STPDi killed $d11, killed $d10, $sp, 18 frame-setup STPDi killed $d9, killed $d8, $sp, 20 $x7 = ORRXrs $xzr, $lr, 0 BL @OUTLINED_FUNCTION_0, implicit-def $lr, implicit $sp, implicit-def $lr, implicit $sp, implicit $xzr, implicit $x7, implicit $x19, implicit $x20, implicit $x21, implicit $x22, implicit $x23, implicit $x24, implicit $x25, implicit $x26 $lr = ORRXrs $xzr, $x7, 0 ... BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp ... ``` For the first time we outlined the following sequence: ``` frame-setup STPXi killed $x26, killed $x25, $sp, 22 frame-setup STPXi killed $x24, killed $x23, $sp, 24 frame-setup STPXi killed $x22, killed $x21, $sp, 26 frame-setup STPXi killed $x20, killed $x19, $sp, 28 ``` and ``` $x1 = MOVZXi 0, 0 BL @needx0, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit $x1, implicit-def $sp, implicit-def $x0 ``` When we execute the outline again, we will get: ``` $x0 = ORRXrs $xzr, $lr, 0 <---- here BL @OUTLINED_FUNCTION_2_0, implicit-def $lr, implicit $sp, implicit-def $sp, implicit-def $lr, implicit $sp, implicit $xzr, implicit $d8, implicit $d9, implicit $d10, implicit $d11, implicit $d12, implicit $d13, implicit $x0 $lr = ORRXrs $xzr, $x0, 0 $x7 = ORRXrs $xzr, $lr, 0 BL @OUTLINED_FUNCTION_0, implicit-def $lr, implicit $sp, implicit-def $lr, implicit $sp, implicit $xzr, implicit $x7, implicit $x19, implicit $x20, implicit $x21, implicit $x22, implicit $x23, implicit $x24, implicit $x25, implicit $x26 $lr = ORRXrs $xzr, $x7, 0 ... BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp ``` When calling `OUTLINED_FUNCTION_2_0`, we used `x0` to save the `lr` register. The reason for the above error appears to be that: ``` BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp ``` should be: ``` BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp, implicit $x0 ``` When processing the same instruction with both `implicit-def $x0` and `implicit $x0` we should keep `implicit $x0`. A reproducible demo is available at: [https://github.com/DianQK/reproduce_outlined_function_use_live_x0](https://github.com/DianQK/reproduce_outlined_function_use_live_x0). Reviewed By: jinlin Differential Revision: https://reviews.llvm.org/D112911	2021-11-17 09:44:10 -08:00
Arthur Eubanks	e3e25b5112	[NewPM] Add option to prevent rerunning function pipeline on functions in CGSCC adaptor In a CGSCC pass manager, we may visit the same function multiple times due to SCC mutations. In the inliner pipeline, this results in running the function simplification pipeline on a function multiple times even if it hasn't been changed since the last function simplification pipeline run. We use a newly introduced analysis to keep track of whether or not a function has changed since the last time the function simplification pipeline has run on it. If we see this analysis available for a function in a CGSCCToFunctionPassAdaptor, we skip running the function passes on the function. The analysis is queried at the end of the function passes so that it's available after the first time the function simplification pipeline runs on a function. This is a per-adaptor option so it doesn't apply to every adaptor. The goal of this is to improve compile times. However, currently we can't turn this on by default at least for the higher optimization levels since the function simplification pipeline is not robust enough to be idempotent in many cases, resulting in performance regressions if we stop running the function simplification pipeline on a function multiple times. We may be able to turn this on for -O1 in the near future, but turning this on for higher optimization levels would require more investment in the function simplification pipeline. Heavily inspired by D98103. Example compile time improvements with flag turned on: https://llvm-compile-time-tracker.com/compare.php?from=998dc4a5d3491d2ae8cbe742d2e13bc1b0cacc5f&to=5c27c913687d3d5559ef3ab42b5a3d513531d61c&stat=instructions Reviewed By: asbirlea, nikic Differential Revision: https://reviews.llvm.org/D113947	2021-11-17 09:06:46 -08:00
Dmitry Vyukov	a7c57c4ec8	tsan: don't consider debug calls as calls Tsan pass does 2 optimizations based on presence of calls: 1. Don't emit function entry/exit callbacks if there are no calls and no memory accesses. 2. Combine read/write of the same variable if there are no intervening calls. However, all debug info is represented as CallInst as well and thus effectively disables these optimizations. Don't consider debug info calls as calls. Reviewed By: glider, melver Differential Revision: https://reviews.llvm.org/D114079	2021-11-17 14:42:16 +01:00
Mirko Brkusanin	db6bc2ab51	[AMDGPU][GlobalISel] Fold G_FNEG above when users cannot fold mods If possible fold fneg into instruction above if users cannot fold mods and we know it will decrease instruction count. Follows same logic as SDAG combiner in choosing opportunities to combine. Differential Revision: https://reviews.llvm.org/D112827	2021-11-17 14:25:13 +01:00
David Sherwood	8d77555b12	[Analysis] Ensure getTypeLegalizationCost returns a simple VT for TypeScalarizeScalableVector When getTypeConversion returns TypeScalarizeScalableVector we were sometimes returning a non-simple type from getTypeLegalizationCost. However, many callers depend upon this being a simple type and will crash if not. This patch changes getTypeLegalizationCost to ensure that we always a return sensible simple VT. If the vector type contains unusual integer types, e.g. <vscale x 2 x i3>, then we just set the type to MVT::i64 as a reasonable default. A test has been added here that demonstrates the vectoriser can correctly calculate the cost of vectorising a "zext i3 to i64" instruction with a VF=vscale x 1: Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll Differential Revision: https://reviews.llvm.org/D113777	2021-11-17 13:11:58 +00:00
Simon Pilgrim	5fedbd5b18	[DAG] SimplifyDemandedVectorElts - zero_extend_vector_inreg(and(x,c)) -> and(x,c') If we've only demanded the 0'th element, and it comes from a (one-use) AND, try to convert the zero_extend_vector_inreg into a mask and constant fold it with the AND.	2021-11-17 12:41:48 +00:00
David Sherwood	670dd40244	[Analysis] Fix getNumberOfParts to return 0 when the answer is unknown When asking how many parts are required for a scalable vector type there are occasions when it cannot be computed. For example, <vscale x 1 x i3> is one such vector for AArch64+SVE because at the moment no matter how we promote the i3 type we never end up with a legal vector. This means that getTypeConversion returns TypeScalarizeScalableVector as the LegalizeKind, and then getTypeLegalizationCost returns an invalid cost. This then causes BasicTTImpl::getNumberOfParts to dereference an invalid cost, which triggers an assert. This patch changes getNumberOfParts to return 0 for such cases, since the definition of getNumberOfParts in TargetTransformInfo.h states that we can use a return value of 0 to represent an unknown answer. Currently, LoopVectorize.cpp is the only place where we need to check for 0 as a return value, because all other instances will not currently ask for the number of parts for <vscale x 1 x iX> types. In addition, I have changed the target-independent interface for getNumberOfParts to return 1 and assume there is a single register that can fit the type. The loop vectoriser has lots of tests that are target-independent and they relied upon the 0 value to mean the answer is known and that we are not scalarising the vector. I have added tests here that show we correctly return an invalid cost for VF=vscale x 1 when the loop contains unusual types such as i7: Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll Differential Revision: https://reviews.llvm.org/D113772	2021-11-17 12:07:09 +00:00
Florian Hahn	e8b55cf7b7	[SCEV] Apply loop guards when computing max BTC for arbitrary steps. Similar other cases in the current function (e.g. when the step is 1 or -1), applying loop guards can lead to tighter upper bounds for the backedge-taken counts. Fixes PR52464. Reviewed By: reames, nikic Differential Revision: https://reviews.llvm.org/D113578	2021-11-17 11:00:49 +00:00
David Sherwood	ca18fcc2c0	[IR] Change CreateStepVector to work with element types smaller than i8 Currently the stepvector intrinsic only supports element types that are integers of size 8 bits or more. This patch adds support for the creation of stepvectors with smaller element types by creating the intrinsic with i8 elements that we then truncate to the requested size. It's not currently possible to write a vectoriser test to exercise this code path so I have added a unit test here: llvm/unittests/IR/IRBuilderTest.cpp Differential Revision: https://reviews.llvm.org/D113767	2021-11-17 10:47:50 +00:00
Jay Foad	3264e95938	[CodeGen] Update LiveIntervals in TargetInstrInfo::convertToThreeAddress Delegate updating of LiveIntervals to each target's convertToThreeAddress implementation, instead of repairing LiveIntervals after the fact in TwoAddressInstruction::convertInstTo3Addr. Differential Revision: https://reviews.llvm.org/D113493	2021-11-17 10:16:47 +00:00
Stanislav Mekhanoshin	c74f2e5b27	[InstCombine] Use SpecificBinaryOp_match in two more places Differential Revision: https://reviews.llvm.org/D114038	2021-11-17 01:16:06 -08:00
Roman Lebedev	2037ec725f	[X86][Costmodel] `ext v64i1 to v32i16` can appear after legalization, cost is same as for `ext v32i1 to v32i16` Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D113914	2021-11-17 12:02:50 +03:00
Roman Lebedev	23b194bf18	[X86][Costmodel] `trunc v32i16 to v64i1` can appear after legalization, cost is same as for `trunc v32i16 to v32i1` Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D113913	2021-11-17 12:02:50 +03:00
Eric Tang	f7eb061a5f	[SelectionDAG] Make WidenVecRes_SELECT work for scalable vectors This change make WidenVecRes_SELECT work for scalable vectors. This patch is split from [D110319](https://reviews.llvm.org/D110319) Signed-off-by: Eric Tang <tangxingxin1008@gmail.com> Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D110388	2021-11-17 08:55:11 +00:00
Aaron Puchert	b20da5117f	Don't add irrelevant items to queue in DwarfCompileUnit::createScopeChildrenDIE (NFC) Instead of popping them and then immediately throwing them away, we can just filter out globals and items in different scopes before adding them to WorkList. Shouldn't change anything but keep the queue smaller. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D113864	2021-11-17 00:01:20 +01:00
Aaron Puchert	86b3100cde	[DebugInfo] Use DbgEntityKind in DbgEntity interface (NFC) It was being used occasionally already, and using it on the constructor and getDbgEntityID has obvious type safety benefits. Also use llvm_unreachable in the switch as usual, but since only these two values are used in constructor calls I think it's still NFC. Reviewed By: probinson Differential Revision: https://reviews.llvm.org/D113862	2021-11-17 00:01:20 +01:00
Benjamin Kramer	8b8e8704ce	[PowerPC] Fix a nullptr dereference LiMI1/LiMI2 can be null, so don't call a method on them before checking. Found by ubsan.	2021-11-16 23:52:42 +01:00
Philip Reames	8d85e945b2	[SCEV] Canonicalize X - urem X, Y patterns There are multiple possible ways to represent the X - urem X, Y pattern. SCEV was not canonicalizing, and thus, depending on which you analyzed, you could get different results. The sub representation appears to produce strictly inferior results in practice, so I decided to canonicalize to the Y * X/Y version. The motivation here is that runtime unroll produces the sub X - (and X, Y-1) pattern when Y is a power of two. SCEV is thus unable to recognize that an unrolled loop exits because we don't figure out that the new unrolled step evenly divides the trip count of the unrolled loop. After instcombine runs, we convert the the andn form which SCEV recognizes, so essentially, this is just fixing a nasty pass ordering dependency. The ARM loop hardware interaction in the test diff is opague to me, but the comments in the review from others knowledge of the infrastructure appear to indicate these are improvements in loop recognition, not regressions. Differential Revision: https://reviews.llvm.org/D114018	2021-11-16 11:59:21 -08:00
Arthur Eubanks	c95a9f46c9	[Loads] Handle addrspacecast constant expressions when determining dereferenceability Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D114015	2021-11-16 11:17:57 -08:00
Victor Huang	ae27ca9a67	[PowerPC] PPC backend optimization on conditional trap intrustions This patch adds PPC back end optimization to analyze the arguments of a conditional trap instruction to execute one of the following: 1. Delete it if never trap 2. Replace it if always trap 3. Otherwise keep it Reviewed By: nemanjai, amyk, PowerPC Differential revision: https://reviews.llvm.org/D111434	2021-11-16 13:11:57 -06:00

... 2 3 4 5 6 ...

152882 Commits