llvm-project

Commit Graph

Author	SHA1	Message	Date
Philip Reames	7f3861cfdb	autogen tests for ease of update	2021-10-14 13:04:22 -07:00
Kevin P. Neal	727a891ec8	[FPEnv][InstSimplify] Fold fadd X, 0 ==> X, when we know X is not -0 Currently the fadd optimizations in InstSimplify don't know how to do this NoSignedZeros "X + 0.0 ==> X" fold when using the constrained intrinsics. This adds the support. This review is derived from D106362 with some improvements from D107285 and is a follow-on to D111085. Differential Revision: https://reviews.llvm.org/D111450	2021-10-14 12:32:45 -04:00
Florian Hahn	094faa5fca	[VectorCombine] Add test showing issue when running VectorCombine early. Running -vector-combine early can introduce new vector operations, blocking loop/SLP vectorization. The added test case could be better optimized by the SLPVectorizer if no new vector operations are added early.	2021-10-14 14:03:02 +01:00
Andrew Savonichev	dc8a41de34	[ARM] Simplify address calculation for NEON load/store The patch attempts to optimize a sequence of SIMD loads from the same base pointer: %0 = gep float, float base, i32 4 %1 = bitcast float* %0 to <4 x float>* %2 = load <4 x float>, <4 x float>* %1 ... %n1 = gep float, float base, i32 N %n2 = bitcast float* %n1 to <4 x float>* %n3 = load <4 x float>, <4 x float>* %n2 For AArch64 the compiler generates a sequence of LDR Qt, [Xn, #16]. However, 32-bit NEON VLD1/VST1 lack the [Wn, #imm] addressing mode, so the address is computed before every ld/st instruction: add r2, r0, #32 add r0, r0, #16 vld1.32 {d18, d19}, [r2] vld1.32 {d22, d23}, [r0] This can be improved by computing address for the first load, and then using a post-indexed form of VLD1/VST1 to load the rest: add r0, r0, #16 vld1.32 {d18, d19}, [r0]! vld1.32 {d22, d23}, [r0] In order to do that, the patch adds more patterns to DAGCombine: - (load (add ptr inc1)) and (add ptr inc2) are now folded if inc1 and inc2 are constants. - (or ptr inc) is now recognized as a pointer increment if ptr is sufficiently aligned. In addition to that, we now search for all possible base updates and then pick the best one. Differential Revision: https://reviews.llvm.org/D108988	2021-10-14 15:23:10 +03:00
Shoaib Meenai	6404f4b5af	[InstCombine] Remove attributes after hoisting free above null check If the parameter had been annotated as nonnull because of the null check, we want to remove the attribute, since it may no longer apply and could result in miscompiles if left. Similarly, we also want to remove undef-implying attributes, since they may not apply anymore either. Fixes PR52110. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D111515	2021-10-13 15:34:56 -07:00
Mircea Trofin	6c76d01011	[mlgo][aot] requrie the model is autogenerated for test determinism The tests that exercise the 'release' mode, where the model is AOT-ed, check the output has certain properties, to validate that, indeed, a different policy from the default one was exercised. For determinism, we can't reliably check that output for an arbitrary learned policy, since it could be that policy happens to mimic the default one in that particular case. This patch adds a requirement that those tests run only when the model is autogenerated (e.g. on build bots). Differential Revision: https://reviews.llvm.org/D111747	2021-10-13 14:02:41 -07:00
Philip Reames	47d10b25f8	[instcombine] PRE freeze to only potentially posion/undef operand of phi This extends the foldOpIntoPhi code used when visiting a freeze user of a phi to allow any non-undef/poison operand as opposed to only non-undef/poison constants. This lets us hoist a freeze in the increment of an IV into the preheader in many cases. Differential Revision: https://reviews.llvm.org/D111744	2021-10-13 13:55:54 -07:00
Roman Lebedev	a8a64eaafc	[NFC][X86][LV] Autogenerate checklines in cost-model.ll to simplify further updates	2021-10-13 22:47:43 +03:00
Roman Lebedev	18eef13dad	[X86][Costmodel] Fix `X86TTIImpl::getGSScalarCost()` `X86TTIImpl::getGSScalarCost()` has (at least) two issues: * it naively computes the cost of sequence of `insertelement`/`extractelement`. If we are operating not on the XMM (but YMM/ZMM), this widely overestimates the cost of subvector insertions/extractions. * Gather/scatter takes a vector of pointers, and scalarization results in us performing scalar memory operation for each of these pointers, but we never account for the cost of extracting these pointers out of the vector of pointers. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111222	2021-10-13 22:35:39 +03:00
Sjoerd Meijer	67a58fa3a6	[FuncSpec] Don't run the solver if there's nothing to do Even if there are no interesting functions, the SCCP solver would still run before bailing. Now bail earlier, avoid running the solver for nothing. Differential Revision: https://reviews.llvm.org/D111645	2021-10-13 19:05:19 +01:00
Arthur Eubanks	3628bb7436	Make various assume bundle data structures use uint64_t Following D110451, we need to make sure to support 64 bit values.	2021-10-13 10:38:41 -07:00
Philip Reames	24c9016574	[instcombine] propagate single use freeze(gep inbounds X) This is a follow on for D111675 which implements the gep case. I'd originally left it out because I was hoping to actually implement the inrange todo, but after a bit of staring at the code, decided to leave it as is since it doesn't effect this use case (i.e. instcombine requires the op to freeze to be an instruction). Differential Revision: https://reviews.llvm.org/D111691	2021-10-13 09:25:00 -07:00
Sanjay Patel	905d170803	[InstCombine] allow matching vector splat constants in foldLogOpOfMaskedICmps() This is NFC-intended for scalar code. There are still unnecessary m_ConstantInt restrictions in surrounding code, so this is not a complete fix. This prevents regressions seen with a planned follow-on to D111410.	2021-10-13 10:15:26 -04:00
Sanjay Patel	d67022fba9	[InstCombine] add vector splat tests for foldLogOpOfMaskedICmps(); NFC There's a substantial pile of scalar tests for transforms that depend on this code, but zero vector coverage. This patch adds a vector test next to the first scalar test in each file that is affected by foldLogOpOfMaskedICmps. The code that handles these transforms is artificially limited from working with vector splat constants.	2021-10-13 09:53:59 -04:00
Dávid Bolvanský	93fd30a163	[NFC] Added test for PR50339	2021-10-13 12:15:57 +02:00
Dávid Bolvanský	005b715b54	[NFC] Added test for PR49927	2021-10-13 12:15:57 +02:00
Philip Reames	84fae3bce8	[tests] Add coverage for follow ons to D111675	2021-10-12 20:37:30 -07:00
Philip Reames	4c5702cb12	Fix bug introduced with `6f34839` (poison flags on floating point ops) The newly introduced API for checking whether poison comes solely from flags which can be dropped was out of sync. This was noticed by a reviewer post commit. For the moment, disable the floating point flags. In a follow up change, I plan to add support in dropPoisonGeneratingFlags, but that deserves to be a change of it's own.	2021-10-12 20:25:00 -07:00
Philip Reames	6f34839407	[instcombine] propagate freeze through single use poison producing flag instruction If we have an instruction which produces poison only when flags are specified on the instruction, then we know that freezing the operands and dropping flags is equivalent to freezing the result. If we know those flags don't result in any undefined behavior being executed, then there's no point in preserving the flags as we gain no knowledge by having them. This patch extends the existing propagation logic which sinks freeze to single potential non-poison operands to allow dropping of flags when we know the freeze is the sole use of the instruction with poison flags. The main value is that we tend to sink freezes towards the phi in IV cycles where the incoming value to the phi is the freeze of an IV increment. This will in turn (in a future patch), let us fold the freeze through the phi into the loop preheader. Motivated by eliminating need for CanonicalizeFreezeInLoops for the clearly profitable cases from onephi.ll test case in the test directory. Differential Revision: https://reviews.llvm.org/D111675	2021-10-12 13:52:41 -07:00
Philip Reames	c24b2ad0e2	Add extra tests for D111675	2021-10-12 13:37:13 -07:00
Philip Reames	357b8d7ddb	[tests] Add coverage for cases we can drop flags to propagate freeze without cost	2021-10-12 12:30:46 -07:00
Ayal Zaks	15692fd6b5	[LV] Fix 2nd crash for reverse interleaved groups under mask/fold-tail. This patch fixes another crash revealed by PR51614: when deciding to vectorize with masked interleave groups, check if the access is reverse (which is currently not supported). Differential Revision: https://reviews.llvm.org/D108900	2021-10-12 21:44:42 +03:00
Kevin P. Neal	bdf6ba2d30	[FPEnv][InstSimplify] Precommit tests: Enable more folds for constrained fsub Precommit tests for D107285 as requested. TODO notes left at individual functions also as requested.	2021-10-12 13:47:28 -04:00
Mircea Trofin	ea4a6c8426	[Inline] Make sure the InlineAdvisor is correctly cleared. If another inlining session came after a ModuleInlinerWrapperPass, the advisor alanysis would still be cached, but its Result would be cleared. We need to clear both. This addresses PR52118 Differential Revision: https://reviews.llvm.org/D111586	2021-10-12 10:42:41 -07:00
Sanjay Patel	7a2949647a	[InstCombine] propagate no-wrap flag through select-of-mul fold This may not be obvious, but Alive2 agrees: https://alive2.llvm.org/ce/z/Ld9qNT If the mul has "nsw", then -1 * INT_MIN is poison, so the negate can also have "nsw" because 0 - INT_MIN is poison. If the mul has "nuw", then that means the "OtherOp" can only be 0 or 1 (anything else multiplied by 0xfff... would wrap). So the replacement negate must be "nsw" because it is either "0-0" or "0-1". This is another regression noticed with a planned follow-up to D111410.	2021-10-12 12:57:20 -04:00
Sanjay Patel	fae7d6886e	[InstCombine] add tests with nsw/nuw for mul-of-select; NFC	2021-10-12 12:57:20 -04:00
Hongtao Yu	098a0d8fbc	[CSSPGO] Unblock optimizations with pseudo probe instrumentation part 3. This patch continues unblocking optimizations that are blocked by pseudo probe instrumentation. Not exactly like DbgIntrinsics, PseudoProbe intrinsic has other attributes (such as mayread, maywrite, mayhaveSideEffect) that can block optimizations. The issues fixed are: - Flipped default param of getFirstNonPHIOrDbg API to skip pseudo probes - Unblocked CSE by avoiding pseudo probe from clobbering memory SSA - Unblocked induction variable simpliciation - Allow empty loop deletion by treating probe intrinsic isDroppable - Some refactoring. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D110847	2021-10-12 09:44:12 -07:00
Kerry McLaughlin	1439ef1a3f	[LoopVectorize] Classify pointer induction updates as scalar only if they have one use collectLoopScalars collects pointer induction updates in ScalarPtrs, assuming that the instruction will be scalar after vectorization. This may crash later in VPReplicateRecipe::execute() if there there is another user of the instruction other than the Phi node which needs to be widened. This changes collectLoopScalars so that if there are any other users of Update other than a Phi node, it is not added to ScalarPtrs. Reviewed By: david-arm, fhahn Differential Revision: https://reviews.llvm.org/D111294	2021-10-12 13:24:49 +01:00
Sjoerd Meijer	fc0fa85171	[FuncSpec] Allow ConstExprs that are function pointers This is a follow up of D110529 that disallowed constexprs. That change introduced a regression as this also disallowed constexprs that are function pointers, which is actually one of the motivating use cases that we do want to support. Differential Revision: https://reviews.llvm.org/D111567	2021-10-12 11:44:26 +01:00
Florian Hahn	cd0ba9dc58	[LoopPeel] Peel if it turns invariant loads dereferenceable. This patch adds a new cost heuristic that allows peeling a single iteration off read-only loops, if the loop contains a load that 1. is feeding an exit condition, 2. dominates the latch, 3. is not already known to be dereferenceable, 4. and has a loop invariant address. If all non-latch exits are terminated with unreachable, such loads in the loop are guaranteed to be dereferenceable after peeling, enabling hoisting/CSE'ing them. This enables vectorization of loops with certain runtime-checks, like multiple calls to `std::vector::at` if the vector is passed as pointer. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D108114	2021-10-12 11:42:28 +01:00
Sanjay Patel	59441c7329	[InstCombine] fold signbit check of X \| (X -1) There may be some other patterns like this or a generalization, but this is an example that I noticed would definitely regress with a planned follow-up to D111410. https://alive2.llvm.org/ce/z/GVpQDb	2021-10-11 16:14:13 -04:00
Sanjay Patel	518ec39de7	[InstCombine] add signbit check for or'd operands; NFC	2021-10-11 16:14:13 -04:00
Arthur Eubanks	fbddf22ef7	[SCCP] Properly report changes when changing a pointer argument Fixes one of the issues in PR51946. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D111277	2021-10-11 13:12:08 -07:00
Florian Hahn	ab33427c86	[VPlan] Print live-in backedge taken count as part of plan. At the moment, a VPValue is created for the backedge-taken count, which is used by some recipes. To make it easier to identify the operands of recipes using the backedge-taken count, print it at the beginning of the VPlan if it is used. Reviewed By: a.elovikov Differential Revision: https://reviews.llvm.org/D111298	2021-10-11 20:13:01 +01:00
hyeongyu kim	0aeb37324d	[SimpleLoopUnswitch] Re-fix introduction of UB when hoisted condition may be undef or poison https://bugs.llvm.org/show_bug.cgi?id=27506 https://bugs.llvm.org/show_bug.cgi?id=31652 https://bugs.llvm.org/show_bug.cgi?id=51043 Problems with SimpleLoopUnswitch cause the bug reports above. ``` while (...) { if (C) { A } else { B } } Into: C' = freeze(C) if (C') { while (...) { A } } else { while (...) { B } } ``` This problem can be solved by adding a freeze on hoisted branches(above transform) and has been solved by D29015. However, D29015 is now reverted by performance regression(`2b5a897651`) It is not the first time that an added freeze has caused performance regression. SimplifyCFG also had a problem with UB caused by branching-on-undef, which was solved by adding freeze to the branching condition. (D104569) Performance regression occurred in D104569, and patches such as D105344 and D105392 were written to minimize it. This patch will correct the SimpleLoopUnswitch as D104569 handles the SimplyCFG while minimizing performance loss by introducing patches like D105344 and D105392(This patch was rebased with the author's permission) Reviewed By: reames Differential Revision: https://reviews.llvm.org/D106041	2021-10-12 01:02:09 +09:00
David Sherwood	26b7d9d622	[LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns This patch adds further support for vectorisation of loops that involve selecting an integer value based on a previous comparison. Consider the following C++ loop: int r = a; for (int i = 0; i < n; i++) { if (src[i] > 3) { r = b; } src[i] += 2; } We should be able to vectorise this loop because all we are doing is selecting between two states - 'a' and 'b' - both of which are loop invariant. This just involves building a vector of values that contain either 'a' or 'b', where the final reduced value will be 'b' if any lane contains 'b'. The IR generated by clang typically looks like this: %phi = phi i32 [ %a, %entry ], [ %phi.update, %for.body ] ... %pred = icmp ugt i32 %val, i32 3 %phi.update = select i1 %pred, i32 %b, i32 %phi We already detect min/max patterns, which also involve a select + cmp. However, with the min/max patterns we are selecting loaded values (and hence loop variant) in the loop. In addition we only support certain cmp predicates. This patch adds a new pattern matching function (isSelectCmpPattern) and new RecurKind enums - SelectICmp & SelectFCmp. We only support selecting values that are integer and loop invariant, however we can support any kind of compare - integer or float. Tests have been added here: Transforms/LoopVectorize/AArch64/sve-select-cmp.ll Transforms/LoopVectorize/select-cmp-predicated.ll Transforms/LoopVectorize/select-cmp.ll Differential Revision: https://reviews.llvm.org/D108136	2021-10-11 09:41:38 +01:00
Clement Courbet	6aaf1e7ea9	[LoopIdiom] Fix store size SCEV type. We were using the type of the loop back edge count to represent the store size. This failed for small loop counts (e.g. in the added test, the loop count was an i2). Use the index type instead. Fixes PR52104. Differential Revision: https://reviews.llvm.org/D111401	2021-10-11 09:39:06 +02:00
Dawid Jurczak	9e65929a8e	[DSE] Re-enable calloc transformation with extra care (PR25892) Transformation from malloc+memset to calloc is always correct and in many situations it brings significant observable benefits in terms of execution speed and memory consumption [1][2]. Unfortunately there are cases when producing calloc cause performance drops [3]. As discussed here: https://reviews.llvm.org/D103009 it's possible to differentiate between those 2 scenarios. If optimizer is able to prove that after malloc call it's _very_ likely to reach memset branch then after calloc emission we shouldn't observe any performance hits. Therefore finding "null pointer check" pattern before memset basic block sounds like good justification for performing transformation. Also that method was already suggested by GCC folks [4]. Main reason for change is that for now to be safe we check for post dominance relation which is way too conservative approach making transformation "almost" disabled in practice. This patch tends to enable transformation again but with extra care. [1] https://stackoverflow.com/questions/2688466/why-mallocmemset-is-slower-than-calloc [2] https://vorpus.org/blog/why-does-calloc-exist/ [3] http://smalldatum.blogspot.com/2017/11/a-new-optimization-in-gcc-5x-and-mysql.html [4] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83022 Differential Revision: https://reviews.llvm.org/D110021	2021-10-10 21:47:14 +02:00
Sanjay Patel	cbd8041b0b	[InstCombine] add tests for (X - Y) == 0; NFC	2021-10-10 11:13:46 -04:00
Sanjay Patel	da210f5d34	[InstCombine] canonicalize "(C2 - Y) > C" as (Y + ~C2) < ~C The test diffs show that we have better analysis/folds for 'add' (although we should at least have the simplifications independently, so we don't have the one-use restriction). This is related to solving regressions that would appear in transforms related to D111410, and that is part of a series of enhancements that may eventually helpi solve PR34047. https://alive2.llvm.org/ce/z/3tB9KG define i1 @src(i8 %x, i8 %C, i8 %C2) { %sub = sub nuw i8 %C2, %x %r = icmp slt i8 %sub, %C ret i1 %r } define i1 @tgt(i8 %x, i8 %C, i8 %C2) { %Cnot = xor i8 %C, -1 %C2not = xor i8 %C2, -1 %add = add nuw i8 %x, %C2not %r = icmp sgt i8 %add, %Cnot ret i1 %r }	2021-10-10 11:06:49 -04:00
Sanjay Patel	c00cab878a	[InstCombine] add test for or-of-icmps; NFC	2021-10-10 11:06:49 -04:00
Sanjay Patel	acafde09a3	[InstCombine] enhance icmp with sub folds There were 2 related but over-specified folds for: C1 - X == C One allowed multi-use but was limited to equal constants. The other allowed different constants but disallowed multi-use. This combines the 2 folds into a more general match. The test diffs show the multi-use cases that were falling through the cracks. https://alive2.llvm.org/ce/z/4_hEt2 define i1 @src(i8 %x, i8 %subC, i8 %C) { %s = sub i8 %subC, %x %r = icmp eq i8 %s, %C ret i1 %r } define i1 @tgt(i8 %x, i8 %subC, i8 %C) { %newC = sub i8 %subC, %C %isneg = icmp eq i8 %x, %newC ret i1 %isneg }	2021-10-09 11:39:49 -04:00
Sanjay Patel	cd76fa79b0	[InstCombine] add tests for icmp of negated op; NFC	2021-10-09 11:39:49 -04:00
Sanjay Patel	38e3b30bd6	[InstCombine] add tests for (iN X s>> N-1) \| Y; NFC These are for a sibling fold suggested in D111410. The tests correspond to the 'and' tests added with: `a35673f4cf`	2021-10-09 11:39:49 -04:00
David Green	adec922361	[AArch64] Make -mcpu=generic schedule for an in-order core We would like to start pushing -mcpu=generic towards enabling the set of features that improves performance for some CPUs, without hurting any others. A blend of the performance options hopefully beneficial to all CPUs. The largest part of that is enabling in-order scheduling using the Cortex-A55 schedule model. This is similar to the Arm backend change from `eecb353d0e` which made -mcpu=generic perform in-order scheduling using the cortex-a8 schedule model. The idea is that in-order cpu's require the most help in instruction scheduling, whereas out-of-order cpus can for the most part out-of-order schedule around different codegen. Our benchmarking suggests that hypothesis holds. When running on an in-order core this improved performance by 3.8% geomean on a set of DSP workloads, 2% geomean on some other embedded benchmark and between 1% and 1.8% on a set of singlecore and multicore workloads, all running on a Cortex-A55 cluster. On an out-of-order cpu the results are a lot more noisy but show flat performance or an improvement. On the set of DSP and embedded benchmarks, run on a Cortex-A78 there was a very noisy 1% speed improvement. Using the most detailed results I could find, SPEC2006 runs on a Neoverse N1 show a small increase in instruction count (+0.127%), but a decrease in cycle counts (-0.155%, on average). The instruction count is very low noise, the cycle count is more noisy with a 0.15% decrease not being significant. SPEC2k17 shows a small decrease (-0.2%) in instruction count leading to a -0.296% decrease in cycle count. These results are within noise margins but tend to show a small improvement in general. When specifying an Apple target, clang will set "-target-cpu apple-a7" on the command line, so should not be affected by this change when running from clang. This also doesn't enable more runtime unrolling like -mcpu=cortex-a55 does, only changing the schedule used. A lot of existing tests have updated. This is a summary of the important differences: - Most changes are the same instructions in a different order. - Sometimes this leads to very minor inefficiencies, such as requiring an extra mov to move variables into r0/v0 for the return value of a test function. - misched-fusion.ll was no longer fusing the pairs of instructions it should, as per D110561. I've changed the schedule used in the test for now. - neon-mla-mls.ll now uses "mul; sub" as opposed to "neg; mla" due to the different latencies. This seems fine to me. - Some SVE tests do not always remove movprfx where they did before due to different register allocation giving different destructive forms. - The tests argument-blocks-array-of-struct.ll and arm64-windows-calls.ll produce two LDR where they previously produced an LDP due to store-pair-suppress kicking in. - arm64-ldp.ll and arm64-neon-copy.ll are missing pre/postinc on LPD. - Some tests such as arm64-neon-mul-div.ll and ragreedy-local-interval-cost.ll have more, less or just different spilling. - In aarch64_generated_funcs.ll.generated.expected one part of the function is no longer outlined. Interestingly if I switch this to use any other scheduled even less is outlined. Some of these are expected to happen, such as differences in outlining or register spilling. There will be places where these result in worse codegen, places where they are better, with the SPEC instruction counts suggesting it is not a decrease overall, on average. Differential Revision: https://reviews.llvm.org/D110830	2021-10-09 15:58:31 +01:00
Max Kazantsev	4c0da23663	[LoopDeletion] Support selects when symbolically evaluating 1st iteration Adds support for selects for which we know value on the 1st iteration. Differential Revision: https://reviews.llvm.org/D104111 Reviewed By: nikic	2021-10-09 14:47:44 +07:00
Max Kazantsev	49ca01047f	[Test] Add commit justifying revert of D110922 Test by Arthur Eubanks!	2021-10-09 14:32:46 +07:00
Qiu Chaofan	f45d5e71d3	[APFloat] Set size of PPCDoubleDouble to 128 `566690b0` uses size information in float semantics, but PPCDoubleDouble left them empty. As follow-up, we can consider remove PPCDoubleDoubleLegacy and fill other fields in the future. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D111398	2021-10-09 10:12:10 +08:00
Qiu Chaofan	573531fb1f	Fix typo of colon to semicolon in lit tests	2021-10-09 10:03:50 +08:00
Nick Desaulniers	9697f93587	[InlineCost] model calls to llvm.is.constant* more carefully llvm.is.constant* intrinsics are evaluated to 0 or 1 integral values. A common use case for llvm.is.constant comes from the higher level __builtin_constant_p. A common usage pattern of __builtin_constant_p in the Linux kernel is: void foo (int bar) { if (__builtin_constant_p(bar)) { // lots of code that will fold away to a constant. } else { // a little bit of code, usually a libcall. } } A minor issue in InlineCost calculations is when `bar` is _not_ Constant and still will not be after inlining, we don't discount the true branch and the inline cost of `foo` ends up being the cost of both branches together, rather than just the false branch. This leads to code like the above where inlining will not help prove bar Constant, but it still would be beneficial to inline foo, because the "true" branch is irrelevant from a cost perspective. For example, IPSCCP can sink a passed constant argument to foo: const int x = 42; void bar (void) { foo(x); } This improves our inlining decisions, and fixes a few head scratching cases were the disassembly shows a relatively small `foo` not inlined into a lone caller. We could further improve this modeling by tracking whether the argument to llvm.is.constant* is a parameter of the function, and if inlining would allow that parameter to become Constant. This idea is noted in a FIXME comment. Link: https://github.com/ClangBuiltLinux/linux/issues/1302 Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D111272	2021-10-08 15:27:30 -07:00
Arthur Eubanks	9405217999	Revert "Recommit "[LoopPeel] Peel loops with deoptimizing exits"" This reverts commit `d68b59f3eb`. This is causing crashes, see D110922 for details.	2021-10-08 10:53:23 -07:00
Sanjay Patel	d004112749	[InstCombine] add shift of bool test with extra use; NFC	2021-10-08 11:19:01 -04:00
Sanjay Patel	a35673f4cf	[InstCombine] add tests for (i32 X s>> 31) & Y; NFC Also regenerate some check lines to more accurately show current transforms via name changes.	2021-10-08 11:07:57 -04:00
Max Kazantsev	d68b59f3eb	Recommit "[LoopPeel] Peel loops with deoptimizing exits" Removed obsolete DT verification that should not be there because the strategy of DT updates has changed. Differential Revision: https://reviews.llvm.org/D110922	2021-10-08 17:54:27 +07:00
Florian Hahn	2cc7013b0e	[LoopPeel] Add tests where peeling turns invar accesses dereferenceable. Precommit test cases for D108114.	2021-10-08 10:18:52 +01:00
Max Kazantsev	48a5a2d1af	Revert "[LoopPeel] Peel loops with deoptimizing exits" This reverts commit `8a959625c4`. Reported failures with LLVM_ENABLE_EXPENSIVE_CHECKS, need to investigate.	2021-10-08 16:07:59 +07:00
Jingu Kang	4c98070cce	[LoopBoundSplit] Handle the case in which exiting block is loop header Update the incoming value of phi nodes in header of post-loop correctly. Differential Revision: https://reviews.llvm.org/D110060	2021-10-08 09:13:41 +01:00
Dawid Jurczak	dd5991cc6f	[LoopIdiom] Transform loop containing memcpy to memmove The purpose of patch is to learn Loop Idiom Recognize pass how to recognize simple memmove patterns in similar way like GCC does: https://godbolt.org/z/dKjGvTGff It's follow-up of following change: https://reviews.llvm.org/D104464 Differential Revision: https://reviews.llvm.org/D107075	2021-10-08 09:56:01 +02:00
Craig Topper	f2ad8c9dc6	[RISCV] Remove experimental-b extension that includes all Zb* extensions At this point it looks like a B extension will never exist. Instead Zba, Zbb, Zbc, and Zbs are individual extensions being ratified together as a package. Unknown at this time when or if the other Zb* extensions will be ratified. This patch removes references to the B extension. I've updated and split tests accordingly. This has been split from D110669 to make review a little easier. Differential Revision: https://reviews.llvm.org/D111338	2021-10-07 20:47:17 -07:00
Max Kazantsev	8a959625c4	[LoopPeel] Peel loops with deoptimizing exits Added support for peeling loops with "deoptimizing" exits - such exits that it or any of its children (or any of their children, etc) either has a @llvm.experimental.deoptimize call prior to the terminating return instruction of this basic block or is terminated with unreachable. All blocks in the the sequence must have a single successor, maybe except for the last one. Previously we only checked the exit block for being deoptimizing. Now we check if the last reachable block from the exit is deoptimizing. Patch by Dmitry Makogon! Differential Revision: https://reviews.llvm.org/D110922 Reviewed By: mkazantsev	2021-10-08 10:32:13 +07:00
Roman Lebedev	690da88a95	Workaround broken FileCheck default yet another time	2021-10-08 01:24:56 +03:00
Roman Lebedev	6526fa3589	[NFC][VectorCombine] Add baseline test coverage for GEP scalarization	2021-10-08 01:16:08 +03:00
Sanjay Patel	d95ebef4b8	[InstCombine] ease use check for fold of bitcasted extractelt to trunc This helps with examples like: https://llvm.org/PR52057 ...but we need at least one more fold to fix that case.	2021-10-07 15:09:34 -04:00
Bjorn Pettersson	7f93bb4a58	[LoopRotate] Forget SCEV values in RewriteUsesOfClonedInstructions This patch fixes problems reported in PR51981. When rotating a loop it isn't enough to just forget SCEV for that loop nest. When rotating we might clone some instructions from the old header into the preheader, and insert new PHI nodes to merge values together. There could be users of the original value that are updated to use the PHI result. And those users were not necessarily depending on a PHI node earlier, so they weren't cleaned up when just forgetting all SCEV:s for the loop nest. So we need to explicitly forget those values to avoid invalid cached SCEV expressions. Reviewed By: fhahn, mkazantsev Differential Revision: https://reviews.llvm.org/D110813	2021-10-07 19:36:30 +02:00
Bjorn Pettersson	b0c34e0dab	[test] Pre-commit test case for PR51981. NFC Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D110812	2021-10-07 19:36:29 +02:00
Sanjay Patel	fdbf2bb4ee	[InstSimplify] (x \|\| y) && (x \|\| !y) --> x https://alive2.llvm.org/ce/z/4BE33w This is the logical (select-form) equivalent of the bitwise logic fold: `e36d351d19` This is another part of solving the regression from: https://llvm.org/PR52077	2021-10-07 12:25:25 -04:00
Chris Jackson	a61c0adba1	[DebugInfo][LSR] Limit the size of SCEV translated to DIExpression SCEV-based salvaging will use excessive resources if it encounters very long SCEV expressions. This patch places a limit on the length of SCEV expression that salvaging will attempt to translate. Reviewed by: Orlando Differential Revision: https://reviews.llvm.org/D110558	2021-10-07 15:38:28 +00:00
Erik Desjardins	11c8efd4db	[Inline] Introduce Constant::hasOneLiveUse, use it instead of hasOneUse in inline cost model (PR51667) Otherwise, inlining costs may be pessimized by dead constants. Fixes https://bugs.llvm.org/show_bug.cgi?id=51667. Reviewed By: mtrofin, aeubanks Differential Revision: https://reviews.llvm.org/D109294	2021-10-07 08:33:25 -07:00
Sanjay Patel	5ae6df1fea	[InstSimplify] add tests for (x \|\| y) && (x \|\| !y); NFC	2021-10-07 10:37:34 -04:00
Florian Hahn	09fdfd03ea	[VPlan] Replace hard-coded VPValue ids with patterns in tests. This makes the tests a bit more robust with respect to small changes in the value numbering.	2021-10-07 09:52:01 +01:00
Kuba Mracek	7329abf2f8	[GlobalDCE] In VFE, replace the whole 'sub' expression of unused relative-pointer-based vtable slots Differential Revision: https://reviews.llvm.org/D109114	2021-10-06 15:55:55 -07:00
Arthur Eubanks	05392466f0	Reland [IR] Increase max alignment to 4GB Currently the max alignment representable is 1GB, see D108661. Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945. This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits. We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now. The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field. Updating clang's max allowed alignment will come in a future patch. Reviewed By: hans Differential Revision: https://reviews.llvm.org/D110451	2021-10-06 13:29:23 -07:00
Roman Lebedev	62d67d9e7c	[NFC][X86][LoopVectorize] Autogenerate check lines in a few tests for ease of updating For D111220	2021-10-06 22:54:15 +03:00
Arthur Eubanks	569346f274	Revert "Reland [IR] Increase max alignment to 4GB" This reverts commit `8d64314ffe`.	2021-10-06 11:38:11 -07:00
Philip Reames	0658bab870	[SCEV] Infer flags from add/gep in any block This patch removes a compile time restriction from isSCEVExprNeverPoison. We've strengthened our ability to reason about flags on scopes other than addrecs, and this bailout prevents us from using it. The comment is also suspect as well in that we're in the middle of constructing a SCEV for I. As such, we're going to visit all operands anyways. Differential Revision: https://reviews.llvm.org/D111186	2021-10-06 11:11:54 -07:00
Arthur Eubanks	8d64314ffe	Reland [IR] Increase max alignment to 4GB Currently the max alignment representable is 1GB, see D108661. Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945. This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits. We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now. The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field. Updating clang's max allowed alignment will come in a future patch. Reviewed By: hans Differential Revision: https://reviews.llvm.org/D110451	2021-10-06 11:03:51 -07:00
Kevin P. Neal	f86c930cc9	[FPEnv][InstSimplify] Fold constrained X + -0.0 ==> X Currently the fadd optimizations in InstSimplify don't know how to do this "X + -0.0 ==> X" fold when using the constrained intrinsics. This adds the support. This commit is derived from D106362 with some improvements from D107285. Differential Revision: https://reviews.llvm.org/D111085	2021-10-06 13:52:31 -04:00
Arthur Eubanks	72cf8b6044	Revert "[IR] Increase max alignment to 4GB" This reverts commit `df84c1fe78`. Breaks some bots	2021-10-06 10:21:35 -07:00
Arthur Eubanks	df84c1fe78	[IR] Increase max alignment to 4GB Currently the max alignment representable is 1GB, see D108661. Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945. This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits. We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now. The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field. Updating clang's max allowed alignment will come in a future patch. Reviewed By: hans Differential Revision: https://reviews.llvm.org/D110451	2021-10-06 09:54:14 -07:00
Max Kazantsev	00eec5c1b7	[Test] Add LoopPeel test for loops with profile data available Patch by Dmitry Makogon!	2021-10-06 23:47:52 +07:00
Sanjay Patel	e36d351d19	[InstSimplify] (x \| y) & (x \| !y) --> x https://alive2.llvm.org/ce/z/QagQMn This fold is handled by instcombine via SimplifyUsingDistributiveLaws(), but we are missing the sibliing fold for 'logical and' (implemented with 'select'). Retrofitting the code in instcombine looks much harder than just adding a small adjustment here, and this is potentially more efficient and beneficial to other passes.	2021-10-06 12:31:25 -04:00
Sanjay Patel	4666324f2b	[InstSimplify] add tests for bitwise logic fold of 'and'; NFC	2021-10-06 12:31:25 -04:00
Simon Pilgrim	0dcd2b40e6	[TTI] Remove default condition type and predicate arguments from getCmpSelInstrCost We need to be better at exposing the comparison predicate to getCmpSelInstrCost calls as some targets (e.g. X86 SSE) have very different costs for different comparisons (PR48337), and we can't always rely on the optional Instruction argument. This initial commit requires explicit condition type and predicate arguments. The next step will be to review a lot of the existing getCmpSelInstrCost calls which have used BAD_ICMP_PREDICATE even when the predicate is known. Differential Revision: https://reviews.llvm.org/D111024	2021-10-06 15:40:35 +01:00
David Green	92128b7801	[AArch64] Regenerate even more tests This updates a few more check lines, in some mte tests that were close to auto generated already and some CodeGenPrepare/consthoist tests where being able to see the entire code sequence is useful for determining whether code differences are improvements or not.	2021-10-06 14:32:01 +01:00
Sanjay Patel	db231ebdb0	[InstCombine] fold fake vector extract to shift+trunc We already handle more complicated cases like: extelt (bitcast (inselt poison, X, 0)) --> trunc (lshr X) But we missed this simpler pattern: https://alive2.llvm.org/ce/z/D55h64 / https://alive2.llvm.org/ce/z/GKzzRq This is part of solving: https://llvm.org/PR52057 I made the transform depend on legal/desirable int type to avoid creating a shift of an illegal type (for example i128). I'm not sure if that restriction is actually necessary, but we can change that as a follow-up if the backend can deal with integer ops on too-wide illegal types. The pile of AVX512 test changes are all neutral AFAICT - the x86 backend seems to know how to turn that into the expected "kmov" instructions. Differential Revision: https://reviews.llvm.org/D111082	2021-10-06 08:12:05 -04:00
Max Kazantsev	576ab15b90	[Test] Few more symmetrical test for D110517	2021-10-06 19:02:28 +07:00
Max Kazantsev	78873840ff	[Test] Add some more symmetrical test cases for D110517 More similar cases to see that the opt we are trying to make is generic enough.	2021-10-06 17:38:48 +07:00
Simon Pilgrim	0776924a17	[CostModel][X86] getCmpSelInstrCost - treat BAD_PREDICATEs the same as the worst case cost predicates for ICMP/FCMP instructions As suggested on D111024, we should treat getCmpSelInstrCost calls without a specific predicate as matching the worst case predicate cost. These regressions will be addressed with a mixture of D111024 and fixing other specific getCmpSelInstrCost calls to have realistic predicates.	2021-10-06 10:14:56 +01:00
Philip Reames	e64ed3c8df	[test] autogen a couple of additional tests	2021-10-05 18:58:08 -07:00
Philip Reames	d652724c0b	[test] refresh a couple of autogen tests	2021-10-05 18:41:24 -07:00
Sanjay Patel	bc72baa047	[InstCombine] add folds for logical nand/nor This is noted as a regression in: https://llvm.org/PR52077	2021-10-05 18:31:20 -04:00
Sanjay Patel	a56257e45e	[InstCombine] add tests for logical nand/nor; NFC	2021-10-05 18:31:19 -04:00
Mircea Trofin	7d541eb4d4	[inliner] Mandatory inlining decisions produce remarks This also removes the need to disable the mandatory inlining phase in tests. In a departure from the previous remark, we don't output a 'cost' in this case, because there's no such thing. We just report that inlining happened because of the attribute. Differential Revision: https://reviews.llvm.org/D110891	2021-10-05 14:01:25 -07:00
Alexey Bataev	bebe702dbe	[SLP]Detect reused scalars in all possible gathers for better vectorization cost. Some initially gathered nodes missed the check for the reused scalars, which leads to high gather cost. Such nodes still can be represented as m gathers + shuffle instead of n gathers, where m < n. Differential Revision: https://reviews.llvm.org/D111153	2021-10-05 09:43:03 -07:00
Nikita Popov	c117d77e93	[ConstantFold] Refactor load folding This refactors load folding to happen in two cleanly separated steps: ConstantFoldLoadFromConstPtr() takes a pointer to load from and decomposes it into a constant initializer base and an offset. Then ConstantFoldLoadFromConst() loads from that initializer at the given offset. This makes the core logic independent of having actual GEP expressions (and those GEP expressions having certain structure) and will allow exposing ConstantFoldLoadFromConst() as an independent API in the future. This is mostly only a refactoring, but it does make the folding logic slightly more powerful. Differential Revision: https://reviews.llvm.org/D111023	2021-10-05 18:07:57 +02:00
Matthew Devereau	2ac1999937	[AArch64][SVE] Propagate math flags from intrinsics to instructions Retain floating-point math flags inside instCombineSVEVectorBinOp	2021-10-05 15:39:13 +01:00
Max Kazantsev	471b25e217	[Test] Add test showing profitable peeling opportunity Patch by Dmitry Makogon!	2021-10-05 17:51:45 +07:00
Arthur Eubanks	7f28b4d5b7	[LICM] Bail if checking a global/constant for invariant.start When we check if a load is loop invariant by finding a dominating invariant.start call, we strip bitcasts until we get to an i8* Value, and look for an invariant.start use of the i8* Value. We may accidentally end up at an i8 global and look at a global's uses, which we shouldn't do in a loop pass. Although we could make this logic work with globals, that's not currently intended. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D111098	2021-10-04 14:14:25 -07:00
Arthur Eubanks	bb69f1dcf9	[test] Precommit test about hoisting invariant loads from globals And run update_test_checks.py on hoisting.ll.	2021-10-04 13:46:47 -07:00
Sanjay Patel	bd2c6e52bb	[InstCombine] add tests for extractelt of bitcasted scalar; NFC	2021-10-04 14:30:18 -04:00
Kevin P. Neal	770c57898e	[FPEnv][InstSimplify] Prepush more tests for D106362. In working on D106362 I found that a few more tests were needed. I've been asked to pre-push the tests for that ticket. This should complete the tests needed for now.	2021-10-04 13:48:34 -04:00
Joseph Huber	f074a6a041	[OpenMP] Add options to change Attributor max iterations in OpenMPOpt This patch adds a new command line option `openmp-opt-max-iterations` that controls the maximum number of iterations the attributor will run for when compiling OpenMP target device code. This patch also adds a remark to indicate when the attributor failed because it did not run for enough iterations. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110749	2021-10-04 09:39:04 -04:00
Philip Reames	f39978b84f	[SCEV] Correctly propagate nowrap flags across scopes when folding invariant add through addrec This fixes a violation of the wrap flag rules introduced in `c4048d8f`. This is an alternate fix to D106852. The basic problem being fixed is that we infer a set of flags which is valid at some inner scope S1 (usually by correctly propagating them from IR), and then (incorrectly) extend them to a SCEV in scope S2 where S1 != S2. This is not in general safe per the wrap flags semantics recently defined. In this patch, I include a simple inference step to handle the case where we can prove that S2 is the preheader of the loop S1, and that entry into S2 implies execution of S1. See the code for a more detailed explanation. One worry I have with this patch is that I might be over-fitting what shows up in tests - and thus hiding negative impact we'd see in the real world. My best defense is that the rule used here very closely follows the one used to propagate the flags from IR to the inner add to start with, and thus if one is reasonable, so probably is the other. Curious what others think about that piece. The test diffs are roughly as expected. Mostly analysis only, with two transform changes. Oddly, the result looks better in the loop-idiom test, and I don't understand the PPC output enough to have tell. Nothing terrible looking though. (For context, without the scope inference peephole, the test delta includes a couple of vectorization tests. Again, not super concerning, but slightly more so.) Differential Revision: https://reviews.llvm.org/D109845	2021-10-03 15:19:33 -07:00
Sanjay Patel	f32c0fe8e5	[InstCombine] fold cast of right-shift if high bits are not demanded (3rd try) The first two tries at this were reverted because they caused an infinite loop in instcombine. That should be fixed after a series of patches that ended with removing the faulty opposing transform: `3fabd98e5b` Original commit message: (masked) trunc (lshr X, C) --> (masked) lshr (trunc X), C Narrowing the shift should be better for analysis and can lead to follow-on transforms as shown. Attempt at a general proof in Alive2: https://alive2.llvm.org/ce/z/tRnnSF Here are a couple of the specific tests: https://alive2.llvm.org/ce/z/bCnTp- https://alive2.llvm.org/ce/z/TfaHnb Differential Revision: https://reviews.llvm.org/D110170	2021-10-03 10:37:22 -04:00
Sanjay Patel	88a9c1827e	[InstCombine] add test for shl + demanded bits; NFC This is a reduction of a test that would infinite loop with D110170.	2021-10-03 10:35:59 -04:00
Nikita Popov	3be4acbaa3	[InstSimplify] Add additional load from constant test (NFC) This case does not get folded, because the GEP indexes too deeply (to the i8), making the bitcast logic not apply (on the [8 x i8]).	2021-10-03 15:52:36 +02:00
hyeongyu kim	cf284f6c5e	[LSV] Change the default value of InstertElement to poison This patch is changing the InsertElement's placeholder to poison without changing the LSV's behavior. Regardless of whether `StoreTy` is FixedVectorType or not, the poison value will be overwritten with a different value. Therefore, whether the InsertElement's placeholder is poison or undef will not affect the result of the program. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D111005	2021-10-03 17:57:34 +09:00
Philip Reames	2ca8a3f213	[SCEV] Stop blindly propagating flags from inbound geps to SCEV nodes This fixes a violation of the wrap flag rules introduced in `c4048d8f`. This was also noted in the (very old) PR23527. The issue being fixed is that we assume the inbound flag on any GEP assumes that all users of any gep (or add) which happens to map to that SCEV would also be UB if the (other) gep overflowed. That's simply not true. In terms of the test diffs, I don't see anything seriously problematic. The lost flags are expected (given the semantic restriction on when its legal to tag the SCEV), and there are several cases where the previously inferred flags are unsound per the new semantics. The only common trend I noticed when looking at the deltas is that by not considering branch on poison as immediate UB in ValueTracking, we do miss a few cases we could reclaim. We may be able to claw some of these back with the follow ideas mentioned in PR51817. It's worth noting that most of the changes are analysis result only changes. The two transform changes are pretty minimal. In one case, we miss the opportunity to infer a nuw (correctly). In the other, we fail to fold an exit and produce a loop invariant form instead. This one is probably over-reduced as the program appears to be undefined in practice, and neither before or after exploits that. Differential Revision: https://reviews.llvm.org/D109789	2021-10-01 16:30:44 -07:00
Daniil Suchkov	45bd8d9477	[SimpleLoopUnswitch] Don't unswitch constant conditions Added an additional check for constants after simplification of "select _, true, false" pattern. We need to prevent attempts to unswitch constant conditions for two reasons: a) Doing that doesn't make any sense, in the best case it will just burn some compile time. b) SimpleLoopUnswitch isn't designed to unswitch constant conditions (due to (a)), so attempting that can cause miscompiles. The attached testcase is an example of such miscompile. Also added an assertion that'll make sure we aren't trying to replace constants, so it will help us prevent such bugs in future. The assertion from D110751 is another layer of protection against such cases. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D110752	2021-10-01 21:30:54 +00:00
Daniil Suchkov	bdd52e8bae	[Test] Add a test exposing a miscompile in SimpleLoopUnswitch. The miscompile was introduced by `6b4b1dc6ec`.	2021-10-01 21:30:54 +00:00
Sanjay Patel	3fabd98e5b	[InstCombine] fold (trunc (X>>C1)) << C to shift+mask directly This is no-externally-visible-functional-difference-intended. That is, the test diffs show identical instructions other than name changes (those are included specifically to verify the logic). The existing transforms created extra instructions and relied on subsequent folds to get to the final result, but that could conflict with other transforms like the proposed D110170 (and caused that patch to be reverted twice so far because of infinite combine loops).	2021-10-01 14:22:44 -04:00
Sanjay Patel	baac82b4cf	[InstCombine] add tests for icmp of gep; NFC	2021-10-01 10:53:23 -04:00
Roman Lebedev	3a0643e9c2	[X86][Costmodel] Load/store i32/f32 Stride=2 VF=8 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/n8aMKeo4E - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `4`. For store we have: https://godbolt.org/z/n8aMKeo4E - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: =2.0` So pick cost of `4`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110755	2021-10-01 17:48:13 +03:00
Roman Lebedev	b12aeaec9a	[X86][Costmodel] Load/store i32/f32 Stride=2 VF=4 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/EM5Ean7bd - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: =1.0` So pick cost of `2`. For store we have: https://godbolt.org/z/EM5Ean7bd - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `2`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110754	2021-10-01 17:48:13 +03:00
Roman Lebedev	f44d9009c2	[X86][Costmodel] Load/store i32/f32 Stride=2 VF=2 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/4rY96hnGT - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: =1.0` So pick cost of `2`. For store we have: https://godbolt.org/z/vbo37Y3r9 - for intels `Block RThroughput: =1.0`; for ryzens, `Block RThroughput: =0.5` So pick cost of `1`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110753	2021-10-01 17:48:13 +03:00
Matthew Devereau	f085a9db8b	[AArch64][SVE] Replace fmul, fadd and fsub LLVM IR instrinsics with LLVM IR binary ops Replacing fmul and fadd instrinsics with their binary ops results more succinct AArch64 SVE output, e.g.: 4: 65428041 fmul z1.h, p0/m, z1.h, z2.h 8: 65408020 fadd z0.h, p0/m, z0.h, z1.h -> 4: 65620020 fmla z0.h, p0/m, z1.h, z2.h	2021-10-01 11:24:46 +01:00
Kerry McLaughlin	c1d46d3461	[SLPVectorizer] Fix crash in isShuffle with scalable vectors D104809 changed `buildTree_rec` to check for extract element instructions with scalable types. However, if the extract is extended or truncated, these changes do not apply and we assert later on in isShuffle(), which attempts to cast the type of the extract to FixedVectorType. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D110640	2021-10-01 10:56:44 +01:00
Krasimir Georgiev	685f1bfd0a	Revert "[LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns" It appears to cause stage2 clang build failures, e.g., https://lab.llvm.org/buildbot/#/builders/74/builds/7145. This reverts commit `1fb37334bd`.	2021-10-01 11:39:43 +02:00
David Sherwood	1fb37334bd	[LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns This patch adds further support for vectorisation of loops that involve selecting an integer value based on a previous comparison. Consider the following C++ loop: int r = a; for (int i = 0; i < n; i++) { if (src[i] > 3) { r = b; } src[i] += 2; } We should be able to vectorise this loop because all we are doing is selecting between two states - 'a' and 'b' - both of which are loop invariant. This just involves building a vector of values that contain either 'a' or 'b', where the final reduced value will be 'b' if any lane contains 'b'. The IR generated by clang typically looks like this: %phi = phi i32 [ %a, %entry ], [ %phi.update, %for.body ] ... %pred = icmp ugt i32 %val, i32 3 %phi.update = select i1 %pred, i32 %b, i32 %phi We already detect min/max patterns, which also involve a select + cmp. However, with the min/max patterns we are selecting loaded values (and hence loop variant) in the loop. In addition we only support certain cmp predicates. This patch adds a new pattern matching function (isSelectCmpPattern) and new RecurKind enums - SelectICmp & SelectFCmp. We only support selecting values that are integer and loop invariant, however we can support any kind of compare - integer or float. Tests have been added here: Transforms/LoopVectorize/AArch64/sve-select-cmp.ll Transforms/LoopVectorize/select-cmp-predicated.ll Transforms/LoopVectorize/select-cmp.ll Differential Revision: https://reviews.llvm.org/D108136	2021-10-01 08:41:03 +01:00
Arnold Schwaighofer	2df2b27d94	[cora async] Cleanup undefined llvm.coro.async.resume In situations where the coroutine function is not split we can just replace the async.resume by null. rdar://82591919 Differential Revision: https://reviews.llvm.org/D110191	2021-09-30 13:26:53 -07:00
Florian Hahn	1fbdbb5595	Revert "Recommit "[SCEV] Look through single value PHIs." (take 2)" This reverts commit `764d9aa979`. This patch exposed a few additional cases where SCEV expressions are not properly invalidated. See PR52024, PR52023.	2021-09-30 20:53:51 +01:00
Sanjay Patel	66c069d7d6	[InstCombine] add tests for shift-trunc-shift; NFC	2021-09-30 15:06:13 -04:00
Craig Topper	765348298c	[CostModel] Update default cost model for sadd/ssub overflow to match TargetLowering The expansion for these was updated in https://reviews.llvm.org/D47927 but the cost model was not adjusted. I believe the cost model was also incorrect for the old expansion. The expansion prior to D47927 used 3 icmps using LHS, RHS, and Result to calculate theirs signs. Then 2 icmps to compare the signs. Followed by an And. The previous cost model was using 3 icmps and 2 selects. Digging back through git blame, those 2 selects in the cost model used to be 2 icmps, but were changed in https://reviews.llvm.org/D90681 Differential Revision: https://reviews.llvm.org/D110739	2021-09-30 09:41:14 -07:00
Adrian Prantl	9232ca4712	Improve the effectiveness of BDCE's debug info salvaging This patch improves the effectiveness of BDCE's debug info salvaging by processing the instructions in reverse order and delaying dropAllReferences until after debug info salvaging. This allows salvaging of entire chains of deleted instructions! Previously we would remove all references from an instruction, which would make it impossible to use that instruction to salvage a later instruction in the instruction stream, because its operands were already removed. This reapplies the previous patch with a fix for a use-after-free. Differential Revision: https://reviews.llvm.org/D110568	2021-09-30 09:28:49 -07:00
Anna Thomas	452714f8f8	[BPI] Keep BPI available in loop passes through LoopStandardAnalysisResults This is analogous to D86156 (which preserves "lossy" BFI in loop passes). Lossy means that the analysis preserved may not be up to date with regards to new blocks that are added in loop passes, but BPI will not contain stale pointers to basic blocks that are deleted by the loop passes. This is achieved through BasicBlockCallbackVH in BPI, which calls eraseBlock that updates the data structures in BPI whenever a basic block is deleted. This patch does not have any changes in the upstream pipeline, since none of the loop passes in the pipeline use BPI currently. However, since BPI wasn't previously preserved in loop passes, the loop predication pass was invoking BPI on the entire function every time it ran in an LPM. This caused massive compile time in our downstream LPM invocation which contained loop predication. See updated test with an invocation of a loop-pipeline containing loop predication and -debug-pass turned ON. Reviewed-By: asbirlea, modimo Differential Revision: https://reviews.llvm.org/D110438	2021-09-30 10:27:05 -04:00
Bjorn Pettersson	3f8027fb67	[test] Update some test cases to use -passes when specifying the pipeline This updates transform test cases for ADCE AddDiscriminators AggressiveInstCombine AlignmentFromAssumptions ArgumentPromotion BDCE CalledValuePropagation DCE Reg2Mem WholeProgramDevirt to use the -passes syntax when specifying the pipeline. Given that LLVM_ENABLE_NEW_PASS_MANAGER isn't set to off (which is a deprecated feature) the updated test cases already used the new pass manager, but they were using the legacy syntax when specifying the passes to run. This patch can be seen as a step toward deprecating that interface. This patch also removes some redundant RUN lines. Here I am referring to test cases that had multiple RUN lines verifying both the legacy "-passname" syntax and the new "-passes=passname" syntax. Since we switched the default pass manager to "new PM" both RUN lines have verified the new PM version of the pass (more or less wasting time running the same test twice), unless LLVM_ENABLE_NEW_PASS_MANAGER is set to "off". It is assumed that it is enough to run these tests with the new pass manager now. Differential Revision: https://reviews.llvm.org/D108472	2021-09-29 21:51:08 +02:00
Sjoerd Meijer	367df18050	[LoopFlatten] Bail if we can't perform flattening after IV widening It can happen that after widening of the IV, flattening may not be possible, e.g. when it is deemed unprofitable. We were not properly checking this, which resulted in flattening being applied when it shouldn't, also leading to incorrect results (miscompilation). This should fix PR51980 (https://bugs.llvm.org/show_bug.cgi?id=51980) Differential Revision: https://reviews.llvm.org/D110712	2021-09-29 19:53:34 +01:00
Sanjay Patel	4414e2ad97	[InstSimplify] (-1 << x) s>> x --> -1 This was noticed in: https://llvm.org/PR51351 https://alive2.llvm.org/ce/z/aLxunD	2021-09-29 13:03:12 -04:00
Sanjay Patel	ea56dcb730	[InstCombine] fix miscompile from dropRedundantMaskingOfLeftShiftInput() The test is from https://llvm.org/PR51351. There are 2 related logic bugs from over-generalizing "lshr" to "any shr", but I'm not sure how to expose the difference for "MaskC" because instsimplify already folds ashr of -1. I'll extend instsimplify to catch the MaskD pattern as a follow-up, but this patch should be enough to avoid the miscompile.	2021-09-29 11:43:18 -04:00
Sanjay Patel	d3e2067c7c	[InstSimplify] add tests for (-1 << x) s>> x; NFC	2021-09-29 11:43:18 -04:00
Sanjay Patel	ac4f30ac49	[InstCombine] add test for miscompile in dropRedundantMaskingOfLeftShiftInput(); NFC (PR51351)	2021-09-29 11:43:18 -04:00
Simon Pilgrim	17f1fc1e54	[TTI] BasicTTI::getInterleavedMemoryOpCost(): use getScalarizationOverhead() getScalarizationOverhead() results in a somewhat better cost estimation than counting the insertion/extraction costs directly. Notably, this is still overestimating the costs. Original Patch by: @lebedev.ri (Roman Lebedev) Differential Revision: https://reviews.llvm.org/D110713	2021-09-29 16:41:53 +01:00
Florian Hahn	0b4a4cc72d	[IndVarSimplify] Forget phi value after changing incoming value. This fixes an issue exposed by D71539, where IndVarSimplify tries to access an invalid cached SCEV expression after making changes to the underlying PHI instruction earlier. When changing the incoming value of a PHI, forget the cached SCEV for the PHI.	2021-09-29 14:44:13 +01:00
Sanjay Patel	98fde3489a	[InstCombine] reduce redundant code for shl-binop folds This is NFCI (no-functional-change-intended), but there are benign diffs possible with commutable ops as seen in the test diffs. The transforms were repeated for the commutative opcodes, but that should not be necessary if we canonicalize the patterns that we're matching. If both operands of the binop match, that should get folded eventually. The transform that starts with a mask op seems to over-constrain the use checks, so that could be a potential enhancement.	2021-09-28 17:06:45 -04:00
Sanjay Patel	6c1a58fe51	[InstCombine] add multi-use tests for shl folds; NFC	2021-09-28 17:06:45 -04:00
Nikita Popov	abbbc480a1	Revert "Improve the effectiveness of BDCE's debug info salvaging" This reverts commit `f6954bf804`. This breaks the test-suite O3 build: /home/nikic/llvm-test-suite/build-O3/tools/timeit --summary Bitcode/Benchmarks/Halide/local_laplacian/CMakeFiles/halide_local_laplacian.dir/local_laplacian.bc.o.time /home/nikic/llvm-project/build/bin/clang++ -DNDEBUG -O3 -w -Werror=date-time -save-stats=obj -save-stats=obj -std=c++11 -MD -MT Bitcode/Benchmarks/Halide/local_laplacian/CMakeFiles/halide_local_laplacian.dir/local_laplacian.bc.o -MF Bitcode/Benchmarks/Halide/local_laplacian/CMakeFiles/halide_local_laplacian.dir/local_laplacian.bc.o.d -o Bitcode/Benchmarks/Halide/local_laplacian/CMakeFiles/halide_local_laplacian.dir/local_laplacian.bc.o -c ../Bitcode/Benchmarks/Halide/local_laplacian/local_laplacian.bc While deleting: i64 % Use still stuck around after Def is destroyed: %12620 = mul i64 %12619, <badref> clang++: /home/nikic/llvm-project/llvm/lib/IR/Value.cpp:103: llvm::Value::~Value(): Assertion `materialized_use_empty() && "Uses remain when a value is destroyed!"' failed.	2021-09-28 21:52:27 +02:00
Anna Thomas	03ce0841da	Add profile count. Regenerate check lines. NFC Function profile counts added to test cases. Regenerated test lines for loop predication test.	2021-09-28 15:33:49 -04:00
Sanjay Patel	09c575e728	[InstCombine] add/move tests for shl with binop; NFC	2021-09-28 14:46:27 -04:00
Adrian Prantl	f6954bf804	Improve the effectiveness of BDCE's debug info salvaging This patch improves the effectiveness of BDCE's debug info salvaging by processing the instructions in reverse order and delaying dropAllReferences until after debug info salvaging. This allows salvaging of entire chains of deleted instructions! Previously we would remove all references from an instruction, which would make it impossible to use that instruction to salvage a later instruction in the instruction stream, because its operands were already removed. Differential Revision: https://reviews.llvm.org/D110568	2021-09-28 10:24:51 -07:00
Adrian Prantl	9637b045e6	Improve the effectiveness of ADCE's debug info salvaging This patch improves the effectiveness of ADCE's debug info salvaging by processing the instructions in reverse order and delaying dropAllReferences until after debug info salvaging. This allows salvaging of entire chains of deleted instructions! Previously we would remove all references from an instruction, which would make it impossible to use that instruction to salvage a later instruction in the instruction stream, because its operands were already removed. Differential Revision: https://reviews.llvm.org/D110462	2021-09-28 10:24:50 -07:00
Adrian Prantl	1b998a5f0c	Add salvageDebugInfo support for truncating/extending ptr/int conversions. This patch enables debug info salvaging for truncating/extending ptr int conversions. The testcase uncovered a bug in adce, which is addressed separately. rdar://80227769 Differential Revision: https://reviews.llvm.org/D110461	2021-09-28 10:24:50 -07:00
Paul Robinson	56e681afcc	[TargetLibraryInfo] Pick new/delete calls by target There are two sets of new/delete functions, one with Windows/MSVC mangling and one with Itanium mangling. Mark one set or the other as unavailable depending on the target. Split the test malloc-free-delete.ll into three parts: malloc-free.dll for the C API tests, new-delete-itanium.ll and new-delete-msvc.ll for the target-specific new/delete tests. Differential Revision: https://reviews.llvm.org/D110419	2021-09-28 10:10:25 -07:00
Alex Richardson	ebb3dc0833	[InstCombine] Fold ptrtoint(gep i8 null, x) -> x This commit is the InstCombine follow-up to the previous constant-folding change that enables noticeable optimizations for CHERI-enabled targets. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D110247	2021-09-28 17:57:37 +01:00
Alex Richardson	9049a1c61e	[ConstantFolding] Fold ptrtoint(gep i8 null, x) -> x I was looking at some missed optimizations in CHERI-enabled targets and noticed that we weren't removing vtable indirection for calls via known pointers-to-members. The underlying reason for this is that we represent pointers-to-function-members as {i8 addrspace(200)*, i64} and generate the constant offsets using (gep i8 null, <index>). We use a constant GEP here since inttoptr should be avoided for CHERI capabilities. The pointer-to-member call uses ptrtoint to extract the index, and due to this missing fold we can't infer the actual value loaded from the vtable. This is the initial constant folding change for this pattern, I will add an InstCombine fold as a follow-up. We could fold all inbounds GEP to null (and therefore the ptrtoint to zero) since zero is the only valid offset for an inbounds GEP. If the offset is not zero, that GEP is poison and therefore returning 0 is valid (https://alive2.llvm.org/ce/z/Gzb5iH). However, Clang currently generates inbounds GEPs on NULL for hand-written offsetof() expressions, so this could lead to miscompilations. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D110245	2021-09-28 17:57:36 +01:00
Alex Richardson	fc0051011e	[InstCombine][ConstantFold] Baseline tests for ptrtoint(gep null, x) Differential Revision: https://reviews.llvm.org/D110244	2021-09-28 17:57:36 +01:00
Alexey Bataev	f701505c45	[SLP]Improve vectorization of phi nodes by trying wider vectors. Try to improve vectorization of the PHI nodes by trying to vectorize similar instructions at the size of the widest possible vectors, then aggregating with compatible type PHIs and trying to vectoriza again and only if this failed, try smaller sizes of the vector factors for compatible PHI nodes. This restores performance of several benchmarks after tuning of the fp/int conversion instructions costs. Differential Revision: https://reviews.llvm.org/D108740	2021-09-28 07:20:36 -07:00
Sjoerd Meijer	0ea77502e2	[LoopFlatten] Updating Phi nodes after IV widening In rG6a076fa9539e, a problem with updating the old/narrow phi nodes after IV widening was introduced. If after widening of the IV the transformation is not applied, the narrow phi node was incorrectly modified, which should only happen if flattening happens. This can be seen in the added test widen-iv2.ll, which incorrectly had 1 incoming value, but should have its original 2 incoming values, which is now restored. Differential Revision: https://reviews.llvm.org/D110234	2021-09-28 15:09:20 +01:00
Sanjay Patel	1a1aed8da8	[InstCombine] add tests for icmp-gep; NFC We need more coverage for commuted and (un)signed preds to verify that things behave as expected here. Currently, we do not transform signed preds or non-inbounds geps.	2021-09-28 10:00:35 -04:00
Sanjay Patel	72d991c42e	[InstCombine] add/move tests for icmp with gep operand(s); NFC	2021-09-28 09:40:52 -04:00
Alexey Bataev	8bacfb9bed	[SLP]No need to schedule/check parent for extract{element/value} instruction. The instruction extractelement/extractvalue are not required to be scheduled since they only depend on the source vector/aggregate (with constant indices), smae applies to the parent basic block checks. Improves compile time and saves scheduling budget. Differential Revision: https://reviews.llvm.org/D108703	2021-09-28 06:13:55 -07:00

1 2 3 4 5 ...

20031 Commits