llvm-project

Commit Graph

Author	SHA1	Message	Date
Jon Roelofs	37b6e03c18	[Intrinsics] Make MemCpyInlineInst a MemCpyInst This opens up more optimization opportunities in passes that already handle MemCpyInst's. Differential revision: https://reviews.llvm.org/D105247	2021-07-02 10:25:24 -07:00
Eli Friedman	8f3d16905d	[ScalarEvolution] Ensure backedge-taken counts are not pointers. A backedge-taken count doesn't refer to memory; returning a pointer type is nonsense. So make sure we always return an integer. The obvious way to do this would be to just convert the operands of the icmp to integers, but that doesn't quite work out at the moment: isLoopEntryGuardedByCond currently gets confused by ptrtoint operations. So we perform the ptrtoint conversion late for lt/gt operations. The test changes are mostly innocuous. The most interesting changes are more complex SCEV expressions of the form "(-1 * (ptrtoint i8* %ptr to i64)) + %ptr)". This is expected: we can't fold this to zero because we need to preserve the pointer base. The call to isLoopEntryGuardedByCond in howFarToZero is less precise because of ptrtoint operations; this shows up in the function pr46786_c26_char in ptrtoint.ll. Fixing it here would require more complex refactoring. It should eventually be fixed by future improvements to isImpliedCond. See https://bugs.llvm.org/show_bug.cgi?id=46786 for context. Differential Revision: https://reviews.llvm.org/D103656	2021-06-21 16:24:16 -07:00
Florian Hahn	05bb969014	[LoopIdiom] Add test case that involves adds with flags and zero exts. Test coverage to ensure D104319 does not introduce a regression here.	2021-06-21 12:10:58 +01:00
Eli Friedman	925cd6b467	Regenerate a few tests related to SCEV. In preparation for https://reviews.llvm.org/D103656	2021-06-04 13:35:00 -07:00
Roman Lebedev	149e018d12	[LoopIdiom] 'arithmetic right-shift until zero': don't turn potentially infinite loops into finite ones Nowadays LLVM does not assume that all loops are finite, so if we want to produce a finite loop from a potentially-infinite one, we must ensure that the original loop is known to be a finite one. For this transform, it only matters for arithmetic right-shifts. For them, either the function or the loop must be known to be `mustprogress`, or the original value being shifted must be known to be non-negative (because iff the sign bit was set, it will never become zero, but will become `-1` in the "end"). It would be really good for alive2 to actually complain about this, but it currently does not: https://github.com/AliveToolkit/alive2/issues/726	2021-05-25 21:02:28 +03:00
Roman Lebedev	8f4db14d1c	[LoopIdiom] Support 'left-shift until zero' idiom This adds support for the "count active bits" pattern, i.e.: ``` int countBits(unsigned val) { int cnt = 0; for( ; (val << cnt) != 0; ++cnt) ; return cnt; } ``` but a somewhat more general one: ``` int countBits(unsigned val, int start, int off) { int cnt; for (cnt = start; val << (cnt + off); cnt++) ; return cnt; } ``` alive2 is happy with all the tests there. Note that, again, much like with the right-shift cases, we don't require the `val != 0` guard. This is the last pattern that was supported by `detectShiftUntilZeroIdiom()`, which now becomes obsolete.	2021-05-25 15:26:35 +03:00
Roman Lebedev	980e0107a1	[NFC][LoopIdiom] Add tests for 'left-shift until zero' idiom	2021-05-25 15:26:34 +03:00
Roman Lebedev	f1c5f78d38	[LoopIdiom] Support 'arithmetic right-shift until zero' idiom This adds support for the "count active bits" pattern, i.e.: ``` int countActiveBits(signed val) { int cnt = 0; for( ; (val >> cnt) != 0; ++cnt) ; return cnt; } ``` but a somewhat more general one: ``` int countActiveBits(signed val, int start, int off) { int cnt; for (cnt = start; val >> (cnt + off); cnt++) ; return cnt; } ``` This directly matches the existing 'logical right-shift until zero' idiom. alive2 is happy with all the tests there. Note that, again, much like with the original unsigned case, we don't require the `val != 0` guard. The old `detectShiftUntilZeroIdiom()` already supports this pattern, the idea here is that the `val` must be positive (have at least one leading zero), because otherwise the loop is non-terminating, but since it is not `while(1)`, that would have been UB.	2021-05-25 14:30:49 +03:00
Roman Lebedev	8a0e4ae772	[NFC][LoopIdiom] Add tests for 'arithmetic right-shift until zero' idiom	2021-05-25 14:30:49 +03:00
Roman Lebedev	aa3dac95ed	[LoopIdiom] 'logical right shift until zero': the value must be loop-invariant As per the reproducer provided by Mikael Holmén in post-commit review.	2021-05-24 12:15:06 +03:00
Roman Lebedev	0633d5ce7b	[LoopIdiom] 'logical right-shift until zero' ('count active bits') "on steroids" idiom recognition. I think i've added exhaustive test coverage, and i have verified that alive2 is happy with all the tests, so in principle i'm fine with landing this without review, but just in case.. This adds support for the "count active bits" pattern, i.e.: ``` int countActiveBits(unsigned val) { int cnt = 0; for( ; (val >> cnt) != 0; ++cnt) ; return cnt; } ``` but a somewhat more general one, since that is what i need: ``` int countActiveBits(unsigned val, int start, int off) { int cnt; for (cnt = start; val >> (cnt + off); cnt++) ; return cnt; } ``` I've followed in footstep of 'left-shift until bittest' idiom (D91038), in the sense that iff the `ctlz` intrinsic is cheap, we'll transform, regardless of all other factors. This can have a shocking effect on certain benchmarks: ``` raw.pixls.us-unique/Olympus/XZ-1$ /repositories/googlebenchmark/tools/compare.py -a benchmarks ~/rawspeed/build-{old,new}/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf RUNNING: /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmp49_28zcm 2021-05-09T01:06:05+03:00 Running /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench Run on (32 X 3600.24 MHz CPU s) CPU Caches: L1 Data 32 KiB (x16) L1 Instruction 32 KiB (x16) L2 Unified 512 KiB (x16) L3 Unified 32768 KiB (x2) Load Average: 5.26, 6.29, 3.49 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations CPUTime,s CPUTime/WallTime Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ p1319978.orf/threads:32/process_time/real_time_mean 145 ms 145 ms 128 0.145319 0.999981 10.1568M 69.8949M 69.8936M 6.88159 6.88146 0.145322 p1319978.orf/threads:32/process_time/real_time_median 145 ms 145 ms 128 0.145317 0.999986 10.1568M 69.8941M 69.8931M 6.88151 6.88141 0.145319 p1319978.orf/threads:32/process_time/real_time_stddev 0.766 ms 0.766 ms 128 766.586u 15.1302u 0 354.167k 354.098k 0.0348699 0.0348631 766.469u RUNNING: /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmpwb9sw2x0 2021-05-09T01:06:24+03:00 Running /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench Run on (32 X 3599.95 MHz CPU s) CPU Caches: L1 Data 32 KiB (x16) L1 Instruction 32 KiB (x16) L2 Unified 512 KiB (x16) L3 Unified 32768 KiB (x2) Load Average: 4.05, 5.95, 3.43 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations CPUTime,s CPUTime/WallTime Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ p1319978.orf/threads:32/process_time/real_time_mean 99.8 ms 99.8 ms 128 0.0997758 0.999972 10.1568M 101.797M 101.794M 10.0225 10.0222 0.0997786 p1319978.orf/threads:32/process_time/real_time_median 99.7 ms 99.7 ms 128 0.0997165 0.999985 10.1568M 101.857M 101.854M 10.0284 10.0281 0.0997195 p1319978.orf/threads:32/process_time/real_time_stddev 0.224 ms 0.224 ms 128 224.166u 34.345u 0 226.81k 227.231k 0.0223309 0.0223723 224.586u Comparing /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench to /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench Benchmark Time CPU Time Old Time New CPU Old CPU New ---------------------------------------------------------------------------------------------------------------------------------------------------- p1319978.orf/threads:32/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 128 vs 128 p1319978.orf/threads:32/process_time/real_time_mean -0.3134 -0.3134 145 100 145 100 p1319978.orf/threads:32/process_time/real_time_median -0.3138 -0.3138 145 100 145 100 p1319978.orf/threads:32/process_time/real_time_stddev -0.7073 -0.7078 1 0 1 0 ``` Reviewed By: craig.topper, zhuhan0 Differential Revision: https://reviews.llvm.org/D102116	2021-05-17 20:33:33 +03:00
Roman Lebedev	4aec8f4ce0	[NFC][LoopIdiom] Add some tests for 'lshr until zero' ('count active bits') "on steroids" idiom	2021-05-09 01:07:07 +03:00
Han Zhu	da1cdffbb1	[loop-idiom] Hoist loop memcpys to loop preheader For a simple loop like: ``` struct S { int x; int y; char b; }; unsigned foo(S* __restrict__ a, S* b, int n) { for (int i = 0; i < n; i++) a[i] = b[i]; return sizeof(a[0]); } ``` We could eliminate the loop and convert it to a large memcpy of 12n bytes. Currently this is not handled. Output of `opt -loop-idiom -S < memcpy_before.ll` ``` %struct.S = type { i32, i32, i8 } define dso_local i32 @_Z3fooP1SS0_i(%struct.S noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr { entry: %cmp7 = icmp sgt i32 %n, 0 br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup for.body.preheader: ; preds = %entry br label %for.body for.cond.cleanup.loopexit: ; preds = %for.body br label %for.cond.cleanup for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry ret i32 12 for.body: ; preds = %for.body, %for.body.preheader %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ] %idxprom = zext i32 %i.08 to i64 %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom %0 = bitcast %struct.S* %arrayidx2 to i8* %1 = bitcast %struct.S* %arrayidx to i8* call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 dereferenceable(12) %0, i8* nonnull align 4 dereferenceable(12) %1, i64 12, i1 false) %inc = add nuw nsw i32 %i.08, 1 %cmp = icmp slt i32 %inc, %n br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit } ; Function Attrs: argmemonly nofree nosync nounwind willreturn declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0 attributes #0 = { argmemonly nofree nosync nounwind willreturn } ``` The loop idiom pass currently only handles load and store instructions. Since struct S is too big to fit in a register, the loop body contains a memcpy intrinsic. With this change, re-run `opt -loop-idiom -S < memcpy_before.ll`. The loop memcpy is promoted to loop preheader. For this trivial case, the loop is dead and will be removed by another pass. ``` %struct.S = type { i32, i32, i8 } define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr { entry: %a1 = bitcast %struct.S* %a to i8* %b2 = bitcast %struct.S* %b to i8* %cmp7 = icmp sgt i32 %n, 0 br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup for.body.preheader: ; preds = %entry %0 = zext i32 %n to i64 %1 = mul nuw nsw i64 %0, 12 call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %a1, i8* align 4 %b2, i64 %1, i1 false) br label %for.body for.cond.cleanup.loopexit: ; preds = %for.body br label %for.cond.cleanup for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry ret i32 12 for.body: ; preds = %for.body, %for.body.preheader %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ] %idxprom = zext i32 %i.08 to i64 %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom %2 = bitcast %struct.S* %arrayidx2 to i8* %3 = bitcast %struct.S* %arrayidx to i8* %inc = add nuw nsw i32 %i.08, 1 %cmp = icmp slt i32 %inc, %n br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit } ; Function Attrs: argmemonly nofree nosync nounwind willreturn declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0 attributes #0 = { argmemonly nofree nosync nounwind willreturn } ``` Reviewed By: zino Differential Revision: https://reviews.llvm.org/D97667	2021-05-04 17:05:04 -07:00
Tres Popp	efce19c3b0	Revert "[loop-idiom] Hoist loop memcpys to loop preheader" This reverts commit `75d6b8bb40`. The reasoning is mentioned in https://reviews.llvm.org/D97667	2021-04-28 13:16:34 +02:00
Han Zhu	75d6b8bb40	[loop-idiom] Hoist loop memcpys to loop preheader For a simple loop like: ``` struct S { int x; int y; char b; }; unsigned foo(S* __restrict__ a, S* b, int n) { for (int i = 0; i < n; i++) a[i] = b[i]; return sizeof(a[0]); } ``` We could eliminate the loop and convert it to a large memcpy of 12n bytes. Currently this is not handled. Output of `opt -loop-idiom -S < memcpy_before.ll` ``` %struct.S = type { i32, i32, i8 } define dso_local i32 @_Z3fooP1SS0_i(%struct.S noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr { entry: %cmp7 = icmp sgt i32 %n, 0 br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup for.body.preheader: ; preds = %entry br label %for.body for.cond.cleanup.loopexit: ; preds = %for.body br label %for.cond.cleanup for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry ret i32 12 for.body: ; preds = %for.body, %for.body.preheader %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ] %idxprom = zext i32 %i.08 to i64 %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom %0 = bitcast %struct.S* %arrayidx2 to i8* %1 = bitcast %struct.S* %arrayidx to i8* call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 dereferenceable(12) %0, i8* nonnull align 4 dereferenceable(12) %1, i64 12, i1 false) %inc = add nuw nsw i32 %i.08, 1 %cmp = icmp slt i32 %inc, %n br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit } ; Function Attrs: argmemonly nofree nosync nounwind willreturn declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0 attributes #0 = { argmemonly nofree nosync nounwind willreturn } ``` The loop idiom pass currently only handles load and store instructions. Since struct S is too big to fit in a register, the loop body contains a memcpy intrinsic. With this change, re-run `opt -loop-idiom -S < memcpy_before.ll`. The loop memcpy is promoted to loop preheader. For this trivial case, the loop is dead and will be removed by another pass. ``` %struct.S = type { i32, i32, i8 } define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr { entry: %a1 = bitcast %struct.S* %a to i8* %b2 = bitcast %struct.S* %b to i8* %cmp7 = icmp sgt i32 %n, 0 br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup for.body.preheader: ; preds = %entry %0 = zext i32 %n to i64 %1 = mul nuw nsw i64 %0, 12 call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %a1, i8* align 4 %b2, i64 %1, i1 false) br label %for.body for.cond.cleanup.loopexit: ; preds = %for.body br label %for.cond.cleanup for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry ret i32 12 for.body: ; preds = %for.body, %for.body.preheader %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ] %idxprom = zext i32 %i.08 to i64 %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom %2 = bitcast %struct.S* %arrayidx2 to i8* %3 = bitcast %struct.S* %arrayidx to i8* %inc = add nuw nsw i32 %i.08, 1 %cmp = icmp slt i32 %inc, %n br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit } ; Function Attrs: argmemonly nofree nosync nounwind willreturn declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0 attributes #0 = { argmemonly nofree nosync nounwind willreturn } ``` Reviewed By: zino Differential Revision: https://reviews.llvm.org/D97667	2021-04-27 17:37:51 -07:00
Dávid Bolvanský	f070956c10	[LoopIdiom] Added testcase for double memset (fixed in LLVM 12); NFC	2021-04-22 16:39:25 +02:00
Dávid Bolvanský	0804f0262f	[LoopIdiom] Added testcase from PR44378; NFC	2021-04-21 22:00:32 +02:00
Roman Lebedev	005881e96e	[LoopIdiom] left-shift-until-bittest: set all allowed no-wrap flags on add/sub I've checked each one of these with alive2, and this is both correct and precise.	2021-04-11 18:08:07 +03:00
Roman Lebedev	0ac1920d03	[NFC][LoopIdiom] left-shift-until-bittest: add small-bitwidth tests	2021-04-11 18:08:07 +03:00
Roman Lebedev	ee6a17eb9f	[NFC][LoopIdiom] Regenerate left-shift-until-bittest.ll	2021-04-11 18:08:07 +03:00
Sander de Smalen	672f673004	[SVE] Remove checks for warnings in scalable-vector tests. After D98856 these tests will by default break (fatal_error) if any of the wrong interfaces are used, so there's no longer a need to have a RUN line that checks for a warning message emitted by the compiler.	2021-04-07 15:59:32 +01:00
Krasimir Georgiev	8e7df996e3	Revert "[loop-idiom] Hoist loop memcpys to loop preheader" This reverts commit `92ddd3c1b6`. Causes multistage clang crashes, e.g.: https://lab.llvm.org/buildbot/#/builders/36/builds/6678	2021-03-30 11:47:12 +02:00
Han Zhu	92ddd3c1b6	[loop-idiom] Hoist loop memcpys to loop preheader For a simple loop like: ``` struct S { int x; int y; char b; }; unsigned foo(S* __restrict__ a, S* b, int n) { for (int i = 0; i < n; i++) a[i] = b[i]; return sizeof(a[0]); } ``` We could eliminate the loop and convert it to a large memcpy of 12n bytes. Currently this is not handled. Output of `opt -loop-idiom -S < memcpy_before.ll` ``` %struct.S = type { i32, i32, i8 } define dso_local i32 @_Z3fooP1SS0_i(%struct.S noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr { entry: %cmp7 = icmp sgt i32 %n, 0 br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup for.body.preheader: ; preds = %entry br label %for.body for.cond.cleanup.loopexit: ; preds = %for.body br label %for.cond.cleanup for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry ret i32 12 for.body: ; preds = %for.body, %for.body.preheader %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ] %idxprom = zext i32 %i.08 to i64 %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom %0 = bitcast %struct.S* %arrayidx2 to i8* %1 = bitcast %struct.S* %arrayidx to i8* call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 dereferenceable(12) %0, i8* nonnull align 4 dereferenceable(12) %1, i64 12, i1 false) %inc = add nuw nsw i32 %i.08, 1 %cmp = icmp slt i32 %inc, %n br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit } ; Function Attrs: argmemonly nofree nosync nounwind willreturn declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0 attributes #0 = { argmemonly nofree nosync nounwind willreturn } ``` The loop idiom pass currently only handles load and store instructions. Since struct S is too big to fit in a register, the loop body contains a memcpy intrinsic. With this change, re-run `opt -loop-idiom -S < memcpy_before.ll`. The loop memcpy is promoted to loop preheader. For this trivial case, the loop is dead and will be removed by another pass. ``` %struct.S = type { i32, i32, i8 } define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr { entry: %a1 = bitcast %struct.S* %a to i8* %b2 = bitcast %struct.S* %b to i8* %cmp7 = icmp sgt i32 %n, 0 br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup for.body.preheader: ; preds = %entry %0 = zext i32 %n to i64 %1 = mul nuw nsw i64 %0, 12 call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %a1, i8* align 4 %b2, i64 %1, i1 false) br label %for.body for.cond.cleanup.loopexit: ; preds = %for.body br label %for.cond.cleanup for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry ret i32 12 for.body: ; preds = %for.body, %for.body.preheader %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ] %idxprom = zext i32 %i.08 to i64 %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom %2 = bitcast %struct.S* %arrayidx2 to i8* %3 = bitcast %struct.S* %arrayidx to i8* %inc = add nuw nsw i32 %i.08, 1 %cmp = icmp slt i32 %inc, %n br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit } ; Function Attrs: argmemonly nofree nosync nounwind willreturn declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0 attributes #0 = { argmemonly nofree nosync nounwind willreturn } ``` Reviewed By: zino Differential Revision: https://reviews.llvm.org/D97667	2021-03-29 23:36:26 -07:00
Han Zhu	2bd4049ceb	Revert "[loop-idiom] Hoist loop memcpys to loop preheader" This reverts commit `deb5095833`. Bad commit message.	2021-03-29 23:35:35 -07:00
Han Zhu	deb5095833	[loop-idiom] Hoist loop memcpys to loop preheader Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Blame Revision: Differential Revision: https://phabricator.intern.facebook.com/D26380397	2021-03-29 23:14:42 -07:00
Craig Topper	f24f09d256	[RISCV] Add TTI support for cpop with Zbb This will tell loop idiom recognize that it can make popcount loops countable using the ctpop intrinsic. I didn't bother checking for illegal types. Type legalization knows how to split a ctpop into multiple ctops added together. Assuming we only receive reasonable integer bit widths, a few cpop instructions added together is probably better than the loop. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D99203	2021-03-24 10:58:42 -07:00
Dávid Bolvanský	d307d892ad	[Tests] Added test for memcpy loop idiom recognization	2021-01-13 14:55:46 +01:00
Roman Lebedev	51879a5256	[LoopIdiom] 'left-shift until bittest': don't forget to check that PHI node is in loop header Fixes an issue reported by Peter Collingbourne in https://reviews.llvm.org/D91726#2475301	2020-12-30 23:58:41 +03:00
Roman Lebedev	25aebe2ccf	[LoopIdiom] 'left-shift-until-bittest': keep no-wrap flags on shift, fix edge-case miscompilation for %x.next While `%x.curr` is always safe to compute, because `LoopBackedgeTakenCount` will always be smaller than `bitwidth(X)`, i.e. we never get poison, rewriting `%x.next` is more complicated, however, because `X << LoopTripCount` will be poison iff `LoopTripCount == bitwidth(X)` (which will happen iff `BitPos` is `bitwidth(x) - 1` and `X` is `1`). So unless we know that isn't the case (as alive2 notes, we know it's safe to do iff shift had no-wrap flags, or bitpos does not indicate signbit, or we know that %x is never `1`), we'll need to emit an alternative, safe IR, by either just shifting the `%x.curr`, or conditionally selecting between the computed `%x.next` and `0`.. Former IR looks better so let's do that. While there, ensure that we don't drop no-wrap flags from said shift.	2020-12-24 21:20:52 +03:00
Roman Lebedev	6e074a8324	[NFC][LoopIdiom] Improve test coverage for 'left-shift-until-bittest' pattern In particular, add tests with no-wrap flags on shift, a test where %x is not `1`, and ensure that tests where %bit is a constant bitwidth-1, or is not a constant bitwidth-1 test both liveout values.	2020-12-24 21:20:51 +03:00
Roman Lebedev	2b61e7c68c	[LoopIdiom] 'left-shift until bittest' idiom: support rewriting loop as countable, allow extra cruft The current state of the transform is still not enough to support my motivational pattern, because it has one more "induction variable". I have delayed posting this patch, because originally even just rewriting the loop as countable wasn't enough to nicely transform my motivational pattern, because i expected that extra IV to be rewritten afterwards, but it wasn't happening until i fixed that in D91800. So, this patch allows the 'left-shift until bittest' loop idiom as long as the inserted ops are cheap, and lifts any and all extra use checks on the instructions. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D92754	2020-12-23 22:28:10 +03:00
Roman Lebedev	a0ddc61c5b	[LoopIdiom] 'left-shift until bittest' idiom: support canonical sign bit mask If the bitmask is for sign bit, instcombine would have canonicalized the pattern into a proper sign bit check. Supporting that is still simple, but requires a bit of a roundtrip - we first have to use `decomposeBitTestICmp()`, and the rest again just works. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D91726	2020-12-23 22:28:09 +03:00
Roman Lebedev	cb2e5980ba	[LoopIdiom] 'left-shift until bittest' idiom: support constant bit mask The handing of the case where the mask is a constant is trivial, if said constant is a power of two, the bit in question is log2(mask), rest just works. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D91725	2020-12-23 22:28:09 +03:00
Roman Lebedev	e124844709	[LoopIdiom] Introduce 'left-shift until bittest' idiom The motivation here is the following inner loop in fp16/fp24 -> fp32 expander, that runs as part of the floating-point DNG decompression in RawSpeed library: `cd380bb9a2/src/librawspeed/decompressors/DeflateDecompressor.cpp (L112-L115)` ``` while (!(fp32_fraction & (1 << 23))) { fp32_exponent -= 1; fp32_fraction <<= 1; } ``` (https://godbolt.org/z/r13YMh) As one might notice, that loop is currently uncountable, and that whole code stays scalar. Yet, it is rather trivial to make that loop countable: https://godbolt.org/z/do8WMz and we can prove that via alive2: https://alive2.llvm.org/ce/z/7vQnji (ha nice, isn't it?) ... and that allow for the whole fp16->fp32 code to vectorize: https://godbolt.org/z/7hYr13 Now, while i'd love to get there, i feel like i should take it in steps. For now, this introduces support for the most basic case, where the bit position is known as a variable, and the loop will go away (has no live-outs other than the recurrence, no extra instructions in the loop). I have added sufficient (i believe) test coverage, and alive2 is happy with those transforms. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D91038	2020-12-23 22:28:09 +03:00
Nabeel Omer	df2b9a3e02	[DebugInfo] Avoid re-ordering assignments in LCSSA The LCSSA pass makes use of a function insertDebugValuesForPHIs() to propogate dbg.value() intrinsics to newly inserted PHI instructions. Faulty behaviour occurs when the parent PHI of a newly inserted PHI is not the most recent assignment to a source variable. insertDebugValuesForPHIs ends up propagating a value that isn't the most recent assignemnt. This change removes the call to insertDebugValuesForPHIs() from LCSSA, preventing incorrect dbg.value intrinsics from being propagated. Propagating variable locations between blocks will occur later, during LiveDebugValues. Differential Revision: https://reviews.llvm.org/D92576	2020-12-17 16:17:32 +00:00
Roman Lebedev	aa2009fe78	[NFCI][SimplifyCFG] Mark all the SimplifyCFG tests that already don't invalidate DomTree as such First step after `e113317958`, in these tests, DomTree is valid afterwards, so mark them as such, so that they don't regress. In further steps, SimplifyCFG transforms shall taught to preserve DomTree, in as small steps as possible.	2020-12-17 01:03:49 +03:00
Craig Topper	25067f179f	[LoopIdiomRecognize] Teach detectShiftUntilZeroIdiom to recognize loops where the counter is decrementing. This adds support for loops like unsigned clz(unsigned x) { unsigned w = sizeof (x) * CHAR_BIT; while (x) { w--; x >>= 1; } return w; } and unsigned clz(unsigned x) { unsigned w = sizeof (x) * CHAR_BIT - 1; while (x >>= 1) { w--; } return w; } To support these we look for add x, -1 as well as add x, 1 that we already matched. If the value was -1 we need to subtract from the initial counter value instead of adding to it. Fixes PR48404. Differential Revision: https://reviews.llvm.org/D92745	2020-12-14 14:25:05 -08:00
Craig Topper	2acd5a4738	[LoopIdiom] Pre-commit tests for D92745. NFC	2020-12-13 23:25:00 -08:00
Craig Topper	6e9e53895c	[LoopIdiomRecognize] Autogenerate complete checks for the X86 ctlz/cttz tests. NFC Preparation for D92745 which will add more tests to these files.	2020-12-11 15:35:37 -08:00
Mircea Trofin	f9a27df16b	[FileCheck] Enforce --allow-unused-prefixes=false for llvm/test/Transforms Explicitly opt-out llvm/test/Transforms/Attributor. Verified by flipping the default value of allow-unused-prefixes and observing that none of the failures were under llvm/test/Transforms. Differential Revision: https://reviews.llvm.org/D92404	2020-12-09 08:51:38 -08:00
Roman Lebedev	2c0536b76b	[NFC][LoopIdiom] Reshuffle left-shift-until-bittest test coverage (D91038)	2020-12-07 15:27:13 +03:00
Roman Lebedev	8b9e6dc501	[NFC][LoopIdiom] Left-shift-until-bittest: revisit test coverage	2020-11-18 21:22:27 +03:00
Roman Lebedev	45ddb245c5	[NFC][LoopIdiom] Add basic test coverage for 'left-shift until bittest` idiom	2020-11-08 22:35:41 +03:00
Dávid Bolvanský	7a2abf5aca	[InferAttrs] Add nocapture/writeonly to string/mem libcalls One step closer to fix PR47644. Differential Revision: https://reviews.llvm.org/D89645	2020-10-29 20:06:43 +01:00
Dávid Bolvanský	935cb12280	[LoopIdiom] Regenerate test checks; NFC	2020-10-18 14:07:04 +02:00
Philip Reames	de3cb9548d	Fix a bug in memset formation with vectors of non-integral pointers We were converting the non-integral store into a integer store which is not legal.	2020-10-01 16:11:11 -07:00
David Sherwood	816663adb5	[SVE] In LoopIdiomRecognize::isLegalStore bail out for scalable vectors The function LoopIdiomRecognize::isLegalStore looks for stores in loops that could be transformed into memset or memcpy. However, the algorithm currently requires that we know how big the store is at runtime, i.e. that the store size will not overflow an unsigned integer. For scalable vectors we cannot guarantee this so I have changed the code to bail out for now. In addition, even if we add a way to query the maximum value of vscale in future we will still need to update the algorithm to cope with non-constant strides. The additional cost associated with calculating the memset and memcpy arguments will need to be taken into account as well. This patch also fixes up an implicit TypeSize -> uint64_t cast, thereby removing a warning. I've added tests here showing a fixed width vector loop being transformed into memcpy, and a scalable vector loop remaining unchanged: Transforms/LoopIdiom/memcpy-vectors.ll Differential Revision: https://reviews.llvm.org/D87439	2020-09-14 11:28:31 +01:00
Anh Tuyen Tran	68717acb24	[LoopIdiomRecognizePass] Options to disable part or the entire Loop Idiom Recognize Pass Loop Idiom Recognize Pass (LIRP) attempts to transform loops with subscripted arrays into memcpy/memset function calls. In some particular situation, this transformation introduces negative impacts. For example: https://bugs.llvm.org/show_bug.cgi?id=47300 This patch will enable users to disable a particular part of the transformation, while he/she can still enjoy the benefit brought about by the rest of LIRP. The default behavior stays unchanged: no part of LIRP is disabled by default. Reviewed By: etiotto (Ettore Tiotto) Differential Revision: https://reviews.llvm.org/D86262	2020-09-01 13:59:24 +00:00
Florian Hahn	88818491b9	[LoopIdiom,LSR] Add additional tests for SCEVExpander cleanups.	2020-08-21 13:48:31 +01:00
Florian Hahn	c70f0b9d4a	[SCEVExpander] Avoid re-using existing casts if it means updating users. Currently the SCEVExpander tries to re-use existing casts, even if they are not exactly at the insertion point it was asked to create the cast. To do so in some case, it creates a new cast at the insertion point and updates all users to use the new cast. This behavior is problematic, because it changes the IR outside of the instructions created during the expansion. Therefore we cannot completely undo all changes made during expansion. This re-use should be only an extra optimization, so only using the new cast in the expanded instructions should not be a correctness issue. There are many cases equivalent instructions are created during expansion. This patch also adjusts findInsertPointAfter to skip instructions inserted during expansion. This enables re-using existing casts without the renaming any uses, by picking a better insertion point. Reviewed By: efriedma, lebedev.ri Differential Revision: https://reviews.llvm.org/D84399	2020-08-09 13:25:17 +01:00

1 2 3 4 5

206 Commits