llvm-project

Commit Graph

Author	SHA1	Message	Date
Roman Lebedev	10151f6618	[SimplifyCFG] FoldTwoEntryPHINode(): consider total speculation cost, not per-BB cost Summary: Previously, if the threshold was 2, we were willing to speculatively execute 2 cheap instructions in both basic blocks (thus we were willing to speculatively execute cost = 4), but weren't willing to speculate when one BB had 3 instructions and other one had no instructions, even thought that would have total cost of 3. This looks inconsistent to me. I don't think `cmov`-like instructions will start executing until both of it's inputs are available: https://godbolt.org/z/zgHePf So i don't see why the existing behavior is the correct one. Also, let's add it's own `cl::opt` for this threshold, with default=4, so it is not stricter than the previous threshold: will allow to fold when there are 2 BB's each with cost=2. And since the logic has changed, it will also allow to fold when one BB has cost=3 and other cost=1, or there is only one BB with cost=4. This is an alternative solution to D65148: This fix is mainly motivated by `signbit-like-value-extension.ll` test. That pattern comes up in JPEG decoding, see e.g. `Figure F.12 – Extending the sign bit of a decoded value in V` of `ITU T.81` (JPEG specification). That branch is not predictable, and it is within the innermost loop, so the fact that that pattern ends up being stuck with a branch instead of `select` (i.e. `CMOV` for x86) is unlikely to be beneficial. This has great results on the final assembly (vanilla test-suite + RawSpeed): (metric pass - D67240) \| metric \| old \| new \| delta \| % \| \| x86-mi-counting.NumMachineFunctions \| 37720 \| 37721 \| 1 \| 0.00% \| \| x86-mi-counting.NumMachineBasicBlocks \| 773545 \| 771181 \| -2364 \| -0.31% \| \| x86-mi-counting.NumMachineInstructions \| 7488843 \| 7486442 \| -2401 \| -0.03% \| \| x86-mi-counting.NumUncondBR \| 135770 \| 135543 \| -227 \| -0.17% \| \| x86-mi-counting.NumCondBR \| 423753 \| 422187 \| -1566 \| -0.37% \| \| x86-mi-counting.NumCMOV \| 24815 \| 25731 \| 916 \| 3.69% \| \| x86-mi-counting.NumVecBlend \| 17 \| 17 \| 0 \| 0.00% \| We significantly decrease basic block count, notably decrease instruction count, significantly decrease branch count and very significantly increase `cmov` count. Performance-wise, unsurprisingly, this has great effect on target RawSpeed benchmark. I'm seeing 5 major improvements: ``` Benchmark Time CPU Time Old Time New CPU Old CPU New ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49 Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_mean -0.3064 -0.3064 226.9913 157.4452 226.9800 157.4384 Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_median -0.3057 -0.3057 226.8407 157.4926 226.8282 157.4828 Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_stddev -0.4985 -0.4954 0.3051 0.1530 0.3040 0.1534 Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49 Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_mean -0.1747 -0.1747 80.4787 66.4227 80.4771 66.4146 Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_median -0.1742 -0.1743 80.4686 66.4542 80.4690 66.4436 Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_stddev +0.6089 +0.5797 0.0670 0.1078 0.0673 0.1062 Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49 Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_mean -0.1598 -0.1598 171.6996 144.2575 171.6915 144.2538 Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_median -0.1598 -0.1597 171.7109 144.2755 171.7018 144.2766 Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_stddev +0.4024 +0.3850 0.0847 0.1187 0.0848 0.1175 Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49 Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_mean -0.0550 -0.0551 280.3046 264.8800 280.3017 264.8559 Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_median -0.0554 -0.0554 280.2628 264.7360 280.2574 264.7297 Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_stddev +0.7005 +0.7041 0.2779 0.4725 0.2775 0.4729 Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 49 vs 49 Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_mean -0.0354 -0.0355 316.7396 305.5208 316.7342 305.4890 Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_median -0.0354 -0.0356 316.6969 305.4798 316.6917 305.4324 Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_stddev +0.0493 +0.0330 0.3562 0.3737 0.3563 0.3681 ``` That being said, it's always best-effort, so there will likely be cases where this worsens things. Reviewers: efriedma, craig.topper, dmgreen, jmolloy, fhahn, Carrot, hfinkel, chandlerc Reviewed By: jmolloy Subscribers: xbolva00, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67318 llvm-svn: 372009	2019-09-16 16:18:24 +00:00
Roman Lebedev	88bab08a88	[SimplifyCFG][NFC] Autogenerate two tests llvm-svn: 371310	2019-09-07 13:35:54 +00:00
Eric Christopher	cee313d288	Revert "Temporarily Revert "Add basic loop fusion pass."" The reversion apparently deleted the test/Transforms directory. Will be re-reverting again. llvm-svn: 358552	2019-04-17 04:52:47 +00:00
Eric Christopher	a863435128	Temporarily Revert "Add basic loop fusion pass." As it's causing some bot failures (and per request from kbarton). This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda. llvm-svn: 358546	2019-04-17 02:12:23 +00:00
Thomas Lively	c339250e12	[InstCombine] InstCombine and InstSimplify for minimum and maximum Summary: Depends on D52765 Reviewers: aheejin, dschuff Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52766 llvm-svn: 344799	2018-10-19 19:01:26 +00:00
David Majnemer	fccf5c6e01	Revert "Revert "[SimplifyCFG] allow speculation of exactly one expensive instruction (PR24818)"" This reverts commit r258903 which reverted r255660. r258903 was an accidental commit and should not have been committed. llvm-svn: 258905	2016-01-27 02:59:41 +00:00
David Majnemer	47de2140f7	Revert "[SimplifyCFG] allow speculation of exactly one expensive instruction (PR24818)" This reverts commit r255660. llvm-svn: 258903	2016-01-27 02:43:22 +00:00
Sanjay Patel	38a022623a	[SimplifyCFG] allow speculation of exactly one expensive instruction (PR24818) This is the last general step to allow more IR-level speculation with a safety harness in place in CodeGenPrepare. The intent is to restore the behavior enabled by: http://reviews.llvm.org/rL228826 but prevent bad performance such as: https://llvm.org/bugs/show_bug.cgi?id=24818 Earlier patches in this sequence: D12882 (disable SimplifyCFG speculation for expensive instructions) D13297 (have CGP despeculate expensive ops) D14630 (have CGP despeculate special versions of cttz/ctlz) As shown in the test cases, we only have two instructions currently affected: ctz for some x86 and fdiv generally. Allowing exactly one expensive instruction is a bit of a hack, but it lines up with what is currently implemented in CGP. If we make the despeculation more general in CGP, we can make the speculation here more liberal. A follow-up patch will adjust the cost for sqrt and possibly other typically expensive math intrinsics (currently everything is cheap by default). GPU targets would likely want to override those expensive default costs (just as they probably should already override the cost of div/rem) because just about any math is cheaper than control-flow on those targets. Differential Revision: http://reviews.llvm.org/D15213 llvm-svn: 255660	2015-12-15 17:38:29 +00:00
Sanjay Patel	13e8bbc237	set div/rem default values to 'expensive' in TargetTransformInfo's cost model ...because that's what the cost model was intended to do. As discussed in D12882, this fix has a temporary unintended consequence for SimplifyCFG: it causes us to not speculate an fdiv. However, two wrongs make PR24818 right, and two wrongs make PR24343 act right even though it's really still wrong. I intend to correct SimplifyCFG and add to CodeGenPrepare to account for this cost model change and preserve the righteousness for the bug report cases. https://llvm.org/bugs/show_bug.cgi?id=24818 https://llvm.org/bugs/show_bug.cgi?id=24343 Differential Revision: http://reviews.llvm.org/D12882 llvm-svn: 248439	2015-09-23 22:28:18 +00:00
David Majnemer	0a92f86fe6	Revert r246232 and r246304. This reverts isSafeToSpeculativelyExecute's use of ReadNone until we split ReadNone into two pieces: one attribute which reasons about how the function reasons about memory and another attribute which determines how it may be speculated, CSE'd, trap, etc. llvm-svn: 246331	2015-08-28 21:13:39 +00:00
David Majnemer	0293704be2	[ValueTracking] readnone CallInsts are fair game for speculation Any call which is side effect free is trivially OK to speculate. We already had similar logic in EarlyCSE and GVN but we were missing it from isSafeToSpeculativelyExecute. This fixes PR24601. llvm-svn: 246232	2015-08-27 23:03:01 +00:00
Matt Arsenault	d6511b49ac	Add minnum / maxnum intrinsics These are named following the IEEE-754 names for these functions, rather than the libm fmin / fmax to avoid possible ambiguities. Some languages may implement something resembling fmin / fmax which return NaN if either operand is to propagate errors. These implement the IEEE-754 semantics of returning the other operand if either is a NaN representing missing data. llvm-svn: 220341	2014-10-21 23:00:20 +00:00
Matt Arsenault	85cbc7e371	Make fabs safe to speculatively execute llvm-svn: 216736	2014-08-29 16:01:17 +00:00
Matt Arsenault	ee364ee729	Allow speculating llvm.sqrt, fma and fmuladd This doesn't set errno, so this should be OK. Also update the documentation to explicitly state that errno are not set. llvm-svn: 200501	2014-01-31 00:09:00 +00:00

14 Commits