llvm-project/llvm/lib/Transforms/InstCombine
Roman Lebedev bf21ce7b90
[InstCombine] Take 3: Perform trivial PHI CSE
The original take 1 was 6102310d81,
which taught InstSimplify to do that, which seemed better at time,
since we got EarlyCSE support for free.

However, it was proven that we can not do that there,
the simplified-to PHI would not be reachable from the original PHI,
and that is not something InstSimplify is allowed to do,
as noted in the commit ed90f15efb
that reverted it:
> It appears to cause compilation non-determinism and caused stage3 mismatches.

Then there was take 2 3e69871ab5,
which was InstCombine-specific, but it again showed stage2-stage3 differences,
and reverted in bdaa3f86a0.
This is quite alarming.

Here, let's try to change how we find existing PHI candidate:
due to the worklist order, and the way PHI nodes are inserted
(it may be inserted as the first one, or maybe not), let's look at *all*
PHI nodes in the block.

Effects on vanilla llvm test-suite + RawSpeed:
```
| statistic name                                     | baseline  | proposed  |      Δ |        % |    \|%\| |
|----------------------------------------------------|-----------|-----------|-------:|---------:|---------:|
| asm-printer.EmittedInsts                           | 7942329   | 7942457   |    128 |    0.00% |    0.00% |
| assembler.ObjectBytes                              | 254295632 | 254312480 |  16848 |    0.01% |    0.01% |
| correlated-value-propagation.NumPhis               | 18412     | 18347     |    -65 |   -0.35% |    0.35% |
| early-cse.NumCSE                                   | 2183283   | 2183267   |    -16 |    0.00% |    0.00% |
| early-cse.NumSimplify                              | 550105    | 541842    |  -8263 |   -1.50% |    1.50% |
| instcombine.NumAggregateReconstructionsSimplified  | 73        | 4506      |   4433 | 6072.60% | 6072.60% |
| instcombine.NumCombined                            | 3640311   | 3644419   |   4108 |    0.11% |    0.11% |
| instcombine.NumDeadInst                            | 1778204   | 1783205   |   5001 |    0.28% |    0.28% |
| instcombine.NumPHICSEs                             | 0         | 22490     |  22490 |    0.00% |    0.00% |
| instcombine.NumWorklistIterations                  | 2023272   | 2024400   |   1128 |    0.06% |    0.06% |
| instcount.NumCallInst                              | 1758395   | 1758802   |    407 |    0.02% |    0.02% |
| instcount.NumInvokeInst                            | 59478     | 59502     |     24 |    0.04% |    0.04% |
| instcount.NumPHIInst                               | 330557    | 330545    |    -12 |    0.00% |    0.00% |
| instcount.TotalBlocks                              | 1077138   | 1077220   |     82 |    0.01% |    0.01% |
| instcount.TotalFuncs                               | 101442    | 101441    |     -1 |    0.00% |    0.00% |
| instcount.TotalInsts                               | 8831946   | 8832606   |    660 |    0.01% |    0.01% |
| simplifycfg.NumHoistCommonCode                     | 24186     | 24187     |      1 |    0.00% |    0.00% |
| simplifycfg.NumInvokes                             | 4300      | 4410      |    110 |    2.56% |    2.56% |
| simplifycfg.NumSimpl                               | 1019813   | 999767    | -20046 |   -1.97% |    1.97% |
```
So it fires 22490 times, which is less than ~24k the take 1 did,
but more than what take 2 did (22228 times)
.
It allows foldAggregateConstructionIntoAggregateReuse() to actually work
after PHI-of-extractvalue folds did their thing. Previously SimplifyCFG
would have done this PHI CSE, of all places. Additionally, allows some
more `invoke`->`call` folds to happen (+110, +2.56%).

All in all, expectedly, this catches less things overall,
but all the motivational cases are still caught, so all good.
2020-08-29 18:21:24 +03:00
..
CMakeLists.txt [InstCombine] Move target-specific inst combining 2020-07-22 15:59:49 +02:00
InstCombineAddSub.cpp [InstCombine] Fix typo in comment (NFC) 2020-08-29 10:17:17 +02:00
InstCombineAndOrXor.cpp [InstCombine] canonicalize 'not' ops before logical shifts 2020-08-22 09:38:13 -04:00
InstCombineAtomicRMW.cpp [InstCombine] Move target-specific inst combining 2020-07-22 15:59:49 +02:00
InstCombineCalls.cpp [NFC][InstCombine] Fix some comments: the code already uses IC::replaceInstUsesWith() 2020-08-29 15:10:14 +03:00
InstCombineCasts.cpp [InstCombine] eliminate a pointer cast around insertelement 2020-08-12 09:08:17 -04:00
InstCombineCompares.cpp [InstCombine] canonicalizeICmpPredicate(): use InstCombiner::replaceInstUsesWith() instead of RAUW 2020-08-29 15:10:14 +03:00
InstCombineInternal.h [InstCombine] canonicalizeICmpPredicate(): use InstCombiner::replaceInstUsesWith() instead of RAUW 2020-08-29 15:10:14 +03:00
InstCombineLoadStoreAlloca.cpp [InstCombine] Move target-specific inst combining 2020-07-22 15:59:49 +02:00
InstCombineMulDivRem.cpp [InstCombine] fold abs(X)/X to cmp+select 2020-08-17 08:01:28 -04:00
InstCombineNegator.cpp [InstCombine] Negator: freeze is freely negatible if it's operand is negatible 2020-08-23 23:28:19 +03:00
InstCombinePHI.cpp [InstCombine] Take 3: Perform trivial PHI CSE 2020-08-29 18:21:24 +03:00
InstCombineSelect.cpp [InstCombine] Use CreateVectorSplat(ElementCount) variant directly 2020-08-08 19:26:02 +01:00
InstCombineShifts.cpp [InstCombine] canonicalize 'not' ops before logical shifts 2020-08-22 09:38:13 -04:00
InstCombineSimplifyDemanded.cpp [InstCombine] improve demanded element analysis for vector insert-of-extract (2nd try) 2020-08-25 11:19:36 -04:00
InstCombineVectorOps.cpp [InstCombine] Return replaceInstUsesWith() result (NFC) 2020-08-29 14:49:57 +02:00
InstructionCombining.cpp [NFC][InstCombine] Add STATISTIC() for how many iterations we did 2020-08-29 15:10:13 +03:00
LLVMBuild.txt Update the file headers across all of the LLVM projects in the monorepo 2019-01-19 08:50:56 +00:00