llvm-project/llvm/lib/Transforms/IPO
Adam Nemet e54a4fa95d LLE 6/6: Add LoopLoadElimination pass
Summary:
The goal of this pass is to perform store-to-load forwarding across the
backedge of a loop.  E.g.:

  for (i)
     A[i + 1] = A[i] + B[i]

  =>

  T = A[0]
  for (i)
     T = T + B[i]
     A[i + 1] = T

The pass relies on loop dependence analysis via LoopAccessAnalisys to
find opportunities of loop-carried dependences with a distance of one
between a store and a load.  Since it's using LoopAccessAnalysis, it was
easy to also add support for versioning away may-aliasing intervening
stores that would otherwise prevent this transformation.

This optimization is also performed by Load-PRE in GVN without the
option of multi-versioning.  As was discussed with Daniel Berlin in
http://reviews.llvm.org/D9548, this is inferior to a more loop-aware
solution applied here.  Hopefully, we will be able to remove some
complexity from GVN/MemorySSA as a consequence.

In the long run, we may want to extend this pass (or create a new one if
there is little overlap) to also eliminate loop-indepedent redundant
loads and store that *require* versioning due to may-aliasing
intervening stores/loads.  I have some motivating cases for store
elimination. My plan right now is to wait for MemorySSA to come online
first rather than using memdep for this.

The main motiviation for this pass is the 456.hmmer loop in SPECint2006
where after distributing the original loop and vectorizing the top part,
we are left with the critical path exposed in the bottom loop.  Being
able to promote the memory dependence into a register depedence (even
though the HW does perform store-to-load fowarding as well) results in a
major gain (~20%).  This gain also transfers over to x86: it's
around 8-10%.

Right now the pass is off by default and can be enabled
with -enable-loop-load-elim.  On the LNT testsuite, there are two
performance changes (negative number -> improvement):

  1. -28% in Polybench/linear-algebra/solvers/dynprog: the length of the
     critical paths is reduced
  2. +2% in Polybench/stencils/adi: Unfortunately, I couldn't reproduce this
     outside of LNT

The pass is scheduled after the loop vectorizer (which is after loop
distribution).  The rational is to try to reuse LAA state, rather than
recomputing it.  The order between LV and LLE is not critical because
normally LV does not touch scalar st->ld forwarding cases where
vectorizing would inhibit the CPU's st->ld forwarding to kick in.

LoopLoadElimination requires LAA to provide the full set of dependences
(including forward dependences).  LAA is known to omit loop-independent
dependences in certain situations.  The big comment before
removeDependencesFromMultipleStores explains why this should not occur
for the cases that we're interested in.

Reviewers: dberlin, hfinkel

Subscribers: junbuml, dberlin, mssimpso, rengolin, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D13259

llvm-svn: 252017
2015-11-03 23:50:08 +00:00
..
ArgumentPromotion.cpp IPO: Remove implicit ilist iterator conversions, NFC 2015-10-13 17:51:03 +00:00
BarrierNoopPass.cpp Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC) 2015-06-23 09:49:53 +00:00
CMakeLists.txt Convert SampleProfile pass into a Module pass. 2015-08-25 15:25:11 +00:00
ConstantMerge.cpp IPO: Remove implicit ilist iterator conversions, NFC 2015-10-13 17:51:03 +00:00
DeadArgumentElimination.cpp IPO: Remove implicit ilist iterator conversions, NFC 2015-10-13 17:51:03 +00:00
ElimAvailExtern.cpp Restore "Support for ThinLTO function importing and symbol linking." 2015-11-03 00:14:15 +00:00
ExtractGV.cpp IPO: Remove implicit ilist iterator conversions, NFC 2015-10-13 17:51:03 +00:00
FunctionAttrs.cpp [FunctionAttrs] Inline the prototype attribute inference to an existing 2015-10-31 00:28:37 +00:00
GlobalDCE.cpp Rangify for loops in GlobalDCE, NFC. 2015-07-18 19:57:34 +00:00
GlobalOpt.cpp [GlobalOpt] Add newlines to DEBUG messages 2015-10-28 14:30:53 +00:00
IPConstantPropagation.cpp Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC) 2015-06-23 09:49:53 +00:00
IPO.cpp [PM] Port StripDeadPrototypes to the new pass manager 2015-10-30 23:28:12 +00:00
InlineAlways.cpp [PM/AA] Rebuild LLVM's alias analysis infrastructure in a way compatible 2015-09-09 17:55:00 +00:00
InlineSimple.cpp [PM/AA] Rebuild LLVM's alias analysis infrastructure in a way compatible 2015-09-09 17:55:00 +00:00
Inliner.cpp Move dbg.declare intrinsics when merging and replacing allocas. 2015-09-29 00:30:19 +00:00
Internalize.cpp IPO: Remove implicit ilist iterator conversions, NFC 2015-10-13 17:51:03 +00:00
LLVMBuild.txt Update libdeps in LLVMipo and LLVMScalarOpts, corresponding to r245940. 2015-08-25 17:11:17 +00:00
LoopExtractor.cpp IPO: Remove implicit ilist iterator conversions, NFC 2015-10-13 17:51:03 +00:00
LowerBitSets.cpp Fix Clang-tidy modernize-use-nullptr warnings in source directories and generated files; other minor cleanups. 2015-10-06 23:24:35 +00:00
Makefile
MergeFunctions.cpp IPO: Remove implicit ilist iterator conversions, NFC 2015-10-13 17:51:03 +00:00
PartialInlining.cpp IPO: Remove implicit ilist iterator conversions, NFC 2015-10-13 17:51:03 +00:00
PassManagerBuilder.cpp LLE 6/6: Add LoopLoadElimination pass 2015-11-03 23:50:08 +00:00
PruneEH.cpp IPO: Remove implicit ilist iterator conversions, NFC 2015-10-13 17:51:03 +00:00
SampleProfile.cpp StringRef-ify DiagnosticInfoSampleProfile::Filename 2015-11-02 20:01:13 +00:00
StripDeadPrototypes.cpp [PM] Port StripDeadPrototypes to the new pass manager 2015-10-30 23:28:12 +00:00
StripSymbols.cpp IPO: Remove implicit ilist iterator conversions, NFC 2015-10-13 17:51:03 +00:00