Commit Graph

45 Commits

Author SHA1 Message Date
Adam Nemet 84a6425d61 [OptDiag,LDist] Convert remaining opt remarks to use the new API
llvm-svn: 276340
2016-07-21 21:21:34 +00:00
Adam Nemet b2593f78ca [LoopDist] Port to new PM
Summary:
The direct motivation for the port is to ensure that the OptRemarkEmitter
tests work with the new PM.

This remains a function pass because we not only create multiple loops
but could also version the original loop.

In the test I need to invoke opt
with -passes='require<aa>,loop-distribute'.  LoopDistribute does not
directly depend on AA however LAA does.  LAA uses getCachedResult so
I *think* we need manually pull in 'aa'.

Reviewers: davidxl, silvas

Subscribers: sanjoy, llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D22437

llvm-svn: 275811
2016-07-18 16:29:27 +00:00
Adam Nemet 79ac42a5c9 [OptRemarkEmitter] Port to new PM
Summary:
The main goal is to able to start using the new OptRemarkEmitter
analysis from the LoopVectorizer.  Since the vectorizer was recently
converted to the new PM, it makes sense to convert this analysis as
well.

This pass is currently tested through the LoopDistribution pass, so I am
also porting LoopDistribution to get coverage for this analysis with the
new PM.

Reviewers: davidxl, silvas

Subscribers: llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D22436

llvm-svn: 275810
2016-07-18 16:29:21 +00:00
Adam Nemet aad816083e [OptRemark,LDist] RFC: Add hotness attribute
Summary:
This is the first set of changes implementing the RFC from
http://thread.gmane.org/gmane.comp.compilers.llvm.devel/98334

This is a cross-sectional patch; rather than implementing the hotness
attribute for all optimization remarks and all passes in a patch set, it
implements it for the 'missed-optimization' remark for Loop
Distribution.  My goal is to shake out the design issues before scaling
it up to other types and passes.

Hotness is computed as an integer as the multiplication of the block
frequency with the function entry count.  It's only printed in opt
currently since clang prints the diagnostic fields directly.  E.g.:

  remark: /tmp/t.c:3:3: loop not distributed: use -Rpass-analysis=loop-distribute for more info (hotness: 300)

A new API added is similar to emitOptimizationRemarkMissed.  The
difference is that it additionally takes a code region that the
diagnostic corresponds to.  From this, hotness is computed using BFI.
The new API is exposed via an analysis pass so that it can be made
dependent on LazyBFI.  (Thanks to Hal for the analysis pass idea.)

This feature can all be enabled by setDiagnosticHotnessRequested in the
LLVM context.  If this is off, LazyBFI is not calculated (D22141) so
there should be no overhead.

A new command-line option is added to turn this on in opt.

My plan is to switch all user of emitOptimizationRemark* to use this
module instead.

Reviewers: hfinkel

Subscribers: rcox2, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D21771

llvm-svn: 275583
2016-07-15 17:23:20 +00:00
Adam Nemet 74730d9ab0 [LoopDist] Fix typo in diagnostic
llvm-svn: 275495
2016-07-14 22:33:46 +00:00
Xinliang David Li 7853c1dd73 Rename LoopAccessAnalysis to LoopAccessLegacyAnalysis /NFC
llvm-svn: 274927
2016-07-08 20:55:26 +00:00
Xinliang David Li 94734eef33 [PM] refactor LoopAccessInfo code part-2
Differential Revision: http://reviews.llvm.org/D21636

llvm-svn: 274334
2016-07-01 05:59:55 +00:00
Adam Nemet ad437fff53 [Diag] Add getter shouldAlwaysPrint. NFC
For the new hotness attribute, the API will take the pass rather than
the pass name so we can no longer play the trick of AlwaysPrint being a
special pass name. This adds a getter to help the transition.

There is also a corresponding clang patch.

llvm-svn: 274100
2016-06-29 04:55:19 +00:00
David Majnemer d770877328 Switch more loops to be range-based
This makes the code a little more concise, no functional change is
intended.

llvm-svn: 273644
2016-06-24 04:05:21 +00:00
Adam Nemet bdbc5227ce [LAA] Default getInfo to not speculate symbolic strides. NFC
Soon we won't be passing Strides to getInfo and then we'll have fewer
call sites to update.

llvm-svn: 272878
2016-06-16 08:26:56 +00:00
Xinliang David Li ecde1c7f3d Revert r272194 No need for it if loop Analysis Manager is used
llvm-svn: 272243
2016-06-09 03:22:39 +00:00
Xinliang David Li 572135f717 [PM] Refector LoopAccessInfo analysis code
This is the preparation patch to port the analysis to new PM

Differential Revision: http://reviews.llvm.org/D20560

llvm-svn: 272194
2016-06-08 20:15:37 +00:00
Adam Nemet eff76646f5 [LoopDist] Only run LAA for loops with the pragma
This should fix some compile-time regressions after r267672.  Thanks to
Chris Matthews for bisecting it.

llvm-svn: 269392
2016-05-13 04:20:31 +00:00
Andrew Kaylor 50271f787e Add opt-bisect support to additional passes that can be skipped
Differential Revision: http://reviews.llvm.org/D19882

llvm-svn: 268457
2016-05-03 22:32:30 +00:00
Adam Nemet 88ec491830 [LoopDist] Also emit optimization remark on success (-Rpass=)
The option -Rpass=loop-distribute now reports the loops that were
distributed.

llvm-svn: 268006
2016-04-29 07:10:46 +00:00
Adam Nemet 4338d6769e [LoopDist] Pass 'Function' to main class. NFC
Next patch will add another use for 'Function' inside the class.

llvm-svn: 268005
2016-04-29 07:10:39 +00:00
Adam Nemet 0ba164bbcb [LoopDist] Emit optimization remarks (-Rpass*)
I closely followed the precedents set by the vectorizer:

* With -Rpass-missed, the loop is reported with further details pointing
to -Rpass--analysis.

* -Rpass-analysis reports the details why distribution has failed.

* Regardless of -Rpass*, when distribution fails for a loop where
distribution was forced with the pragma, a warning is produced according
to -Wpass-failed.  In this case the analysis info is also printed even
without -Rpass-analysis.

llvm-svn: 267952
2016-04-28 23:08:32 +00:00
Adam Nemet adeccf7658 [LoopDist] Improve debug messages
The next patch will start using these for -Rpass-analysis so they won't
be internal-only anymore.

Move the 'Skipping; ' prefix that some of the message are using into the
'fail' function.  We don't want to include this prefix in
the -Rpass-analysis report.

llvm-svn: 267951
2016-04-28 23:08:30 +00:00
Adam Nemet 7f38e1199a [LoopDist] Add helper to print debug message when distribution fails. NFC
This will form the basis to emit optimization remarks (-Rpass*).

llvm-svn: 267950
2016-04-28 23:08:27 +00:00
Adam Nemet d2fa414718 [LoopDist] Add llvm.loop.distribute.enable loop metadata
Summary:
D19403 adds a new pragma for loop distribution.  This change adds
support for the corresponding metadata that the pragma is translated to
by the FE.

As part of this I had to rethink the flag -enable-loop-distribute.  My
goal was to be backward compatible with the existing behavior:

  A1. pass is off by default from the optimization pipeline
  unless -enable-loop-distribute is specified

  A2. pass is on when invoked directly from opt (e.g. for unit-testing)

The new pragma/metadata overrides these defaults so the new behavior is:

  B1. A1 + enable distribution for individual loop with the pragma/metadata

  B2. A2 + disable distribution for individual loop with the pragma/metadata

The default value whether the pass is on or off comes from the initiator
of the pass.  From the PassManagerBuilder the default is off, from opt
it's on.

I moved -enable-loop-distribute under the pass.  If the flag is
specified it overrides the default from above.

Then the pragma/metadata can further modifies this per loop.

As a side-effect, we can now also use -enable-loop-distribute=0 from opt
to emulate the default from the optimization pipeline.  So to be precise
this is the new behavior:

  C1. pass is off by default from the optimization pipeline
  unless -enable-loop-distribute or the pragma/metadata enables it

  C2. pass is on when invoked directly from opt
  unless -enable-loop-distribute=0 or the pragma/metadata disables it

Reviewers: hfinkel

Subscribers: joker.eph, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D19431

llvm-svn: 267672
2016-04-27 05:28:18 +00:00
Adam Nemet 61399ac424 [LoopDist] Split main class. NFC
This splits out the per-loop functionality from the Pass class.

With this the fact whether the loop is forced-distribute with the new
metadata/pragma can be cached in the per-loop class rather than passed
around.

llvm-svn: 267643
2016-04-27 00:31:03 +00:00
Adam Nemet 5eccf07df3 [LoopVersioning] Annotate versioned loop with noalias metadata
Summary:
If we decide to version a loop to benefit a transformation, it makes
sense to record the now non-aliasing accesses in the newly versioned
loop.  This allows non-aliasing information to be used by subsequent
passes.

One example is 456.hmmer in SPECint2006 where after loop distribution,
we vectorize one of the newly distributed loops.  To vectorize we
version this loop to fully disambiguate may-aliasing accesses.  If we
add the noalias markers, we can use the same information in a later DSE
pass to eliminate some dead stores which amounts to ~25% of the
instructions of this hot memory-pipeline-bound loop.  The overall
performance improves by 18% on our ARM64.

The scoped noalias annotation is added in LoopVersioning.  The patch
then enables this for loop distribution.  A follow-on patch will enable
it for the vectorizer.  Eventually this should be run by default when
versioning the loop but first I'd like to get some feedback whether my
understanding and application of scoped noalias metadata is correct.

Essentially my approach was to have a separate alias domain for each
versioning of the loop.  For example, if we first version in loop
distribution and then in vectorization of the distributed loops, we have
a different set of memchecks for each versioning.  By keeping the scopes
in different domains they can conveniently be defined independently
since different alias domains don't affect each other.

As written, I also have a separate domain for each loop.  This is not
necessary and we could save some metadata here by using the same domain
across the different loops.  I don't think it's a big deal either way.

Probably the best is to review the tests first to see if I mapped this
problem correctly to scoped noalias markers.  I have plenty of comments
in the tests.

Note that the interface is prepared for the vectorizer which needs the
annotateInstWithNoAlias API.  The vectorizer does not use LoopVersioning
so we need a way to pass in the versioned instructions.  This is also
why the maps have to become part of the object state.

Also currently, we only have an AA-aware DSE after the vectorizer if we
also run the LTO pipeline.  Depending how widely this triggers we may
want to schedule a DSE toward the end of the regular pass pipeline.

Reviewers: hfinkel, nadav, ashutosh.nema

Subscribers: mssimpso, aemerson, llvm-commits, mcrosier

Differential Revision: http://reviews.llvm.org/D16712

llvm-svn: 263743
2016-03-17 20:32:32 +00:00
Justin Bogner 843fb204b7 LPM: Stop threading `Pass *` through all of the loop utility APIs. NFC
A large number of loop utility functions take a `Pass *` and reach
into it to find out which analyses to preserve. There are a number of
problems with this:

- The APIs have access to pretty well any Pass state they want, so
  it's hard to tell what they may or may not do.

- Other APIs have copied these and pass around a `Pass *` even though
  they don't even use it. Some of these just hand a nullptr to the API
  since the callers don't even have a pass available.

- Passes in the new pass manager don't work like the current ones, so
  the APIs can't be used as is there.

Instead, we should explicitly thread the analysis results that we
actually care about through these APIs. This is both simpler and more
reusable.

llvm-svn: 255669
2015-12-15 19:40:57 +00:00
Silviu Baranga 9cd9a7e310 Re-commit r255115, with the PredicatedScalarEvolution class moved to
ScalarEvolution.h, in order to avoid cyclic dependencies between the Transform
and Analysis modules:

[LV][LAA] Add a layer over SCEV to apply run-time checked knowledge on SCEV expressions

Summary:
This change creates a layer over ScalarEvolution for LAA and LV, and centralizes the
usage of SCEV predicates. The SCEVPredicatedLayer takes the statically deduced knowledge
by ScalarEvolution and applies the knowledge from the SCEV predicates. The end goal is
that both LAA and LV should use this interface everywhere.

This also solves a problem involving the result of SCEV expression rewritting when
the predicate changes. Suppose we have the expression (sext {a,+,b}) and two predicates
  P1: {a,+,b} has nsw
  P2: b = 1.

Applying P1 and then P2 gives us {a,+,1}, while applying P2 and the P1 gives us
sext({a,+,1}) (the AddRec expression was changed by P2 so P1 no longer applies).
The SCEVPredicatedLayer maintains the order of transformations by feeding back
the results of previous transformations into new transformations, and therefore
avoiding this issue.

The SCEVPredicatedLayer maintains a cache to remember the results of previous
SCEV rewritting results. This also has the benefit of reducing the overall number
of expression rewrites.

Reviewers: mzolotukhin, anemet

Subscribers: jmolloy, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D14296

llvm-svn: 255122
2015-12-09 16:06:28 +00:00
Silviu Baranga ad1ccb357b Revert r255115 until we figure out how to fix the bot failures.
llvm-svn: 255117
2015-12-09 15:25:28 +00:00
Silviu Baranga 41eb682501 [LV][LAA] Add a layer over SCEV to apply run-time checked knowledge on SCEV expressions
Summary:
This change creates a layer over ScalarEvolution for LAA and LV, and centralizes the
usage of SCEV predicates. The SCEVPredicatedLayer takes the statically deduced knowledge
by ScalarEvolution and applies the knowledge from the SCEV predicates. The end goal is
that both LAA and LV should use this interface everywhere.

This also solves a problem involving the result of SCEV expression rewritting when
the predicate changes. Suppose we have the expression (sext {a,+,b}) and two predicates
  P1: {a,+,b} has nsw
  P2: b = 1.

Applying P1 and then P2 gives us {a,+,1}, while applying P2 and the P1 gives us
sext({a,+,1}) (the AddRec expression was changed by P2 so P1 no longer applies).
The SCEVPredicatedLayer maintains the order of transformations by feeding back
the results of previous transformations into new transformations, and therefore
avoiding this issue.

The SCEVPredicatedLayer maintains a cache to remember the results of previous
SCEV rewritting results. This also has the benefit of reducing the overall number
of expression rewrites.

Reviewers: mzolotukhin, anemet

Subscribers: jmolloy, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D14296

llvm-svn: 255115
2015-12-09 15:03:52 +00:00
Silviu Baranga 2910a4f6b1 Allow LLE/LD and the loop versioning infrastructure to use SCEV predicates
Summary:
LAA currently generates a set of SCEV predicates that must be checked by users.
In the case of Loop Distribute/Loop Load Elimination, no such predicates could have
been emitted, since we don't allow stride versioning. However, in the future there
could be SCEV predicates that will need to be checked.

This change adds support for SCEV predicate versioning in the Loop Distribute, Loop
Load Eliminate and the loop versioning infrastructure.

Reviewers: anemet

Subscribers: mssimpso, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D14240

llvm-svn: 252467
2015-11-09 13:26:09 +00:00
Adam Nemet a2df750fb3 [LAA] LLE 3/6: Rename InterestingDependence to Dependences, NFC
Summary:
We now collect all types of dependences including lexically forward
deps not just "interesting" ones.

Reviewers: hfinkel

Subscribers: rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D13256

llvm-svn: 251985
2015-11-03 21:39:52 +00:00
Adam Nemet e48134093d [LVer] Fix FIXME: hide addPHINodes, NFC
Since Ashutosh made findDefsUsedOutsideOfLoop public, we can clean this
up.

Now clients that don't compute DefsUsedOutsideOfLoop can just call
versionLoop() and computing DefsUsedOutsideOfLoop will happen
implicitly.  With that there is no reason to expose addPHINodes anymore.

Ashutosh, you can now drop the calls to findDefsUsedOutsideOfLoop and
addPHINodes in LVerLICM and things should just work.

llvm-svn: 245579
2015-08-20 17:22:29 +00:00
Ashutosh Nema c5b7b55589 Exposed findDefsUsedOutsideOfLoop as a loop utility function
Exposed findDefsUsedOutsideOfLoop as a loop utility function by moving 
it from LoopDistribute to LoopUtils.

Reviewed By: anemet

llvm-svn: 245416
2015-08-19 05:40:42 +00:00
Adam Nemet 06ccf0145f [LVer] Remove unused Pass parameter from versionLoop, NFC
llvm-svn: 245032
2015-08-14 06:30:26 +00:00
Adam Nemet 15840393f3 [LAA] Make the set of runtime checks part of the state of LAA, NFC
This is the full set of checks that clients can further filter. IOW,
it's client-agnostic.  This makes LAA complete in the sense that it now
provides the two main results of its analysis precomputed:

1. memory dependences via getDepChecker().getInsterestingDependences()
2. run-time checks via getRuntimePointerCheck().getChecks()

However, as a consequence we now compute this information pro-actively.
Thus if the client decides to skip the loop based on the dependences
we've computed the checks unnecessarily.  In order to see whether this
was a significant overhead I checked compile time on SPEC2k6 LTO bitcode
files.  The change was in the noise.

The checks are generated in canCheckPtrAtRT, at the same place where we
used to call groupChecks to merge checks.

llvm-svn: 244368
2015-08-07 22:44:15 +00:00
Adam Nemet c75ad69ca5 [LDist] Filter the checks locally rather than in LAA, NFC
Before, we were passing the pointer partitions to LAA.  Now, we get all
the checks from LAA and filter out the checks within partitions in
LoopDistribution.

This effectively concludes the steps to move filtering memchecks from
LAA into its clients.  There is still some cleanup left to remove the
unused interfaces in LAA that still take PtrPartition.

(Moving this functionality to LoopDistribution also requires
needsChecking on pointers to be made public.)

llvm-svn: 243613
2015-07-30 03:29:16 +00:00
Adam Nemet 0a674401bf [LDist][LVer] Explicitly pass the set of memchecks to LoopVersioning, NFC
Before the patch, the checks were generated internally in
addRuntimeCheck.  Now, we use the new overloaded version of
addRuntimeCheck that takes the ready-made set of checks as a parameter.

The checks are now generated by the client (LoopDistribution) with the
new RuntimePointerChecking::generateChecks API.

Also the new printChecks API is used to print out the checks for
debugging.

This is to continue the transition over to the new model whereby clients
will get the full set of checks from LAA, filter it and then pass it to
LoopVersioning and in turn to addRuntimeCheck.

llvm-svn: 243382
2015-07-28 05:01:53 +00:00
Pete Cooper 7679afda82 Use make_range(rbegin(), rend()) to allow foreach loops. NFC.
Instead of the pattern

for (auto I = x.rbegin(), E = x.end(); I != E; ++I)

we can use make_range to construct the reverse range and iterate using
that instead.

llvm-svn: 243163
2015-07-24 21:13:43 +00:00
Adam Nemet 9f7dedc376 [LAA] Introduce RuntimePointerChecking::PointerInfo, NFC
Turn this structure-of-arrays (i.e. the various pointer attributes) into
array-of-structures.

llvm-svn: 242219
2015-07-14 22:32:50 +00:00
Adam Nemet 7cdebac0c8 [LAA] Lift RuntimePointerCheck out of LoopAccessInfo, NFC
I am planning to add more nested classes inside RuntimePointerCheck so
all these triple-nesting would be hard to follow.

Also rename it to RuntimePointerChecking (i.e. append 'ing').

llvm-svn: 242218
2015-07-14 22:32:44 +00:00
Adam Nemet 215746b45a [LoopDist/LoopVer] Move LoopVersioning to a new module, NFC
Summary:
The class will obviously need improvement down the road.  For one, there
is no reason that addPHINodes would have to be exposed like that.  I
will make this and other improvements in follow-up patches.

The main goal is to be able to share this functionality.  The
LoopLoadElimination pass I am working on needs it too.  Later we can
move other clients as well (LV and Ashutosh's LICMVer).

Reviewers: hfinkel, ashutosh.nema

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10577

llvm-svn: 241932
2015-07-10 18:55:13 +00:00
Adam Nemet 1a689188c4 [LoopDist] Move loop-versioning helper functions to Cloning, NFC
Summary:
This makes them available to the LoopVersioning class as that is moved
to its own module in the next patch.

Reviewers: ashutosh.nema, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10576

llvm-svn: 241931
2015-07-10 18:55:09 +00:00
Adam Nemet f530b329c7 [LoopDist] Improve variable names and comments in LoopVersioning class, NFC
As with the previous patch, the goal is to turn the class into a general
loop-versioning class.  This patch removes any references to loop
distribution.

llvm-svn: 240352
2015-06-22 22:59:40 +00:00
Adam Nemet 7632500d7a [LoopDist] Rename RuntimeCheckEmitter to LoopVersioning, NFC
llvm-svn: 240165
2015-06-19 19:32:48 +00:00
Adam Nemet 772a150614 [LoopDist] Move pointer-to-partition computation out of RuntimeCheckEmitter, NFC
This starts preparing the class to become a (more) general
LoopVersioning utility class.

llvm-svn: 240164
2015-06-19 19:32:41 +00:00
Benjamin Kramer e6987bf351 [LoopDistribute] Remove a layer of pointer indirection.
Just store InstPartitions directly into the std::list. No functional change
intended.

llvm-svn: 237930
2015-05-21 18:32:07 +00:00
Adam Nemet 2f85b7372c Attempt to fix MSVC bots
llvm-svn: 237359
2015-05-14 12:33:32 +00:00
Adam Nemet 938d3d63d6 New Loop Distribution pass
Summary:
This implements the initial version as was proposed earlier this year
(http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-January/080462.html).
Since then Loop Access Analysis was split out from the Loop Vectorizer
and was made into a separate analysis pass.  Loop Distribution becomes
the second user of this analysis.

The pass is off by default and can be enabled
with -enable-loop-distribution.  There is currently no notion of
profitability; if there is a loop with dependence cycles, the pass will
try to split them off from other memory operations into a separate loop.

I decided to remove the control-dependence calculation from this first
version.  This and the issues with the PDT are actively discussed so it
probably makes sense to treat it separately.  Right now I just mark all
terminator instruction required which keeps identical CFGs for each
distributed loop.  This seems to be working pretty well for 456.hmmer
where even though there is an empty if-then block in the distributed
loop initially, it gets completely removed.

The pass keeps DominatorTree and LoopInfo updated.  I've tested this
with -loop-distribute-verify with the testsuite where we distribute ~90
loops.  SimplifyLoop is violated in some cases and I have a FIXME
covering this.

Reviewers: hfinkel, nadav, aschwaighofer

Reviewed By: aschwaighofer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8831

llvm-svn: 237358
2015-05-14 12:05:18 +00:00