Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
//===- RSProfiling.cpp - Various profiling using random sampling ----------===//
|
|
|
|
//
|
|
|
|
// The LLVM Compiler Infrastructure
|
|
|
|
//
|
2007-12-30 04:36:04 +08:00
|
|
|
// This file is distributed under the University of Illinois Open Source
|
|
|
|
// License. See LICENSE.TXT for details.
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
//
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
//
|
|
|
|
// These passes implement a random sampling based profiling. Different methods
|
|
|
|
// of choosing when to sample are supported, as well as different types of
|
|
|
|
// profiling. This is done as two passes. The first is a sequence of profiling
|
2005-11-29 02:00:38 +08:00
|
|
|
// passes which insert profiling into the program, and remember what they
|
|
|
|
// inserted.
|
|
|
|
//
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
// The second stage duplicates all instructions in a function, ignoring the
|
|
|
|
// profiling code, then connects the two versions togeather at the entry and at
|
|
|
|
// backedges. At each connection point a choice is made as to whether to jump
|
|
|
|
// to the profiled code (take a sample) or execute the unprofiled code.
|
|
|
|
//
|
2007-10-26 11:03:51 +08:00
|
|
|
// It is highly recommended that after this pass one runs mem2reg and adce
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
// (instcombine load-vn gdce dse also are good to run afterwards)
|
|
|
|
//
|
|
|
|
// This design is intended to make the profiling passes independent of the RS
|
|
|
|
// framework, but any profiling pass that implements the RSProfiling interface
|
|
|
|
// is compatible with the rs framework (and thus can be sampled)
|
|
|
|
//
|
|
|
|
// TODO: obviously the block and function profiling are almost identical to the
|
|
|
|
// existing ones, so they can be unified (esp since these passes are valid
|
|
|
|
// without the rs framework).
|
|
|
|
// TODO: Fix choice code so that frequency is not hard coded
|
|
|
|
//
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
|
|
|
|
#include "llvm/Pass.h"
|
|
|
|
#include "llvm/Module.h"
|
|
|
|
#include "llvm/Instructions.h"
|
|
|
|
#include "llvm/Constants.h"
|
|
|
|
#include "llvm/DerivedTypes.h"
|
2008-04-07 21:45:04 +08:00
|
|
|
#include "llvm/Intrinsics.h"
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
#include "llvm/Transforms/Scalar.h"
|
|
|
|
#include "llvm/Transforms/Utils/BasicBlockUtils.h"
|
|
|
|
#include "llvm/Support/CommandLine.h"
|
2007-02-06 07:32:05 +08:00
|
|
|
#include "llvm/Support/Compiler.h"
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
#include "llvm/Support/Debug.h"
|
|
|
|
#include "llvm/Transforms/Instrumentation.h"
|
|
|
|
#include "RSProfiling.h"
|
|
|
|
#include <set>
|
|
|
|
#include <map>
|
|
|
|
#include <queue>
|
|
|
|
#include <list>
|
|
|
|
using namespace llvm;
|
|
|
|
|
|
|
|
namespace {
|
|
|
|
enum RandomMeth {
|
|
|
|
GBV, GBVO, HOSTCC
|
|
|
|
};
|
|
|
|
|
2008-05-06 09:53:16 +08:00
|
|
|
static cl::opt<RandomMeth> RandomMethod("profile-randomness",
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
cl::desc("How to randomly choose to profile:"),
|
|
|
|
cl::values(
|
|
|
|
clEnumValN(GBV, "global", "global counter"),
|
2005-11-29 02:00:38 +08:00
|
|
|
clEnumValN(GBVO, "ra_global",
|
2007-04-17 02:10:23 +08:00
|
|
|
"register allocated global counter"),
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
clEnumValN(HOSTCC, "rdcc", "cycle counter"),
|
|
|
|
clEnumValEnd));
|
|
|
|
|
2005-11-29 02:10:59 +08:00
|
|
|
/// NullProfilerRS - The basic profiler that does nothing. It is the default
|
|
|
|
/// profiler and thus terminates RSProfiler chains. It is useful for
|
|
|
|
/// measuring framework overhead
|
2007-02-06 07:32:05 +08:00
|
|
|
class VISIBILITY_HIDDEN NullProfilerRS : public RSProfilers {
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
public:
|
2007-05-06 21:37:16 +08:00
|
|
|
static char ID; // Pass identification, replacement for typeid
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
bool isProfiling(Value* v) {
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
bool runOnModule(Module &M) {
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
void getAnalysisUsage(AnalysisUsage &AU) const {
|
|
|
|
AU.setPreservesAll();
|
|
|
|
}
|
|
|
|
};
|
|
|
|
|
|
|
|
static RegisterAnalysisGroup<RSProfilers> A("Profiling passes");
|
2006-08-28 06:42:52 +08:00
|
|
|
static RegisterPass<NullProfilerRS> NP("insert-null-profiling-rs",
|
2007-04-17 02:10:23 +08:00
|
|
|
"Measure profiling framework overhead");
|
2006-08-28 08:42:29 +08:00
|
|
|
static RegisterAnalysisGroup<RSProfilers, true> NPT(NP);
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
|
2005-11-29 02:10:59 +08:00
|
|
|
/// Chooser - Something that chooses when to make a sample of the profiled code
|
2007-02-06 07:32:05 +08:00
|
|
|
class VISIBILITY_HIDDEN Chooser {
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
public:
|
2005-11-29 02:10:59 +08:00
|
|
|
/// ProcessChoicePoint - is called for each basic block inserted to choose
|
|
|
|
/// between normal and sample code
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
virtual void ProcessChoicePoint(BasicBlock*) = 0;
|
2005-11-29 02:10:59 +08:00
|
|
|
/// PrepFunction - is called once per function before other work is done.
|
|
|
|
/// This gives the opertunity to insert new allocas and such.
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
virtual void PrepFunction(Function*) = 0;
|
|
|
|
virtual ~Chooser() {}
|
|
|
|
};
|
|
|
|
|
|
|
|
//Things that implement sampling policies
|
2005-11-29 02:10:59 +08:00
|
|
|
//A global value that is read-mod-stored to choose when to sample.
|
|
|
|
//A sample is taken when the global counter hits 0
|
2007-02-06 07:32:05 +08:00
|
|
|
class VISIBILITY_HIDDEN GlobalRandomCounter : public Chooser {
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
GlobalVariable* Counter;
|
|
|
|
Value* ResetValue;
|
|
|
|
const Type* T;
|
|
|
|
public:
|
|
|
|
GlobalRandomCounter(Module& M, const Type* t, uint64_t resetval);
|
|
|
|
virtual ~GlobalRandomCounter();
|
|
|
|
virtual void PrepFunction(Function* F);
|
|
|
|
virtual void ProcessChoicePoint(BasicBlock* bb);
|
|
|
|
};
|
|
|
|
|
2005-11-29 02:10:59 +08:00
|
|
|
//Same is GRC, but allow register allocation of the global counter
|
2007-02-06 07:32:05 +08:00
|
|
|
class VISIBILITY_HIDDEN GlobalRandomCounterOpt : public Chooser {
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
GlobalVariable* Counter;
|
|
|
|
Value* ResetValue;
|
|
|
|
AllocaInst* AI;
|
|
|
|
const Type* T;
|
|
|
|
public:
|
|
|
|
GlobalRandomCounterOpt(Module& M, const Type* t, uint64_t resetval);
|
|
|
|
virtual ~GlobalRandomCounterOpt();
|
|
|
|
virtual void PrepFunction(Function* F);
|
|
|
|
virtual void ProcessChoicePoint(BasicBlock* bb);
|
|
|
|
};
|
|
|
|
|
2005-11-29 02:10:59 +08:00
|
|
|
//Use the cycle counter intrinsic as a source of pseudo randomness when
|
|
|
|
//deciding when to sample.
|
2007-02-06 07:32:05 +08:00
|
|
|
class VISIBILITY_HIDDEN CycleCounter : public Chooser {
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
uint64_t rm;
|
2007-01-07 15:22:20 +08:00
|
|
|
Constant *F;
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
public:
|
|
|
|
CycleCounter(Module& m, uint64_t resetmask);
|
|
|
|
virtual ~CycleCounter();
|
|
|
|
virtual void PrepFunction(Function* F);
|
|
|
|
virtual void ProcessChoicePoint(BasicBlock* bb);
|
|
|
|
};
|
|
|
|
|
2005-11-29 02:10:59 +08:00
|
|
|
/// ProfilerRS - Insert the random sampling framework
|
2007-02-06 07:32:05 +08:00
|
|
|
struct VISIBILITY_HIDDEN ProfilerRS : public FunctionPass {
|
2007-05-06 21:37:16 +08:00
|
|
|
static char ID; // Pass identification, replacement for typeid
|
2007-05-02 05:15:47 +08:00
|
|
|
ProfilerRS() : FunctionPass((intptr_t)&ID) {}
|
|
|
|
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
std::map<Value*, Value*> TransCache;
|
|
|
|
std::set<BasicBlock*> ChoicePoints;
|
|
|
|
Chooser* c;
|
|
|
|
|
2005-11-29 02:10:59 +08:00
|
|
|
//Translate and duplicate values for the new profile free version of stuff
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
Value* Translate(Value* v);
|
2005-11-29 02:10:59 +08:00
|
|
|
//Duplicate an entire function (with out profiling)
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
void Duplicate(Function& F, RSProfilers& LI);
|
2005-11-29 02:10:59 +08:00
|
|
|
//Called once for each backedge, handle the insertion of choice points and
|
|
|
|
//the interconection of the two versions of the code
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
void ProcessBackEdge(BasicBlock* src, BasicBlock* dst, Function& F);
|
|
|
|
bool runOnFunction(Function& F);
|
|
|
|
bool doInitialization(Module &M);
|
|
|
|
virtual void getAnalysisUsage(AnalysisUsage &AU) const;
|
|
|
|
};
|
|
|
|
|
2006-08-28 06:42:52 +08:00
|
|
|
RegisterPass<ProfilerRS> X("insert-rs-profiling-framework",
|
2007-04-17 02:10:23 +08:00
|
|
|
"Insert random sampling instrumentation framework");
|
2006-05-25 01:04:05 +08:00
|
|
|
}
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
|
2007-05-03 09:11:54 +08:00
|
|
|
char RSProfilers::ID = 0;
|
|
|
|
char NullProfilerRS::ID = 0;
|
|
|
|
char ProfilerRS::ID = 0;
|
2007-05-03 04:37:47 +08:00
|
|
|
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
//Local utilities
|
|
|
|
static void ReplacePhiPred(BasicBlock* btarget,
|
|
|
|
BasicBlock* bold, BasicBlock* bnew);
|
|
|
|
|
|
|
|
static void CollapsePhi(BasicBlock* btarget, BasicBlock* bsrc);
|
|
|
|
|
|
|
|
template<class T>
|
|
|
|
static void recBackEdge(BasicBlock* bb, T& BackEdges,
|
|
|
|
std::map<BasicBlock*, int>& color,
|
|
|
|
std::map<BasicBlock*, int>& depth,
|
|
|
|
std::map<BasicBlock*, int>& finish,
|
|
|
|
int& time);
|
|
|
|
|
|
|
|
//find the back edges and where they go to
|
|
|
|
template<class T>
|
|
|
|
static void getBackEdges(Function& F, T& BackEdges);
|
|
|
|
|
|
|
|
|
|
|
|
///////////////////////////////////////
|
|
|
|
// Methods of choosing when to profile
|
|
|
|
///////////////////////////////////////
|
|
|
|
|
|
|
|
GlobalRandomCounter::GlobalRandomCounter(Module& M, const Type* t,
|
|
|
|
uint64_t resetval) : T(t) {
|
2006-10-20 15:07:24 +08:00
|
|
|
ConstantInt* Init = ConstantInt::get(T, resetval);
|
|
|
|
ResetValue = Init;
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
Counter = new GlobalVariable(T, false, GlobalValue::InternalLinkage,
|
2006-10-20 15:07:24 +08:00
|
|
|
Init, "RandomSteeringCounter", &M);
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
GlobalRandomCounter::~GlobalRandomCounter() {}
|
|
|
|
|
|
|
|
void GlobalRandomCounter::PrepFunction(Function* F) {}
|
|
|
|
|
|
|
|
void GlobalRandomCounter::ProcessChoicePoint(BasicBlock* bb) {
|
|
|
|
BranchInst* t = cast<BranchInst>(bb->getTerminator());
|
|
|
|
|
|
|
|
//decrement counter
|
|
|
|
LoadInst* l = new LoadInst(Counter, "counter", t);
|
|
|
|
|
2006-12-23 14:05:41 +08:00
|
|
|
ICmpInst* s = new ICmpInst(ICmpInst::ICMP_EQ, l, ConstantInt::get(T, 0),
|
|
|
|
"countercc", t);
|
|
|
|
|
2005-11-29 02:00:38 +08:00
|
|
|
Value* nv = BinaryOperator::createSub(l, ConstantInt::get(T, 1),
|
2007-04-17 02:10:23 +08:00
|
|
|
"counternew", t);
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
new StoreInst(nv, Counter, t);
|
|
|
|
t->setCondition(s);
|
|
|
|
|
|
|
|
//reset counter
|
|
|
|
BasicBlock* oldnext = t->getSuccessor(0);
|
2008-04-07 04:25:17 +08:00
|
|
|
BasicBlock* resetblock = BasicBlock::Create("reset", oldnext->getParent(),
|
|
|
|
oldnext);
|
|
|
|
TerminatorInst* t2 = BranchInst::Create(oldnext, resetblock);
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
t->setSuccessor(0, resetblock);
|
|
|
|
new StoreInst(ResetValue, Counter, t2);
|
|
|
|
ReplacePhiPred(oldnext, bb, resetblock);
|
|
|
|
}
|
|
|
|
|
|
|
|
GlobalRandomCounterOpt::GlobalRandomCounterOpt(Module& M, const Type* t,
|
|
|
|
uint64_t resetval)
|
|
|
|
: AI(0), T(t) {
|
2006-10-20 15:07:24 +08:00
|
|
|
ConstantInt* Init = ConstantInt::get(T, resetval);
|
|
|
|
ResetValue = Init;
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
Counter = new GlobalVariable(T, false, GlobalValue::InternalLinkage,
|
2006-10-20 15:07:24 +08:00
|
|
|
Init, "RandomSteeringCounter", &M);
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
GlobalRandomCounterOpt::~GlobalRandomCounterOpt() {}
|
|
|
|
|
|
|
|
void GlobalRandomCounterOpt::PrepFunction(Function* F) {
|
|
|
|
//make a local temporary to cache the global
|
|
|
|
BasicBlock& bb = F->getEntryBlock();
|
2007-04-18 01:51:03 +08:00
|
|
|
BasicBlock::iterator InsertPt = bb.begin();
|
|
|
|
AI = new AllocaInst(T, 0, "localcounter", InsertPt);
|
|
|
|
LoadInst* l = new LoadInst(Counter, "counterload", InsertPt);
|
|
|
|
new StoreInst(l, AI, InsertPt);
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
|
2005-11-29 02:10:59 +08:00
|
|
|
//modify all functions and return values to restore the local variable to/from
|
|
|
|
//the global variable
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
for(Function::iterator fib = F->begin(), fie = F->end();
|
|
|
|
fib != fie; ++fib)
|
|
|
|
for(BasicBlock::iterator bib = fib->begin(), bie = fib->end();
|
|
|
|
bib != bie; ++bib)
|
2007-04-18 01:51:03 +08:00
|
|
|
if (isa<CallInst>(bib)) {
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
LoadInst* l = new LoadInst(AI, "counter", bib);
|
|
|
|
new StoreInst(l, Counter, bib);
|
2007-04-18 01:51:03 +08:00
|
|
|
l = new LoadInst(Counter, "counter", ++bib);
|
|
|
|
new StoreInst(l, AI, bib--);
|
|
|
|
} else if (isa<InvokeInst>(bib)) {
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
LoadInst* l = new LoadInst(AI, "counter", bib);
|
|
|
|
new StoreInst(l, Counter, bib);
|
|
|
|
|
2007-04-18 01:51:03 +08:00
|
|
|
BasicBlock* bb = cast<InvokeInst>(bib)->getNormalDest();
|
|
|
|
BasicBlock::iterator i = bb->begin();
|
|
|
|
while (isa<PHINode>(i))
|
|
|
|
++i;
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
l = new LoadInst(Counter, "counter", i);
|
|
|
|
|
2007-04-18 01:51:03 +08:00
|
|
|
bb = cast<InvokeInst>(bib)->getUnwindDest();
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
i = bb->begin();
|
2007-04-18 01:51:03 +08:00
|
|
|
while (isa<PHINode>(i)) ++i;
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
l = new LoadInst(Counter, "counter", i);
|
2007-04-18 01:51:03 +08:00
|
|
|
new StoreInst(l, AI, i);
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
} else if (isa<UnwindInst>(&*bib) || isa<ReturnInst>(&*bib)) {
|
|
|
|
LoadInst* l = new LoadInst(AI, "counter", bib);
|
|
|
|
new StoreInst(l, Counter, bib);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
void GlobalRandomCounterOpt::ProcessChoicePoint(BasicBlock* bb) {
|
|
|
|
BranchInst* t = cast<BranchInst>(bb->getTerminator());
|
|
|
|
|
|
|
|
//decrement counter
|
|
|
|
LoadInst* l = new LoadInst(AI, "counter", t);
|
|
|
|
|
2006-12-23 14:05:41 +08:00
|
|
|
ICmpInst* s = new ICmpInst(ICmpInst::ICMP_EQ, l, ConstantInt::get(T, 0),
|
|
|
|
"countercc", t);
|
|
|
|
|
2005-11-29 02:00:38 +08:00
|
|
|
Value* nv = BinaryOperator::createSub(l, ConstantInt::get(T, 1),
|
2007-04-17 02:10:23 +08:00
|
|
|
"counternew", t);
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
new StoreInst(nv, AI, t);
|
|
|
|
t->setCondition(s);
|
|
|
|
|
|
|
|
//reset counter
|
|
|
|
BasicBlock* oldnext = t->getSuccessor(0);
|
2008-04-07 04:25:17 +08:00
|
|
|
BasicBlock* resetblock = BasicBlock::Create("reset", oldnext->getParent(),
|
|
|
|
oldnext);
|
|
|
|
TerminatorInst* t2 = BranchInst::Create(oldnext, resetblock);
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
t->setSuccessor(0, resetblock);
|
|
|
|
new StoreInst(ResetValue, AI, t2);
|
|
|
|
ReplacePhiPred(oldnext, bb, resetblock);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
CycleCounter::CycleCounter(Module& m, uint64_t resetmask) : rm(resetmask) {
|
2008-04-07 21:45:04 +08:00
|
|
|
F = Intrinsic::getDeclaration(&m, Intrinsic::readcyclecounter);
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
CycleCounter::~CycleCounter() {}
|
|
|
|
|
|
|
|
void CycleCounter::PrepFunction(Function* F) {}
|
|
|
|
|
|
|
|
void CycleCounter::ProcessChoicePoint(BasicBlock* bb) {
|
|
|
|
BranchInst* t = cast<BranchInst>(bb->getTerminator());
|
|
|
|
|
2008-04-07 04:25:17 +08:00
|
|
|
CallInst* c = CallInst::Create(F, "rdcc", t);
|
2005-11-29 02:00:38 +08:00
|
|
|
BinaryOperator* b =
|
2006-12-31 13:48:39 +08:00
|
|
|
BinaryOperator::createAnd(c, ConstantInt::get(Type::Int64Ty, rm),
|
2007-04-17 02:10:23 +08:00
|
|
|
"mrdcc", t);
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
|
2006-12-23 14:05:41 +08:00
|
|
|
ICmpInst *s = new ICmpInst(ICmpInst::ICMP_EQ, b,
|
2006-12-31 13:48:39 +08:00
|
|
|
ConstantInt::get(Type::Int64Ty, 0),
|
2006-12-23 14:05:41 +08:00
|
|
|
"mrdccc", t);
|
|
|
|
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
t->setCondition(s);
|
|
|
|
}
|
|
|
|
|
|
|
|
///////////////////////////////////////
|
|
|
|
// Profiling:
|
|
|
|
///////////////////////////////////////
|
2005-11-29 02:00:38 +08:00
|
|
|
bool RSProfilers_std::isProfiling(Value* v) {
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
if (profcode.find(v) != profcode.end())
|
|
|
|
return true;
|
|
|
|
//else
|
|
|
|
RSProfilers& LI = getAnalysis<RSProfilers>();
|
|
|
|
return LI.isProfiling(v);
|
|
|
|
}
|
|
|
|
|
2005-11-29 02:00:38 +08:00
|
|
|
void RSProfilers_std::IncrementCounterInBlock(BasicBlock *BB, unsigned CounterNum,
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
GlobalValue *CounterArray) {
|
|
|
|
// Insert the increment after any alloca or PHI instructions...
|
|
|
|
BasicBlock::iterator InsertPos = BB->begin();
|
|
|
|
while (isa<AllocaInst>(InsertPos) || isa<PHINode>(InsertPos))
|
|
|
|
++InsertPos;
|
|
|
|
|
|
|
|
// Create the getelementptr constant expression
|
|
|
|
std::vector<Constant*> Indices(2);
|
2006-12-31 13:48:39 +08:00
|
|
|
Indices[0] = Constant::getNullValue(Type::Int32Ty);
|
|
|
|
Indices[1] = ConstantInt::get(Type::Int32Ty, CounterNum);
|
2007-02-19 15:34:47 +08:00
|
|
|
Constant *ElementPtr = ConstantExpr::getGetElementPtr(CounterArray,
|
|
|
|
&Indices[0], 2);
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
|
|
|
|
// Load, increment and store the value back.
|
|
|
|
Value *OldVal = new LoadInst(ElementPtr, "OldCounter", InsertPos);
|
|
|
|
profcode.insert(OldVal);
|
2005-11-29 02:00:38 +08:00
|
|
|
Value *NewVal = BinaryOperator::createAdd(OldVal,
|
2007-04-17 02:10:23 +08:00
|
|
|
ConstantInt::get(Type::Int32Ty, 1),
|
|
|
|
"NewCounter", InsertPos);
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
profcode.insert(NewVal);
|
|
|
|
profcode.insert(new StoreInst(NewVal, ElementPtr, InsertPos));
|
|
|
|
}
|
|
|
|
|
2005-11-29 02:00:38 +08:00
|
|
|
void RSProfilers_std::getAnalysisUsage(AnalysisUsage &AU) const {
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
//grab any outstanding profiler, or get the null one
|
|
|
|
AU.addRequired<RSProfilers>();
|
|
|
|
}
|
|
|
|
|
|
|
|
///////////////////////////////////////
|
|
|
|
// RS Framework
|
|
|
|
///////////////////////////////////////
|
|
|
|
|
|
|
|
Value* ProfilerRS::Translate(Value* v) {
|
|
|
|
if(TransCache[v])
|
|
|
|
return TransCache[v];
|
|
|
|
|
|
|
|
if (BasicBlock* bb = dyn_cast<BasicBlock>(v)) {
|
|
|
|
if (bb == &bb->getParent()->getEntryBlock())
|
|
|
|
TransCache[bb] = bb; //don't translate entry block
|
|
|
|
else
|
2008-04-07 04:25:17 +08:00
|
|
|
TransCache[bb] = BasicBlock::Create("dup_" + bb->getName(), bb->getParent(),
|
|
|
|
NULL);
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
return TransCache[bb];
|
|
|
|
} else if (Instruction* i = dyn_cast<Instruction>(v)) {
|
|
|
|
//we have already translated this
|
|
|
|
//do not translate entry block allocas
|
|
|
|
if(&i->getParent()->getParent()->getEntryBlock() == i->getParent()) {
|
|
|
|
TransCache[i] = i;
|
|
|
|
return i;
|
|
|
|
} else {
|
|
|
|
//translate this
|
|
|
|
Instruction* i2 = i->clone();
|
|
|
|
if (i->hasName())
|
|
|
|
i2->setName("dup_" + i->getName());
|
|
|
|
TransCache[i] = i2;
|
|
|
|
//NumNewInst++;
|
|
|
|
for (unsigned x = 0; x < i2->getNumOperands(); ++x)
|
|
|
|
i2->setOperand(x, Translate(i2->getOperand(x)));
|
|
|
|
return i2;
|
|
|
|
}
|
|
|
|
} else if (isa<Function>(v) || isa<Constant>(v) || isa<Argument>(v)) {
|
|
|
|
TransCache[v] = v;
|
|
|
|
return v;
|
|
|
|
}
|
|
|
|
assert(0 && "Value not handled");
|
2005-11-28 14:45:57 +08:00
|
|
|
return 0;
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
void ProfilerRS::Duplicate(Function& F, RSProfilers& LI)
|
|
|
|
{
|
|
|
|
//perform a breadth first search, building up a duplicate of the code
|
|
|
|
std::queue<BasicBlock*> worklist;
|
|
|
|
std::set<BasicBlock*> seen;
|
|
|
|
|
|
|
|
//This loop ensures proper BB order, to help performance
|
|
|
|
for (Function::iterator fib = F.begin(), fie = F.end(); fib != fie; ++fib)
|
|
|
|
worklist.push(fib);
|
|
|
|
while (!worklist.empty()) {
|
|
|
|
Translate(worklist.front());
|
|
|
|
worklist.pop();
|
|
|
|
}
|
|
|
|
|
|
|
|
//remember than reg2mem created a new entry block we don't want to duplicate
|
|
|
|
worklist.push(F.getEntryBlock().getTerminator()->getSuccessor(0));
|
|
|
|
seen.insert(&F.getEntryBlock());
|
|
|
|
|
|
|
|
while (!worklist.empty()) {
|
|
|
|
BasicBlock* bb = worklist.front();
|
|
|
|
worklist.pop();
|
|
|
|
if(seen.find(bb) == seen.end()) {
|
|
|
|
BasicBlock* bbtarget = cast<BasicBlock>(Translate(bb));
|
|
|
|
BasicBlock::InstListType& instlist = bbtarget->getInstList();
|
|
|
|
for (BasicBlock::iterator iib = bb->begin(), iie = bb->end();
|
|
|
|
iib != iie; ++iib) {
|
|
|
|
//NumOldInst++;
|
|
|
|
if (!LI.isProfiling(&*iib)) {
|
|
|
|
Instruction* i = cast<Instruction>(Translate(iib));
|
|
|
|
instlist.insert(bbtarget->end(), i);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
//updated search state;
|
|
|
|
seen.insert(bb);
|
|
|
|
TerminatorInst* ti = bb->getTerminator();
|
|
|
|
for (unsigned x = 0; x < ti->getNumSuccessors(); ++x) {
|
|
|
|
BasicBlock* bbs = ti->getSuccessor(x);
|
|
|
|
if (seen.find(bbs) == seen.end()) {
|
|
|
|
worklist.push(bbs);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
void ProfilerRS::ProcessBackEdge(BasicBlock* src, BasicBlock* dst, Function& F) {
|
|
|
|
//given a backedge from B -> A, and translations A' and B',
|
|
|
|
//a: insert C and C'
|
|
|
|
//b: add branches in C to A and A' and in C' to A and A'
|
|
|
|
//c: mod terminators@B, replace A with C
|
|
|
|
//d: mod terminators@B', replace A' with C'
|
|
|
|
//e: mod phis@A for pred B to be pred C
|
|
|
|
// if multiple entries, simplify to one
|
|
|
|
//f: mod phis@A' for pred B' to be pred C'
|
|
|
|
// if multiple entries, simplify to one
|
|
|
|
//g: for all phis@A with pred C using x
|
|
|
|
// add in edge from C' using x'
|
|
|
|
// add in edge from C using x in A'
|
|
|
|
|
|
|
|
//a:
|
2007-04-18 01:54:12 +08:00
|
|
|
Function::iterator BBN = src; ++BBN;
|
2008-04-07 04:25:17 +08:00
|
|
|
BasicBlock* bbC = BasicBlock::Create("choice", &F, BBN);
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
//ChoicePoints.insert(bbC);
|
2007-04-18 01:54:12 +08:00
|
|
|
BBN = cast<BasicBlock>(Translate(src));
|
2008-04-07 04:25:17 +08:00
|
|
|
BasicBlock* bbCp = BasicBlock::Create("choice", &F, ++BBN);
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
ChoicePoints.insert(bbCp);
|
|
|
|
|
|
|
|
//b:
|
2008-04-07 04:25:17 +08:00
|
|
|
BranchInst::Create(cast<BasicBlock>(Translate(dst)), bbC);
|
|
|
|
BranchInst::Create(dst, cast<BasicBlock>(Translate(dst)),
|
|
|
|
ConstantInt::get(Type::Int1Ty, true), bbCp);
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
//c:
|
|
|
|
{
|
|
|
|
TerminatorInst* iB = src->getTerminator();
|
|
|
|
for (unsigned x = 0; x < iB->getNumSuccessors(); ++x)
|
|
|
|
if (iB->getSuccessor(x) == dst)
|
|
|
|
iB->setSuccessor(x, bbC);
|
|
|
|
}
|
|
|
|
//d:
|
|
|
|
{
|
|
|
|
TerminatorInst* iBp = cast<TerminatorInst>(Translate(src->getTerminator()));
|
|
|
|
for (unsigned x = 0; x < iBp->getNumSuccessors(); ++x)
|
|
|
|
if (iBp->getSuccessor(x) == cast<BasicBlock>(Translate(dst)))
|
|
|
|
iBp->setSuccessor(x, bbCp);
|
|
|
|
}
|
|
|
|
//e:
|
|
|
|
ReplacePhiPred(dst, src, bbC);
|
|
|
|
//src could be a switch, in which case we are replacing several edges with one
|
|
|
|
//thus collapse those edges int the Phi
|
|
|
|
CollapsePhi(dst, bbC);
|
|
|
|
//f:
|
2005-11-29 02:00:38 +08:00
|
|
|
ReplacePhiPred(cast<BasicBlock>(Translate(dst)),
|
2007-04-17 02:10:23 +08:00
|
|
|
cast<BasicBlock>(Translate(src)),bbCp);
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
CollapsePhi(cast<BasicBlock>(Translate(dst)), bbCp);
|
|
|
|
//g:
|
|
|
|
for(BasicBlock::iterator ib = dst->begin(), ie = dst->end(); ib != ie;
|
|
|
|
++ib)
|
|
|
|
if (PHINode* phi = dyn_cast<PHINode>(&*ib)) {
|
|
|
|
for(unsigned x = 0; x < phi->getNumIncomingValues(); ++x)
|
|
|
|
if(bbC == phi->getIncomingBlock(x)) {
|
|
|
|
phi->addIncoming(Translate(phi->getIncomingValue(x)), bbCp);
|
2005-11-29 02:00:38 +08:00
|
|
|
cast<PHINode>(Translate(phi))->addIncoming(phi->getIncomingValue(x),
|
2007-04-17 02:10:23 +08:00
|
|
|
bbC);
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
}
|
|
|
|
phi->removeIncomingValue(bbC);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
bool ProfilerRS::runOnFunction(Function& F) {
|
2007-01-31 04:08:39 +08:00
|
|
|
if (!F.isDeclaration()) {
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
std::set<std::pair<BasicBlock*, BasicBlock*> > BackEdges;
|
|
|
|
RSProfilers& LI = getAnalysis<RSProfilers>();
|
|
|
|
|
|
|
|
getBackEdges(F, BackEdges);
|
|
|
|
Duplicate(F, LI);
|
|
|
|
//assume that stuff worked. now connect the duplicated basic blocks
|
|
|
|
//with the originals in such a way as to preserve ssa. yuk!
|
2005-11-29 02:00:38 +08:00
|
|
|
for (std::set<std::pair<BasicBlock*, BasicBlock*> >::iterator
|
2007-04-17 02:10:23 +08:00
|
|
|
ib = BackEdges.begin(), ie = BackEdges.end(); ib != ie; ++ib)
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
ProcessBackEdge(ib->first, ib->second, F);
|
|
|
|
|
2005-11-29 02:00:38 +08:00
|
|
|
//oh, and add the edge from the reg2mem created entry node to the
|
|
|
|
//duplicated second node
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
TerminatorInst* T = F.getEntryBlock().getTerminator();
|
2008-04-07 04:25:17 +08:00
|
|
|
ReplaceInstWithInst(T, BranchInst::Create(T->getSuccessor(0),
|
|
|
|
cast<BasicBlock>(
|
|
|
|
Translate(T->getSuccessor(0))),
|
|
|
|
ConstantInt::get(Type::Int1Ty,
|
|
|
|
true)));
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
|
|
|
|
//do whatever is needed now that the function is duplicated
|
|
|
|
c->PrepFunction(&F);
|
|
|
|
|
|
|
|
//add entry node to choice points
|
|
|
|
ChoicePoints.insert(&F.getEntryBlock());
|
|
|
|
|
2005-11-29 02:00:38 +08:00
|
|
|
for (std::set<BasicBlock*>::iterator
|
2007-04-17 02:10:23 +08:00
|
|
|
ii = ChoicePoints.begin(), ie = ChoicePoints.end(); ii != ie; ++ii)
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
c->ProcessChoicePoint(*ii);
|
|
|
|
|
|
|
|
ChoicePoints.clear();
|
|
|
|
TransCache.clear();
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
bool ProfilerRS::doInitialization(Module &M) {
|
|
|
|
switch (RandomMethod) {
|
|
|
|
case GBV:
|
2006-12-31 13:48:39 +08:00
|
|
|
c = new GlobalRandomCounter(M, Type::Int32Ty, (1 << 14) - 1);
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
break;
|
|
|
|
case GBVO:
|
2006-12-31 13:48:39 +08:00
|
|
|
c = new GlobalRandomCounterOpt(M, Type::Int32Ty, (1 << 14) - 1);
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
break;
|
|
|
|
case HOSTCC:
|
|
|
|
c = new CycleCounter(M, (1 << 14) - 1);
|
|
|
|
break;
|
|
|
|
};
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
void ProfilerRS::getAnalysisUsage(AnalysisUsage &AU) const {
|
|
|
|
AU.addRequired<RSProfilers>();
|
|
|
|
AU.addRequiredID(DemoteRegisterToMemoryID);
|
|
|
|
}
|
|
|
|
|
|
|
|
///////////////////////////////////////
|
|
|
|
// Utilities:
|
|
|
|
///////////////////////////////////////
|
|
|
|
static void ReplacePhiPred(BasicBlock* btarget,
|
|
|
|
BasicBlock* bold, BasicBlock* bnew) {
|
|
|
|
for(BasicBlock::iterator ib = btarget->begin(), ie = btarget->end();
|
|
|
|
ib != ie; ++ib)
|
|
|
|
if (PHINode* phi = dyn_cast<PHINode>(&*ib)) {
|
|
|
|
for(unsigned x = 0; x < phi->getNumIncomingValues(); ++x)
|
|
|
|
if(bold == phi->getIncomingBlock(x))
|
|
|
|
phi->setIncomingBlock(x, bnew);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void CollapsePhi(BasicBlock* btarget, BasicBlock* bsrc) {
|
|
|
|
for(BasicBlock::iterator ib = btarget->begin(), ie = btarget->end();
|
|
|
|
ib != ie; ++ib)
|
|
|
|
if (PHINode* phi = dyn_cast<PHINode>(&*ib)) {
|
|
|
|
std::map<BasicBlock*, Value*> counter;
|
|
|
|
for(unsigned i = 0; i < phi->getNumIncomingValues(); ) {
|
|
|
|
if (counter[phi->getIncomingBlock(i)]) {
|
2005-11-29 02:00:38 +08:00
|
|
|
assert(phi->getIncomingValue(i) == counter[phi->getIncomingBlock(i)]);
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
phi->removeIncomingValue(i, false);
|
|
|
|
} else {
|
|
|
|
counter[phi->getIncomingBlock(i)] = phi->getIncomingValue(i);
|
|
|
|
++i;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
template<class T>
|
|
|
|
static void recBackEdge(BasicBlock* bb, T& BackEdges,
|
|
|
|
std::map<BasicBlock*, int>& color,
|
|
|
|
std::map<BasicBlock*, int>& depth,
|
|
|
|
std::map<BasicBlock*, int>& finish,
|
|
|
|
int& time)
|
|
|
|
{
|
|
|
|
color[bb] = 1;
|
|
|
|
++time;
|
|
|
|
depth[bb] = time;
|
|
|
|
TerminatorInst* t= bb->getTerminator();
|
|
|
|
for(unsigned i = 0; i < t->getNumSuccessors(); ++i) {
|
|
|
|
BasicBlock* bbnew = t->getSuccessor(i);
|
|
|
|
if (color[bbnew] == 0)
|
|
|
|
recBackEdge(bbnew, BackEdges, color, depth, finish, time);
|
|
|
|
else if (color[bbnew] == 1) {
|
|
|
|
BackEdges.insert(std::make_pair(bb, bbnew));
|
|
|
|
//NumBackEdges++;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
color[bb] = 2;
|
|
|
|
++time;
|
|
|
|
finish[bb] = time;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
//find the back edges and where they go to
|
|
|
|
template<class T>
|
|
|
|
static void getBackEdges(Function& F, T& BackEdges) {
|
|
|
|
std::map<BasicBlock*, int> color;
|
|
|
|
std::map<BasicBlock*, int> depth;
|
|
|
|
std::map<BasicBlock*, int> finish;
|
|
|
|
int time = 0;
|
|
|
|
recBackEdge(&F.getEntryBlock(), BackEdges, color, depth, finish, time);
|
2006-11-26 17:17:06 +08:00
|
|
|
DOUT << F.getName() << " " << BackEdges.size() << "\n";
|
Random sampling (aka Arnold and Ryder) profiling. This is still preliminary, but it works on spec on x86 and alpha. The idea is to allow profiling passes to remember what profiling they inserted, then a random sampling framework is inserted which consists of duplicated basic blocks (without profiling), such that at each backedge in the program and entry into every function, the framework chooses whether to use the instrumented code or the instrumentation free code. The goal of such a framework is to make it reasonably cheap to do random sampling of very expensive profiling products (such as load-value profiling).
The code is organized into 3 parts (2 passes)
1) a linked set of profiling passes, which implement an analysis group (linked, like alias analysis are). These insert profiling into the program, and remember what they inserted, so that at a later time they can be queried about any instruction.
2) a pass that handles inserting the random sampling framework. This also has options to control how random samples are choosen. Currently implemented are Global counters, register allocated global counters, and read cycle counter (see? there was a reason for it).
The profiling passes are almost identical to the existing ones (block, function, and null profiling is supported right now), and they are valid passes without the sampling framework (hence the existing passes can be unified with the new ones, not done yet).
Some things are a bit ugly still, but that should be fixed up soon enough.
Other todo? making the counter values not "magic 2^16 -1" values, but dynamically choosable.
llvm-svn: 24493
2005-11-28 08:58:09 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
//Creation functions
|
|
|
|
ModulePass* llvm::createNullProfilerRSPass() {
|
|
|
|
return new NullProfilerRS();
|
|
|
|
}
|
|
|
|
|
|
|
|
FunctionPass* llvm::createRSProfilingPass() {
|
|
|
|
return new ProfilerRS();
|
|
|
|
}
|