Since the analysis managers were split into explicit function and module
analysis managers, it is now completely trivial to specify this when
building up the concept and model types explicitly, and it is impossible
to end up with a type error at run time. We instantiate a template when
registering a pass that will enforce the requirement at a type-system
level, and we produce a dynamic error on all the other query paths to
the analysis manager if the pass in question isn't registered.
llvm-svn: 195447
This is supposed to be the whole type of the IR unit, and so we
shouldn't pass a pointer to it but rather the value itself. In turn, we
need to provide a 'Module *' as that type argument (for example). This
will become more relevant with SCCs or other units which may not be
passed as a pointer type, but also brings consistency with the
transformation pass templates.
llvm-svn: 195445
We already have a method for returning one loop latch but for some
reason no one has committed one for returning loop latches in the case
where there are multiple latches.
llvm-svn: 195410
rather than the constructors of passes.
This simplifies the APIs of passes significantly and removes an error
prone pattern where the *same* manager had to be given to every
different layer. With the new API the analysis managers themselves will
have to be cross connected with proxy analyses that allow a pass at one
layer to query for the analysis manager of another layer. The proxy will
both expose a handle to the other layer's manager and it will provide
the invalidation hooks to ensure things remain consistent across layers.
Finally, the outer-most analysis manager has to be passed to the run
method of the outer-most pass manager. The rest of the propagation is
automatic.
I've used SFINAE again to allow passes to completely disregard the
analysis manager if they don't need or want to care. This helps keep
simple things simple for users of the new pass manager.
Also, the system specifically supports passing a null pointer into the
outer-most run method if your pass pipeline neither needs nor wants to
deal with analyses. I find this of dubious utility as while some
*passes* don't care about analysis, I'm not sure there are any
real-world users of the pass manager itself that need to avoid even
creating an analysis manager. But it is easy to support, so there we go.
Finally I renamed the module proxy for the function analysis manager to
the more verbose but less confusing name of
FunctionAnalysisManagerModuleProxy. I hate this name, but I have no idea
what else to name these things. I'm expecting in the fullness of time to
potentially have the complete cross product of types at the proxy layer:
{Module,SCC,Function,Loop,Region}AnalysisManager{Module,SCC,Function,Loop,Region}Proxy
(except for XAnalysisManagerXProxy which doesn't make any sense)
This should make it somewhat easier to do the next phases which is to
build the upward proxy and get its invalidation correct, as well as to
make the invalidation within the Module -> Function mapping pass be more
fine grained so as to invalidate fewer fuction analyses.
After all of the proxy analyses are done and the invalidation working,
I'll finally be able to start working on the next two fun fronts: how to
adapt an existing pass to work in both the legacy pass world and the new
one, and building the SCC, Loop, and Region counterparts. Fun times!
llvm-svn: 195400
This patch is a rewrite of the original patch commited in r194542. Instead of
relying on the type legalizer to do the splitting for us, we now peform the
splitting ourselves in the DAG combiner. This is necessary for the case where
the vector mask is a legal type after promotion and still wouldn't require
splitting.
Patch by: Juergen Ributzka
NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195397
It broke, at least, i686 target. It is reproducible with "llc -mtriple=i686-unknown".
FYI, it didn't appear to add either "-O0" or "-fast-isel".
llvm-svn: 195339
it is completely optional, and sink the logic for handling the preserved
analysis set into it.
This allows us to implement the delegation logic desired in the proxy
module analysis for the function analysis manager where if the proxy
itself is preserved we assume the set of functions hasn't changed and we
do a fine grained invalidation by walking the functions in the module
and running the invalidate for them all at the manager level and letting
it try to invalidate any passes.
This in turn makes it blindingly obvious why we should hoist the
invalidate trait and have two collections of results. That allows
handling invalidation for almost all analyses without indirect calls and
it allows short circuiting when the preserved set is all.
llvm-svn: 195338
type and detect whether or not it provides an 'invalidate' member the
analysis manager should use.
This lets the overwhelming common case of *not* caring about custom
behavior when an analysis is invalidated be the the obvious default
behavior with no code written by the author of an analysis. Only when
they write code specifically to handle invalidation does it get used.
Both cases are actually covered by tests here. The test analysis uses
the default behavior, and the proxy module analysis actually has custom
behavior on invalidation that is firing correctly. (In fact, this is the
analysis which was the primary motivation for having custom invalidation
behavior in the first place.)
llvm-svn: 195332
This proxy will fill the role of proxying invalidation events down IR
unit layers so that when a module changes we correctly invalidate
function analyses. Currently this is a very coarse solution -- any
change blows away the entire thing -- but the next step is to make
invalidation handling more nuanced so that we can propagate specific
amounts of invalidation from one layer to the next.
The test is extended to place a module pass between two function pass
managers each of which have preserved function analyses which get
correctly invalidated by the module pass that might have changed what
functions are even in the module.
llvm-svn: 195304
MappingTrait template specializations can now have a validate() method which
performs semantic checking. For details, see <http://llvm.org/docs/YamlIO.html>.
llvm-svn: 195286
Enhance the tests to actually require moves in C++11 mode, in addition
to testing the moved-from state. Further enhance the tests to cover
copy-assignment into a moved-from object and moving a large-state
object. (Note that we can't really test small-state vs. large-state as
that isn't an observable property of the API really.) This should finish
addressing review on r195239.
llvm-svn: 195261
This adds a new set-like type which represents a set of preserved
analysis passes. The set is managed via the opaque PassT::ID() void*s.
The expected convenience templates for interacting with specific passes
are provided. It also supports a symbolic "all" state which is
represented by an invalid pointer in the set. This state is nicely
saturating as it comes up often. Finally, it supports intersection which
is used when finding the set of preserved passes after N different
transforms.
The pass API is then changed to return the preserved set rather than
a bool. This is much more self-documenting than the previous system.
Returning "none" is a conservatively correct solution just like
returning "true" from todays passes and not marking any passes as
preserved. Passes can also be dynamically preserved or not throughout
the run of the pass, and whatever gets returned is the binding state.
Finally, preserving "all" the passes is allowed for no-op transforms
that simply can't harm such things.
Finally, the analysis managers are changed to instead of blindly
invalidating all of the analyses, invalidate those which were not
preserved. This should rig up all of the basic preservation
functionality. This also correctly combines the preservation moving up
from one IR-layer to the another and the preservation aggregation across
N pass runs. Still to go is incrementally correct invalidation and
preservation across IR layers incrementally during N pass runs. That
will wait until we have a device for even exposing analyses across IR
layers.
While the core of this change is obvious, I'm not happy with the current
testing, so will improve it to cover at least some of the invalidation
that I can test easily in a subsequent commit.
llvm-svn: 195241
Somehow, this ADT got missed which is moderately terrifying considering
the efficiency of move for it.
The code to implement move semantics for it is pretty horrible
currently but was written to reasonably closely match the rest of the
code. Unittests that cover both copying and moving (at a basic level)
added.
llvm-svn: 195239
The FunctionPassManager is now itself a function pass. When run over
a function, it runs all N of its passes over that function. This is the
1:N mapping in the pass dimension only. This allows it to be used in
either a ModulePassManager or potentially some other manager that
works on IR units which are supersets of Functions.
This commit also adds the obvious adaptor to map from a module pass to
a function pass, running the function pass across every function in the
module.
The test has been updated to use this new pattern.
llvm-svn: 195192
Instead of permanently outputting "MVLL" as the file checksum, clang
will create gcno and gcda checksums by hashing the destination block
numbers of every arc. This allows for llvm-cov to check if the two gcov
files are synchronized.
Regenerated the test files so they contain the checksum. Also added
negative test to ensure error when the checksums don't match.
llvm-svn: 195191
a module-specific interface. This is the first of many steps necessary
to generalize the infrastructure such that we can support both
a Module-to-Function and Module-to-SCC-to-Function pass manager
nestings.
After a *lot* of attempts that never worked and didn't even make it to
a committable state, it became clear that I had gotten the layering
design of analyses flat out wrong. Four days later, I think I have most
of the plan for how to correct this, and I'm starting to reshape the
code into it. This is just a baby step I'm afraid, but starts separating
the fundamentally distinct concepts of function analysis passes and
module analysis passes so that in subsequent steps we can effectively
layer them, and have a consistent design for the eventual SCC layer.
As part of this, I've started some interface changes to make passes more
regular. The module pass accepts the module in the run method, and some
of the constructor parameters are gone. I'm still working out exactly
where constructor parameters vs. method parameters will be used, so
I expect this to fluctuate a bit.
This actually makes the invalidation less "correct" at this phase,
because now function passes don't invalidate module analysis passes, but
that was actually somewhat of a misfeature. It will return in a better
factored form which can scale to other units of IR. The documentation
has gotten less verbose and helpful.
llvm-svn: 195189
This is the first step to fix pr17918.
It extends the .section directive a bit, inspired by what the ELF one looks
like. The problem with using linkonce is that given
.section foo
.linkonce....
.section foo
.linkonce
we would already have switched sections when getting to .linkonce. The cleanest
solution seems to be to add the comdat information in the .section itself.
llvm-svn: 195148
This patch removes most of the trivial cases of weak vtables by pinning them to
a single object file. The memory leaks in this version have been fixed. Thanks
Alexey for pointing them out.
Differential Revision: http://llvm-reviews.chandlerc.com/D2068
Reviewed by Andy
llvm-svn: 195064
This change is incorrect. If you delete virtual destructor of both a base class
and a subclass, then the following code:
Base *foo = new Child();
delete foo;
will not cause the destructor for members of Child class. As a result, I observe
plently of memory leaks. Notable examples I investigated are:
ObjectBuffer and ObjectBufferStream, AttributeImpl and StringSAttributeImpl.
llvm-svn: 194997
We used to depend on running processModule before the other public functions
such as processDeclare, processValue and processLocation. We are now relaxing
the constraint by adding a module argument to the three functions and
letting the three functions to initialize the type map. This will be used in
a follow-on patch that collects nodes reachable from a Function.
llvm-svn: 194973
This adds a boolean member variable to the PassManagerBuilder to control loop
rerolling (just like we have for unrolling and the various vectorization
options). This is necessary for control by the frontend. Loop rerolling remains
disabled by default at all optimization levels.
llvm-svn: 194966
This change is the first in a series of changes improving LLVM's Block
Frequency propogation implementation to not lose probability mass in
branchy code when propogating block frequency information from a basic
block to its successors. This patch is a simple infrastructure
improvement that does not actually modify the block frequency
algorithm. The specific changes are:
1. Changes the division algorithm used when scaling block frequencies by
branch probabilities to a short division algorithm. This gives us the
remainder for free as well as provides a nice speed boost. When I
benched the old routine and the new routine on a Sandy Bridge iMac with
disabled turbo mode performing 8192 iterations on an array of length
32768, I saw ~600% increase in speed in mean/median performance.
2. Exposes a scale method that returns a remainder. This is important so
we can ensure that when we scale a block frequency by some branch
probability BP = N/D, the remainder from the division by D can be
retrieved and propagated to other children to ensure no probability mass
is lost (more to come on this).
llvm-svn: 194950
AnalysisManager. All this method did was assert something and we have
a perfectly good way to trigger that assert from the query path.
llvm-svn: 194947
Implementing this on bigendian platforms could get strange. I added a
target hook, getStackSlotRange, per Jakob's recommendation to make
this as explicit as possible.
llvm-svn: 194942
This adds a loop rerolling pass: the opposite of (partial) loop unrolling. The
transformation aims to take loops like this:
for (int i = 0; i < 3200; i += 5) {
a[i] += alpha * b[i];
a[i + 1] += alpha * b[i + 1];
a[i + 2] += alpha * b[i + 2];
a[i + 3] += alpha * b[i + 3];
a[i + 4] += alpha * b[i + 4];
}
and turn them into this:
for (int i = 0; i < 3200; ++i) {
a[i] += alpha * b[i];
}
and loops like this:
for (int i = 0; i < 500; ++i) {
x[3*i] = foo(0);
x[3*i+1] = foo(0);
x[3*i+2] = foo(0);
}
and turn them into this:
for (int i = 0; i < 1500; ++i) {
x[i] = foo(0);
}
There are two motivations for this transformation:
1. Code-size reduction (especially relevant, obviously, when compiling for
code size).
2. Providing greater choice to the loop vectorizer (and generic unroller) to
choose the unrolling factor (and a better ability to vectorize). The loop
vectorizer can take vector lengths and register pressure into account when
choosing an unrolling factor, for example, and a pre-unrolled loop limits that
choice. This is especially problematic if the manual unrolling was optimized
for a machine different from the current target.
The current implementation is limited to single basic-block loops only. The
rerolling recognition should work regardless of how the loop iterations are
intermixed within the loop body (subject to dependency and side-effect
constraints), but the significant restriction is that the order of the
instructions in each iteration must be identical. This seems sufficient to
capture all current use cases.
This pass is not currently enabled by default at any optimization level.
llvm-svn: 194939
This patch removes most of the trivial cases of weak vtables by pinning them to
a single object file.
Differential Revision: http://llvm-reviews.chandlerc.com/D2068
Reviewed by Andy
llvm-svn: 194865
0xffff does not mean that there are 65535 sections in a COFF file but
indicates that it's a COFF import library. This patch fixes SEGV error
when an import library file is passed to llvm-readobj.
llvm-svn: 194844
Stop folding constant adds into GEP when the type size doesn't match.
Otherwise, the adds' operands are effectively being promoted, changing the
conditions of an overflow. Results are different when:
sext(a) + sext(b) != sext(a + b)
Problem originally found on x86-64, but also fixed issues with ARM and PPC,
which used similar code.
<rdar://problem/15292280>
Patch by Duncan Exon Smith!
llvm-svn: 194840
Summary:
When getConstant() is called for an expanded vector type, it is split into
multiple scalar constants which are then combined using appropriate build_vector
and bitcast operations.
In addition to the usual big/little endian differences, the case where the
element-order of the vector does not have the same endianness as the elements
themselves is also accounted for. For example, for v4i32 on big-endian MIPS,
the byte-order of the vector is <3210,7654,BA98,FEDC>. For little-endian, it is
<0123,4567,89AB,CDEF>.
Handling this case turns out to be a nop since getConstant() returns a splatted
vector (so reversing the element order doesn't change the value)
This fixes a number of cases in MIPS MSA where calling getConstant() during
operation legalization introduces illegal types (e.g. to legalize v2i64 UNDEF
into a v2i64 BUILD_VECTOR of illegal i64 zeros). It should also handle bigger
differences between illegal and legal types such as legalizing v2i64 into v8i16.
lowerMSASplatImm() in the MIPS backend no longer needs to avoid calling
getConstant() so this function has been updated in the same patch.
For the sake of transparency, the steps I've taken since the review are:
* Added 'virtual' to isVectorEltOrderLittleEndian() as requested. This revealed
that the MIPS tests were falsely passing because a polymorphic function was
not actually polymorphic in the reviewed patch.
* Fixed the tests that were now failing. This involved deleting the code to
handle the MIPS MSA element-order (which was previously doing an byte-order
swap instead of an element-order swap). This left
isVectorEltOrderLittleEndian() unused and it was deleted.
* Fixed build failures caused by rebasing beyond r194467-r194472. These build
failures involved the bset, bneg, and bclr instructions added in these commits
using lowerMSASplatImm() in a way that was no longer valid after this patch.
Some of these were fixed by calling SelectionDAG::getConstant() instead,
others were fixed by a new function getBuildVectorSplat() that provided the
removed functionality of lowerMSASplatImm() in a more sensible way.
Reviewers: bkramer
Reviewed By: bkramer
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D1973
llvm-svn: 194811
This is to avoid this transformation in some cases:
fold (conv (load x)) -> (load (conv*)x)
On architectures that don't natively support some vector
loads efficiently casting the load to a smaller vector of
larger types and loading is more efficient.
Patch by Micah Villmow.
llvm-svn: 194783
Including only Debug.h did not cause a compilation error, but you couldn't
do anything (like writing something with <<) to raw_ostreams returned by
llvm::dbgs() or llvm::errs() without including raw_ostream.h. So including
it from Debug.h should make sense.
Differential Revision: http://llvm-reviews.chandlerc.com/D2183
llvm-svn: 194759
This is useful for debugging issues in the BlockFrequency implementation since
one can easily visualize where probability mass and other errors occur in the
propagation.
llvm-svn: 194654
- readInt() should check all 4 bytes can be read, not just 1.
- In the event of false data in the gcno file, it was possible to index
into a non-existent index of SmallVector, causing assertion error.
llvm-svn: 194639
According to the hazy gcov documentation, it appeared to be technically
possible for lines within a block to belong to different source files.
However, upon further investigation, gcov does not actually support
multiple source files for a single block.
This change removes a level of separation between blocks and lines by
replacing the StringMap of GCOVLines with a SmallVector of ints
representing line numbers. This also means that the GCOVLines class is
no longer needed.
This paves the way for supporting the "-a" option, which will output
block information.
llvm-svn: 194637
Unified the interface for read functions. They all return a boolean
indicating if the read from file succeeded. Functions that previously
returned the read value now store it into a variable that is passed in
by reference instead. Callers will need to check the return value to
detect if an error occurred.
Also added a new test which ensures that no assertions occur when file
contains invalid data. llvm-cov should return with error code 1 upon
failure.
llvm-svn: 194635
instructions. This patch does not include the shift right and accumulate
instructions. A number of non-overloaded intrinsics have been remove in favor
of their overloaded counterparts.
llvm-svn: 194598
Accepting quotes is a property of an assembler, not of an object file. For
example, ELF can support any names for sections and symbols, but the gnu
assembler only accepts quotes in some contexts and llvm-mc in a few more.
LLVM should not produce different symbols based on a guess about which assembler
will be reading the code it is printing.
llvm-svn: 194575
This adds a new scalar pass that reads a file with samples generated
by 'perf' during runtime. The samples read from the profile are
incorporated and emmited as IR metadata reflecting that profile.
The profile file is assumed to have been generated by an external
profile source. The profile information is converted into IR metadata,
which is later used by the analysis routines to estimate block
frequencies, edge weights and other related data.
External profile information files have no fixed format, each profiler
is free to define its own. This includes both the on-disk representation
of the profile and the kind of profile information stored in the file.
A common kind of profile is based on sampling (e.g., perf), which
essentially counts how many times each line of the program has been
executed during the run.
The SampleProfileLoader pass is organized as a scalar transformation.
On startup, it reads the file given in -sample-profile-file to
determine what kind of profile it contains. This file is assumed to
contain profile information for the whole application. The profile
data in the file is read and incorporated into the internal state of
the corresponding profiler.
To facilitate testing, I've organized the profilers to support two file
formats: text and native. The native format is whatever on-disk
representation the profiler wants to support, I think this will mostly
be bitcode files, but it could be anything the profiler wants to
support. To do this, every profiler must implement the
SampleProfile::loadNative() function.
The text format is mostly meant for debugging. Records are separated by
newlines, but each profiler is free to interpret records as it sees fit.
Profilers must implement the SampleProfile::loadText() function.
Finally, the pass will call SampleProfile::emitAnnotations() for each
function in the current translation unit. This function needs to
translate the loaded profile into IR metadata, which the analyzer will
later be able to use.
This patch implements the first steps towards the above design. I've
implemented a sample-based flat profiler. The format of the profile is
fairly simplistic. Each sampled function contains a list of relative
line locations (from the start of the function) together with a count
representing how many samples were collected at that line during
execution. I generate this profile using perf and a separate converter
tool.
Currently, I have only implemented a text format for these profiles. I
am interested in initial feedback to the whole approach before I send
the other parts of the implementation for review.
This patch implements:
- The SampleProfileLoader pass.
- The base ExternalProfile class with the core interface.
- A SampleProfile sub-class using the above interface. The profiler
generates branch weight metadata on every branch instructions that
matches the profiles.
- A text loader class to assist the implementation of
SampleProfile::loadText().
- Basic unit tests for the pass.
Additionally, the patch uses profile information to compute branch
weights based on instruction samples.
This patch converts instruction samples into branch weights. It
does a fairly simplistic conversion:
Given a multi-way branch instruction, it calculates the weight of
each branch based on the maximum sample count gathered from each
target basic block.
Note that this assignment of branch weights is somewhat lossy and can be
misleading. If a basic block has more than one incoming branch, all the
incoming branches will get the same weight. In reality, it may be that
only one of them is the most heavily taken branch.
I will adjust this assignment in subsequent patches.
llvm-svn: 194566
This bug only bit the C++98 build bots because all of the actual uses
really do move. ;] But not *quite* ready to do the whole C++11 switch
yet, so clean it up. Also add a unit test that catches this immediately.
llvm-svn: 194548
more smarts in it. This is where most of the interesting logic that used
to live in the implicit-scheduling-hackery of the old pass manager will
live.
Like the previous commits, note that this is a very early prototype!
I expect substantial changes before this is ready to use.
The core of the design is the following:
- We have an AnalysisManager which can be used across a series of
passes over a module.
- The code setting up a pass pipeline registers the analyses available
with the manager.
- Individual transform passes can check than an analysis manager
provides the analyses they require in order to fail-fast.
- There is *no* implicit registration or scheduling.
- Analysis passes are different from other passes: they produce an
analysis result that is cached and made available via the analysis
manager.
- Cached results are invalidated automatically by the pass managers.
- When a transform pass requests an analysis result, either the analysis
is run to produce the result or a cached result is provided.
There are a few aspects of this design that I *know* will change in
subsequent commits:
- Currently there is no "preservation" system, that needs to be added.
- All of the analysis management should move up to the analysis library.
- The analysis management needs to support at least SCC passes. Maybe
loop passes. Living in the analysis library will facilitate this.
- Need support for analyses which are *both* module and function passes.
- Need support for pro-actively running module analyses to have cached
results within a function pass manager.
- Need a clear design for "immutable" passes.
- Need support for requesting cached results when available and not
re-running the pass even if that would be necessary.
- Need more thorough testing of all of this infrastructure.
There are other aspects that I view as open questions I'm hoping to
resolve as I iterate a bit on the infrastructure, and especially as
I start writing actual passes against this.
- Should we have separate management layers for function, module, and
SCC analyses? I think "yes", but I'm not yet ready to switch the code.
Adding SCC support will likely resolve this definitively.
- How should the 'require' functionality work? Should *that* be the only
way to request results to ensure that passes always require things?
- How should preservation work?
- Probably some other things I'm forgetting. =]
Look forward to more patches in shorter order now that this is in place.
llvm-svn: 194538
Add user-supplied C runtime and compiler-rt library functions to
llvm.compiler.used to protect them from premature optimization by
passes like -globalopt and -ipsccp. Calls to (seemingly unused)
runtime library functions can be added by -instcombine and instruction
lowering.
Patch by Duncan Exon Smith, thanks!
Fixes <rdar://problem/14740087>
llvm-svn: 194514
This reverts commit r194485.
The variable is unused in some macro instantiations, but not others. We should
probably fix clang to not warn on this.
llvm-svn: 194486
Based on discussions with Lang Hames and Jakob Stoklund Olesen at the hacker's lab, and in the light of upcoming work on the PBQP register allocator, it was though that CalcSpillWeights does not need to be a pass. This change will enable to customize / tune the spill weight computation depending on the allocator.
Update the documentation style while there.
No functionnal change.
llvm-svn: 194356
This is still just a skeleton. I'm trying to pull together the
experimentation I've done into committable chunks, and this is the first
coherent one. Others will follow in hopefully short order that move this
more toward a useful initial implementation. I still expect the design
to continue evolving in small ways as I work through the different
requirements and features needed here though.
Keep in mind, all of this is off by default.
Currently, this mostly exercises the use of a polymorphic smart pointer
and templates to hide the polymorphism for the pass manager from the
pass implementation. The next step will be more significant, adding the
first framework of analysis support.
llvm-svn: 194325
give the files a legacy prefix in the right directory. Use forwarding
headers in the old locations to paper over the name change for most
clients during the transitional period.
No functionality changed here! This is just clearing some space to
reduce renaming churn later on with a new system.
Even when the new stuff starts to go in, it is going to be hidden behind
a flag and off-by-default as it is still WIP and under development.
This patch is specifically designed so that very little out-of-tree code
has to change. I'm going to work as hard as I can to keep that the case.
Only direct forward declarations of the PassManager class are impacted
by this change.
llvm-svn: 194324
unique ownership smart pointer which is *deep* copyable by assuming it
can call a T::clone() method to allocate a copy of the owned data.
This is mostly useful with containers or other collections of uniquely
owned data in C++98 where they *might* copy. With C++11 we can likely
remove this in favor of move-only types and containers wrapped around
those types.
llvm-svn: 194315
This patch moves the jump address materialization inside the noop slide. This
enables patching of the materialization itself or its complete removal. This
patch also adds the ability to define scratch registers that can be used safely
by the code called from the patchpoint intrinsic. At least one scratch register
is required, because that one is used for the materialization of the jump
address. This patch depends on D2009.
Differential Revision: http://llvm-reviews.chandlerc.com/D2074
Reviewed by Andy
llvm-svn: 194306
The new graph structure replaces the node and edge linked lists with vectors.
Free lists (well, free vectors) are used for fast insertion/deletion.
The ultimate aim is to make PBQP graphs cheap to clone. The motivation is that
the PBQP solver destructively consumes input graphs while computing a solution,
forcing the graph to be fully reconstructed for each round of PBQP. This
imposes a high cost on large functions, which often require several rounds of
solving/spilling to find a final register allocation. If we can cheaply clone
the PBQP graph and incrementally update it between rounds then hopefully we can
reduce this cost. Further, once we begin pooling matrix/vector values (future
work), we can cache some PBQP solver metadata and share it between cloned
graphs, allowing the PBQP solver to re-use some of the computation done in
earlier rounds.
For now this is just a data structure update. The allocator and solver still
use the graph the same way as before, fully reconstructing it between each
round. I expect no material change from this update, although it may change
the iteration order of the nodes, causing ties in the solver to break in
different directions, and this could perturb the generated allocations
(hopefully in a completely benign way).
Thanks very much to Arnaud Allard de Grandmaison for encouraging me to get back
to work on this, and for a lot of discussion and many useful PBQP test cases.
llvm-svn: 194300
The idea of the AnyReg Calling Convention is to provide the call arguments in
registers, but not to force them to be placed in a paticular order into a
specified set of registers. Instead it is up tp the register allocator to assign
any register as it sees fit. The same applies to the return value (if
applicable).
Differential Revision: http://llvm-reviews.chandlerc.com/D2009
Reviewed by Andy
llvm-svn: 194293
Based on discussions with Lang Hames and Jakob Stoklund Olesen at the hacker's lab, and in the light of upcoming work on the PBQP register allocator, it was though that CalcSpillWeights does not need to be a pass. This change will enable to customize / tune the spill weight computation depending on the allocator.
Update the documentation style while there.
No functionnal change.
llvm-svn: 194269
Patch by Michele Scandale!
Rewrite of the functions used to compute the backedge taken count of a
loop on LT and GT comparisons.
I decided to split the handling of LT and GT cases becasue the trick
"a > b == -a < -b" in some cases prevents the trip count computation
due to the multiplication by -1 on the two operands of the
comparison. This issue comes from the conservative computation of
value range of SCEVs: taking the negative SCEV of an expression that
have a small positive range (e.g. [0,31]), we would have a SCEV with a
fullset as value range.
Indeed, in the new rewritten function I tried to better handle the
maximum backedge taken count computation when MAX/MIN expression are
used to handle the cases where no entry guard is found.
Some test have been modified in order to check the new value correctly
(I manually check them and reasoning on possible overflow the new
values seem correct).
I finally added a new test case related to the multiplication by -1
issue on GT comparisons.
llvm-svn: 194116
One of the uses of the IsValid flag is to support default constructing
a ErrorOr that is not a Error or a Value. There is not much value in
doing that IMHO. If ErrorOr was to have a default constructor, it
should be implemented by default constructing the value, but even that
looks unnecessary.
The other use is to avoid calling destructors on moved objects. This
looks wrong. If the data being moved has non trivial treatment of
moves (an std::vector for example), it is its destructor that should
handle it, not ~ErrorOr.
With this change ErrorOr becomes a fairly simple wrapper and should
always be better than using an error_code + value in an API.
llvm-svn: 194109
This patch enables llvm-cov to correctly output the run count stored in
the GCDA file. GCOVProfiling currently does not generate this
information, so the GCDA run data had to be hacked on from a GCDA file
generated by gcc. This is corrected by a subsequent patch.
With the run and program data included, both llvm-cov and gcov produced
the same output.
llvm-svn: 194033
ErrorOr had quiet a bit of complexity and indirection to be able to hold a user
type with the error.
That feature is not used anymore. This patch removes it, it will live in svn
history if we ever need it again.
If we do need it again, IMHO there is one thing that should be done
differently: Holding extra info in the error is not a property a function also
returning a value or not. The ability to hold extra info should be in the error
type and ErrorOr templated over it so that we don't need the funny looking
ErrorOr<void>.
llvm-svn: 194030
As with the other loop unrolling parameters (the unrolling threshold, partial
unrolling, etc.) runtime unrolling can now also be controlled via the
constructor. This will be necessary for moving non-trivial unrolling late in
the pass manager (after loop vectorization).
No functionality change intended.
llvm-svn: 194027
stack traces by default if you use PrettyStackTraceProgram, so that existing LLVM-based
tools will continue to get it without any changes.
llvm-svn: 193971
This adds an SimplifyLibCalls case which converts the special __sinpi and
__cospi (float & double variants) into a __sincospi_stret where appropriate to
remove duplicated work.
Patch by Tim Northover
llvm-svn: 193943
linkonce_odr_auto_hide was in incomplete attempt to implement a way
for the linker to hide symbols that are known to be available in every
TU and whose addresses are not relevant for a particular DSO.
It was redundant in that it all its uses are equivalent to
linkonce_odr+unnamed_addr. Unlike those, it has never been connected
to clang or llvm's optimizers, so it was effectively dead.
Given that nothing produces it, this patch just nukes it
(other than the llvm-c enum value).
llvm-svn: 193865
Objective-C data structures.
This is allows tools such as darwin's otool(1) that uses the
LLVM disassembler take a pointer value being loaded by
an instruction and add a comment to what it is being referenced
to make following disassembly of Objective-C programs
more readable.
For example disassembling the Mac OS X TextEdit app one
will see comments like the following:
movq 0x20684(%rip), %rsi ## Objc selector ref: standardUserDefaults
movq 0x21985(%rip), %rdi ## Objc class ref: _OBJC_CLASS_$_NSUserDefaults
movq 0x1d156(%rip), %r14 ## Objc message: +[NSUserDefaults standardUserDefaults]
leaq 0x23615(%rip), %rdx ## Objc cfstring ref: @"SelectLinePanel"
callq 0x10001386c ## Objc message: -[[%rdi super] initWithWindowNibName:]
These diffs also include putting quotes around C strings
in literal pools and uses "symbol address" in the comment
when adding a symbol name to the comment to tell these
types of references apart:
leaq 0x4f(%rip), %rax ## literal pool for: "Hello world"
movq 0x1c3ea(%rip), %rax ## literal pool symbol address: ___stack_chk_guard
Of course the easy changes are in the LLVM disassembler and
the hard work is up to the implementer of the SymbolLookUp()
call back.
rdar://10602439
llvm-svn: 193833
There are two ways one could implement hiding of linkonce_odr symbols in LTO:
* LLVM tells the linker which symbols can be hidden if not used from native
files.
* The linker tells LLVM which symbols are not used from other object files,
but will be put in the dso symbol table if present.
GOLD's API is the second option. It was implemented almost 1:1 in llvm by
passing the list down to internalize.
LLVM already had partial support for the first option. It is also very similar
to how ld64 handles hiding these symbols when *not* doing LTO.
This patch then
* removes the APIs for the DSO list.
* marks LTO_SYMBOL_SCOPE_DEFAULT_CAN_BE_HIDDEN all linkonce_odr unnamed_addr
global values and other linkonce_odr whose address is not used.
* makes the gold plugin responsible for handling the API mismatch.
llvm-svn: 193800
With this patch llvm produces a weak_def_can_be_hidden for linkonce_odr
if they are also unnamed_addr or don't have their address taken.
There is not a lot of documentation about .weak_def_can_be_hidden, but
from the old discussion about linkonce_odr_auto_hide and the name of
the directive this looks correct: these symbols can be hidden.
Testing this with the ld64 in Xcode 5 linking clang reduces the number of
exported symbols from 21053 to 19049.
llvm-svn: 193718
startswith_lower is ocassionally useful and I think worth adding.
endwith_lower is added for completeness.
Differential Revision: http://llvm-reviews.chandlerc.com/D2041
llvm-svn: 193706
Also corrected the definition of the intrinsics for these instructions (the
result register is also the first operand), and added intrinsics for bsel and
bseli to clang (they already existed in the backend).
These four operations are mostly equivalent to bsel, and bseli (the difference
is which operand is tied to the result). As a result some of the tests changed
as described below.
bitwise.ll:
- bsel.v test adapted so that the mask is unknown at compile-time. This stops
it emitting bmnzi.b instead of the intended bsel.v.
- The bseli.b test now tests the right thing. Namely the case when one of the
values is an uimm8, rather than when the condition is a uimm8 (which is
covered by bmnzi.b)
compare.ll:
- bsel.v tests now (correctly) emits bmnz.v instead of bsel.v because this
is the same operation (see MSA.txt).
i8.ll
- CHECK-DAG-ized test.
- bmzi.b test now (correctly) emits equivalent bmnzi.b with swapped operands
because this is the same operation (see MSA.txt).
- bseli.b still emits bseli.b though because the immediate makes it
distinguishable from bmnzi.b.
vec.ll:
- CHECK-DAG-ized test.
- bmz.v tests now (correctly) emits bmnz.v with swapped operands (see
MSA.txt).
- bsel.v tests now (correctly) emits bmnz.v with swapped operands (see
MSA.txt).
llvm-svn: 193693
This problem was found and fixed by José Fonseca in March 2011 for
SmallPtrSet, committed r128566. But as far as I can tell, all other
llvm hash tables retain the same problem: the bucket count can grow
without bound while size() remains near constant by repeated
insert/erase cycles that tend to fill the container with tombstones.
Here is a demo that has been reduced to a trivial case:
int
main()
{
llvm::DenseSet<unsigned> d;
for (unsigned i = 0; i < 0xFFFFFFF; ++i)
{
d.insert(i);
d.erase(i);
}
}
While the container size() never grows above 1, the bucket count grows
like this:
nb = 64
nb = 128
nb = 256
nb = 512
nb = 1024
nb = 2048
nb = 4096
nb = 8192
nb = 16384
nb = 32768
nb = 65536
nb = 131072
nb = 262144
nb = 524288
nb = 1048576
nb = 2097152
nb = 4194304
nb = 8388608
nb = 16777216
nb = 33554432
nb = 67108864
nb = 134217728
nb = 268435456
The above program currently consumes a few GB ram. This patch brings
the memory consumption down by several orders of magnitude, and keeps
the bucket count at 64 for the above test.
llvm-svn: 193689
This required correcting the definition of the bins[lr]i intrinsics because
the result is also the first operand.
It also required removing the (arbitrary) check for 32-bit immediates in
MipsSEDAGToDAGISel::selectVSplat().
Currently using binsli.d with 2 bits set in the mask doesn't select binsli.d
because the constant is legalized into a ConstantPool. Similar things can
happen with binsri.d with more than 10 bits set in the mask. The resulting
code when this happens is correct but not optimal.
llvm-svn: 193687
This modifies the pass to classify every SSP-triggering AllocaInst according to
an SSPLayoutKind (LargeArray, SmallArray, AddrOf). This analysis is collected
by the pass and made available for use, but no other pass uses it yet.
The next patch will make use of this analysis in PEI and StackSlot
passes. The end goal is to support ssp-strong stack layout rules.
WIP.
Differential Revision: http://llvm-reviews.chandlerc.com/D1789
llvm-svn: 193653
Use 32-bit types for the array instead of 64. This should
generally be better anyway.
In optimized + assert builds, I saw a failure when a
cond code / type combination that is never set was loading
a non-zero value and hitting the != Promote assert.
It turns out when loading the 64-bit value to do the shift,
the assembly loads the 2 32-bit halves from non-consecutive
addresses. The address the second half of the loaded uint64_t
doesn't include the offset of the array in the struct. Instead
of being offset + 4, it's just + 4.
I'm not entirely sure why this wasn't observed before.
setCondCodeAction isn't heavily used by the in-tree targets,
and not with the higher valued vector SimpleValueTypes. Only
PPC is using one of the > 32 valued types, and that is probably
never used by anyone on a 32-bit MSVC compiled host.
I ran into this when upgrading LLVM versions, so I guess the
value loaded from the nonsense address happened to work out
before.
No test since I'm not really sure if / how it can be reproduced
with the current in tree targets, and it's not supposed to change
anything.
llvm-svn: 193650
Summary:
Use DWARF4 table of form classes to fetch attributes from DIE
in a more consistent way. This shouldn't change the functionality and
serves as a refactoring for upcoming change: DW_AT_high_pc has different
semantics depending on its form class.
Reviewers: dblaikie, echristo
Reviewed By: echristo
CC: echristo, llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D1961
llvm-svn: 193553
This commit allows the ARM integrated assembler to parse
and assemble the code with .eabi_attribute, .cpu, and
.fpu directives.
To implement the feature, this commit moves the code from
AttrEmitter to ARMTargetStreamers, and several new test
cases related to cortex-m4, cortex-r5, and cortex-a15 are
added.
Besides, this commit also change the Subtarget->isFPOnlySP()
to Subtarget->hasD16() to match the usage of .fpu directive.
This commit changes the test cases:
* Several .eabi_attribute directives in
2010-09-29-mc-asm-header-test.ll are removed because the .fpu
directive already cover the functionality.
* In the Cortex-A15 test case, the value for
Tag_Advanced_SIMD_arch has be changed from 1 to 2,
which is more precise.
llvm-svn: 193524
Most SelectionDAG code drops the TBAA info when creating a new form of a
load and store (e.g. during legalization, or when converting a plain
load to an extending one). This patch tries to catch all cases where
the TBAA information can legitimately be carried over.
The patch adds alternative forms of getLoad() and getExtLoad() that take
a MachineMemOperand instead of individual fields. (The corresponding
getTruncStore() already exists.) The idea is to use the MachineMemOperand
forms when all fields are carried over (size, pointer info, isVolatile,
isNonTemporal, alignment and TBAA info). If some adjustment is being
made, e.g. to narrow the load, then we still pass the individual fields
but also pass the TBAA info.
llvm-svn: 193517
Partial fix for PR17459: wrong code at -O3 on x86_64-linux-gnu
(affecting trunk and 3.3)
When SCEV expands a recurrence outside of a loop it attempts to scale
by the stride of the recurrence. Chained recurrences don't work that
way. We could compute binomial coefficients, but would hve to
guarantee that the chained AddRec's are in a perfectly reduced form.
llvm-svn: 193438