Commit Graph

4949 Commits

Author SHA1 Message Date
Duncan P. N. Exon Smith c9b7cfea2f blockfreq: Expose getPackagedNode()
Make `getPackagedNode()` a member function of
`BlockFrequencyInfoImplBase` so that it's available for templated code.

<rdar://problem/14292693>

llvm-svn: 207183
2014-04-25 04:38:12 +00:00
Duncan P. N. Exon Smith 1cab8a0708 blockfreq: Store the header with the members
<rdar://problem/14292693>

llvm-svn: 207182
2014-04-25 04:38:09 +00:00
Duncan P. N. Exon Smith 39cc64827e blockfreq: Encapsulate LoopData::Header
<rdar://problem/14292693>

llvm-svn: 207181
2014-04-25 04:38:06 +00:00
Duncan P. N. Exon Smith d132040ed6 blockfreq: Use LoopData directly
Instead of passing around loop headers, pass around `LoopData` directly.

<rdar://problem/14292693>

llvm-svn: 207179
2014-04-25 04:38:01 +00:00
Duncan P. N. Exon Smith fc7dc93031 blockfreq: Use a std::list for Loops
As pointed out by David Blaikie in code review, a `std::list<T>` is
simpler than a `std::vector<std::unique_ptr<T>>`.  Another option is a
`std::deque<T>` (which allocates in chunks), but I'd like to leave open
the option of inserting in the middle of the sequence for handling
irreducible control flow on the fly.

<rdar://problem/14292693>

llvm-svn: 207177
2014-04-25 04:30:06 +00:00
Chandler Carruth 91dcf0f977 [LCG] Switch a weird do/while loop that actually couldn't fail its
condition into an obviously infinite loop with an assert about the
degenerate condition. No functionality changed.

llvm-svn: 207147
2014-04-24 21:19:30 +00:00
Chandler Carruth 24553934f8 [LCG] Incorporate the core trick of improvements on the naive Tarjan's
algorithm here: http://dl.acm.org/citation.cfm?id=177301.

The idea of isolating the roots has even more relevance when using the
stack not just to implement the DFS but also to implement the recursive
step. Because we use it for the recursive step, to isolate the roots we
need to maintain two stacks: one for our recursive DFS walk, and another
of the nodes that have been walked. The nice thing is that the latter
will be half the size. It also fixes a complete hack where we scanned
backwards over the stack to find the next potential-root to continue
processing. Now that is always the top of the DFS stack.

While this is a really nice improvement already (IMO) it further opens
the door for two important simplifications:

1) De-duplicating some of the code across the two different walks. I've
   actually made the duplication a bit worse in some senses with this
   patch because the two are starting to converge.
2) Dramatically simplifying the loop structures of both walks.

I wanted to do those separately as they'll be essentially *just* CFG
restructuring. This patch on the other hand actually uses different
datastructures to implement the algorithm itself.

llvm-svn: 207098
2014-04-24 11:05:20 +00:00
Chandler Carruth 09751bf173 [LCG] Rotate logic applied to the top of the DFSStack to instead be
applied prior to pushing a node onto the DFSStack. This is the first
step toward avoiding the stack entirely for leaf nodes. It also
simplifies things a bit and I think is pointing the way toward factoring
some more of the shared logic out of the two implementations.

It is also making it more obvious how to restructure the loops
themselves to be a bit easier to read (although no different in terms of
functionality).

llvm-svn: 207095
2014-04-24 09:59:59 +00:00
Chandler Carruth 493e0a6ad0 [LCG] Switch the parent SCC tracking from a SmallSetVector to
a SmallPtrSet. Currently, there is no need for stable iteration in this
dimension, and I now thing there won't need to be going forward.

If this is ever re-introduced in any form, it needs to not be
a SetVector based solution because removal cannot be linear. There will
be many SCCs with large numbers of parents. When encountering these, the
incremental SCC update for intra-SCC edge removal was quadratic due to
linear removal (kind of).

I'm really hoping we can avoid having an ordering property here at all
though...

llvm-svn: 207091
2014-04-24 09:22:31 +00:00
Chandler Carruth d52f8e0e4d [LCG] We don't actually need a set in each SCC to track the nodes. We
can use the node -> SCC mapping in the top-level graph to test this on
the rare occasions we need it.

llvm-svn: 207090
2014-04-24 08:55:36 +00:00
Craig Topper 353eda484c [C++] Use 'nullptr'.
llvm-svn: 207083
2014-04-24 06:44:33 +00:00
Chandler Carruth 6a4fee87bc [LCG] Normalize the post-order SCC iterator to just iterate over the SCC
values rather than having pointers in weird places.

llvm-svn: 207053
2014-04-23 23:51:07 +00:00
Chandler Carruth bd5d3082c4 [LCG] Switch the primary node iterator to be a *much* more normal C++
iterator, returning a Node by reference on dereference.

llvm-svn: 207048
2014-04-23 23:34:48 +00:00
Chandler Carruth 2a898e0df6 [LCG] Make the insertion and query paths into the LCG which cannot fail
return references to better model this property.

No functionality changed.

llvm-svn: 207047
2014-04-23 23:20:36 +00:00
Chandler Carruth a10e240377 [LCG] Switch the SCC lookup to be in terms of call graph nodes rather
than functions. So far, this access pattern is *much* more common. It
seems likely that any user of this interface is going to have nodes at
the point that they are querying the SCCs.

No functionality changed.

llvm-svn: 207045
2014-04-23 23:12:06 +00:00
Chandler Carruth b4a04da0b9 [LCG] Switch the primary SCC building code to use the negative low-link
values rather than an expensive dense map query to test whether children
have already been popped into an SCC. This matches the incremental SCC
building code. I've also included the assert that I put there but
updated both of their text.

No functionality changed here.

I still don't have any great ideas for sharing the code between the two
implementations, but I may try a brute-force approach to factoring it at
some point.

llvm-svn: 207042
2014-04-23 22:28:13 +00:00
Chandler Carruth 9302fbf0ae [LCG] Add the first round of mutation support to the lazy call graph.
This implements the core functionality necessary to remove an edge from
the call graph and correctly update both the basic graph and the SCC
structure. As part of that it has to run a tiny (in number of nodes)
Tarjan-style DFS walk of an SCC being mutated to compute newly formed
SCCs, etc.

This is *very rough* and a WIP. I have a bunch of FIXMEs for code
cleanup that will reduce the boilerplate in this change substantially.
I also have a bunch of simplifications to various parts of both
algorithms that I want to make, but first I'd like to have a more
holistic picture. Ideally, I'd also like more testing. I'll probably add
quite a few more unit tests as I go here to cover the various different
aspects and corner cases of removing edges from the graph.

Still, this is, so far, successfully updating the SCC graph in-place
without disrupting the identity established for the existing SCCs even
when we do challenging things like delete the critical edge that made an
SCC cycle at all and have to reform things as a tree of smaller SCCs.
Getting this to work is really critical for the new pass manager as it
is going to associate significant state with the SCC instance and needs
it to be stable. That is also the motivation behind the return of the
newly formed SCCs. Eventually, I'll wire this all the way up to the
public API so that the pass manager can use it to correctly re-enqueue
newly formed SCCs into a fresh postorder traversal.

llvm-svn: 206968
2014-04-23 11:03:03 +00:00
Chandler Carruth cace6623c4 [LCG] Implement Tarjan's algorithm correctly this time. We have to walk
up the stack finishing the exploration of each entries children before
we're finished in addition to accounting for their low-links. Added
a unittest that really hammers home the need for this with interlocking
cycles that would each appear distinct otherwise and crash or compute
the wrong result. As part of this, nuke a stale fixme and bring the rest
of the implementation still more closely in line with the original
algorithm.

llvm-svn: 206966
2014-04-23 10:31:17 +00:00
Chandler Carruth c7bad9a5a0 [LCG] Add a unittest for the LazyCallGraph. I had a weak moment and
resisted this for too long. Just with the basic testing here I was able
to exercise the analysis in more detail and sift out both type signature
bugs in the API and a bug in the DFS numbering. All of these are fixed
here as well.

The unittests will be much more important for the mutation support where
it is necessary to craft minimal mutations and then inspect the state of
the graph. There is just no way to do that with a standard FileCheck
test. However, unittesting these kinds of analyses is really quite easy,
especially as they're designed with the new pass manager where there is
essentially no infrastructure required to rig up the core logic and
exercise it at an API level.

As a minor aside about the DFS numbering bug, the DFS numbering used in
LCG is a bit unusual. Rather than numbering from 0, we number from 1,
and use 0 as the sentinel "unvisited" state. Other implementations often
use '-1' for this, but I find it easier to deal with 0 and it shouldn't
make any real difference provided someone doesn't write silly bugs like
forgetting to actually initialize the DFS numbering. Oops. ;]

llvm-svn: 206954
2014-04-23 08:08:49 +00:00
Chandler Carruth 3f9869a8e2 [LCG] Hoist the logic for forming a new SCC from the top of the DFSStack
into a helper function. I plan to re-use it for doing incremental
DFS-based updates to the SCCs when we mutate the call graph.

llvm-svn: 206948
2014-04-23 06:09:03 +00:00
Chandler Carruth 0b623baeb3 [LCG] Switch the Callee sets to be DenseMaps pointing to the index into
the Callee list. This is going to be quite important to prevent removal
from going quadratic. No functionality changed at this point, this is
one of the refactoring patches I've broken out of my initial work toward
mutation updates of the call graph.

llvm-svn: 206938
2014-04-23 04:00:17 +00:00
Duncan P. N. Exon Smith b3380ea60a blockfreq: Skip irreducible backedges inside functions
The branch that skips irreducible backedges was only active when
propagating mass at the top-level.  In particular, when propagating mass
through a loop recognized by `LoopInfo` with irreducible control flow
inside, irreducible backedges would not be skipped.

Not sure where that idea came from, but the result was that mass was
lost until after loop exit.  Added a testcase that covers this case.

llvm-svn: 206860
2014-04-22 03:31:53 +00:00
Duncan P. N. Exon Smith d1aec79d7a blockfreq: Rename PackagedLoops => Loops
llvm-svn: 206859
2014-04-22 03:31:50 +00:00
Duncan P. N. Exon Smith 2984a64bae blockfreq: Use a pointer for ContainingLoop too
llvm-svn: 206858
2014-04-22 03:31:44 +00:00
Duncan P. N. Exon Smith e1423639bb blockfreq: Use pointers to loops instead of an index
Store pointers directly to loops inside the nodes.  This could have been
done without changing the type stored in `std::vector<>`.  However,
rather than computing the number of loops before constructing them
(which `LoopInfo` doesn't provide directly), I've switched to a
`vector<unique_ptr<LoopData>>`.

This adds some heap overhead, but the number of loops is typically
small.

llvm-svn: 206857
2014-04-22 03:31:37 +00:00
Duncan P. N. Exon Smith dc2d66e7b3 blockfreq: Implement clear() explicitly
This was implicitly with copy assignment before, which fails to actually
clear `std::vector<>`'s heap storage.  Move assignment would work, but
since MSVC can't imply those anyway, explicitly `clear()`-ing members
makes more sense.

llvm-svn: 206856
2014-04-22 03:31:34 +00:00
Duncan P. N. Exon Smith cc88ebfa5f blockfreq: Rename PackagedLoopData => LoopData
No functionality change.

llvm-svn: 206855
2014-04-22 03:31:31 +00:00
Chandler Carruth f1221bd01b [Modules] Fix potential ODR violations by sinking the DEBUG_TYPE
definition below all the header #include lines, lib/Analysis/...
edition.

This one has a bit extra as there were *other* #define's before #include
lines in addition to DEBUG_TYPE. I've sunk all of them as a block.

llvm-svn: 206843
2014-04-22 02:48:03 +00:00
Chandler Carruth 1b9dde087e [Modules] Remove potential ODR violations by sinking the DEBUG_TYPE
define below all header includes in the lib/CodeGen/... tree. While the
current modules implementation doesn't check for this kind of ODR
violation yet, it is likely to grow support for it in the future. It
also removes one layer of macro pollution across all the included
headers.

Other sub-trees will follow.

llvm-svn: 206837
2014-04-22 02:02:50 +00:00
Chandler Carruth e96dd8975f [Modules] Make Support/Debug.h modular. This requires it to not change
behavior based on other files defining DEBUG_TYPE, which means it cannot
define DEBUG_TYPE at all. This is actually better IMO as it forces folks
to define relevant DEBUG_TYPEs for their files. However, it requires all
files that currently use DEBUG(...) to define a DEBUG_TYPE if they don't
already. I've updated all such files in LLVM and will do the same for
other upstream projects.

This still leaves one important change in how LLVM uses the DEBUG_TYPE
macro going forward: we need to only define the macro *after* header
files have been #include-ed. Previously, this wasn't possible because
Debug.h required the macro to be pre-defined. This commit removes that.
By defining DEBUG_TYPE after the includes two things are fixed:

- Header files that need to provide a DEBUG_TYPE for some inline code
  can do so by defining the macro before their inline code and undef-ing
  it afterward so the macro does not escape.

- We no longer have rampant ODR violations due to including headers with
  different DEBUG_TYPE definitions. This may be mostly an academic
  violation today, but with modules these types of violations are easy
  to check for and potentially very relevant.

Where necessary to suppor headers with DEBUG_TYPE, I have moved the
definitions below the includes in this commit. I plan to move the rest
of the DEBUG_TYPE macros in LLVM in subsequent commits; this one is big
enough.

The comments in Debug.h, which were hilariously out of date already,
have been updated to reflect the recommended practice going forward.

llvm-svn: 206822
2014-04-21 22:55:11 +00:00
Duncan P. N. Exon Smith 254689fcf9 blockfreq: Some cleanup of UnsignedFloat
Change `PositiveFloat` to `UnsignedFloat`, and fix some of the comments
to indicate that it's disappearing eventually.

llvm-svn: 206771
2014-04-21 18:31:58 +00:00
Duncan P. N. Exon Smith 10be9a8868 Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl"
This reverts commit r206707, reapplying r206704.  The preceding commit
to CalcSpillWeights should have sorted out the failing buildbots.

<rdar://problem/14292693>

llvm-svn: 206766
2014-04-21 17:57:07 +00:00
Chandler Carruth 572e3407c3 [PM] Add a new-PM-style CGSCC pass manager using the newly added
LazyCallGraph analysis framework. Wire it up all the way through the opt
driver and add some very basic testing that we can build pass pipelines
including these components. Still a lot more to do in terms of testing
that all of this works, but the basic pieces are here.

There is a *lot* of boiler plate here. It's something I'm going to
actively look at reducing, but I don't have any immediate ideas that
don't end up making the code terribly complex in order to fold away the
boilerplate. Until I figure out something to minimize the boilerplate,
almost all of this is based on the code for the existing pass managers,
copied and heavily adjusted to suit the needs of the CGSCC pass
management layer.

The actual CG management still has a bunch of FIXMEs in it. Notably, we
don't do *any* updating of the CG as it is potentially invalidated.
I wanted to get this in place to motivate the new analysis, and add
update APIs to the analysis and the pass management layers in concert to
make sure that the *right* APIs are present.

llvm-svn: 206745
2014-04-21 11:12:00 +00:00
Chandler Carruth 99b756db04 [LCG] Add some basic debug output to the LCG pass.
llvm-svn: 206730
2014-04-21 05:04:24 +00:00
Duncan P. N. Exon Smith e63327e967 Revert "blockfreq: Rewrite BlockFrequencyInfoImpl"
This reverts commit r206704, as expected.

llvm-svn: 206707
2014-04-19 22:46:00 +00:00
Duncan P. N. Exon Smith 875ddfac75 Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl"
This reverts commit r206677, reapplying my BlockFrequencyInfo rewrite.

I've done a careful audit, added some asserts, and fixed a couple of
bugs (unfortunately, they were in unlikely code paths).  There's a small
chance that this will appease the failing bots [1][2].  (If so, great!)

If not, I have a follow-up commit ready that will temporarily add
-debug-only=block-freq to the two failing tests, allowing me to compare
the code path between what the failing bots and what my machines (and
the rest of the bots) are doing.  Once I've triggered those builds, I'll
revert both commits so the bots go green again.

[1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816
[2]: http://llvm-amd64.freebsd.your.org/b/builders/clang-i386-freebsd/builds/18445

<rdar://problem/14292693>

llvm-svn: 206704
2014-04-19 22:34:26 +00:00
Duncan P. N. Exon Smith 76b813619a Revert "blockfreq: Rewrite BlockFrequencyInfoImpl" (#2)
This reverts commit r206666, as planned.

Still stumped on why the bots are failing.  Sanitizer bots haven't
turned anything up.  If anyone can help me debug either of the failures
(referenced in r206666) I'll owe them a beer.  (In the meantime, I'll be
auditing my patch for undefined behaviour.)

llvm-svn: 206677
2014-04-19 00:42:46 +00:00
Duncan P. N. Exon Smith b3caf3646f Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl" (#2)
This reverts commit r206628, reapplying r206622 (and r206626).

Two tests are failing only on buildbots [1][2]: i.e., I can't reproduce
on Darwin, and Chandler can't reproduce on Linux.  Asan and valgrind
don't tell us anything, but we're hoping the msan bot will catch it.

So, I'm applying this again to get more feedback from the bots.  I'll
leave it in long enough to trigger builds in at least the sanitizer
buildbots (it was failing for reasons unrelated to my commit last time
it was in), and hopefully a few others.... and then I expect to revert a
third time.

[1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816
[2]: http://llvm-amd64.freebsd.your.org/b/builders/clang-i386-freebsd/builds/18445

llvm-svn: 206666
2014-04-18 22:30:03 +00:00
Chandler Carruth 2174f44f61 [LCG] Fix the bugs that Ben pointed out in code review (and the MSan bot
caught). Sad that we don't have warnings for these things, but bleh, no
idea how to fix that.

llvm-svn: 206646
2014-04-18 20:44:16 +00:00
Benjamin Kramer 147644d400 Remove a couple of redundant copies of SmallVector::operator==.
No functionality change.

llvm-svn: 206635
2014-04-18 19:48:03 +00:00
Duncan P. N. Exon Smith 0842ff36a6 Revert "blockfreq: Rewrite BlockFrequencyInfoImpl" (#2)
This reverts commit r206622 and the MSVC fixup in r206626.

Apparently the remotely failing tests are still failing, despite my
attempt to fix the nondeterminism in r206621.

llvm-svn: 206628
2014-04-18 17:56:08 +00:00
Duncan P. N. Exon Smith 38fe464df0 Fixing MSVC after r206622?
llvm-svn: 206626
2014-04-18 17:38:01 +00:00
Duncan P. N. Exon Smith f8361d127a Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl"
This reverts commit r206556, effectively reapplying commit r206548 and
its fixups in r206549 and r206550.

In an intervening commit I've added target triples to the tests that
were failing remotely [1] (but passing locally).  I'm hoping the mystery
is solved?  I'll revert this again if the tests are still failing
remotely.

[1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816

llvm-svn: 206622
2014-04-18 17:22:25 +00:00
Chandler Carruth d8d865e266 [LCG] Remove all of the complexity stemming from supporting copying.
Reality is that we're never going to copy one of these. Supporting this
was becoming a nightmare because nothing even causes it to compile most
of the time. Lots of subtle errors built up that wouldn't have been
caught by any "normal" testing.

Also, make the move assignment actually work rather than the bogus swap
implementation that would just infloop if used. As part of that, factor
out the graph pointer updates into a helper to share between move
construction and move assignment.

llvm-svn: 206583
2014-04-18 11:02:33 +00:00
Chandler Carruth 18eadd9260 [LCG] Add support for building persistent and connected SCCs to the
LazyCallGraph. This is the start of the whole point of this different
abstraction, but it is just the initial bits. Here is a run-down of
what's going on here. I'm planning to incorporate some (or all) of this
into comments going forward, hopefully with better editing and wording.
=]

The crux of the problem with the traditional way of building SCCs is
that they are ephemeral. The new pass manager however really needs the
ability to associate analysis passes and results of analysis passes with
SCCs in order to expose these analysis passes to the SCC passes. Making
this work is kind-of the whole point of the new pass manager. =]

So, when we're building SCCs for the call graph, we actually want to
build persistent nodes that stick around and can be reasoned about
later. We'd also like the ability to walk the SCC graph in more complex
ways than just the traditional postorder traversal of the current CGSCC
walk. That means that in addition to being persistent, the SCCs need to
be connected into a useful graph structure.

However, we still want the SCCs to be formed lazily where possible.

These constraints are quite hard to satisfy with the SCC iterator. Also,
using that would bypass our ability to actually add data to the nodes of
the call graph to facilite implementing the Tarjan walk. So I've
re-implemented things in a more direct and embedded way. This
immediately makes it easy to get the persistence and connectivity
correct, and it also allows leveraging the existing nodes to simplify
the algorithm. I've worked somewhat to make this implementation more
closely follow the traditional paper's nomenclature and strategy,
although it is still a bit obtuse because it isn't recursive, using
an explicit stack and a tail call instead, and it is interruptable,
resuming each time we need another SCC.

The other tricky bit here, and what actually took almost all the time
and trials and errors I spent building this, is exactly *what* graph
structure to build for the SCCs. The naive thing to build is the call
graph in its newly acyclic form. I wrote about 4 versions of this which
did precisely this. Inevitably, when I experimented with them across
various use cases, they became incredibly awkward. It was all
implementable, but it felt like a complete wrong fit. Square peg, round
hole. There were two overriding aspects that pushed me in a different
direction:

1) We want to discover the SCC graph in a postorder fashion. That means
   the root node will be the *last* node we find. Using the call-SCC DAG
   as the graph structure of the SCCs results in an orphaned graph until
   we discover a root.

2) We will eventually want to walk the SCC graph in parallel, exploring
   distinct sub-graphs independently, and synchronizing at merge points.
   This again is not helped by the call-SCC DAG structure.

The structure which, quite surprisingly, ended up being completely
natural to use is the *inverse* of the call-SCC DAG. We add the leaf
SCCs to the graph as "roots", and have edges to the caller SCCs. Once
I switched to building this structure, everything just fell into place
elegantly.

Aside from general cleanups (there are FIXMEs and too few comments
overall) that are still needed, the other missing piece of this is
support for iterating across levels of the SCC graph. These will become
useful for implementing #2, but they aren't an immediate priority.

Once SCCs are in good shape, I'll be working on adding mutation support
for incremental updates and adding the pass manager that this analysis
enables.

llvm-svn: 206581
2014-04-18 10:50:32 +00:00
Duncan P. N. Exon Smith e576167df8 Revert "blockfreq: Rewrite BlockFrequencyInfoImpl"
This reverts commits r206548, r206549 and r206549.

There are some unit tests failing that aren't failing locally [1], so
reverting until I have time to investigate.

[1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816

llvm-svn: 206556
2014-04-18 02:17:43 +00:00
Duncan P. N. Exon Smith 878cf2b804 blockfreq: Really fix r206548 (and r206549)
Turns out this code is dead.

llvm-svn: 206554
2014-04-18 02:10:09 +00:00
Duncan P. N. Exon Smith c7abca54cf blockfreq: Fixing MSVC after r206548?
llvm-svn: 206549
2014-04-18 02:06:24 +00:00
Duncan P. N. Exon Smith 12e68e1733 blockfreq: Rewrite BlockFrequencyInfoImpl
Rewrite the shared implementation of BlockFrequencyInfo and
MachineBlockFrequencyInfo entirely.

The old implementation had a fundamental flaw:  precision losses from
nested loops (or very wide branches) compounded past loop exits (and
convergence points).

The @nested_loops testcase at the end of
test/Analysis/BlockFrequencyAnalysis/basic.ll is motivating.  This
function has three nested loops, with branch weights in the loop headers
of 1:4000 (exit:continue).  The old analysis gives non-sensical results:

    Printing analysis 'Block Frequency Analysis' for function 'nested_loops':
    ---- Block Freqs ----
     entry = 1.0
     for.cond1.preheader = 1.00103
     for.cond4.preheader = 5.5222
     for.body6 = 18095.19995
     for.inc8 = 4.52264
     for.inc11 = 0.00109
     for.end13 = 0.0

The new analysis gives correct results:

    Printing analysis 'Block Frequency Analysis' for function 'nested_loops':
    block-frequency-info: nested_loops
     - entry: float = 1.0, int = 8
     - for.cond1.preheader: float = 4001.0, int = 32007
     - for.cond4.preheader: float = 16008001.0, int = 128064007
     - for.body6: float = 64048012001.0, int = 512384096007
     - for.inc8: float = 16008001.0, int = 128064007
     - for.inc11: float = 4001.0, int = 32007
     - for.end13: float = 1.0, int = 8

Most importantly, the frequency leaving each loop matches the frequency
entering it.

The new algorithm leverages BlockMass and PositiveFloat to maintain
precision, separates "probability mass distribution" from "loop
scaling", and uses dithering to eliminate probability mass loss.  I have
unit tests for these types out of tree, but it was decided in the review
to make the classes private to BlockFrequencyInfoImpl, and try to shrink
them (or remove them entirely) in follow-up commits.

The new algorithm should generally have a complexity advantage over the
old.  The previous algorithm was quadratic in the worst case.  The new
algorithm is still worst-case quadratic in the presence of irreducible
control flow, but it's linear without it.

The key difference between the old algorithm and the new is that control
flow within a loop is evaluated separately from control flow outside,
limiting propagation of precision problems and allowing loop scale to be
calculated independently of mass distribution.  Loops are visited
bottom-up, their loop scales are calculated, and they are replaced by
pseudo-nodes.  Mass is then distributed through the function, which is
now a DAG.  Finally, loops are revisited top-down to multiply through
the loop scales and the masses distributed to pseudo nodes.

There are some remaining flaws.

  - Irreducible control flow isn't modelled correctly.  LoopInfo and
    MachineLoopInfo ignore irreducible edges, so this algorithm will
    fail to scale accordingly.  There's a note in the class
    documentation about how to get closer.  See also the comments in
    test/Analysis/BlockFrequencyInfo/irreducible.ll.

  - Loop scale is limited to 4096 per loop (2^12) to avoid exhausting
    the 64-bit integer precision used downstream.

  - The "bias" calculation proposed on llvmdev is *not* incorporated
    here.  This will be added in a follow-up commit, once comments from
    this review have been handled.

llvm-svn: 206548
2014-04-18 01:57:45 +00:00
Nuno Lopes 9ced19abe8 remove some dead code
lib/Analysis/IPA/InlineCost.cpp         |   18 ------------------
 lib/Analysis/RegionPass.cpp             |    1 -
 lib/Analysis/TypeBasedAliasAnalysis.cpp |    1 -
 lib/Transforms/Scalar/LoopUnswitch.cpp  |   21 ---------------------
 lib/Transforms/Utils/LCSSA.cpp          |    2 --
 lib/Transforms/Utils/LoopSimplify.cpp   |    6 ------
 utils/TableGen/AsmWriterEmitter.cpp     |   13 -------------
 utils/TableGen/DFAPacketizerEmitter.cpp |    7 -------
 utils/TableGen/IntrinsicEmitter.cpp     |    2 --
 9 files changed, 71 deletions(-)

llvm-svn: 206506
2014-04-17 22:26:44 +00:00
Gerolf Hoflehner ecebc3730e Reverse 206485.
After some discussions the preferred semantics of
the always_inline attribute is
inline always when the compiler can determine
that it it safe to do so.

llvm-svn: 206487
2014-04-17 19:14:06 +00:00
Chandler Carruth b60cb315bc [LCG] Just move the allocator (now that we can) when moving a call
graph. This simplifies the custom move constructor operation to one of
walking the graph and updating the 'up' pointers to point to the new
location of the graph. Switch the nodes from a reference to a pointer
for the 'up' edge to facilitate this.

llvm-svn: 206450
2014-04-17 07:25:59 +00:00
Chandler Carruth 81f497d176 [LCG] Remove the Module reference member which we weren't using for
anything and doesn't make sense if assigning.

llvm-svn: 206449
2014-04-17 07:22:19 +00:00
Gerolf Hoflehner 5f6268a40e Inline a function when the always_inline attribute
is set even when it contains a indirect branch.
The attribute overrules correctness concerns
like the escape of a local block address.

This is for rdar://16501761

llvm-svn: 206429
2014-04-17 00:21:52 +00:00
Tobias Grosser 8d941ef30d RegionInfo: Do not access a value that was just moved away
This fixes a regression introduced in r206310.

llvm-svn: 206328
2014-04-15 22:09:36 +00:00
David Blaikie ec649acb82 Use unique_ptr to manage ownership of child Regions within llvm::Region
llvm-svn: 206310
2014-04-15 18:32:43 +00:00
Craig Topper 9f008867c0 [C++11] More 'nullptr' conversion. In some cases just using a boolean check instead of comparing to nullptr.
llvm-svn: 206243
2014-04-15 04:59:12 +00:00
Akira Hatanaka 5638b89944 Fix a bug in which BranchProbabilityInfo wasn't setting branch weights of basic blocks inside loops correctly.
Previously, BranchProbabilityInfo::calcLoopBranchHeuristics would determine the weights of basic blocks inside loops even when it didn't have enough information to estimate the branch probabilities correctly. This patch fixes the function to exit early if it doesn't see any exit edges or back edges and let the later heuristics determine the weights.

This fixes PR18705 and <rdar://problem/15991090>.

Differential Revision: http://reviews.llvm.org/D3363

llvm-svn: 206194
2014-04-14 16:56:19 +00:00
Duncan P. N. Exon Smith 689a50736e blockfreq: Rename BlockFrequencyImpl to BlockFrequencyInfoImpl
This is a shared implementation class for BlockFrequencyInfo and
MachineBlockFrequencyInfo, not for BlockFrequency, a related (but
distinct) class.

No functionality change.

<rdar://problem/14292693>

llvm-svn: 206083
2014-04-11 23:20:58 +00:00
Duncan P. N. Exon Smith 37bd529964 blockfreq: Use getSuccessorIndex()
No functionality change.

<rdar://problem/14292693>

llvm-svn: 206082
2014-04-11 23:20:52 +00:00
Tobias Grosser c3d9db2336 Delinearize: Extend informationin -analyze output
llvm-svn: 205838
2014-04-09 07:53:49 +00:00
Sebastian Pop b2fdacf3f2 divide by the result of the gcd
used to fail with 'Step should divide Start with no remainder.'

llvm-svn: 205802
2014-04-08 21:21:13 +00:00
Sebastian Pop 9738e83a7d handle special cases when findGCD returns 1
used to fail with 'Step should divide Start with no remainder.'

llvm-svn: 205801
2014-04-08 21:21:10 +00:00
Sebastian Pop b5b84e0963 in findGCD of multiply expr return the gcd
we used to return 1 instead of the gcd

llvm-svn: 205800
2014-04-08 21:21:05 +00:00
Eric Christopher beb2cd6b7c Handle vlas during inline cost computation if they'll be turned
into a constant size alloca by inlining.

Ran a run over the testsuite, no results out of the noise, fixes
the testcase in the PR.

PR19115.

llvm-svn: 205710
2014-04-07 13:36:21 +00:00
Hal Finkel b4e001cc81 Use TopTTI->getGEPCost from within getUserCost
The implementation of getUserCost had duplicated (and hard-coded) the default
logic in getGEPCost. Instead, it is better to use getGEPCost directly, which
limits the default logic to the implementation of one function, and allows
targets to override the behavior.

No functionality change intended.

llvm-svn: 205346
2014-04-01 18:50:06 +00:00
Arnold Schwaighofer 1a444489e9 PR15967 Fix in basicaa for faulty returning no alias.
This commit consist of two parts.
The first part fix the PR15967. The wrong conclusion was made when the MaxLookup
limit was reached. The fix introduce a out parameter (MaxLookupReached) to
DecomposeGEPExpression that the function aliasGEP can act upon.
The second part is introducing the constant MaxLookupSearchDepth to make sure
that DecomposeGEPExpression and GetUnderlyingObject use the same search depth.
This is a small cleanup to clarify the original algorithm.

Patch by Karl-Johan Karlsson!

llvm-svn: 204859
2014-03-26 21:30:19 +00:00
Duncan P. N. Exon Smith 3dbe10503a blockfreq: Implement Pass::releaseMemory()
Implement Pass::releaseMemory() in BlockFrequencyInfo and
MachineBlockFrequencyInfo.  Just delete the private implementation when
not in use.  Switch to a std::unique_ptr to make the logic more clear.

<rdar://problem/14292693>

llvm-svn: 204741
2014-03-25 18:01:38 +00:00
Benjamin Kramer e75eaca32f ScalarEvolution: Compute exit counts for loops with a power-of-2 step.
If we have a loop of the form
for (unsigned n = 0; n != (k & -32); n += 32) {}
then we know that n is always divisible by 32 and the loop must
terminate. Even if we have a condition where the loop counter will
overflow it'll always hold this invariant.

PR19183. Our loop vectorizer creates this pattern and it's also
occasionally formed by loop counters derived from pointers.

llvm-svn: 204728
2014-03-25 16:25:12 +00:00
Erik Verbruggen e706b88304 Simplify loop that worked around bugs in old GCC/Xcode.
GCC 4.0.1 and Xcode 2 are no longer supported for building llvm/clang.

llvm-svn: 204705
2014-03-25 09:06:18 +00:00
Karthik Bhat 195e9dd91b Allow constant folding of ceil function whenever feasible
llvm-svn: 204583
2014-03-24 04:36:06 +00:00
Juergen Ributzka f0dff49ad0 [Constant Hoisting] Make the constant materialization cost operand dependent
Extend the target hook to take also the operand index into account when
calculating the cost of the constant materialization.

Related to <rdar://problem/16381500>

llvm-svn: 204435
2014-03-21 06:04:45 +00:00
Juergen Ributzka 46357931ab Revert "[Constant Hoisting] Extend coverage of the constant hoisting pass."
I will break this up into smaller pieces for review and recommit.

llvm-svn: 204393
2014-03-20 20:17:13 +00:00
Juergen Ributzka 6dab520c70 [Constant Hoisting] Extend coverage of the constant hoisting pass.
This commit extends the coverage of the constant hoisting pass, adds additonal
debug output and updates the function names according to the style guide.

Related to <rdar://problem/16381500>

llvm-svn: 204389
2014-03-20 19:55:52 +00:00
Michael Zolotukhin ed0a7761e5 Add stride normalization to SCEV Normalize/Denormalize transformation.
llvm-svn: 204161
2014-03-18 17:34:03 +00:00
Alon Mishne ad312155a6 [C++11] Change DebugInfoFinder to use range-based loops
Also changes the iterators to return actual DI type over MDNode.

llvm-svn: 204130
2014-03-18 09:41:07 +00:00
Eli Bendersky 576ef3c667 Consistent use of the noduplicate attribute.
The "noduplicate" attribute of call instructions is sometimes queried directly
and sometimes through the cannotDuplicate() predicate. This patch streamlines
all queries to use the cannotDuplicate() predicate. It also adds this predicate
to InvokeInst, to mirror what CallInst has.

llvm-svn: 204049
2014-03-17 16:19:07 +00:00
Arnaud A. de Grandmaison 75c9e6dedf Remove some dead assignements found by scan-build
llvm-svn: 204013
2014-03-15 22:13:15 +00:00
Michael Zolotukhin 66806aef1e PR17473:
Don't normalize an expression during postinc transformation unless it's
invertible.

llvm-svn: 203719
2014-03-12 21:31:05 +00:00
Michael Zolotukhin 15e6e543b9 Test commit
llvm-svn: 203716
2014-03-12 21:15:56 +00:00
Tim Northover e94a518a22 IR: add a second ordering operand to cmpxhg for failure
The syntax for "cmpxchg" should now look something like:

	cmpxchg i32* %addr, i32 42, i32 3 acquire monotonic

where the second ordering argument gives the required semantics in the case
that no exchange takes place. It should be no stronger than the first ordering
constraint and cannot be either "release" or "acq_rel" (since no store will
have taken place).

rdar://problem/15996804

llvm-svn: 203559
2014-03-11 10:48:52 +00:00
Chandler Carruth aee3ca6cfd [TTI] There is actually no realistic way to pop TTI implementations off
the stack of the analysis group because they are all immutable passes.
This is made clear by Craig's recent work to use override
systematically -- we weren't overriding anything for 'finalizePass'
because there is no such thing.

This is kind of a lame restriction on the API -- we can no longer push
and pop things, we just set up the stack and run. However, I'm not
invested in building some better solution on top of the existing
(terrifying) immutable pass and legacy pass manager.

llvm-svn: 203437
2014-03-10 02:45:14 +00:00
Chandler Carruth e9b50617b8 [LCG] Ran clang-format over this too and it pointed out some fixes.
llvm-svn: 203435
2014-03-10 02:14:14 +00:00
Chandler Carruth b9e2f8c479 [LCG] Simplify a bunch of the LCG code with range for loops and auto.
Still more work to be done here to leverage C++11, but this clears out
the glaring issues.

llvm-svn: 203395
2014-03-09 12:20:34 +00:00
Chandler Carruth cdf4788401 [C++11] Add range based accessors for the Use-Def chain of a Value.
This requires a number of steps.
1) Move value_use_iterator into the Value class as an implementation
   detail
2) Change it to actually be a *Use* iterator rather than a *User*
   iterator.
3) Add an adaptor which is a User iterator that always looks through the
   Use to the User.
4) Wrap these in Value::use_iterator and Value::user_iterator typedefs.
5) Add the range adaptors as Value::uses() and Value::users().
6) Update *all* of the callers to correctly distinguish between whether
   they wanted a use_iterator (and to explicitly dig out the User when
   needed), or a user_iterator which makes the Use itself totally
   opaque.

Because #6 requires churning essentially everything that walked the
Use-Def chains, I went ahead and added all of the range adaptors and
switched them to range-based loops where appropriate. Also because the
renaming requires at least churning every line of code, it didn't make
any sense to split these up into multiple commits -- all of which would
touch all of the same lies of code.

The result is still not quite optimal. The Value::use_iterator is a nice
regular iterator, but Value::user_iterator is an iterator over User*s
rather than over the User objects themselves. As a consequence, it fits
a bit awkwardly into the range-based world and it has the weird
extra-dereferencing 'operator->' that so many of our iterators have.
I think this could be fixed by providing something which transforms
a range of T&s into a range of T*s, but that *can* be separated into
another patch, and it isn't yet 100% clear whether this is the right
move.

However, this change gets us most of the benefit and cleans up
a substantial amount of code around Use and User. =]

llvm-svn: 203364
2014-03-09 03:16:01 +00:00
Benjamin Kramer b0f74b24fa [C++11] Convert sort predicates into lambdas.
No functionality change.

llvm-svn: 203288
2014-03-07 21:35:39 +00:00
Karthik Bhat b67688a87c Allow constant folding of round function whenever feasible
llvm-svn: 203198
2014-03-07 04:36:21 +00:00
Matt Arsenault a236ea551c Teach lint about address spaces
llvm-svn: 203132
2014-03-06 17:33:55 +00:00
Ahmed Charles 56440fd820 Replace OwningPtr<T> with std::unique_ptr<T>.
This compiles with no changes to clang/lld/lldb with MSVC and includes
overloads to various functions which are used by those projects and llvm
which have OwningPtr's as parameters. This should allow out of tree
projects some time to move. There are also no changes to libs/Target,
which should help out of tree targets have time to move, if necessary.

llvm-svn: 203083
2014-03-06 05:51:42 +00:00
Karthik Bhat daa8cd10d9 Allow constant folding of copysign
llvm-svn: 203076
2014-03-06 05:32:52 +00:00
Chandler Carruth 7da14f1ab9 [Layering] Move InstVisitor.h into the IR library as it is pretty
obviously coupled to the IR.

llvm-svn: 203064
2014-03-06 03:23:41 +00:00
Chandler Carruth 9a4c9e597b [Layering] Move DebugInfo.h into the IR library where its implementation
already lives.

llvm-svn: 203046
2014-03-06 00:46:21 +00:00
Benjamin Kramer 061d147f74 ConstantFolding: Also fold the vector overloads of our math intrinsics.
llvm-svn: 202997
2014-03-05 19:41:48 +00:00
Tobias Grosser ba49e4229c Add missing parenthesis in SCEV comment
Contributed-by: Michael Zolutukin <mzolotukhin@apple.com>
llvm-svn: 202963
2014-03-05 10:37:17 +00:00
Chandler Carruth 64e9aa5c93 [C++11] Make this interface accept const Use pointers and use override
to ensure we don't mess up any of the overrides. Necessary for cleaning
up the Value use iterators and enabling range-based traversing of use
lists.

llvm-svn: 202958
2014-03-05 10:21:48 +00:00
Craig Topper e9ba759c81 [C++11] Add 'override' keyword to virtual methods that override their base class.
llvm-svn: 202945
2014-03-05 07:30:04 +00:00
Matt Arsenault 8377858c55 Allow constant folding of fma and fmuladd
llvm-svn: 202914
2014-03-05 00:02:00 +00:00
Matt Arsenault f8ecf9b447 Fix duplicate code in ConstantFolding
llvm-svn: 202913
2014-03-05 00:01:58 +00:00
Chandler Carruth 8cd041ef19 [Modules] Move the ConstantRange class into the IR library. This is
a bit surprising, as the class is almost entirely abstracted away from
any particular IR, however it encodes the comparsion predicates which
mutate ranges as ICmp predicate codes. This is reasonable as they're
used for both instructions and constants. Thus, it belongs in the IR
library with instructions and constants.

llvm-svn: 202838
2014-03-04 12:24:34 +00:00
Chandler Carruth aa0ab6389a [Modules] Move the PredIteratorCache into the IR library -- it is
hardcoded to use IR BasicBlocks.

llvm-svn: 202835
2014-03-04 12:09:19 +00:00
Chandler Carruth 1305dc3351 [Modules] Move CFG.h to the IR library as it defines graph traits over
IR types.

llvm-svn: 202827
2014-03-04 11:45:46 +00:00
Chandler Carruth 4220e9c154 [Modules] Move ValueHandle into the IR library where Value itself lives.
Move the test for this class into the IR unittests as well.

This uncovers that ValueMap too is in the IR library. Ironically, the
unittest for ValueMap is useless in the Support library (honestly, so
was the ValueHandle test) and so it already lives in the IR unittests.
Mmmm, tasty layering.

llvm-svn: 202821
2014-03-04 11:17:44 +00:00
Chandler Carruth 820a908df7 [Modules] Move the LLVM IR pattern match header into the IR library, it
obviously is coupled to the IR.

llvm-svn: 202818
2014-03-04 11:08:18 +00:00
Chandler Carruth 219b89b987 [Modules] Move CallSite into the IR library where it belogs. It is
abstracting between a CallInst and an InvokeInst, both of which are IR
concepts.

llvm-svn: 202816
2014-03-04 11:01:28 +00:00
Chandler Carruth 03eb0de93d [Modules] Move GetElementPtrTypeIterator into the IR library. As its
name might indicate, it is an iterator over the types in an instruction
in the IR.... You see where this is going.

Another step of modularizing the support library.

llvm-svn: 202815
2014-03-04 10:40:04 +00:00
Chandler Carruth 8394857f43 [Modules] Move InstIterator out of the Support library, where it had no
business.

This header includes Function and BasicBlock and directly uses the
interfaces of both classes. It has to do with the IR, it even has that
in the name. =] Put it in the library it belongs to.

This is one step toward making LLVM's Support library survive a C++
modules bootstrap.

llvm-svn: 202814
2014-03-04 10:30:26 +00:00
Chandler Carruth 442f784814 [cleanup] Re-sort all the includes with utils/sort_includes.py.
llvm-svn: 202811
2014-03-04 10:07:28 +00:00
Tobias Grosser 4abf9d3a54 [C++11] Add a basic block range view for RegionInfo
This also switches the users in LLVM to ensure this functionality is tested.

llvm-svn: 202705
2014-03-03 13:00:39 +00:00
Chandler Carruth 1583e99c23 [C++11] Add two range adaptor views to User: operands and
operand_values. The first provides a range view over operand Use
objects, and the second provides a range view over the Value*s being
used by those operands.

The naming is "STL-style" rather than "LLVM-style" because we have
historically named iterator methods STL-style, and range methods seem to
have far more in common with their iterator counterparts than with
"normal" APIs. Feel free to bikeshed on this one if you want, I'm happy
to change these around if people feel strongly.

I've switched code in SROA and LCG to exercise these mostly to ensure
they work correctly -- we don't really have an easy way to unittest this
and they're trivial.

llvm-svn: 202687
2014-03-03 10:42:58 +00:00
Benjamin Kramer d6f1f84f51 [C++11] Replace llvm::tie with std::tie.
The old implementation is no longer needed in C++11.

llvm-svn: 202644
2014-03-02 13:30:33 +00:00
Benjamin Kramer b6d0bd48bd [C++11] Replace llvm::next and llvm::prior with std::next and std::prev.
Remove the old functions.

llvm-svn: 202636
2014-03-02 12:27:27 +00:00
Craig Topper 73156025e0 Switch all uses of LLVM_OVERRIDE to just use 'override' directly.
llvm-svn: 202621
2014-03-02 09:09:27 +00:00
Craig Topper 77dfe45f81 Switch all uses of LLVM_FINAL to just use 'final', and remove the macro.
llvm-svn: 202618
2014-03-02 08:08:51 +00:00
Chandler Carruth 002da5db29 [C++11] Switch all uses of the llvm_move macro to use std::move
directly, and remove the macro.

llvm-svn: 202612
2014-03-02 04:08:41 +00:00
Chandler Carruth 172f7c37b9 [C++11] Remove the use of LLVM_HAS_RVALUE_REFERENCES from the rest of
the core LLVM libraries.

llvm-svn: 202582
2014-03-01 09:32:03 +00:00
Eric Christopher a13839f5ca Remove unnecessary llvm:: qualification.
llvm-svn: 202316
2014-02-26 23:27:16 +00:00
Paul Robinson 0c12b1d23c Constify the Optnone checks in IR passes.
llvm-svn: 202213
2014-02-26 01:23:26 +00:00
Rafael Espindola 339430f993 Use DataLayout from the module when easily available.
Eventually DataLayoutPass should go away, but for now that is the only easy
way to get a DataLayout in some APIs. This patch only changes the ones that
have easy access to a Module.

One interesting issue with sometimes using DataLayoutPass and sometimes
fetching it from the Module is that we have to make sure they are equivalent.
We can get most of the way there by always constructing the pass with a Module.
In fact, the pass could be changed to point to an external DataLayout instead
of owning one to make this stricter.

Unfortunately, the C api passes a DataLayout, so it has to be up to the caller
to make sure the pass and the module are in sync.

llvm-svn: 202204
2014-02-25 23:25:17 +00:00
Rafael Espindola 935125126c Make DataLayout a plain object, not a pass.
Instead, have a DataLayoutPass that holds one. This will allow parts of LLVM
don't don't handle passes to also use DataLayout.

llvm-svn: 202168
2014-02-25 17:30:31 +00:00
Rafael Espindola aeff8a9c05 Make some DataLayout pointers const.
No functionality change. Just reduces the noise of an upcoming patch.

llvm-svn: 202087
2014-02-24 23:12:18 +00:00
Rafael Espindola 90c7f1cc16 Replace the F_Binary flag with a F_Text one.
After this I will set the default back to F_None. The advantage is that
before this patch forgetting to set F_Binary would corrupt a file on windows.
Forgetting to set F_Text produces one that cannot be read in notepad, which
is a better failure mode :-)

llvm-svn: 202052
2014-02-24 18:20:12 +00:00
Rafael Espindola 7dbcdd08c2 Don't make F_None the default.
This will make it easier to switch the default to being binary files.

llvm-svn: 202042
2014-02-24 15:07:20 +00:00
Rafael Espindola 5f57f462a8 Rename a few more DataLayout variables from TD to DL.
llvm-svn: 201870
2014-02-21 18:34:28 +00:00
Sebastian Pop 64f12d5324 fix a corner case in delinearization
handle special cases Step==1, Step==-1, GCD==1, and GCD==-1

llvm-svn: 201868
2014-02-21 18:15:15 +00:00
Sebastian Pop 29026d3e52 normalize the last delinearized dimension
in the dependence test, we used to discard some information that the
delinearization provides: the size of the innermost dimension of an array,
i.e., the size of scalars stored in the array, and the remainder of the
delinearization that provides the offset from which the array reads start,
i.e., the base address of the array.

To avoid losing this data in the rest of the data dependence analysis, the fix
is to multiply the access function in the last delinearized dimension by its
size, effectively making the size of the last dimension to always be in bytes,
and then add the remainder of delinearization to the last subscript,
effectively making the last subscript start at the base address of the array.

llvm-svn: 201867
2014-02-21 18:15:11 +00:00
Sebastian Pop 5133d2e9d4 fail delinearization when the size of subscripts differs
Because the delinearization is not a global analysis pass, it will compute the
delinearization independently of knowledge about the way the delinearization
happened for other data accesses to the same array: the dependence analysis will
only trigger the delinearization on a tuple of access functions, and thus
delinearization may compute different subscripts sizes for a same array.  When
that happens the safest is to discard the delinearized information.

llvm-svn: 201866
2014-02-21 18:15:07 +00:00
Rafael Espindola 37dc9e19f5 Rename many DataLayout variables from TD to DL.
I am really sorry for the noise, but the current state where some parts of the
code use TD (from the old name: TargetData) and other parts use DL makes it
hard to write a patch that changes where those variables come from and how
they are passed along.

llvm-svn: 201827
2014-02-21 00:06:31 +00:00
Rafael Espindola 7c68bebb9c Rename some member variables from TD to DL.
TargetData was renamed DataLayout back in r165242.

llvm-svn: 201581
2014-02-18 15:33:12 +00:00
Arnold Schwaighofer 26f567d8a4 SCEVExpander: Try hard not to create derived induction variables in other loops
During LSR of one loop we can run into a situation where we have to expand the
start of a recurrence of a loop induction variable in this loop. This start
value is a value derived of the induction variable of a preceeding loop. SCEV
has cannonicalized this value to a different recurrence than the recurrence of
the preceeding loop's induction variable (the type and/or step direction) has
changed). When we come to instantiate this SCEV we created a second induction
variable in this preceeding loop.  This patch tries to base such derived
induction variables of the preceeding loop's induction variable.

This helps twolf on arm and seems to help scimark2 on x86.

Reapply with a fix for the case of a value derived from a pointer.

radar://15970709

llvm-svn: 201496
2014-02-16 15:49:50 +00:00
Arnold Schwaighofer 847d96142c Revert "SCEVExpander: Try hard not to create derived induction variables in other loops"
This reverts commit r201465. It broke an arm bot.

llvm-svn: 201466
2014-02-15 18:16:56 +00:00
Arnold Schwaighofer 1e12f8563d SCEVExpander: Try hard not to create derived induction variables in other loops
During LSR of one loop we can run into a situation where we have to expand the
start of a recurrence of a loop induction variable in this loop. This start
value is a value derived of the induction variable of a preceeding loop. SCEV
has cannonicalized this value to a different recurrence than the recurrence of
the preceeding loop's induction variable (the type and/or step direction) has
changed). When we come to instantiate this SCEV we created a second induction
variable in this preceeding loop.  This patch tries to base such derived
induction variables of the preceeding loop's induction variable.

This helps twolf on arm and seems to help scimark2 on x86.

radar://15970709

llvm-svn: 201465
2014-02-15 17:11:56 +00:00
Benjamin Kramer 989b92936c Reduce code duplication resulting from the ConstantVector/ConstantDataVector split.
No intended functionality change.

llvm-svn: 201344
2014-02-13 16:48:38 +00:00
Andrea Di Biagio b7882b3bd1 [Vectorizer] Add a new 'OperandValueKind' in TargetTransformInfo called
'OK_NonUniformConstValue' to identify operands which are constants but
not constant splats.

The cost model now allows returning 'OK_NonUniformConstValue'
for non splat operands that are instances of ConstantVector or
ConstantDataVector.

With this change, targets are now able to compute different costs
for instructions with non-uniform constant operands.
For example, On X86 the cost of a vector shift may vary depending on whether
the second operand is a uniform or non-uniform constant.

This patch applies the following changes:
 - The cost model computation now takes into account non-uniform constants;
 - The cost of vector shift instructions has been improved in
   X86TargetTransformInfo analysis pass;
 - BBVectorize, SLPVectorizer and LoopVectorize now know how to distinguish
   between non-uniform and uniform constant operands.

Added a new test to verify that the output of opt
'-cost-model -analyze' is valid in the following configurations: SSE2,
SSE4.1, AVX, AVX2.

llvm-svn: 201272
2014-02-12 23:43:47 +00:00
Benjamin Kramer 987b850cf2 SCEV: Cast switched values to make -Wswitch more useful.
llvm-svn: 201170
2014-02-11 19:02:55 +00:00
Benjamin Kramer 5a188549ad ScalarEvolution: Analyze trip count of loops with a switch guarding the exit.
llvm-svn: 201159
2014-02-11 15:44:32 +00:00
Benjamin Kramer 3c29c0704b Make succ_iterator a real random access iterator and clean up a couple of users.
llvm-svn: 201088
2014-02-10 14:17:42 +00:00
Benjamin Kramer b8266d2062 GlobalsModRef: Unify and clean up duplicated pointer analysis code.
llvm-svn: 201087
2014-02-10 14:17:30 +00:00
Chandler Carruth d1ba2efb8f [PM] Fix horrible typos that somehow didn't cause a failure in a C++11
build but spectacularly changed behavior of the C++98 build. =]

This shows my one problem with not having unittests -- basic API
expectations aren't well exercised by the integration tests because they
*happen* to not come up, even though they might later. I'll probably add
a basic unittest to complement the integration testing later, but
I wanted to revive the bots.

llvm-svn: 200905
2014-02-06 05:17:02 +00:00
Chandler Carruth bf71a34eb9 [PM] Add a new "lazy" call graph analysis pass for the new pass manager.
The primary motivation for this pass is to separate the call graph
analysis used by the new pass manager's CGSCC pass management from the
existing call graph analysis pass. That analysis pass is (somewhat
unfortunately) over-constrained by the existing CallGraphSCCPassManager
requirements. Those requirements make it *really* hard to cleanly layer
the needed functionality for the new pass manager on top of the existing
analysis.

However, there are also a bunch of things that the pass manager would
specifically benefit from doing differently from the existing call graph
analysis, and this new implementation tries to address several of them:

- Be lazy about scanning function definitions. The existing pass eagerly
  scans the entire module to build the initial graph. This new pass is
  significantly more lazy, and I plan to push this even further to
  maximize locality during CGSCC walks.
- Don't use a single synthetic node to partition functions with an
  indirect call from functions whose address is taken. This node creates
  a huge choke-point which would preclude good parallelization across
  the fanout of the SCC graph when we got to the point of looking at
  such changes to LLVM.
- Use a memory dense and lightweight representation of the call graph
  rather than value handles and tracking call instructions. This will
  require explicit update calls instead of some updates working
  transparently, but should end up being significantly more efficient.
  The explicit update calls ended up being needed in many cases for the
  existing call graph so we don't really lose anything.
- Doesn't explicitly model SCCs and thus doesn't provide an "identity"
  for an SCC which is stable across updates. This is essential for the
  new pass manager to work correctly.
- Only form the graph necessary for traversing all of the functions in
  an SCC friendly order. This is a much simpler graph structure and
  should be more memory dense. It does limit the ways in which it is
  appropriate to use this analysis. I wish I had a better name than
  "call graph". I've commented extensively this aspect.

This is still very much a WIP, in fact it is really just the initial
bits. But it is about the fourth version of the initial bits that I've
implemented with each of the others running into really frustrating
problms. This looks like it will actually work and I'd like to split the
actual complexity across commits for the sake of my reviewers. =] The
rest of the implementation along with lots of wiring will follow
somewhat more rapidly now that there is a good path forward.

Naturally, this doesn't impact any of the existing optimizer. This code
is specific to the new pass manager.

A bunch of thanks are deserved for the various folks that have helped
with the design of this, especially Nick Lewycky who actually sat with
me to go through the fundamentals of the final version here.

llvm-svn: 200903
2014-02-06 04:37:03 +00:00
Paul Robinson af4e64d095 Disable most IR-level transform passes on functions marked 'optnone'.
Ideally only those transform passes that run at -O0 remain enabled,
in reality we get as close as we reasonably can.
Passes are responsible for disabling themselves, it's not the job of
the pass manager to do it for them.

llvm-svn: 200892
2014-02-06 00:07:05 +00:00
Duncan P. N. Exon Smith 8e661efc00 cleanup: scc_iterator consumers should use isAtEnd
No functional change.  Updated loops from:

    for (I = scc_begin(), E = scc_end(); I != E; ++I)

to:

    for (I = scc_begin(); !I.isAtEnd(); ++I)

for teh win.

llvm-svn: 200789
2014-02-04 19:19:07 +00:00
Chandler Carruth 6b4cc8b66a [inliner] Skip debug intrinsics even earlier in computing the inline
cost so that they don't impact the vector bonus. Fundamentally, counting
unsimplified instructions is just *wrong*; it will continue to introduce
instability as things which do not generate code bizarrely impact
inlining. For example, sufficiently nested inlined functions could turn
off the vector bonus with lifetime markers just like the debug
intrinsics do. =/

This is a short-term tactical fix. Long term, I think we need to remove
the vector bonus entirely. That's a separate patch and discussion
though.

The patch to fix this provided by Dario Domizioli. I've added some
comments about the planned direction and used a heavily pruned form of
debug info intrinsics for the test case. While this debug info doesn't
work or "do" anything useful, it lets us easily test all manner of
interference easily, and I suspect this will not be the last time we
want to craft a pattern where debug info interferes with the inliner in
a problematic way.

llvm-svn: 200609
2014-02-01 10:38:17 +00:00
Chandler Carruth 394e34f5c2 [inliner] Print out extra stats about the cost, threshold, and vector
bonus in the inline cost analysis.

Split out of a patch by Dario Domizioli to commit separately.

llvm-svn: 200586
2014-01-31 22:32:32 +00:00
Matt Arsenault ee364ee729 Allow speculating llvm.sqrt, fma and fmuladd
This doesn't set errno, so this should be OK.
Also update the documentation to explicitly state
that errno are not set.

llvm-svn: 200501
2014-01-31 00:09:00 +00:00
Reid Kleckner 26af2cae05 Update optimization passes to handle inalloca arguments
Summary:
I searched Transforms/ and Analysis/ for 'ByVal' and updated those call
sites to check for inalloca if appropriate.

I added tests for any change that would allow an optimization to fire on
inalloca.

Reviewers: nlewycky

Differential Revision: http://llvm-reviews.chandlerc.com/D2449

llvm-svn: 200281
2014-01-28 02:38:36 +00:00
Nick Lewycky 629199ccb3 Fix crasher introduced in r200203 and caught by a libc++ buildbot. Don't assume that getMulExpr returns a SCEVMulExpr, it may have simplified it to something else!
llvm-svn: 200210
2014-01-27 10:47:44 +00:00
Nick Lewycky 31eaca5513 Teach SCEV to handle more cases of 'and X, CST', specifically where CST is any number of contiguous 1 bits in a row, with any number of leading and trailing 0 bits.
Unfortunately, this in turn led to some lower quality SCEVs due to some different paths through expression simplification, so add getUDivExactExpr and use it. This fixes all instances of the problems that I found, but we can make that function smarter as necessary.

Merge test "xor-and.ll" into "and-xor.ll" since I needed to update it anyways. Test 'nsw-offset.ll' analyzes a little deeper, %n now gets a scev in terms of %no instead of a SCEVUnknown.

llvm-svn: 200203
2014-01-27 10:04:03 +00:00
Juergen Ributzka f26beda7c7 Revert "Revert "Add Constant Hoisting Pass" (r200034)"
This reverts commit r200058 and adds the using directive for
ARMTargetTransformInfo to silence two g++ overload warnings.

llvm-svn: 200062
2014-01-25 02:02:55 +00:00
Hans Wennborg 4d67a2e85a Revert "Add Constant Hoisting Pass" (r200034)
This commit caused -Woverloaded-virtual warnings. The two new
TargetTransformInfo::getIntImmCost functions were only added to the superclass,
and to the X86 subclass. The other targets were not updated, and the
warning highlighted this by pointing out that e.g. ARMTTI::getIntImmCost was
hiding the two new getIntImmCost variants.

We could pacify the warning by adding "using TargetTransformInfo::getIntImmCost"
to the various subclasses, or turning it off, but I suspect that it's wrong to
leave the functions unimplemnted in those targets. The default implementations
return TCC_Free, which I don't think is right e.g. for ARM.

llvm-svn: 200058
2014-01-25 01:18:18 +00:00
Juergen Ributzka 4f3df4ad64 Add Constant Hoisting Pass
Retry commit r200022 with a fix for the build bot errors. Constant expressions
have (unlike instructions) module scope use lists and therefore may have users
in different functions. The fix is to simply ignore these out-of-function uses.

llvm-svn: 200034
2014-01-24 20:18:00 +00:00