1. Added debug messages when in OptimizeIndividualCalls we move calls into predecessors and then erase the original call.
2. Added debug messages when in the process of moving calls in ObjCARCOpt::MoveCalls we create new RR and delete old RR.
3. Added a debug message when we visit a specific retain instruction in ObjCARCOpt::PerformCodePlacement.
llvm-svn: 171988
peculiar headers under include/llvm.
This struct still doesn't make a lot of sense, but it makes more sense
down in TargetLowering than it did before.
llvm-svn: 171739
already in a class, just inline the four of them. I suspect that this
class could be simplified some to not always keep distinct variables for
these things, but it wasn't clear to me how given the usage so I opted
for a trivial and mechanical translation.
This removes one of the two remaining users of a header in include/llvm
which does nothing more than define a 4 member struct.
llvm-svn: 171738
TargetTransformInfo rather than TargetLowering, removing one of the
primary instances of the layering violation of Transforms depending
directly on Target.
This is a really big deal because LSR used to be a "special" pass that
could only be tested fully using llc and by looking at the full output
of it. It also couldn't run with any other loop passes because it had to
be created by the backend. No longer is this true. LSR is now just
a normal pass and we should probably lift the creation of LSR out of
lib/CodeGen/Passes.cpp and into the PassManagerBuilder. =] I've not done
this, or updated all of the tests to use opt and a triple, because
I suspect someone more familiar with LSR would do a better job. This
change should be essentially without functional impact for normal
compilations, and only change behvaior of targetless compilations.
The conversion required changing all of the LSR code to refer to the TTI
interfaces, which fortunately are very similar to TargetLowering's
interfaces. However, it also allowed us to *always* expect to have some
implementation around. I've pushed that simplification through the pass,
and leveraged it to simplify code somewhat. It required some test
updates for one of two things: either we used to skip some checks
altogether but now we get the default "no" answer for them, or we used
to have no information about the target and now we do have some.
I've also started the process of removing AddrMode, as the TTI interface
doesn't use it any longer. In some cases this simplifies code, and in
others it adds some complexity, but I think it's not a bad tradeoff even
there. Subsequent patches will try to clean this up even further and use
other (more appropriate) abstractions.
Yet again, almost all of the formatting changes brought to you by
clang-format. =]
llvm-svn: 171735
I'm sorry for duplicating bad style here, but I wanted to keep
consistency. I've pinged the code review thread where this style was
reviewed and changes were requested.
llvm-svn: 171714
through as a reference rather than a pointer. There is always *some*
implementation of this available, so this simplifies code by not having
to test for whether it is available or not.
Further, it turns out there were piles of places where SimplifyCFG was
recursing and not passing down either TD or TTI. These are fixed to be
more pedantically consistent even though I don't have any particular
cases where it would matter.
llvm-svn: 171691
next to its only user. This helper relies on TargetLowering information
that shouldn't be generally used throughout the Transfoms library, and
so it made little sense as a generic utility.
This also consolidates the file where we need to remove the remaining
uses of TargetLowering in favor of the IR-layer abstract interface in
TargetTransformInfo.
llvm-svn: 171590
into their new header subdirectory: include/llvm/IR. This matches the
directory structure of lib, and begins to correct a long standing point
of file layout clutter in LLVM.
There are still more header files to move here, but I wanted to handle
them in separate commits to make tracking what files make sense at each
layer easier.
The only really questionable files here are the target intrinsic
tablegen files. But that's a battle I'd rather not fight today.
I've updated both CMake and Makefile build systems (I think, and my
tests think, but I may have missed something).
I've also re-sorted the includes throughout the project. I'll be
committing updates to Clang, DragonEgg, and Polly momentarily.
llvm-svn: 171366
utils/sort_includes.py script.
Most of these are updating the new R600 target and fixing up a few
regressions that have creeped in since the last time I sorted the
includes.
llvm-svn: 171362
Specifically these calls return their argument verbatim, as a low-level
optimization. However, this makes high-level optimizations
harder. We undo any uses of this optimization that the front-end
emitted. We redo them later in the contract pass.
llvm-svn: 171346
The later API is nicer than the former, and is correct regarding wrap-around offsets (if anyone cares).
There are a few more places left with duplicated code, which I'll remove soon.
llvm-svn: 171259
directly.
This is in preparation for removing the use of the 'Attribute' class as a
collection of attributes. That will shift to the AttributeSet class instead.
llvm-svn: 171253
Similarly inlining of the function is inhibited, if that would duplicate the call (in particular inlining is still allowed when there is only one callsite and the function has internal linkage).
llvm-svn: 170704
This was a silly oversight, we weren't pruning allocas which were used
by variable-length memory intrinsics from the set that could be widened
and promoted as integers. Fix that.
llvm-svn: 170353
This also cleans up a bit of the memcpy call rewriting by sinking some
irrelevant code further down and making the call-emitting code a bit
more concrete.
Previously, memcpy of a subvector would actually miscompile (!!!) the
copy into a single vector element copy. I have no idea how this ever
worked. =/ This is the memcpy half of PR14478 which we probably weren't
noticing previously because it didn't actually assert.
The rewrite relies on the newly refactored insert- and extractVector
functions to do the heavy lifting, and those are the same as used for
loads and stores which makes the test coverage a bit more meaningful
here.
llvm-svn: 170338
Check whether a BB is known as reachable before adding it to the worklist.
This way BB's with multiple predecessors are added to the list no more than
once.
llvm-svn: 170335
The first half of fixing this bug was actually in r170328, but was
entirely coincidental. It did however get me to realize the nature of
the bug, and adapt the test case to test more interesting behavior. In
turn, that uncovered the rest of the bug which I've fixed here.
This should fix two new asserts that showed up in the vectorize nightly
tester.
llvm-svn: 170333
I noticed this while looking at r170328. We only ever do a vector
rewrite when the alloca *is* the vector type, so it's good to not paper
over bugs here by doing a convertValue that isn't needed.
llvm-svn: 170331
This will allow its use inside of memcpy rewriting as well. This routine
is more complex than extractVector, and some of its uses are not 100%
where I want them to be so there is still some work to do here.
While this can technically change the output in some cases, it shouldn't
be a change that matters -- IE, it can leave some dead code lying around
that prior versions did not, etc.
Yet another step in the refactorings leading up to the solution to the
last component of PR14478.
llvm-svn: 170328
The method helpers all implicitly act upon the alloca, and what we
really want is a fully generic helper. Doing memcpy rewrites is more
special than all other rewrites because we are at times rewriting
instructions which touch pointers *other* than the alloca. As
a consequence all of the helpers needed by memcpy rewriting of
sub-vector copies will need to be generalized fully.
Note that all of these helpers ({insert,extract}{Integer,Vector}) are
woefully uncommented. I'm going to go back through and document them
once I get the factoring correct.
No functionality changed.
llvm-svn: 170325
PR14478 highlights a serious problem in SROA that simply wasn't being
exercised due to a lack of vector input code mixed with C-library
function calls. Part of SROA was written carefully to handle subvector
accesses via memset and memcpy, but the rewriter never grew support for
this. Fixing it required refactoring the subvector access code in other
parts of SROA so it could be shared, and then fixing the splat formation
logic and using subvector insertion (this patch).
The PR isn't quite fixed yet, as memcpy is still broken in the same way.
I'm starting on that series of patches now.
Hopefully this will be enough to bring the bullet benchmark back to life
with the bb-vectorizer enabled, but that may require fixing memcpy as
well.
llvm-svn: 170301
No functionality changed. Refactoring leading up to the fix for PR14478
which requires some significant changes to the memset and memcpy
rewriting.
llvm-svn: 170299
This visitor provides infrastructure for recursively traversing the
use-graph of a pointer-producing instruction like an alloca or a malloc.
It maintains a worklist of uses to visit, so it can handle very deep
recursions. It automatically looks through instructions which simply
translate one pointer to another (bitcasts and GEPs). It tracks the
offset relative to the original pointer as long as that offset remains
constant and exposes it during the visit as an APInt offset. Finally, it
performs conservative escape analysis.
However, currently it has some limitations that should be addressed
going forward:
1) It doesn't handle vectors of pointers.
2) It doesn't provide a cheaper visitor when the constant offset
tracking isn't needed.
3) It doesn't support non-instruction pointer values.
The current functionality is exactly what is required to implement the
SROA pointer-use visitors in terms of this one, rather than in terms of
their own ad-hoc base visitor, which was always very poorly specified.
SROA has been converted to use this, and the code there deleted which
this utility now provides.
Technically speaking, using this new visitor allows SROA to handle a few
more cases than it previously did. It is now more aggressive in ignoring
chains of instructions which look like they would defeat SROA, but in
fact do not because they never result in a read or write of memory.
While this is "neat", it shouldn't be interesting for real programs as
any such chains should have been removed by others passes long before we
get to SROA. As a consequence, I've not added any tests for these
features -- it shouldn't be part of SROA's contract to perform such
heroics.
The goal is to extend the functionality of this visitor going forward,
and re-use it from passes like ASan that can benefit from doing
a detailed walk of the uses of a pointer.
Thanks to Ben Kramer for the code review rounds and lots of help
reviewing and debugging this patch.
llvm-svn: 169728
When SROA was evaluating a mixture of i1 and i8 loads and stores, in
just a particular case, it would tickle a latent bug where we compared
bits to bytes rather than bits to bits. As a consequence of the latent
bug, we would allow integers through which were not byte-size multiples,
a situation the later rewriting code was never intended to handle.
In release builds this could trigger all manner of oddities, but the
reported issue in PR14548 was forming invalid bitcast instructions.
The only downside of this fix is that it makes it more clear that SROA
in its current form is not capable of handling mixed i1 and i8 loads and
stores. Sometimes with the previous code this would work by luck, but
usually it would crash, so I'm not terribly worried. I'll watch the LNT
numbers just to be sure.
llvm-svn: 169719
This will more closely match the behavior of the new PtrUseVisitor that
I am adding. Hopefully this will not change the actual behavior in any
way, but by making the processing order more similar help in debugging.
llvm-svn: 169697
There are still bugs in this pass, as well as other issues that are
being worked on, but the bugs are crashers that occur pretty easily in
the wild. Test cases have been sent to the original commit's review
thread.
This reverts the commits:
r169671: Fix a logic error.
r169604: Move the popcnt tests to an X86 subdirectory.
r168931: Initial commit adding the pass.
llvm-svn: 169683
Sooooo many of these had incorrect or strange main module includes.
I have manually inspected all of these, and fixed the main module
include to be the nearest plausible thing I could find. If you own or
care about any of these source files, I encourage you to take some time
and check that these edits were sensible. I can't have broken anything
(I strictly added headers, and reordered them, never removed), but they
may not be the headers you'd really like to identify as containing the
API being implemented.
Many forward declarations and missing includes were added to a header
files to allow them to parse cleanly when included first. The main
module rule does in fact have its merits. =]
llvm-svn: 169131
The partitioning logic attempted to handle uses of an alloca with an
offset starting before the alloca so long as the use had some overlap
with the alloca itself. However, there was a bug where we tested
'(uint64_t)Offset >= AllocSize' without first checking whether 'Offset'
was positive. As a consequence, essentially every negative offset (that
is, starting *before* the alloca does) would be thrown out, even if it
was overlapping. The subsequent code to throw out negative offsets which
were actually non-overlapping was essentially dead. The code to *handle*
overlapping negative offsets was actually dead!
I've just removed all of this, and taught SROA to discard any uses which
start prior to the alloca from the beginning. It has the lovely property
of simplifying the code. =] All the tests still pass, and in fact no new
tests are needed as this is already covered by our testsuite. Fixing the
code so that negative offsets work the way the comments indicate they
were supposed to work causes regressions. That's how I found this.
Anyways, this is all progress in the correct direction -- tightening up
SROA to be maximally aggressive. Some day, I really hope to turn
out-of-bounds accesses to an alloca into 'unreachable'.
llvm-svn: 169120
The original patch removed a bunch of code that the SjLjEHPrepare pass placed
into the entry block if all of the landing pads were removed during the
CodeGenPrepare class. The more natural way of doing things is to run the CGP
*before* we run the SjLjEHPrepare pass.
Make it so!
llvm-svn: 169044
The simplify-libcalls pass maintained a statistic to count the number
of library calls that have been simplified. Now that library call
simplification is being carried out in instcombine the statistic should
be moved to there.
llvm-svn: 168975
depends on the IR infrastructure, there is no sense in it being off in
Support land.
This is in preparation to start working to expand InstVisitor into more
special-purpose visitors that are still generic and can be re-used
across different passes. The expansion will go into the Analylis tree
though as nothing in VMCore needs it.
llvm-svn: 168972
This revision attempts to recognize following population-count pattern:
while(a) { c++; ... ; a &= a - 1; ... },
where <c> and <a>could be used multiple times in the loop body.
TODO: On X8664 and ARM, __buildin_ctpop() are not expanded to a efficent
instruction sequence, which need to be improved in the following commits.
Reviewed by Nadav, really appreciate!
llvm-svn: 168931
the last invoke instruction in the function. This also removes the last landing
pad in an function. This is fine, but with SjLj EH code, we've already placed a
bunch of code in the 'entry' block, which expects the landing pad to stick
around.
When we get to the situation where CGP has removed the last landing pad, go
ahead and nuke the SjLj instructions from the 'entry' block.
<rdar://problem/12721258>
llvm-svn: 168930
This patch migrates the puts optimizations from the simplify-libcalls
pass into the instcombine library call simplifier.
All the simplifiers from simplify-libcalls have now been migrated to
instcombine. Yay! Just a few other bits to migrate (prototype attribute
inference and a few statistics) and simplify-libcalls can finally be put
to rest.
llvm-svn: 168925
Now if we can transform an alloca into a single vector value, but it has
subvector, non-element accesses, we form the appropriate shufflevectors
to allow SROA to proceed. This fixes PR14055 which pointed out a very
common pattern that SROA couldn't handle -- mixed vec3 and vec4
operations on a single alloca.
llvm-svn: 168418
The issue is that we may end up with newly OOB loads when speculating
a load into the predecessors of a PHI node, and this confuses the new
integer splitting logic in some cases, triggering an assertion failure.
In fact, the branch in question must be dead code as it loads from
a too-narrow alloca. Add code to handle this gracefully and leave the
requisite FIXMEs for both optimizing more aggressively and doing more to
aid sanitizing invalid code which triggers these patterns.
llvm-svn: 168361
to properly handle the combinations of these with split integer loads
and stores. This essentially replaces Evan's r168227 by refactoring the
code in a different way, and trynig to mirror that refactoring in both
the load and store sides of the rewriting.
Generally speaking there was some really problematic duplicated code
here that led to poorly founded assumptions and then subtle bugs. Now
much of the code actually flows through and follows a more consistent
style and logical path. There is still a tiny bit of duplication on the
store side of things, but it is much less bad.
This also changes the logic to never re-use a load or store instruction
as that was simply too error prone in practice.
I've added a few tests (one a reduction of the one in Evan's original
patch, which happened to be the same as the report in PR14349). I'm
going to look at adding a few more tests for things I found and fixed in
passing (such as the volatile tests in the vectorizable predicate).
This patch has survived bootstrap, and modulo one bugfix survived
Duncan's test suite, but let me know if anything else explodes.
llvm-svn: 168346
operands of the expression being written was wrongly thought to be reusable as
an inner node of the expression resulting in it turning up as both an inner node
*and* a leaf, creating a cycle in the def-use graph. This would have caused the
verifier to blow up if things had gotten that far, however it managed to provoke
an infinite loop first.
llvm-svn: 168291
the utility for extracting a chain of operations from the IR, thought that it
might as well combine any constants it came across (rather than just returning
them along with everything else). On the other hand, the factorization code
would like to see the individual constants (this is quite reasonable: it is
much easier to pull a factor of 3 out of 2*3 than it is to pull it out of 6;
you may think 6/3 isn't so hard, but due to overflow it's not as easy to undo
multiplications of constants as it may at first appear). This patch therefore
makes LinearizeExprTree stupider: it now leaves optimizing to the optimization
part of reassociate, and sticks to just analysing the IR.
llvm-svn: 168035
This patch migrates the math library call simplifications from the
simplify-libcalls pass into the instcombine library call simplifier.
I have typically migrated just one simplifier at a time, but the math
simplifiers are interdependent because:
1. CosOpt, PowOpt, and Exp2Opt all depend on UnaryDoubleFPOpt.
2. CosOpt, PowOpt, Exp2Opt, and UnaryDoubleFPOpt all depend on
the option -enable-double-float-shrink.
These two factors made migrating each of these simplifiers individually
more of a pain than it would be worth. So, I migrated them all together.
llvm-svn: 167815
The assertion is trigged when the Reassociater tries to transform expression
... + 2 * n * 3 + 2 * m + ...
into:
... + 2 * (n*3 + m).
In the process of the transformation, a helper routine folds the constant 2*3 into 6,
confusing optimizer which is trying the to eliminate the common factor 2, and cannot
find 2 any more.
Review is pending. But I'd like commit first in order to help those who are waiting
for this fix.
llvm-svn: 167740
The new analysis is not yet ready for prime time. It has a *critical*
flawed assumption, and some troubling shortages of testing. Until it's
been hammered into better shape, let's stick with the working code. This
should be easy to revert itself when the analysis is ready.
Fixes PR14241, a miscompile of any memcpy-able loop which uses a pointer
as the induction mechanism. If you have been seeing miscompiles in this
revision range, you really want to test with this backed out. The
results of this miscompile are a bit subtle as they can lead to
downstream passes concluding things are impossible which are in fact
possible.
Thanks to David Blaikie for the majority of the reduction of this
miscompile. I'll be checking in the test case in a non-revert commit.
Revesions reverted here:
r167045: LoopIdiom: Fix a serious missed optimization: we only turned
top-level loops into memmove.
r166877: LoopIdiom: Add checks to avoid turning memmove into an infinite
loop.
r166875: LoopIdiom: Recognize memmove loops.
r166874: LoopIdiom: Replace custom dependence analysis with
DependenceAnalysis.
llvm-svn: 167286
r165941: Resubmit the changes to llvm core to update the functions to
support different pointer sizes on a per address space basis.
Despite this commit log, this change primarily changed stuff outside of
VMCore, and those changes do not carry any tests for correctness (or
even plausibility), and we have consistently found questionable or flat
out incorrect cases in these changes. Most of them are probably correct,
but we need to devise a system that makes it more clear when we have
handled the address space concerns correctly, and ideally each pass that
gets updated would receive an accompanying test case that exercises that
pass specificaly w.r.t. alternate address spaces.
However, from this commit, I have retained the new C API entry points.
Those were an orthogonal change that probably should have been split
apart, but they seem entirely good.
In several places the changes were very obvious cleanups with no actual
multiple address space code added; these I have not reverted when
I spotted them.
In a few other places there were merge conflicts due to a cleaner
solution being implemented later, often not using address spaces at all.
In those cases, I've preserved the new code which isn't address space
dependent.
This is part of my ongoing effort to clean out the partial address space
code which carries high risk and low test coverage, and not likely to be
finished before the 3.2 release looms closer. Duncan and I would both
like to see the above issues addressed before we return to these
changes.
llvm-svn: 167222
getIntPtrType support for multiple address spaces via a pointer type,
and also introduced a crasher bug in the constant folder reported in
PR14233.
These commits also contained several problems that should really be
addressed before they are re-committed. I have avoided reverting various
cleanups to the DataLayout APIs that are reasonable to have moving
forward in order to reduce the amount of churn, and minimize the number
of commits that were reverted. I've also manually updated merge
conflicts and manually arranged for the getIntPtrType function to stay
in DataLayout and to be defined in a plausible way after this revert.
Thanks to Duncan for working through this exact strategy with me, and
Nick Lewycky for tracking down the really annoying crasher this
triggered. (Test case to follow in its own commit.)
After discussing with Duncan extensively, and based on a note from
Micah, I'm going to continue to back out some more of the more
problematic patches in this series in order to ensure we go into the
LLVM 3.2 branch with a reasonable story here. I'll send a note to
llvmdev explaining what's going on and why.
Summary of reverted revisions:
r166634: Fix a compiler warning with an unused variable.
r166607: Add some cleanup to the DataLayout changes requested by
Chandler.
r166596: Revert "Back out r166591, not sure why this made it through
since I cancelled the command. Bleh, sorry about this!
r166591: Delete a directory that wasn't supposed to be checked in yet.
r166578: Add in support for getIntPtrType to get the pointer type based
on the address space.
llvm-svn: 167221
This patch migrates the stpcpy optimizations from the simplify-libcalls
pass into the instcombine library call simplifier. Note that the
__stpcpy_chk simplifications were migrated in a previous commit.
llvm-svn: 167083
integers in that the code to handle split alloca-wide integer loads or
stores doesn't come first. It should, for the same reasons as with
integers, and the PR attests to that. Also had to fix a busted assert in
that this test case also covers.
llvm-svn: 167051
When the switch-to-lookup tables transform landed in SimplifyCFG, it
was pointed out that this could be inappropriate for some targets.
Since there was no way at the time for the pass to know anything about
the target, an awkward reverse-transform was added in CodeGenPrepare
that turned lookup tables back into switches for some targets.
This patch uses the new TargetTransformInfo to determine if a
switch should be transformed, and removes
CodeGenPrepare::ConvertLoadToSwitch.
llvm-svn: 167011
checks to avoid performing compile-time arithmetic on PPCDoubleDouble.
Now that APFloat supports arithmetic on PPCDoubleDouble, those checks
are no longer needed, and we can treat the type like any other.
llvm-svn: 166958
wrapper returns a vector of integers when passed a vector of pointers) by having
getIntPtrType itself return a vector of integers in this case. Outside of this
wrapper, I didn't find anywhere in the codebase that was relying on the old
behaviour for vectors of pointers, so give this a whirl through the buildbots.
llvm-svn: 166939
This turns loops like
for (unsigned i = 0; i != n; ++i)
p[i] = p[i+1];
into memmove, which has a highly optimized implementation in most libcs.
This was really easy with the new DependenceAnalysis :)
llvm-svn: 166875
Requires a lot less code and complexity on loop-idiom's side and the more
precise analysis can catch more cases, like the one I included as a test case.
This also fixes the edge-case miscompilation from PR9481.
Compile time performance seems to be slightly worse, but this is mostly due
to an extra LCSSA run scheduled by the PassManager and should be fixed there.
llvm-svn: 166874
smaller integer loads and stores.
The high-level motivation is that the frontend sometimes generates
a single whole-alloca integer load or store during ABI lowering of
splittable allocas. We need to be able to break this apart in order to
see the underlying elements and properly promote them to SSA values. The
hope is that this fixes some performance regressions on x86-32 with the
new SROA pass.
Unfortunately, this causes quite a bit of churn in the test cases, and
bloats some IR that comes out. When we see an alloca that consists soley
of bits and bytes being extracted and re-inserted, we now do some
splitting first, before building widened integer "bucket of bits"
representations. These are always well folded by instcombine however, so
this shouldn't actually result in missed opportunities.
If this splitting of all-integer allocas does cause problems (perhaps
due to smaller SSA values going into the RA), we could potentially go to
some extreme measures to only do this integer splitting trick when there
are non-integer component accesses of an alloca, but discovering this is
quite expensive: it adds yet another complete walk of the recursive use
tree of the alloca.
Either way, I will be watching build bots and LNT bots to see what
fallout there is here. If anyone gets x86-32 numbers before & after this
change, I would be very interested.
llvm-svn: 166662
very small but very important bugfix:
bool shouldExplore(Use *U) {
Value *V = U->get();
if (isa<CallInst>(V) || isa<InvokeInst>(V))
[...]
should have read:
bool shouldExplore(Use *U) {
Value *V = U->getUser();
if (isa<CallInst>(V) || isa<InvokeInst>(V))
Fixes PR14143!
llvm-svn: 166407
It passes all tests, produces better results than the old code but uses the
wrong pass, LoopDependenceAnalysis, which is old and unmaintained. "Why is it
still in tree?", you might ask. The answer is obviously: "To confuse developers."
Just swapping in the new dependency pass sends the pass manager into an infinte
loop, I'll try to figure out why tomorrow.
llvm-svn: 166399
Requires a lot less code and complexity on loop-idiom's side and the more
precise analysis can catch more cases, like the one I included as a test case.
This also fixes the edge-case miscompilation from PR9481. I'm not entirely
sure that all cases are handled that the old checks handled but LDA will
certainly become smarter in the future.
llvm-svn: 166390
This patch migrates the strcpy optimizations from the simplify-libcalls pass
into the instcombine library call simplifier. Note also that StrCpyChkOpt
has been updated with a few simplifications that were being done in the
simplify-libcalls version of StrCpyOpt, but not in the migrated implementation
of StrCpyOpt. There is no reason to overload StrCpyOpt with fortified and
regular simplifications in the new model since there is already a dedicated
simplifier for __strcpy_chk.
llvm-svn: 166198
operate purely on values. Sink the alloca loading and storing logic into
the rewrite routines that are specific to alloca-integer-rewrite
driving. This is just a refactoring here, but the subsequent step will
be to reuse the insertion and extraction logic when rewriting integer
loads and stores that have been split and decomposed into narrower loads
and stores.
No functionality changed other than different names for instructions.
llvm-svn: 166176
The TargetTransform changes are breaking LTO bootstraps of clang. I am
working with Nadav to figure out the problem, but I am reverting it for now
to get our buildbots working.
This reverts svn commits: 165665 165669 165670 165786 165787 165997
and I have also reverted clang svn 165741
llvm-svn: 166168
a pointer. A very bad idea. Let's not do that. Fixes PR14105.
Note that this wasn't *that* glaring of an oversight. Originally, these
routines were only called on offsets within an alloca, which are
intrinsically positive. But over the evolution of the pass, they ended
up being called for arbitrary offsets, and things went downhill...
llvm-svn: 166095
revision makes no sense. We cannot use the address space of the *post
indexed* type to conclude anything about a *pre indexed* pointer type's
size. More importantly, this index can never be over a pointer. We are
indexing over arrays and vectors here.
Of course, I have no test case here. Neither did the original patch. =/
llvm-svn: 166091
includes extracting ints for copying elsewhere and inserting ints when
copying into the alloca. This should fix the CanSROA assertion coming
out of Clang's regression test suite.
llvm-svn: 165931
cases where we have partial integer loads and stores to an otherwise
promotable alloca to widen[1] those loads and stores to cover the entire
alloca and bitcast them into the appropriate type such that promotion
can proceed.
These partial loads and stores stem from an annoying confluence of ARM's
calling convention and ABI lowering and the FCA pre-splitting which
takes place in SROA. Clang lowers a { double, double } in-register
function argument as a [4 x i32] function argument to ensure it is
placed into integer 32-bit registers (a really unnerving implicit
contract between Clang and the ARM backend I would add). This results in
a FCA load of [4 x i32]* from the { double, double } alloca, and SROA
decomposes this into a sequence of i32 loads and stores. Inlining
proceeds, code gets folded, but at the end of the day, we still have i32
stores to the low and high halves of a double alloca. Widening these to
be i64 operations, and bitcasting them to double prior to loading or
storing allows promotion to proceed for these allocas.
I looked quite a bit changing the IR which Clang produces for this case
to be more friendly, but small changes seem unlikely to help. I think
the best representation we could use currently would be to pass 4 i32
arguments thereby avoiding any FCAs, but that would still require this
fix. It seems like it might eventually be nice to somehow encode the ABI
register selection choices outside of the parameter type system so that
the parameter can be a { double, double }, but the CC register
annotations indicate that this should be passed via 4 integer registers.
This patch does not address the second problem in PR14059, which is the
reverse: when a struct alloca is loaded as a *larger* single integer.
This patch also does not address some of the code quality issues with
the FCA-splitting. Those don't actually impede any optimizations really,
but they're on my list to clean up.
[1]: Pedantic footnote: for those concerned about memory model issues
here, this is safe. For the alloca to be promotable, it cannot escape or
have any use of its address that could allow these loads or stores to be
racing. Thus, widening is always safe.
llvm-svn: 165928
Convert the internal representation of the Attributes class into a pointer to an
opaque object that's uniqued by and stored in the LLVMContext object. The
Attributes class then becomes a thin wrapper around this opaque
object. Eventually, the internal representation will be expanded to include
attributes that represent code generation options, etc.
llvm-svn: 165917
This patch migrates the strcmp and strncmp optimizations from the
simplify-libcalls pass into the instcombine library call simplifier.
llvm-svn: 165915
Erasing from the beginning or middle of the vector is expensive, remove_if can
do it in linear time even though it's a bit ugly without lambdas.
No functionality change.
llvm-svn: 165903
This patch migrates the strchr and strrchr optimizations from the
simplify-libcalls pass into the instcombine library call simplifier.
llvm-svn: 165875
This patch migrates the strcat and strncat optimizations from the
simplify-libcalls pass into the instcombine library call simplifier.
llvm-svn: 165874
type coercion code, especially when targetting ARM. Things like [1
x i32] instead of i32 are very common there.
The goal of this logic is to ensure that when we are picking an alloca
type, we look through such wrapper aggregates and across any zero-length
aggregate elements to find the simplest type possible to form a type
partition.
This logic should (generally speaking) rarely fire. It only ends up
kicking in when an alloca is accessed using two different types (for
instance, i32 and float), and the underlying alloca type has wrapper
aggregates around it. I noticed a significant amount of this occurring
looking at stepanov_abstraction generated code for arm, and suspect it
happens elsewhere as well.
Note that this doesn't yet address truly heinous IR productions such as
PR14059 is concerning. Those result in mismatched *sizes* of types in
addition to mismatched access and alloca types.
llvm-svn: 165870
help the dragonegg builders, and no test case at this point, but this
was one dimly plausible case I spotted by inspection. Hopefully will get
a testcase from those bots soon-ish, and will tidy this up with proper
testing.
llvm-svn: 165869
are single value types, the load and store should be directly based upon
the alloca and then bitcasting can fix the type as needed afterward.
This might in theory improve some of the IR coming out of SROA, but
I don't expect big changes yet and don't have any test cases on hand.
This is really just a cleanup/refactoring patch. The next patch will
cause this code path to be hit a lot more, actually get SROA to promote
more allocas and include several more test cases.
llvm-svn: 165864
We use the enums to query whether an Attributes object has that attribute. The
opaque layer is responsible for knowing where that specific attribute is stored.
llvm-svn: 165488
This class is used by LSR and a number of places in the codegen.
This is the first step in de-coupling LSR from TLI, and creating
a new interface in between them.
llvm-svn: 165455
have an alloca or a parameter, since then the alloca test should make sense
to readers, while before it probably appears too specific. No functionality
change.
llvm-svn: 165306
are in fact identity operations. We detect these and kill their
partitions so that even splitting is unaffected by them. This is
particularly important because Clang relies on emitting identity memcpy
operations for struct copies, and these fold away to constants very
often after inlining.
Fixes the last big performance FIXME I have on my plate.
llvm-svn: 165285
the rewrite visitor to make the fact that the speculation is completely
independent a bit more clear.
I promise that this is just a cut/paste of the one visitor and adding
the annonymous namespace wrappings. The diff may look completely
preposterous, it does in git for some reason.
llvm-svn: 165284
cpyDest can be mutated in some cases, which would then cause a crash later if
indeed the memory was underaligned. This brought down several buildbots, so
I guess the underaligned case is much more common than I thought!
llvm-svn: 165228
Currently, we re-visit allocas when something changes about the way they
might be *split* to allow better scalarization to take place. However,
we weren't handling the case when the *promotion* is what would change
the behavior of SROA. When an address derived from an alloca is stored
into another alloca, we consider the first to have escaped. If the
second is ever promoted to an SSA value, we will suddenly be able to run
the SROA pass on the first alloca.
This patch adds explicit support for this form if iteration. When we
detect a store of a pointer derived from an alloca, we flag the
underlying alloca for reprocessing after promotion. The logic works hard
to only do this when there is definitely going to be promotion and it
might remove impediments to the analysis of the alloca.
Thanks to Nick for the great test case and Benjamin for some sanity
check review.
llvm-svn: 165223
was less aligned than the old. In the testcase this results in an overaligned
memset: the memset alignment was correct for the original memory but is too much
for the new memory. Fix this by either increasing the alignment of the new
memory or bailing out if that isn't possible. Should fix the gcc-4.7 self-host
buildbot failure.
llvm-svn: 165220
Sorry for this being broken so long. =/
As part of this, switch all of the existing tests to be Little Endian,
which is the behavior I was asserting in them anyways! Add in a new
big-endian test that checks the interesting behavior there.
Another part of this is to tighten the rules abotu when we perform the
full-integer promotion. This logic now rejects cases where there fully
promoted integer is a non-multiple-of-8 bitwidth or cases where the
loads or stores touch bits which are in the allocated space of the
alloca but are not loaded or stored when accessing the integer. Sadly,
these aren't really observable today as the rest of the pass will
already ensure the invariants hold. However, the latter situation is
likely to become a potential concern in the future.
Thanks to Benjamin and Duncan for early review of this patch. I'm still
looking into whether there are further endianness issues, please let me
know if anyone sees BE failures persisting past this.
llvm-svn: 165219
a memcpy to reflect that '0' has a different meaning when applied to
a load or store. Now we correctly use underaligned loads and stores for
the test case added.
llvm-svn: 165101
necessary during rewriting. As part of this, fix a real think-o here
where we might have left off an alignment specification when the address
is in fact underaligned. I haven't come up with any way to trigger this,
as there is always some other factor that reduces the alignment, but it
certainly might have been an observable bug in some way I can't think
of. This also slightly changes the strategy for placing explicit
alignments on loads and stores to only do so when the alignment does not
match that required by the ABI. This causes a few redundant alignments
to go away from test cases.
I've also added a couple of tests that really push on the alignment that
we end up with on loads and stores. More to come here as I try to fix an
underlying bug I have conjectured and produced test cases for, although
it's not clear if this bug is the one currently hitting dragonegg's
gcc47 bootstrap.
llvm-svn: 165100
preserves the values of the relocated entries, unlikely remove_if. This
allows walking them and erasing them.
Also flesh out the predicate we are using for this to support the
various constraints actually imposed on a UnaryPredicate -- without this
we can't compose it with std::not1.
Thanks to Sean Silva for the review here and noticing the issue with
std::remove_if.
llvm-svn: 165073
scheduled for processing on the worklist eventually gets deleted while
we are processing another alloca, fixing the original test case in
PR13990.
To facilitate this, add a remove_if helper to the SetVector abstraction.
It's not easy to use the standard abstractions for this because of the
specifics of SetVectors types and implementation.
Finally, a nice small test case is included. Thanks to Benjamin for the
fantastic reduced test case here! All I had to do was delete some empty
basic blocks!
llvm-svn: 165065
We require that the indices into the use lists are stable in order to
build fast lookup tables to locate a particular partition use from an
operand of a PHI or select. This is (obviously in hind sight)
incompatible with erasing elements from the array. Really, we don't want
to erase anyways. It is expensive, and a rare operation. Instead, simply
weaken the contract of the PartitionUse structure to allow null Use
pointers to represent dead uses. Now we can clear out the pointer to
mark things as dead, and all it requires is adding some 'continue'
checks to the various loops.
I'm still reducing a test case for this, as the test case I have is
huge. I think this one I can get a nice test case for though, as it was
much more deterministic.
llvm-svn: 165032
being separate was that it can grow the use list. As a consequence, we
can't use the iterator-pair interface, we need an index based interface.
Expose such an interface from the AllocaPartitioning, and use it in the
speculator.
This should at least fix a use-after-free bug found by Duncan, and may
fix some of the other crashers.
I don't have a nice deterministic test case yet, but if I get a good
one, I'll add it.
llvm-svn: 165027
alignment requirements of the new alloca. As one consequence which was
reported as a bug by Duncan, we overaligned memcpy calls to ranges of
allocas after they were rewritten to types with lower alignment
requirements. Other consquences are possible, but I don't have any test
cases for them.
llvm-svn: 164937
could probably be factored still further to hoist this logic into
a generic helper, but currently I don't have particularly clean ideas
about how to handle that.
This at least allows us to drop custom load rewriting from the
speculation logic, which in turn allows the existing load rewriting
logic to fire. In theory, this could enable vector promotion or other
tricks after speculation occurs, but I've not dug into such issues. This
is primarily just cleaning up the factoring of the code and the
resulting logic.
llvm-svn: 164933
a pair of instructions, one for the used pointer and the second for the
user. This simplifies the representation and also makes it more dense.
This was noticed because of the miscompile in PR13926. In that case, we
were running up against a fundamental "bad idea" in the speculation of
PHI and select instructions: the speculation and rewriting are
interleaved, which requires phi speculation to also perform load
rewriting! This is bad, and causes us to miss opportunities to do (for
example) vector rewriting only exposed after PHI speculation, etc etc.
It also, in the old system, required us to insert *new* load uses into
the current partition's use list, which would then be ignored during
rewriting because we had already extracted an end iterator for the use
list. The appending behavior (and much of the other oddities) stem from
the strange de-duplication strategy in the PartitionUse builder.
Amusingly, all this went without notice for so long because it could
only be triggered by having *different* GEPs into the same partition of
the same alloca, where both different GEPs were operands of a single
PHI, and where the GEP which was not encountered first also had multiple
uses within that same PHI node... Hence the insane steps required to
reproduce.
So, step one in fixing this fundamental bad idea is to make the
PartitionUse actually contain a Use*, and to make the builder do proper
deduplication instead of funky de-duplication. This is enough to remove
the appending behavior, and fix the miscompile in PR13926, but there is
more work to be done here. Subsequent commits will lift the speculation
into its own visitor. It'll be a useful step toward potentially
extracting all of the speculation logic into a generic utility
transform.
The existing PHI test case for repeated operands has been made more
extreme to catch even these issues. This test case, run through the old
pass, will exactly reproduce the miscompile from PR13926. ;] We were so
close here!
llvm-svn: 164925
alignment could lose it due to the alloca type moving down to a much
smaller alignment guarantee.
Now SROA will actively compute a proper alignment, factoring the target
data, any explicit alignment, and the offset within the struct. This
will in some cases lower the alignment requirements, but when we lower
them below those of the type, we drop the alignment entirely to give
freedom to the code generator to align it however is convenient.
Thanks to Duncan for the lovely test case that pinned this down. =]
llvm-svn: 164891
rewriter in SROA to carry a proper alignment. This involves
interrogating various sources of alignment, etc. This is a more complete
and principled fix to PR13920 as well as related bugs pointed out by Eli
in review and by inspection in the area.
Also by inspection fix the integer and vector promotion paths to create
aligned loads and stores. I still need to work up test cases for
these... Sorry for the delay, they were found purely by inspection.
llvm-svn: 164689
This should really, really fix PR13916. For real this time. The
underlying bug is... a bit more subtle than I had imagined.
The setup is a code pattern that leads to an @llvm.memcpy call with two
equal pointers to an alloca in the source and dest. Now, not any pattern
will do. The alloca needs to be formed just so, and both pointers should
be wrapped in different bitcasts etc. When this precise pattern hits,
a funny sequence of events transpires. First, we correctly detect the
potential for overlap, and correctly optimize the memcpy. The first
time. However, we do simplify the set of users of the alloca, and that
causes us to run the alloca back through the SROA pass in case there are
knock-on simplifications. At this point, a curious thing has happened.
If we happen to have an i8 alloca, we have direct i8 pointer values. So
we don't bother creating a cast, we rewrite the arguments to the memcpy
to dircetly refer to the alloca.
Now, in an unrelated area of the pass, we have clever logic which
ensures that when visiting each User of a particular pointer derived
from an alloca, we only visit that User once, and directly inspect all
of its operands which refer to that particular pointer value. However,
the mechanism used to detect memcpy's with the potential to overlap
relied upon getting visited once per *Use*, not once per *User*. This is
always true *unless* the same exact value is both source and dest. It
turns out that almost nothing actually produces that pattern though.
We can hand craft test cases that more directly test this behavior of
course, and those are included. Also, note that there is a significant
missed optimization here -- we prove in many cases that there is
a non-volatile memcpy call with identical source and dest addresses. We
shouldn't prevent splitting the alloca in that case, and in fact we
should just remove such memcpy calls eagerly. I'll address that in
a subsequent commit.
llvm-svn: 164669
only a missed optimization opportunity if the store is over-aligned, but a
miscompile if the store's new type has a higher natural alignment than the
memcpy did. Fixes PR13920!
llvm-svn: 164641
reason we were getting two of the same alloca is because of a memmove/memcpy
which had the same alloca in both the src and dest. Now we detect that case
directly. This has the same testcase as before, but fixes a clang test
CodeGenObjC/exceptions.m which runs clang -O2.
llvm-svn: 164636
Chandler, it's not obvious that it's okay that this alloca gets into the list
twice to begin with. Please review and see whether this is the fix you really
want, but I wanted to get a fix checked in quickly.
llvm-svn: 164634
to chains or cycles between PHIs and/or selects. Also add a couple of
really nice test cases reduced from Kostya's reports in PR13905 and
PR13906. Both are fixed by this patch.
llvm-svn: 164596
David (I think), but I would appreciate folks verifying that this fixes
the big crasher.
I'm still working on a reduced test case, but because this was causing
problems I wanted to get the fix checked in quickly.
llvm-svn: 164585
integer promotion analogous to vector promotion. When there is an
integer alloca being accessed both as its integer type and as a narrower
integer type, promote the narrower access to "insert" and "extract" the
smaller integer from the larger one, and make the integer alloca
a candidate for promotion.
In the new formulation, we don't care about target legal integer or use
thresholds to control things. Instead, we only perform this promotion to
an integer type which the frontend has already emitted a load or store
for. This bounds the scope and prevents optimization passes from
coalescing larger and larger entities into a single integer.
llvm-svn: 164479
across the uses of the alloca. It's entirely possible for negative
numbers to come up here, and in some rare cases simply doing the 2's
complement arithmetic isn't the correct decision. Notably, we can't zext
the index of the GEP. The definition of GEP is that these offsets are
sign extended or truncated to the size of the pointer, and then wrapping
2's complement arithmetic used.
This patch fixes an issue that comes up with *no* input from the
buildbots or bootstrap afaict. The only place where it manifested,
disturbingly, is Clang's own regression test suite. A reduced and
targeted collection of tests are added to cope with this. Note that I've
tried to pin down the potential cases of overflow, but may have missed
some cases. I've tried to add a few cases to test this, but its hard
because LLVM has quite limited support for >64bit constructs.
llvm-svn: 164475
selects with a constant condition. This resulted in the operands
remaining live through the SROA rewriter. Most of the time, this just
caused some dead allocas to persist and get zapped by later passes, but
in one case found by Joerg, it caused a crash when we tried to *promote*
the alloca despite it having this dead use. We already have the
mechanisms in place to handle this, just wire select up to them.
llvm-svn: 164427
This is a follow-up from r163302, which added a transformation to
SimplifyCFG that turns some switches into loads from lookup tables.
It was pointed out that some targets, such as GPUs and deeply embedded
targets, might not find this appropriate, but SimplifyCFG doesn't have
enough information about the target to decide this.
This patch adds the reverse transformation to CodeGenPrep: it turns
loads from lookup tables back into switches for targets where we do not
build jump tables (assuming these are also the targets where lookup
tables are inappropriate).
Hopefully we will eventually get to have target information in
SimplifyCFG, and then this CodeGenPrep transformation can be removed.
llvm-svn: 164206
from the dragonegg build bots when we turned on the full version of the
pass. Included a much reduced test case for this pesky bug, despite
bugpoint's uncooperative behavior.
Also, I audited all the similar code I could find and didn't spot any
other cases where this mistake cropped up.
llvm-svn: 164178
working on FCA splitting. Instead of refusing to form a common type when
there are uses of a subsection of the alloca as well as a use of the
entire alloca, just skip the subsection uses and continue looking for
a whole-alloca use with a type that we can use.
This produces slightly prettier IR I think, and also fixes the other
failure in the test.
llvm-svn: 164146
FCAs. This is essential in order to promote allocas that are used in
struct returns by frontends like Clang. The FCA load would block the
rest of the pass from firing, resulting is significant regressions with
the bullet benchmark in the nightly test suite.
Thanks to Duncan for repeated discussions about how best to do this, and
to both him and Benjamin for review.
This appears to have blocked many places where the pass tries to fire,
and so I'm expect somewhat different results with this fix added.
As with the last big patch, I'm including a change to enable the SROA by
default *temporarily*. Ben is going to remove this as soon as the LNT
bots pick up the patch. I'm just trying to get a round of LNT numbers
from the stable machines in the lab.
NOTE: Four clang tests are expected to fail in the brief window where
this is enabled. Sorry for the noise!
llvm-svn: 164119
partition use lists a bit. No functionality changed.
These visitors are actually visiting a tuple of a Use and an offset into
the alloca. However, we use the InstVisitor to handle the dispatch over
the users, and so the Use and Offset are stored in class member
variables and set just before each call to visit(). This is fairly
awkward and makes the functions a bit harder to read, but its the only
real option we have until InstVisitor can be rewritten to use variadic
templates.
However, this pattern shouldn't be followed on the helper member
functions where there is no interface constraint from the visitor. We
already were passing the instruction as a normal parameter rather than
use the Use to get at it, start passing the offset as well. This will
become more important in subsequent patches as the offset will in some
cases change while visiting a single instruction.
llvm-svn: 164003
new one, and add support for running the new pass in that mode and in
that slot of the pass manager. With this the new pass can completely
replace the old one within the pipeline.
The strategy for enabling or disabling the SSAUpdater logic is to do it
by making the requirement of the domtree analysis optional. By default,
it is required and we get the standard mem2reg approach. This is usually
the desired strategy when run in stand-alone situations. Within the
CGSCC pass manager, we disable requiring of the domtree analysis and
consequentially trigger fallback to the SSAUpdater promotion.
In theory this would allow the pass to re-use a domtree if one happened
to be available even when run in a mode that doesn't require it. In
practice, it lets us have a single pass rather than two which was
simpler for me to wrap my head around.
There is a hidden flag to force the use of the SSAUpdater code path for
the purpose of testing. The primary testing strategy is just to run the
existing tests through that path. One notable difference is that it has
custom code to handle lifetime markers, and one of the tests has been
enhanced to exercise that code.
This has survived a bootstrap and the test suite without serious
correctness issues, however my run of the test suite produced *very*
alarming performance numbers. I don't entirely understand or trust them
though, so more investigation is on-going.
To aid my understanding of the performance impact of the new SROA now
that it runs throughout the optimization pipeline, I'm enabling it by
default in this commit, and will disable it again once the LNT bots have
picked up one iteration with it. I want to get those bots (which are
much more stable) to evaluate the impact of the change before I jump to
any conclusions.
NOTE: Several Clang tests will fail because they run -O3 and check the
result's order of output. They'll go back to passing once I disable it
again.
llvm-svn: 163965
* wrap code blocks in \code ... \endcode;
* refer to parameter names in paragraphs correctly (\arg is not what most
people want -- it starts a new paragraph);
* use \param instead of \arg to document parameters in order to be consistent
with the rest of the codebase.
llvm-svn: 163902
pointless checks in here, bad asserts, and just confusing code. I've
also added a bit more to the comment to clarify what this function is
really trying to do as it was not obvious to Duncan when studying it.
Thanks to Duncan for helping me dig through the issue.
No real functionality changed here in practical cases, and certainly no
test case. This is just cleanup spotted by inspection.
llvm-svn: 163897
inspection by Duncan during review. My suspicion is that we would still
have returned 0 anyways in this case, but doing it sooner is better.
llvm-svn: 163895
deeply suspicious and likely to go away eventually. Also fix a bogus
comment about one of the checks in the vector GEP analysis. Based on
review from Duncan.
llvm-svn: 163894
Originally I had anticipated needing to thread this through more bits of
the SROA pass itself, but that ended up not happening. In the end, this
is a much simpler way to manange the variable.
llvm-svn: 163893
This is essentially a ground up re-think of the SROA pass in LLVM. It
was initially inspired by a few problems with the existing pass:
- It is subject to the bane of my existence in optimizations: arbitrary
thresholds.
- It is overly conservative about which constructs can be split and
promoted.
- The vector value replacement aspect is separated from the splitting
logic, missing many opportunities where splitting and vector value
formation can work together.
- The splitting is entirely based around the underlying type of the
alloca, despite this type often having little to do with the reality
of how that memory is used. This is especially prevelant with unions
and base classes where we tail-pack derived members.
- When splitting fails (often due to the thresholds), the vector value
replacement (again because it is separate) can kick in for
preposterous cases where we simply should have split the value. This
results in forming i1024 and i2048 integer "bit vectors" that
tremendously slow down subsequnet IR optimizations (due to large
APInts) and impede the backend's lowering.
The new design takes an approach that fundamentally is not susceptible
to many of these problems. It is the result of a discusison between
myself and Duncan Sands over IRC about how to premptively avoid these
types of problems and how to do SROA in a more principled way. Since
then, it has evolved and grown, but this remains an important aspect: it
fixes real world problems with the SROA process today.
First, the transform of SROA actually has little to do with replacement.
It has more to do with splitting. The goal is to take an aggregate
alloca and form a composition of scalar allocas which can replace it and
will be most suitable to the eventual replacement by scalar SSA values.
The actual replacement is performed by mem2reg (and in the future
SSAUpdater).
The splitting is divided into four phases. The first phase is an
analysis of the uses of the alloca. This phase recursively walks uses,
building up a dense datastructure representing the ranges of the
alloca's memory actually used and checking for uses which inhibit any
aspects of the transform such as the escape of a pointer.
Once we have a mapping of the ranges of the alloca used by individual
operations, we compute a partitioning of the used ranges. Some uses are
inherently splittable (such as memcpy and memset), while scalar uses are
not splittable. The goal is to build a partitioning that has the minimum
number of splits while placing each unsplittable use in its own
partition. Overlapping unsplittable uses belong to the same partition.
This is the target split of the aggregate alloca, and it maximizes the
number of scalar accesses which become accesses to their own alloca and
candidates for promotion.
Third, we re-walk the uses of the alloca and assign each specific memory
access to all the partitions touched so that we have dense use-lists for
each partition.
Finally, we build a new, smaller alloca for each partition and rewrite
each use of that partition to use the new alloca. During this phase the
pass will also work very hard to transform uses of an alloca into a form
suitable for promotion, including forming vector operations, speculating
loads throguh PHI nodes and selects, etc.
After splitting is complete, each newly refined alloca that is
a candidate for promotion to a scalar SSA value is run through mem2reg.
There are lots of reasonably detailed comments in the source code about
the design and algorithms, and I'm going to be trying to improve them in
subsequent commits to ensure this is well documented, as the new pass is
in many ways more complex than the old one.
Some of this is still a WIP, but the current state is reasonbly stable.
It has passed bootstrap, the nightly test suite, and Duncan has run it
successfully through the ACATS and DragonEgg test suites. That said, it
remains behind a default-off flag until the last few pieces are in
place, and full testing can be done.
Specific areas I'm looking at next:
- Improved comments and some code cleanup from reviews.
- SSAUpdater and enabling this pass inside the CGSCC pass manager.
- Some datastructure tuning and compile-time measurements.
- More aggressive FCA splitting and vector formation.
Many thanks to Duncan Sands for the thorough final review, as well as
Benjamin Kramer for lots of review during the process of writing this
pass, and Daniel Berlin for reviewing the data structures and algorithms
and general theory of the pass. Also, several other people on IRC, over
lunch tables, etc for lots of feedback and advice.
llvm-svn: 163883
* wrap code blocks in \code ... \endcode;
* refer to parameter names in paragraphs correctly (\arg is not what most
people want -- it starts a new paragraph).
llvm-svn: 163790
pointers-to-strong-pointers may be in play. These can lead to retains and
releases happening in unstructured ways, foiling the optimizer. This fixes
rdar://12150909.
llvm-svn: 163180
- CodeGenPrepare pass for identifying div/rem ops
- Backend specifies the type mapping using addBypassSlowDivType
- Enabled only for Intel Atom with O2 32-bit -> 8-bit
- Replace IDIV with instructions which test its value and use DIVB if the value
is positive and less than 256.
- In the case when the quotient and remainder of a divide are used a DIV
and a REM instruction will be present in the IR. In the non-Atom case
they are both lowered to IDIVs and CSE removes the redundant IDIV instruction,
using the quotient and remainder from the first IDIV. However,
due to this optimization CSE is not able to eliminate redundant
IDIV instructions because they are located in different basic blocks.
This is overcome by calculating both the quotient (DIV) and remainder (REM)
in each basic block that is inserted by the optimization and reusing the result
values when a subsequent DIV or REM instruction uses the same operands.
- Test cases check for the presents of the optimization when calculating
either the quotient, remainder, or both.
Patch by Tyler Nowicki!
llvm-svn: 163150
Scan the body of the loop and find instructions that may trap.
Use this information when deciding if it is safe to hoist or sink instructions.
Notice that we can optimize the search of instructions that may throw in the case of nested loops.
rdar://11518836
llvm-svn: 163132
For example, the ARM target does not have efficient ISel handling for vector
selects with scalar conditions. This patch adds a TLI hook which allows the
different targets to report which selects are supported well and which selects
should be converted to CF duting codegen prepare.
llvm-svn: 163093
We update until we hit a fixpoint. This is probably slow but also
slightly simplifies the code. It should also fix the occasional
invalid domtrees observed when building with expensive checking.
I couldn't find a case where this had a measurable slowdown, but
if someone finds a pathological case where it does we may have
to find a cleverer way of updating dominators here.
Thanks to Duncan for the test case.
llvm-svn: 163091
The old PHI updating code in loop-rotate was replaced with SSAUpdater a while
ago, it has no problems with comples PHIs. What had to be fixed is detecting
whether a loop was already rotated and updating dominators when multiple exits
were present.
This change increases overall code size a bit, mostly due to additional loop
unrolling opportunities. Passes test-suite and selfhost with -verify-dom-info.
Fixes PR7447.
Thanks to Andy for the input on the domtree updating code.
llvm-svn: 162912
This disables malloc-specific optimization when -fno-builtin (or -ffreestanding)
is specified. This has been a problem for a long time but became more severe
with the recent memory builtin improvements.
Since the memory builtin functions are used everywhere, this required passing
TLI in many places. This means that functions that now have an optional TLI
argument, like RecursivelyDeleteTriviallyDeadFunctions, won't remove dead
mallocs anymore if the TLI argument is missing. I've updated most passes to do
the right thing.
Fixes PR13694 and probably others.
llvm-svn: 162841
No intended behavior change. This was introduced in r162023. With the fixed
algorithm a Release build of ARMInstPrinter.cpp goes from 16s to 10s on a
2011 MBP.
llvm-svn: 162559
optimizations are guarded by the -enable-double-float-shrink LLVM option.
Last bit of PR13574. Patch by Weiming Zhao <weimingz@codeaurora.org>.
llvm-svn: 162368
This optimization is really just replacing allocas wholesale with
globals, there is no scalarization.
The underlying motivation for this patch is to simplify the SROA pass
and focus it on splitting and promoting allocas.
llvm-svn: 162271
where some fact lake a=b dominates a use in a phi, but doesn't dominate the
basic block itself.
This feature could also be implemented by splitting critical edges, but at least
with the current algorithm reasoning about the dominance directly is faster.
The time for running "opt -O2" in the testcase in pr10584 is 1.003 times slower
and on gcc as a single file it is 1.0007 times faster.
llvm-svn: 162023
and allow some optimizations to turn conditional branches into unconditional.
This commit adds a simple control-flow optimization which merges two consecutive
basic blocks which are connected by a single edge. This allows the codegen to
operate on larger basic blocks.
rdar://11973998
llvm-svn: 161852
multiple scalar promotions on a single loop. This also has the effect of
preserving the order of stores sunk out of loops, which is aesthetically
pleasing, and it happens to fix the testcase in PR13542, though it doesn't
fix the underlying problem.
llvm-svn: 161459
GetBestDestForJumpOnUndef() assumes there is at least 1 successor, which isn't
true if the block ends in an indirect branch with no successors. Fix this by
bailing out earlier in this case.
llvm-svn: 160546
Fixes PR13371: indvars pass incorrectly substitutes 'undef' values.
I do not like this fix. It's needed until/unless the meaning of undef
changes. It attempts to be complete according to the IR spec, but I
don't have much confidence in the implementation given the difficulty
testing undefined behavior. Worse, this invalidates some of my
hard-fought work on indvars and LSR to optimize pointer induction
variables. It results benchmark regressions, which I'll track
internally. On x86_64 no LTO I see:
-3% huffbench
-3% 400.perlbench
-8% fhourstones
My only suggestion for recovering is to change the meaning of
undef. If we could trust an arbitrary instruction to produce a some
real value that can be manipulated (e.g. incremented) according to
non-undef rules, then this case could be easily handled with SCEV.
llvm-svn: 160421
This places limits on CollectSubexprs to constrains the number of
reassociation possibilities. It limits the recursion depth and skips
over chains of nested recurrences outside the current loop.
Fixes PR13361. Although underlying SCEV behavior is still potentially bad.
llvm-svn: 160340
All SCEV expressions used by LSR formulae must be safe to
expand. i.e. they may not contain UDiv unless we can prove nonzero
denominator.
Fixes PR11356: LSR hoists UDiv.
llvm-svn: 160205
This was always part of the VMCore library out of necessity -- it deals
entirely in the IR. The .cpp file in fact was already part of the VMCore
library. This is just a mechanical move.
I've tried to go through and re-apply the coding standard's preferred
header sort, but at 40-ish files, I may have gotten some wrong. Please
let me know if so.
I'll be committing the corresponding updates to Clang and Polly, and
Duncan has DragonEgg.
Thanks to Bill and Eric for giving the green light for this bit of cleanup.
llvm-svn: 159421
include/llvm/Analysis/DebugInfo.h to include/llvm/DebugInfo.h.
The reasoning is because the DebugInfo module is simply an interface to the
debug info MDNodes and has nothing to do with analysis.
llvm-svn: 159312
before the expression root. Any existing operators that are changed to use one
of them needs to be moved between it and the expression root, and recursively
for the operators using that one. When I rewrote RewriteExprTree I accidentally
inverted the logic, resulting in the compacting going down from operators to
operands rather than up from operands to the operators using them, oops. Fix
this, resolving PR12963.
llvm-svn: 159265
- simplifycfg: invoke undef/null -> unreachable
- instcombine: invoke new -> invoke expect(0, 0) (an arbitrary NOOP intrinsic; only done if the allocated memory is unused, of course)
- verifier: allow invoke of intrinsics (to make the previous step work)
llvm-svn: 159146
- provide more extensive set of functions to detect library allocation functions (e.g., malloc, calloc, strdup, etc)
- provide an API to compute the size and offset of an object pointed by
Move a few clients (GVN, AA, instcombine, ...) to the new API.
This implementation is a lot more aggressive than each of the custom implementations being replaced.
Patch reviewed by Nick Lewycky and Chandler Carruth, thanks.
llvm-svn: 158919
Dynamic GEPs created by SROA needed to insert extra "i32 0"
operands to index through structs and arrays to get to the
vector being indexed.
llvm-svn: 158590
For non-address users, Base and Scaled registers are not specially
associated to fit an address mode, so SCEVExpander should apply normal
expansion rules. Otherwise we may sink computation into inner loops
that have already been optimized.
llvm-svn: 158537
example degenerate phi nodes and binops that use themselves in unreachable code.
Thanks to Charles Davis for the testcase that uncovered this can of worms.
llvm-svn: 158508
since then the entire expression must equal zero (similarly for other operations
with an absorbing element). With this in place a bunch of reassociate code for
handling constants is dead since it is all taken care of when linearizing. No
intended functionality change.
llvm-svn: 158398