This commit completely removes what is left of the simplify-libcalls
pass. All of the functionality has now been migrated to the instcombine
and functionattrs passes. The following C API functions are now NOPs:
1. LLVMAddSimplifyLibCallsPass
2. LLVMPassManagerBuilderSetDisableSimplifyLibCalls
llvm-svn: 184459
Prior to this change, the considered addressing modes may be invalid since the
maximum and minimum offsets were not taking into account.
This was causing an assertion failure.
The added test case exercices that behavior.
<rdar://problem/14199725> Assertion failed: (CurScaleCost >= 0 && "Legal
addressing mode has an illegal cost!")
llvm-svn: 184341
r183584 tries to derive some info from the code *AFTER* a call and apply
these derived info to the code *BEFORE* the call, which is not always safe
as the call in question may never return, and in this case, the derived
info is invalid.
Thank Duncan for pointing out this potential bug.
rdar://14073661
llvm-svn: 183606
The MemCpyOpt pass is capable of optimizing:
callee(&S); copy N bytes from S to D.
into:
callee(&D);
subject to some legality constraints.
Assertion is triggered when the compiler tries to evalute "sizeof(typeof(D))",
while D is an opaque-typed, 'sret' formal argument of function being compiled.
i.e. the signature of the func being compiled is something like this:
T caller(...,%opaque* noalias nocapture sret %D, ...)
The fix is that when come across such situation, instead of calling some
utility functions to get the size of D's type (which will crash), we simply
assume D has at least N bytes as implified by the copy-instruction.
rdar://14073661
llvm-svn: 183584
IndVarSimplify is willing to move divide instructions outside of their
loop bodies if they are invariant of the loop. However, it may not be
safe to expand them if we do not know if they can trap.
Instead, check to see if it is not safe to expand the instruction and
skip the expansion.
This fixes PR16041.
Testcase by Rafael Ávila de Espíndola.
llvm-svn: 183239
Account for the cost of scaling factor in Loop Strength Reduce when rating the
formulae. This uses a target hook.
The default implementation of the hook is: if the addressing mode is legal, the
scaling factor is free.
<rdar://problem/13806271>
llvm-svn: 183045
Namely, check if the target allows to fold more that one register in the
addressing mode and if yes, adjust the cost accordingly.
Prior to this commit, reg1 + scale * reg2 accesses were artificially preferred
to reg1 + reg2 accesses. Indeed, the cost model wrongly assumed that reg1 + reg2
needs a temporary register for the computation, whereas it was correctly
estimated for reg1 + scale * reg2.
<rdar://problem/13973908>
llvm-svn: 183021
iteration.
This on step toward non-iterative GVN. My local hack suggests that getting rid
of iteration will speedup GVN by 30%+ on a medium sized input (2k LOC, C++).
I cannot explain why not 2x or more at this moment.
llvm-svn: 181532
Test case by Michele Scandale!
Fixes PR10293: Load not hoisted out of loop with multiple exits.
There are few regressions with this patch, now tracked by
rdar:13817079, and a roughly equal number of improvements. The
regressions are almost certainly back luck because LoopRotate has very
little idea of whether rotation is profitable. Doing better requires a
more comprehensive solution.
This checkin is a quick fix that lacks generality (PR10293 has
a counter-example). But it trivially fixes the case in PR10293 without
interfering with other cases, and it does satify the criteria that
LoopRotate is a loop canonicalization pass that should avoid
heuristics and special cases.
I can think of two approaches that would probably be better in
the long run. Ultimately they may both make sense.
(1) LoopRotate should check that the current header would make a good
loop guard, and that the loop does not already has a sufficient
guard. The artifical SimplifiedLoopLatch check would be unnecessary,
and the design would be more general and canonical. Two difficulties:
- We need a strong guarantee that we won't endlessly rotate, so the
analysis would need to be precise in order to avoid the
SimplifiedLoopLatch precondition.
- Analysis like this are usually based on SCEV, which we don't want to
rely on.
(2) Rotate on-demand in late loop passes. This could even be done by
shoving the loop back on the queue after the optimization that needs
it. This could work well when we find LICM opportunities in
multi-branch loops. This requires some work, and it doesn't really
solve the problem of SCEV wanting a loop guard before the analysis.
llvm-svn: 181230
This function consists of following steps:
1. Collect dependent memory accesses.
2. Analyze availability.
3. Perform fully redundancy elimination, or
4. Perform PRE, depending on the availability
Step 2, 3 and 4 are now moved to three helper routines.
llvm-svn: 181047
Actually it took me couple of hours trying to make sense of them and
only to find they are dead code. I guess the original author used
"allSingleSucc" to indicate if there are any critial edge emanating
from some blocks, and tried to perform code motion (actually speculation)
in the presence of these critical edges; but later on he/she changed mind
and decided to perform edge-splitting first.
llvm-svn: 180951
the things, and renames it to CBindingWrapping.h. I also moved
CBindingWrapping.h into Support/.
This new file just contains the macros for defining different wrap/unwrap
methods.
The calls to those macros, as well as any custom wrap/unwrap definitions
(like for array of Values for example), are put into corresponding C++
headers.
Doing this required some #include surgery, since some .cpp files relied
on the fact that including Wrap.h implicitly caused the inclusion of a
bunch of other things.
This also now means that the C++ headers will include their corresponding
C API headers; for example Value.h must include llvm-c/Core.h. I think
this is harmless, since the C API headers contain just external function
declarations and some C types, so I don't believe there should be any
nasty dependency issues here.
llvm-svn: 180881
When Reassociator optimize "(x | C1)" ^ "(X & C2)", it may swap the two
subexpressions, however, it forgot to swap cached constants (of C1 and C2)
accordingly.
rdar://13739160
llvm-svn: 180676
This is an edge case that can happen if we modify a chain of multiple selects.
Update all operands in that case and remove the assert. PR15805.
llvm-svn: 179982
I brazenly think this change is slightly simpler than r178793 because:
- no "state" in functor
- "OpndPtrs[i]" looks simpler than "&Opnds[OpndIndices[i]]"
While I can reproduce the probelm in Valgrind, it is rather difficult to come up
a standalone testing case. The reason is that when an iterator is invalidated,
the stale invalidated elements are not yet clobbered by nonsense data, so the
optimizer can still proceed successfully.
Thank Benjamin for fixing this bug and generously providing the test case.
llvm-svn: 179062
The fix for PR14972 in r177055 introduced a real think-o in the *store*
side, likely because I was much more focused on the load side. While we
can arbitrarily widen (or narrow) a loaded value, we can't arbitrarily
widen a value to be stored, as that changes the width of memory access!
Lock down the code path in the store rewriting which would do this to
only handle the intended circumstance.
All of the existing tests continue to pass, and I've added a test from
the PR.
llvm-svn: 178974
This optimization is unstable at this moment; it
1) block us on a very important application
2) PR15200
3) test6 and test7 in test/Transforms/ScalarRepl/dynamic-vector-gep.ll
(the CHECK command compare the output against wrong result)
I personally believe this optimization should not have any impact on the
autovectorized code, as auto-vectorizer is supposed to put gather/scatter
in a "right" way. Although in theory downstream optimizaters might reveal
some gather/scatter optimization opportunities, the chance is quite slim.
For the hand-crafted vectorizing code, in term of redundancy elimination,
load-CSE, copy-propagation and DSE can collectively achieve the same result,
but in much simpler way. On the other hand, these optimizers are able to
improve the code in a incremental way; in contrast, SROA is sort of all-or-none
approach. However, SROA might slighly win in stack size, as it tries to figure
out a stretch of memory tightenly cover the area accessed by the dynamic index.
rdar://13174884
PR15200
llvm-svn: 178912
OpndPtrs stored pointers into the Opnd vector that became invalid when the
vector grows. Store indices instead. Sadly I only have a large testcase that
only triggers under valgrind, so I didn't include it.
llvm-svn: 178793
The key part of this is ensuring that name prefixes remain in a Twine
form until we get to a point where we can nuke them under NDEBUG. This
is tricky using the old APIs as they played fast and loose with Twine,
which is prone to serious error. The inserter is much cleaner as it is
actually in the call stack leading to the setName call, and so has
a good opportunity to prepend the prefix.
This matters more than you might imagine because most runs over an
alloca find a single partition, and rewrite 3 or 4 instructions
referring to it. As a consequence doing this lazily and exclusively with
Twine allows the optimizer to delete more of it and shaves another 2% to
3% off of the release build's SROA run time for PR15412. I also think
the APIs are cleaner, and the use of Twine is more reliable, so
I consider it a win-win despite the churn required to reach this state.
llvm-svn: 177631
The simplify-libcalls pass implemented a doInitialization hook to infer
function prototype attributes for well-known functions. Given that the
simplify-libcalls pass is going away *and* that the functionattrs pass
is already in place to deduce function attributes, I am moving this logic
to the functionattrs pass. This approach was discussed during patch
review:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20121126/157465.html.
llvm-svn: 177619
This is espcially important because the new SROA pass goes to great
lengths to provide helpful names for debugging, and as a consequence
they can become very slow to render.
Good for between 5% and 15% of the SROA runtime on some slow test cases
such as the one in PR15412.
llvm-svn: 177495
- it is trivially known to be used inside the loop in a way that can not be optimized away
- there is no use outside of the loop which can take advantage of the computation hoisting
llvm-svn: 177432
The fundamental problem is that SROA didn't allow for overly wide loads
where the bits past the end of the alloca were masked away and the load
was sufficiently aligned to ensure there is no risk of page fault, or
other trapping behavior. With such widened loads, SROA would delete the
load entirely rather than clamping it to the size of the alloca in order
to allow mem2reg to fire. This was exposed by a test case that neatly
arranged for GVN to run first, widening certain loads, followed by an
inline step, and then SROA which miscompiles the code. However, I see no
reason why this hasn't been plaguing us in other contexts. It seems
deeply broken.
Diagnosing all of the above took all of 10 minutes of debugging. The
really annoying aspect is that fixing this completely breaks the pass.
;] There was an implicit reliance on the fact that no loads or stores
extended past the alloca once we decided to rewrite them in the final
stage of SROA. This was used to encode information about whether the
loads and stores had been split across multiple partitions of the
original alloca. That required threading explicit tracking of whether
a *use* of a partition is split across multiple partitions.
Once that was done, another problem arose: we allowed splitting of
integer loads and stores iff they were loads and stores to the entire
alloca. This is a really arbitrary limitation, and splitting at least
some integer loads and stores is crucial to maximize promotion
opportunities. My first attempt was to start removing the restriction
entirely, but currently that does Very Bad Things by causing *many*
common alloca patterns to be fully decomposed into i8 operations and
lots of or-ing together to produce larger integers on demand. The code
bloat is terrifying. That is still the right end-goal, but substantial
work must be done to either merge partitions or ensure that small i8
values are eagerly merged in some other pass. Sadly, figuring all this
out took essentially all the time and effort here.
So the end result is that we allow splitting only when the load or store
at least covers the alloca. That ensures widened loads and stores don't
hurt SROA, and that we don't rampantly decompose operations more than we
have previously.
All of this was already fairly well tested, and so I've just updated the
tests to cover the wide load behavior. I can add a test that crafts the
pass ordering magic which caused the original PR, but that seems really
brittle and to provide little benefit. The fundamental problem is that
widened loads should Just Work.
llvm-svn: 177055
* Only apply divide bypass optimization when not optimizing for size.
* Fixed bug caused by constant for 0 value of type Int32,
used dividend type to generate the constant instead.
* For atom x86-64 apply the divide bypass to use 16-bit divides instead of
64-bit divides when operand values are small enough.
* Added lit tests for 64-bit divide bypass.
Patch by Tyler Nowicki!
llvm-svn: 176442
This is a common pattern with dyn_cast and similar constructs, when the
PHI no longer depends on the select it can often be turned into a simpler
construct or even get hoisted out of the loop.
PR15340.
llvm-svn: 175995
The 'nobuiltin' attribute is applied to call sites to indicate that LLVM should
not treat the callee function as a built-in function. I.e., it shouldn't try to
replace that function with different code.
llvm-svn: 175835
the SCEV vector size in LoopStrengthReduce. It is observed that
the BaseRegs vector size is 4 in most cases,
and elements are frequently copied when it is initialized as
SmallVector<const SCEV *, 2> BaseRegs.
Our benchmark results show that the compilation time performance
improved by ~0.5%.
Patch by Wan Xiaofei.
llvm-svn: 174219
This name change does the following:
1. Causes the function name to use proper ARC terminology.
2. Makes it clear what the function truly does.
llvm-svn: 173609
The method PerformCodePlacement was doing too much (i.e. 3x loops, lots of
different checking). This refactoring separates the analysis section of the
method into a separate function while leaving the actual code placement and
analysis preparation in PerformCodePlacement.
*NOTE* Really this part of ObjCARC should be refactored out of the main pass
class into its own seperate class/struct. But, it is not time to make that
change yet though (don't want to make such an invasive change without fixing all
of the bugs first).
llvm-svn: 173201
Use the AttributeSet when we're talking about more than one attribute. Add a
function that adds a single attribute. No functionality change intended.
llvm-svn: 173196
generic function calls and intrinsics. This is somewhat overlapping with
an existing intrinsic cost method, but that one seems targetted at
vector intrinsics. I'll merge them or separate their names and use cases
in a separate commit.
This sinks the test of 'callIsSmall' down into TTI where targets can
control it. The whole thing feels very hack-ish to me though. I've left
a FIXME comment about the fundamental design problem this presents. It
isn't yet clear to me what the users of this function *really* care
about. I'll have to do more analysis to figure that out. Putting this
here at least provides it access to proper analysis pass tools and other
such. It also allows us to more cleanly implement the baseline cost
interfaces in TTI.
With this commit, it is now theoretically possible to simplify much of
the inline cost analysis's handling of calls by calling through to this
interface. That conversion will have to happen in subsequent commits as
it requires more extensive restructuring of the inline cost analysis.
The CodeMetrics class is now really only in the business of running over
a block of code and aggregating the metrics on that block of code, with
the actual cost evaluation done entirely in terms of TTI.
llvm-svn: 173148
is free. The whole CodeMetrics API should probably be reworked more, but
this is enough to allow deleting the duplicate code there for computing
whether an instruction is free.
All of the passes using this have been updated to pull in TTI and hand
it to the CodeMetrics stuff. Further, a dead CodeMetrics API
(analyzeFunction) is nuked for lack of users.
llvm-svn: 173036
Specifically according to the semantics of ARC -fno-objc-arc-exception simply
states that it is expected that the unwind path out of a call *MAY* not release
objects. Thus we can have the situation where a release gets moved into a catch
block which we ignore when we remove a retain/release pair resulting in (even
though we assume the program is exiting anyways) the cleanup code path
potentially blowing up before program exit.
llvm-svn: 172599
The reason that this occurs is that tail calling objc_autorelease eventually
tail calls -[NSObject autorelease] which supports fast autorelease. This can
cause us to violate the semantic gaurantees of __autoreleasing variables that
assignment to an __autoreleasing variables always yields an object that is
placed into the innermost autorelease pool.
The fix included in this patch works by:
1. In the peephole optimization function OptimizeIndividualFunctions, always
remove tail call from objc_autorelease.
2. Whenever we convert to/from an objc_autorelease, set/unset the tail call
keyword as appropriate.
*NOTE* I also handled the case where objc_autorelease is converted in
OptimizeReturns to an autoreleaseRV which still violates the ARC semantics. I
will be removing that in a later patch and I wanted to make sure that the tree
is in a consistent state vis-a-vis ARC always.
Additionally some test cases are provided and all tests that have tail call marked
objc_autorelease keywords have been modified so that tail call has been removed.
*NOTE* One test fails due to a separate bug that I am going to commit soon. Thus
I marked the check line TMP: instead of CHECK: so make check does not fail.
llvm-svn: 172287
1. Added debug messages when in OptimizeIndividualCalls we move calls into predecessors and then erase the original call.
2. Added debug messages when in the process of moving calls in ObjCARCOpt::MoveCalls we create new RR and delete old RR.
3. Added a debug message when we visit a specific retain instruction in ObjCARCOpt::PerformCodePlacement.
llvm-svn: 171988
peculiar headers under include/llvm.
This struct still doesn't make a lot of sense, but it makes more sense
down in TargetLowering than it did before.
llvm-svn: 171739
already in a class, just inline the four of them. I suspect that this
class could be simplified some to not always keep distinct variables for
these things, but it wasn't clear to me how given the usage so I opted
for a trivial and mechanical translation.
This removes one of the two remaining users of a header in include/llvm
which does nothing more than define a 4 member struct.
llvm-svn: 171738
TargetTransformInfo rather than TargetLowering, removing one of the
primary instances of the layering violation of Transforms depending
directly on Target.
This is a really big deal because LSR used to be a "special" pass that
could only be tested fully using llc and by looking at the full output
of it. It also couldn't run with any other loop passes because it had to
be created by the backend. No longer is this true. LSR is now just
a normal pass and we should probably lift the creation of LSR out of
lib/CodeGen/Passes.cpp and into the PassManagerBuilder. =] I've not done
this, or updated all of the tests to use opt and a triple, because
I suspect someone more familiar with LSR would do a better job. This
change should be essentially without functional impact for normal
compilations, and only change behvaior of targetless compilations.
The conversion required changing all of the LSR code to refer to the TTI
interfaces, which fortunately are very similar to TargetLowering's
interfaces. However, it also allowed us to *always* expect to have some
implementation around. I've pushed that simplification through the pass,
and leveraged it to simplify code somewhat. It required some test
updates for one of two things: either we used to skip some checks
altogether but now we get the default "no" answer for them, or we used
to have no information about the target and now we do have some.
I've also started the process of removing AddrMode, as the TTI interface
doesn't use it any longer. In some cases this simplifies code, and in
others it adds some complexity, but I think it's not a bad tradeoff even
there. Subsequent patches will try to clean this up even further and use
other (more appropriate) abstractions.
Yet again, almost all of the formatting changes brought to you by
clang-format. =]
llvm-svn: 171735
I'm sorry for duplicating bad style here, but I wanted to keep
consistency. I've pinged the code review thread where this style was
reviewed and changes were requested.
llvm-svn: 171714
through as a reference rather than a pointer. There is always *some*
implementation of this available, so this simplifies code by not having
to test for whether it is available or not.
Further, it turns out there were piles of places where SimplifyCFG was
recursing and not passing down either TD or TTI. These are fixed to be
more pedantically consistent even though I don't have any particular
cases where it would matter.
llvm-svn: 171691
next to its only user. This helper relies on TargetLowering information
that shouldn't be generally used throughout the Transfoms library, and
so it made little sense as a generic utility.
This also consolidates the file where we need to remove the remaining
uses of TargetLowering in favor of the IR-layer abstract interface in
TargetTransformInfo.
llvm-svn: 171590
into their new header subdirectory: include/llvm/IR. This matches the
directory structure of lib, and begins to correct a long standing point
of file layout clutter in LLVM.
There are still more header files to move here, but I wanted to handle
them in separate commits to make tracking what files make sense at each
layer easier.
The only really questionable files here are the target intrinsic
tablegen files. But that's a battle I'd rather not fight today.
I've updated both CMake and Makefile build systems (I think, and my
tests think, but I may have missed something).
I've also re-sorted the includes throughout the project. I'll be
committing updates to Clang, DragonEgg, and Polly momentarily.
llvm-svn: 171366
utils/sort_includes.py script.
Most of these are updating the new R600 target and fixing up a few
regressions that have creeped in since the last time I sorted the
includes.
llvm-svn: 171362
Specifically these calls return their argument verbatim, as a low-level
optimization. However, this makes high-level optimizations
harder. We undo any uses of this optimization that the front-end
emitted. We redo them later in the contract pass.
llvm-svn: 171346
The later API is nicer than the former, and is correct regarding wrap-around offsets (if anyone cares).
There are a few more places left with duplicated code, which I'll remove soon.
llvm-svn: 171259
directly.
This is in preparation for removing the use of the 'Attribute' class as a
collection of attributes. That will shift to the AttributeSet class instead.
llvm-svn: 171253
Similarly inlining of the function is inhibited, if that would duplicate the call (in particular inlining is still allowed when there is only one callsite and the function has internal linkage).
llvm-svn: 170704
This was a silly oversight, we weren't pruning allocas which were used
by variable-length memory intrinsics from the set that could be widened
and promoted as integers. Fix that.
llvm-svn: 170353
This also cleans up a bit of the memcpy call rewriting by sinking some
irrelevant code further down and making the call-emitting code a bit
more concrete.
Previously, memcpy of a subvector would actually miscompile (!!!) the
copy into a single vector element copy. I have no idea how this ever
worked. =/ This is the memcpy half of PR14478 which we probably weren't
noticing previously because it didn't actually assert.
The rewrite relies on the newly refactored insert- and extractVector
functions to do the heavy lifting, and those are the same as used for
loads and stores which makes the test coverage a bit more meaningful
here.
llvm-svn: 170338
Check whether a BB is known as reachable before adding it to the worklist.
This way BB's with multiple predecessors are added to the list no more than
once.
llvm-svn: 170335
The first half of fixing this bug was actually in r170328, but was
entirely coincidental. It did however get me to realize the nature of
the bug, and adapt the test case to test more interesting behavior. In
turn, that uncovered the rest of the bug which I've fixed here.
This should fix two new asserts that showed up in the vectorize nightly
tester.
llvm-svn: 170333
I noticed this while looking at r170328. We only ever do a vector
rewrite when the alloca *is* the vector type, so it's good to not paper
over bugs here by doing a convertValue that isn't needed.
llvm-svn: 170331
This will allow its use inside of memcpy rewriting as well. This routine
is more complex than extractVector, and some of its uses are not 100%
where I want them to be so there is still some work to do here.
While this can technically change the output in some cases, it shouldn't
be a change that matters -- IE, it can leave some dead code lying around
that prior versions did not, etc.
Yet another step in the refactorings leading up to the solution to the
last component of PR14478.
llvm-svn: 170328
The method helpers all implicitly act upon the alloca, and what we
really want is a fully generic helper. Doing memcpy rewrites is more
special than all other rewrites because we are at times rewriting
instructions which touch pointers *other* than the alloca. As
a consequence all of the helpers needed by memcpy rewriting of
sub-vector copies will need to be generalized fully.
Note that all of these helpers ({insert,extract}{Integer,Vector}) are
woefully uncommented. I'm going to go back through and document them
once I get the factoring correct.
No functionality changed.
llvm-svn: 170325
PR14478 highlights a serious problem in SROA that simply wasn't being
exercised due to a lack of vector input code mixed with C-library
function calls. Part of SROA was written carefully to handle subvector
accesses via memset and memcpy, but the rewriter never grew support for
this. Fixing it required refactoring the subvector access code in other
parts of SROA so it could be shared, and then fixing the splat formation
logic and using subvector insertion (this patch).
The PR isn't quite fixed yet, as memcpy is still broken in the same way.
I'm starting on that series of patches now.
Hopefully this will be enough to bring the bullet benchmark back to life
with the bb-vectorizer enabled, but that may require fixing memcpy as
well.
llvm-svn: 170301
No functionality changed. Refactoring leading up to the fix for PR14478
which requires some significant changes to the memset and memcpy
rewriting.
llvm-svn: 170299
This visitor provides infrastructure for recursively traversing the
use-graph of a pointer-producing instruction like an alloca or a malloc.
It maintains a worklist of uses to visit, so it can handle very deep
recursions. It automatically looks through instructions which simply
translate one pointer to another (bitcasts and GEPs). It tracks the
offset relative to the original pointer as long as that offset remains
constant and exposes it during the visit as an APInt offset. Finally, it
performs conservative escape analysis.
However, currently it has some limitations that should be addressed
going forward:
1) It doesn't handle vectors of pointers.
2) It doesn't provide a cheaper visitor when the constant offset
tracking isn't needed.
3) It doesn't support non-instruction pointer values.
The current functionality is exactly what is required to implement the
SROA pointer-use visitors in terms of this one, rather than in terms of
their own ad-hoc base visitor, which was always very poorly specified.
SROA has been converted to use this, and the code there deleted which
this utility now provides.
Technically speaking, using this new visitor allows SROA to handle a few
more cases than it previously did. It is now more aggressive in ignoring
chains of instructions which look like they would defeat SROA, but in
fact do not because they never result in a read or write of memory.
While this is "neat", it shouldn't be interesting for real programs as
any such chains should have been removed by others passes long before we
get to SROA. As a consequence, I've not added any tests for these
features -- it shouldn't be part of SROA's contract to perform such
heroics.
The goal is to extend the functionality of this visitor going forward,
and re-use it from passes like ASan that can benefit from doing
a detailed walk of the uses of a pointer.
Thanks to Ben Kramer for the code review rounds and lots of help
reviewing and debugging this patch.
llvm-svn: 169728
When SROA was evaluating a mixture of i1 and i8 loads and stores, in
just a particular case, it would tickle a latent bug where we compared
bits to bytes rather than bits to bits. As a consequence of the latent
bug, we would allow integers through which were not byte-size multiples,
a situation the later rewriting code was never intended to handle.
In release builds this could trigger all manner of oddities, but the
reported issue in PR14548 was forming invalid bitcast instructions.
The only downside of this fix is that it makes it more clear that SROA
in its current form is not capable of handling mixed i1 and i8 loads and
stores. Sometimes with the previous code this would work by luck, but
usually it would crash, so I'm not terribly worried. I'll watch the LNT
numbers just to be sure.
llvm-svn: 169719
This will more closely match the behavior of the new PtrUseVisitor that
I am adding. Hopefully this will not change the actual behavior in any
way, but by making the processing order more similar help in debugging.
llvm-svn: 169697
There are still bugs in this pass, as well as other issues that are
being worked on, but the bugs are crashers that occur pretty easily in
the wild. Test cases have been sent to the original commit's review
thread.
This reverts the commits:
r169671: Fix a logic error.
r169604: Move the popcnt tests to an X86 subdirectory.
r168931: Initial commit adding the pass.
llvm-svn: 169683
Sooooo many of these had incorrect or strange main module includes.
I have manually inspected all of these, and fixed the main module
include to be the nearest plausible thing I could find. If you own or
care about any of these source files, I encourage you to take some time
and check that these edits were sensible. I can't have broken anything
(I strictly added headers, and reordered them, never removed), but they
may not be the headers you'd really like to identify as containing the
API being implemented.
Many forward declarations and missing includes were added to a header
files to allow them to parse cleanly when included first. The main
module rule does in fact have its merits. =]
llvm-svn: 169131
The partitioning logic attempted to handle uses of an alloca with an
offset starting before the alloca so long as the use had some overlap
with the alloca itself. However, there was a bug where we tested
'(uint64_t)Offset >= AllocSize' without first checking whether 'Offset'
was positive. As a consequence, essentially every negative offset (that
is, starting *before* the alloca does) would be thrown out, even if it
was overlapping. The subsequent code to throw out negative offsets which
were actually non-overlapping was essentially dead. The code to *handle*
overlapping negative offsets was actually dead!
I've just removed all of this, and taught SROA to discard any uses which
start prior to the alloca from the beginning. It has the lovely property
of simplifying the code. =] All the tests still pass, and in fact no new
tests are needed as this is already covered by our testsuite. Fixing the
code so that negative offsets work the way the comments indicate they
were supposed to work causes regressions. That's how I found this.
Anyways, this is all progress in the correct direction -- tightening up
SROA to be maximally aggressive. Some day, I really hope to turn
out-of-bounds accesses to an alloca into 'unreachable'.
llvm-svn: 169120
The original patch removed a bunch of code that the SjLjEHPrepare pass placed
into the entry block if all of the landing pads were removed during the
CodeGenPrepare class. The more natural way of doing things is to run the CGP
*before* we run the SjLjEHPrepare pass.
Make it so!
llvm-svn: 169044
The simplify-libcalls pass maintained a statistic to count the number
of library calls that have been simplified. Now that library call
simplification is being carried out in instcombine the statistic should
be moved to there.
llvm-svn: 168975
depends on the IR infrastructure, there is no sense in it being off in
Support land.
This is in preparation to start working to expand InstVisitor into more
special-purpose visitors that are still generic and can be re-used
across different passes. The expansion will go into the Analylis tree
though as nothing in VMCore needs it.
llvm-svn: 168972
This revision attempts to recognize following population-count pattern:
while(a) { c++; ... ; a &= a - 1; ... },
where <c> and <a>could be used multiple times in the loop body.
TODO: On X8664 and ARM, __buildin_ctpop() are not expanded to a efficent
instruction sequence, which need to be improved in the following commits.
Reviewed by Nadav, really appreciate!
llvm-svn: 168931
the last invoke instruction in the function. This also removes the last landing
pad in an function. This is fine, but with SjLj EH code, we've already placed a
bunch of code in the 'entry' block, which expects the landing pad to stick
around.
When we get to the situation where CGP has removed the last landing pad, go
ahead and nuke the SjLj instructions from the 'entry' block.
<rdar://problem/12721258>
llvm-svn: 168930
This patch migrates the puts optimizations from the simplify-libcalls
pass into the instcombine library call simplifier.
All the simplifiers from simplify-libcalls have now been migrated to
instcombine. Yay! Just a few other bits to migrate (prototype attribute
inference and a few statistics) and simplify-libcalls can finally be put
to rest.
llvm-svn: 168925
Now if we can transform an alloca into a single vector value, but it has
subvector, non-element accesses, we form the appropriate shufflevectors
to allow SROA to proceed. This fixes PR14055 which pointed out a very
common pattern that SROA couldn't handle -- mixed vec3 and vec4
operations on a single alloca.
llvm-svn: 168418
The issue is that we may end up with newly OOB loads when speculating
a load into the predecessors of a PHI node, and this confuses the new
integer splitting logic in some cases, triggering an assertion failure.
In fact, the branch in question must be dead code as it loads from
a too-narrow alloca. Add code to handle this gracefully and leave the
requisite FIXMEs for both optimizing more aggressively and doing more to
aid sanitizing invalid code which triggers these patterns.
llvm-svn: 168361
to properly handle the combinations of these with split integer loads
and stores. This essentially replaces Evan's r168227 by refactoring the
code in a different way, and trynig to mirror that refactoring in both
the load and store sides of the rewriting.
Generally speaking there was some really problematic duplicated code
here that led to poorly founded assumptions and then subtle bugs. Now
much of the code actually flows through and follows a more consistent
style and logical path. There is still a tiny bit of duplication on the
store side of things, but it is much less bad.
This also changes the logic to never re-use a load or store instruction
as that was simply too error prone in practice.
I've added a few tests (one a reduction of the one in Evan's original
patch, which happened to be the same as the report in PR14349). I'm
going to look at adding a few more tests for things I found and fixed in
passing (such as the volatile tests in the vectorizable predicate).
This patch has survived bootstrap, and modulo one bugfix survived
Duncan's test suite, but let me know if anything else explodes.
llvm-svn: 168346
operands of the expression being written was wrongly thought to be reusable as
an inner node of the expression resulting in it turning up as both an inner node
*and* a leaf, creating a cycle in the def-use graph. This would have caused the
verifier to blow up if things had gotten that far, however it managed to provoke
an infinite loop first.
llvm-svn: 168291
the utility for extracting a chain of operations from the IR, thought that it
might as well combine any constants it came across (rather than just returning
them along with everything else). On the other hand, the factorization code
would like to see the individual constants (this is quite reasonable: it is
much easier to pull a factor of 3 out of 2*3 than it is to pull it out of 6;
you may think 6/3 isn't so hard, but due to overflow it's not as easy to undo
multiplications of constants as it may at first appear). This patch therefore
makes LinearizeExprTree stupider: it now leaves optimizing to the optimization
part of reassociate, and sticks to just analysing the IR.
llvm-svn: 168035
This patch migrates the math library call simplifications from the
simplify-libcalls pass into the instcombine library call simplifier.
I have typically migrated just one simplifier at a time, but the math
simplifiers are interdependent because:
1. CosOpt, PowOpt, and Exp2Opt all depend on UnaryDoubleFPOpt.
2. CosOpt, PowOpt, Exp2Opt, and UnaryDoubleFPOpt all depend on
the option -enable-double-float-shrink.
These two factors made migrating each of these simplifiers individually
more of a pain than it would be worth. So, I migrated them all together.
llvm-svn: 167815
The assertion is trigged when the Reassociater tries to transform expression
... + 2 * n * 3 + 2 * m + ...
into:
... + 2 * (n*3 + m).
In the process of the transformation, a helper routine folds the constant 2*3 into 6,
confusing optimizer which is trying the to eliminate the common factor 2, and cannot
find 2 any more.
Review is pending. But I'd like commit first in order to help those who are waiting
for this fix.
llvm-svn: 167740
The new analysis is not yet ready for prime time. It has a *critical*
flawed assumption, and some troubling shortages of testing. Until it's
been hammered into better shape, let's stick with the working code. This
should be easy to revert itself when the analysis is ready.
Fixes PR14241, a miscompile of any memcpy-able loop which uses a pointer
as the induction mechanism. If you have been seeing miscompiles in this
revision range, you really want to test with this backed out. The
results of this miscompile are a bit subtle as they can lead to
downstream passes concluding things are impossible which are in fact
possible.
Thanks to David Blaikie for the majority of the reduction of this
miscompile. I'll be checking in the test case in a non-revert commit.
Revesions reverted here:
r167045: LoopIdiom: Fix a serious missed optimization: we only turned
top-level loops into memmove.
r166877: LoopIdiom: Add checks to avoid turning memmove into an infinite
loop.
r166875: LoopIdiom: Recognize memmove loops.
r166874: LoopIdiom: Replace custom dependence analysis with
DependenceAnalysis.
llvm-svn: 167286
r165941: Resubmit the changes to llvm core to update the functions to
support different pointer sizes on a per address space basis.
Despite this commit log, this change primarily changed stuff outside of
VMCore, and those changes do not carry any tests for correctness (or
even plausibility), and we have consistently found questionable or flat
out incorrect cases in these changes. Most of them are probably correct,
but we need to devise a system that makes it more clear when we have
handled the address space concerns correctly, and ideally each pass that
gets updated would receive an accompanying test case that exercises that
pass specificaly w.r.t. alternate address spaces.
However, from this commit, I have retained the new C API entry points.
Those were an orthogonal change that probably should have been split
apart, but they seem entirely good.
In several places the changes were very obvious cleanups with no actual
multiple address space code added; these I have not reverted when
I spotted them.
In a few other places there were merge conflicts due to a cleaner
solution being implemented later, often not using address spaces at all.
In those cases, I've preserved the new code which isn't address space
dependent.
This is part of my ongoing effort to clean out the partial address space
code which carries high risk and low test coverage, and not likely to be
finished before the 3.2 release looms closer. Duncan and I would both
like to see the above issues addressed before we return to these
changes.
llvm-svn: 167222
getIntPtrType support for multiple address spaces via a pointer type,
and also introduced a crasher bug in the constant folder reported in
PR14233.
These commits also contained several problems that should really be
addressed before they are re-committed. I have avoided reverting various
cleanups to the DataLayout APIs that are reasonable to have moving
forward in order to reduce the amount of churn, and minimize the number
of commits that were reverted. I've also manually updated merge
conflicts and manually arranged for the getIntPtrType function to stay
in DataLayout and to be defined in a plausible way after this revert.
Thanks to Duncan for working through this exact strategy with me, and
Nick Lewycky for tracking down the really annoying crasher this
triggered. (Test case to follow in its own commit.)
After discussing with Duncan extensively, and based on a note from
Micah, I'm going to continue to back out some more of the more
problematic patches in this series in order to ensure we go into the
LLVM 3.2 branch with a reasonable story here. I'll send a note to
llvmdev explaining what's going on and why.
Summary of reverted revisions:
r166634: Fix a compiler warning with an unused variable.
r166607: Add some cleanup to the DataLayout changes requested by
Chandler.
r166596: Revert "Back out r166591, not sure why this made it through
since I cancelled the command. Bleh, sorry about this!
r166591: Delete a directory that wasn't supposed to be checked in yet.
r166578: Add in support for getIntPtrType to get the pointer type based
on the address space.
llvm-svn: 167221
This patch migrates the stpcpy optimizations from the simplify-libcalls
pass into the instcombine library call simplifier. Note that the
__stpcpy_chk simplifications were migrated in a previous commit.
llvm-svn: 167083
integers in that the code to handle split alloca-wide integer loads or
stores doesn't come first. It should, for the same reasons as with
integers, and the PR attests to that. Also had to fix a busted assert in
that this test case also covers.
llvm-svn: 167051
When the switch-to-lookup tables transform landed in SimplifyCFG, it
was pointed out that this could be inappropriate for some targets.
Since there was no way at the time for the pass to know anything about
the target, an awkward reverse-transform was added in CodeGenPrepare
that turned lookup tables back into switches for some targets.
This patch uses the new TargetTransformInfo to determine if a
switch should be transformed, and removes
CodeGenPrepare::ConvertLoadToSwitch.
llvm-svn: 167011
checks to avoid performing compile-time arithmetic on PPCDoubleDouble.
Now that APFloat supports arithmetic on PPCDoubleDouble, those checks
are no longer needed, and we can treat the type like any other.
llvm-svn: 166958
wrapper returns a vector of integers when passed a vector of pointers) by having
getIntPtrType itself return a vector of integers in this case. Outside of this
wrapper, I didn't find anywhere in the codebase that was relying on the old
behaviour for vectors of pointers, so give this a whirl through the buildbots.
llvm-svn: 166939
This turns loops like
for (unsigned i = 0; i != n; ++i)
p[i] = p[i+1];
into memmove, which has a highly optimized implementation in most libcs.
This was really easy with the new DependenceAnalysis :)
llvm-svn: 166875
Requires a lot less code and complexity on loop-idiom's side and the more
precise analysis can catch more cases, like the one I included as a test case.
This also fixes the edge-case miscompilation from PR9481.
Compile time performance seems to be slightly worse, but this is mostly due
to an extra LCSSA run scheduled by the PassManager and should be fixed there.
llvm-svn: 166874
smaller integer loads and stores.
The high-level motivation is that the frontend sometimes generates
a single whole-alloca integer load or store during ABI lowering of
splittable allocas. We need to be able to break this apart in order to
see the underlying elements and properly promote them to SSA values. The
hope is that this fixes some performance regressions on x86-32 with the
new SROA pass.
Unfortunately, this causes quite a bit of churn in the test cases, and
bloats some IR that comes out. When we see an alloca that consists soley
of bits and bytes being extracted and re-inserted, we now do some
splitting first, before building widened integer "bucket of bits"
representations. These are always well folded by instcombine however, so
this shouldn't actually result in missed opportunities.
If this splitting of all-integer allocas does cause problems (perhaps
due to smaller SSA values going into the RA), we could potentially go to
some extreme measures to only do this integer splitting trick when there
are non-integer component accesses of an alloca, but discovering this is
quite expensive: it adds yet another complete walk of the recursive use
tree of the alloca.
Either way, I will be watching build bots and LNT bots to see what
fallout there is here. If anyone gets x86-32 numbers before & after this
change, I would be very interested.
llvm-svn: 166662
very small but very important bugfix:
bool shouldExplore(Use *U) {
Value *V = U->get();
if (isa<CallInst>(V) || isa<InvokeInst>(V))
[...]
should have read:
bool shouldExplore(Use *U) {
Value *V = U->getUser();
if (isa<CallInst>(V) || isa<InvokeInst>(V))
Fixes PR14143!
llvm-svn: 166407
It passes all tests, produces better results than the old code but uses the
wrong pass, LoopDependenceAnalysis, which is old and unmaintained. "Why is it
still in tree?", you might ask. The answer is obviously: "To confuse developers."
Just swapping in the new dependency pass sends the pass manager into an infinte
loop, I'll try to figure out why tomorrow.
llvm-svn: 166399
Requires a lot less code and complexity on loop-idiom's side and the more
precise analysis can catch more cases, like the one I included as a test case.
This also fixes the edge-case miscompilation from PR9481. I'm not entirely
sure that all cases are handled that the old checks handled but LDA will
certainly become smarter in the future.
llvm-svn: 166390
This patch migrates the strcpy optimizations from the simplify-libcalls pass
into the instcombine library call simplifier. Note also that StrCpyChkOpt
has been updated with a few simplifications that were being done in the
simplify-libcalls version of StrCpyOpt, but not in the migrated implementation
of StrCpyOpt. There is no reason to overload StrCpyOpt with fortified and
regular simplifications in the new model since there is already a dedicated
simplifier for __strcpy_chk.
llvm-svn: 166198
operate purely on values. Sink the alloca loading and storing logic into
the rewrite routines that are specific to alloca-integer-rewrite
driving. This is just a refactoring here, but the subsequent step will
be to reuse the insertion and extraction logic when rewriting integer
loads and stores that have been split and decomposed into narrower loads
and stores.
No functionality changed other than different names for instructions.
llvm-svn: 166176